This post demonstrates how to construct an image-to-Excel converter with highly effective ComPDFKit Java OCR conversion library.
Register for ComPDFKit API to start with 1,000 free API calls per month. Refer to the guide below to begin your converter building or replacing your existing technology of converting images to Excel.
Step 1: Authentication for Converting Image to Excel
You need to replace the publicKey and secretKey fields in the following code which you can get from the console. Then, you can get the accessToken and verification-related information after your authentication. AccessToken will expire after 12 hours. When calling the images to Excel conversion API, you must carry this token in the request header: Authorization: Bearer {accessToken}.
import java.io.*;
import okhttp3.*;
public class main {
public static void main(String []args) throws IOException{
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("text/plain");
RequestBody body = RequestBody.create(mediaType, "{\n \"publicKey\": \"{{public_key}}\",\n \"secretKey\": \"{{secret_key}}\"\n}");
Request request = new Request.Builder()
.url("https://api-server.compdf.com/server/v1/oauth/token")
.method("POST", body)
.build();
Response response = client.newCall(request).execute();
}
Needed Parameters:
To ensure accurate conversion when transforming an image to Excel, please import the following parameters. If you do not import them, default options will be applied.
- contentOptions: Options to extract contents (1: only text, 2: only table, 3: all content) Default 2.
- worksheetOptions: Options to create worksheet (1: create a sheet for each table, 2: create a sheet for each page, 3: create a single sheet for file) Default 1.
- isAllowOcr: Whether to allow to open OCR (1: yes, 0: no), Default 0.
- isContainOcrBg: Whether to keep the background image after OCR is enabled (1: yes, 0: no) Default 0.
- isOnlyAiTable: Whether to enable AI to recognize table (1: yes, 0: no) Default 0.
Methods to apply the parameters of converting images to Excel:
{
"contentOptions": "2",
"worksheetOptions": "1",
"isAllowOcr": "0",
"isContainOcrBg": "0",
"isOnlyAiTable": "0"
}
Step 2: Create Task
You need to replace the accessToken which was obtained from the previous step. After replacing them, you will get the taskId in the response data.
import java.io.*;
import okhttp3.*;
public class main {
public static void main(String []args) throws IOException{
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("text/plain");
RequestBody body = RequestBody.create(mediaType, "");
Request request = new Request.Builder()
.url("https://api-server.compdf.com/server/v1/task/img/xlsx?language={{language}}")
.method("GET", body)
.addHeader("Authorization", "Bearer {{accessToken}}")
.build();
Response response = client.newCall(request).execute();
}
}
Step 3: Upload Files
Upload the file you want to convert. Image conversion supports JPG, JPEG, PNG, and BMP formats. Ensure you update the taskId, accessToken, and task parameters as required.
import java.io.*;
import okhttp3.*;
public class main {
public static void main(String []args) throws IOException{
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("text/plain");
RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
.addFormDataPart("file","{{file}}",
RequestBody.create(MediaType.parse("application/octet-stream"),
new File("<file>")))
.addFormDataPart("taskId","{{taskId}}")
.addFormDataPart("language","{{language}}")
.addFormDataPart("password","")
.addFormDataPart("parameter","{ \"contentOptions\": \"2\", \"worksheetOptions\": \"1\"}")
.build();
Request request = new Request.Builder()
.url("https://api-server.compdf.com/server/v1/file/upload")
.method("POST", body)
.addHeader("Authorization", "Bearer {{accessToken}}")
.build();
Response response = client.newCall(request).execute();
}
}
Step 4: Process Files
Ensure you update the taskId, accessToken, and task parameters as required.
import java.io.*;
import okhttp3.*;
public class main {
public static void main(String []args) throws IOException{
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("text/plain");
RequestBody body = RequestBody.create(mediaType, "");
Request request = new Request.Builder()
.url("https://api-server.compdf.com/server/v1/execute/start?taskId={{taskId}}&language={{language}}")
.method("GET", body)
.addHeader("Authorization", "Bearer {{accessToken}}")
.build();
Response response = client.newCall(request).execute();
}
}
Step 5: Get Task Information
Use the taskId and access_token obtained from previous steps to access all the task information. You can also get the conversion result, an .xlsx file.
import java.io.*;
import okhttp3.*;
public class main {
public static void main(String []args) throws IOException{
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("text/plain");
RequestBody body = RequestBody.create(mediaType, "");
Request request = new Request.Builder()
.url("https://api-server.compdf.com/server/v1/task/taskInfo?taskId={{taskId}}")
.method("GET", body)
.addHeader("Authorization", "Bearer {{accessToken}}")
.build();
Response response = client.newCall(request).execute();
}
}
Wrapping Up
The effectiveness of converting images to Excel using the ComPDFKit API is attributed to its support for OCR recognition in multiple languages and extensive AI model training for irregular tables, such as those with partial borders or no borders.
In addition, to minimize adjustments in the converted Excel files, we provide support for recognizing and extracting table background colors, attributes, text, and their positional relationships within the table.