TutorialsPDF APIConversion SDKData Extraction

How to Extract Table from PDF to JSON with ComPDFKit

Sierra Nakamura | Wed. 09 Jul. 2025

CONTENTS

Method 1: Convert PDF to JSON Using Python

Method 2: Extract PDF Tables to JSON With ComPDFKit

In today’s data-driven world, extracting tabular data from PDFs is essential for businesses, but the PDF format’s structure often makes data extraction challenging. JSON, a lightweight and flexible format, is ideal for storing and analyzing this data. This article will show you how to use ComPDFKit to efficiently extract tables from PDFs into JSON format, enabling easy access and editing of table information.

extract-pdf-table-to-json

Launch Demo Free Trial

Windows Web Android iOS Mac Server React Native Flutter Electron

30-day Free

Method 1: Convert PDF to JSON Using Python

ComPDFKit provides secure and reliable PDF APIs to help developers integrate powerful PDF features into their applications, enhancing the user experience. What's more, ComPDFKit API offers a 30-day free trial, allowing you to process over 1,000 documents without limits to test your project needs. With its comprehensive API documentation, you can easily find tutorials on extracting PDF tables to JSON in various languages such as Java, PHP, and Python.

This article will show you how to use Python to extract tables from PDFs to JSON.

Step 1: Register and authenticate

You can sign up for a free ComPDFKit API account online and get unlimited free processing of over 1,000 documents for 30 days to test your project.

sign-up-api

ComPDFKit API uses JSON Web Tokens method for secure authentication. Get your Public Key and Secret Key from the control panel and authenticate it as follows.

# Create a client
client = CPDFClient(public_key, secret_key)

Step 2: Create a PDF to JSON task

Choose the PDF to JSON tool and substitute it with the accessToken you obtained earlier. Change the error message's display language to your preferred language type. Afterward, the taskId will be included in the response data.

# Create a task
# Create an example of a Image to Json task
create_task_result = client.create_task(CPDFConversionEnum.PDF_TO_JSON)

# Get a task id
task_id = create_task_result.task_id

Step 3: Upload files

Upload the PDF file from which you need to extract the table as JSON, and bind it to the task ID.

# Upload files
client.upload_file('test.pdf', task_id, file_parameter)

Note: You need to set the option parameter of extracting content to "1"

Step 4: Convert PDF table to JSON

After the file is uploaded, call this interface through the task ID to convert the PDF document to JSON.

# Execute task
client.execute_task(task_id)

Step 5: Get task information

Request task status and file-related metadata based on the task ID.

# Query TaskInfo
task_info = client.get_task_info(task_id)

Now, your code should be like the following sample:

# Create a client
client = CPDFClient(public_key, secret_key)

# Create a task
# Create an example of a Image to Json task
create_task_result = client.create_task(CPDFConversionEnum.PDF_TO_JSON)

# Get a task id
task_id = create_task_result.task_id
# File handling parameter settings
file_parameter = PDFToJSONParameter()
file_parameter.is_allow_ocr = file_parameter.ALLOW_OCR
file_parameter.type = file_parameter.TYPE_TEXT

# Upload files
client.upload_file('test.pdf', task_id, file_parameter)

# Execute task
client.execute_task(task_id)

# Query TaskInfo
task_info = client.get_task_info(task_id)

If you want to use other development languages for converting PDF to JSON, you can visit ComPDFKit API libraries. It also offers comprehensive guides for more PDF features, including document editor, annotations, and more.

Windows Web Android iOS Mac Server React Native Flutter Electron

30-day Free

Method 2: Extract PDF Tables to JSON With ComPDFKit

ComPDFKit's data extraction feature uses advanced table algorithms to accurately identify and extract both standard and non-standard tables from PDFs, with support for outputting data in formats such as JSON, XML, and CSV. Today, we will demonstrate how to extract tables from a PDF to JSON on the Windows platform.

1. Set the input file, and output file path and create a class to extract PDF tables as JSON.

string inputFilePath = "***";

string outputFolderPath = "***";

string outputFileName = "***";

CPDFConverterJsonTable converter = CPDFConvertFactroy.CreateConverter(CPDFConvertType.CPDFConvertTypeJsonTable, inputFilePath) as CPDFConverterJsonTable;

2. Set extraction options, including whether to enable OCR and AI layout recognition.

CPDFConvertJsonOptions jsonOptions = new CPDFConvertJsonOptions();

jsonOptions.IsAllowOCR = false;

jsonOptions.IsAILayoutAnalysis = false;

3. Call the Convert interface to start extracting PDF

ConvertError error = ConvertError.ERR_UNKNOWN;

converter.Convert(outputFolderPath, ref outputFileName, jsonOptions, ref error);

When you are finished, your complete code should look like the following example:

string inputFilePath = "***";
string outputFolderPath = "***";
string outputFileName = "***";
CPDFConverterJsonTable converter = CPDFConvertFactroy.CreateConverter(CPDFConvertType.CPDFConvertTypeJsonTable, inputFilePath) as CPDFConverterJsonTable;
CPDFConvertJsonOptions jsonOptions = new CPDFConvertJsonOptions();
jsonOptions.IsAllowOCR = false;
jsonOptions.IsAILayoutAnalysis = false;
ConvertError error = ConvertError.ERR_UNKNOWN;
converter.Convert(outputFolderPath, ref outputFileName, jsonOptions, ref error);

Launch Demo Free Trial

Windows Web Android iOS Mac Server React Native Flutter Electron

30-day Free

Here are 2 samples of the affection of recognizing a table with the help of ComPDFKit:

1. Standard Table

Recognition effect for standard tables with complete and clear table borders and inner lines:

extract-standard-table-from-pdf

2. Non-standard Table

Recognition effect for non-standard tables with missing table borders or inner lines and unclear table lines:

extract-non-standard-table-from-pdf

In addition, you can get a complete tutorial and code examples on extracting PDF tables to JSON on more platforms by visiting the ComPDFKit Data Extraction guide document. ComPDFKit supports users to test the project by getting a 30-day free trial online!

Final Words

This guide shows you how easy it is to use ComPDFKit to extract PDF tables as JSON for storing and editing table data.

ComPDFKit offers responsive services to users, including 24/5 online technical support, unlimited error requests, and remote assistance to address any technical issues and ensure smooth project progress.

Additionally, you can instantly try out the ComPDFKit online tools to extract PDF documents into JSON format for free or convert PDF tables into other editable text formats, such as Excel spreadsheets!

Free Trial Contact Us

Windows Web Android iOS Mac Server React Native Flutter Electron

30-day Free

Best Data Extraction PDF SDKs How to Extract Text from PDF using Python How to Convert Excel to PDF Using Java C# Extract Data from PDF Files

How to Extract Table from PDF to JSON with ComPDFKit

Method 1: Convert PDF to JSON Using Python

Step 1: Register and authenticate

Step 2: Create a PDF to JSON task

Step 3: Upload files

Step 4: Convert PDF table to JSON

Step 5: Get task information

Method 2: Extract PDF Tables to JSON With ComPDFKit

1. Standard Table

2. Non-standard Table

Final Words

Related Articles