Skip to content
Guides

Overview

About PDF Extract

ComPDFKit provides PDF extract API, PDF extract SDKs, and PDF extract processor (on-premise). It uses the core data extraction technology and AI technology in ComPDFKit to extract the content and structural information from PDF documents (including scanned files) and output them in various structured formats output, such as JSON, XML, CSV, Excel, HTML, TXT, PNG, etc. The PDF data extraction function can extract text, tables, and images as follows:

  • PDF Text Extraction: Extract text content from PDF files, including the page, content(paragraphs, headings, lists, etc.), location information, fonts, styles, and other text formatting information.
  • PDF Table Extraction: Extract PDF tables and analyze the content of each cell and table formatting information. This feature recognizes all types of tables, both structured and unstructured. The parsed data can be generated in JSON format, or optionally output as XML, CSV, and XLSX files.
  • PDF Image Extraction: Objects recognized as graphics or images will be extracted as PNG files.

In addition to the above types of content extraction, the ComPDFKit Data Extraction API also captures document structure information, such as the natural reading order of the various extracted elements and the layout of the elements on each given page.

The extracted information can be used by developers for more convenient secondary development. It can be used in many aspects such as content republishing, content processing, data analysis, content aggregation, management, and search. The ComPDFKit Data Extraction SDK is available for Windows, Android, iOS, and Mac platforms, as well as for a variety of development languages (C++, Java, Python, Python), providing many deployment options for developers to embed the SDK into your development programs.

Solutions

Integrate ComPDFKit SDK for PDF Data Extraction

ComPDFKit SDK is a library of high-performance development tools that can be used to extract data from PDF files and convert them to various file formats. It can also directly export or save the extracted data in various formats for subsequent development. You can continue to view about using the SDK for data extraction.

Use ComPDFKit API to call the PDF Data Extraction Interface

We also provide the ComPDFKit API, which follows the RESTful API standard, for developers to call the PDF data extraction interface. ComPDFKit API provides you with simple document upload, document processing, and file download workflows to extract data from PDFs. You can refer to ComPDFKit API documentation for relevant information.

Deploy ComPDFKit Processor for PDF Data Extraction

ComPDFKit Processor is an SDK for converting PDF files on the Linux platform, which provides developers with a rich API including data extraction functions for developers to call, and can be deployed on your private server to ensure data security.

Contact Information

Contact ComPDFKit:

Thanks, The ComPDFKit Team