Skip to content
Guides

Intelligent Document Parsing

Our AI-powered ComIDP solution intelligently processes and comprehends content in documents and images, including reports, contracts, essays, and other standard files. It identifies and classifies elements, modules, and structures while preserving the original reading logic. By structuring unstructured and semi-structured data, it provides accurate data sources for downstream applications.

Output .Json format file

Parse PDF documents into JSON files, interface reference ComIDP API

Parameter executeType uses pdf/json

Parameter parameter is as follows:

java
{
	"version": "v2"
}

Required parameters

version: PDF to JSON different versions (v1, v2), use guide document parsing when v2 is selected. Default v1.

JSON content explanation

json
{
    "version": "1.0.0",
    "objects": [
        {
            "type": "Header",
            "rect": [
                49.0,
                43.5,
                171.5,
                76.0
            ],
            "text": "Intelligent Document Parsing",
            "page": 0,
            "order_index": 0
        }
   ]
}

The properties common to all objects are as follows:

  • rect: The position of the object on the page

  • page: The page number where the object is located

  • order_index: The reading order position of the object on the current page

  • type: Used to identify the type of the object. Currently supported object types are:

    • Text: Ordinary text type object, containing text content
    • Image: Image type object, containing the path of the image
    • Table and UnstdTable Table type object, containing the content and structure of the table
    • Catalogue Catalogue type object, containing the content of the catalogue
    • List and UnorderedList List type object, containing the content of the list
    • Formula Formula type object, containing the content of the formula
    • Header Header type object, containing the content of the header
    • Footer Footer type object, containing the content of the footer
    • PageNumber Page number type object, containing the content of the page number
    • FigureTitle Figure title type object, containing the content of the figure title
    • FigureCaption Figure caption type object, containing the content of the figure caption

Supported input formats

  • PDF

Supported output formats

  • Zip: The zip includes Json result files and image folders.