Intelligent Document Parsing
Our AI-powered ComIDP solution intelligently processes and comprehends content in documents and images, including reports, contracts, essays, and other standard files. It identifies and classifies elements, modules, and structures while preserving the original reading logic. By structuring unstructured and semi-structured data, it provides accurate data sources for downstream applications.
Output .Json format file
Parse PDF documents into JSON files, interface reference ComIDP API
Parameter executeType
uses pdf/json
Parameter parameter
is as follows:
{
"version": "v2"
}
Required parameters
version
: PDF to JSON different versions (v1, v2), use guide document parsing when v2 is selected. Default v1.
JSON content explanation
{
"version": "1.0.0",
"objects": [
{
"type": "Header",
"rect": [
49.0,
43.5,
171.5,
76.0
],
"text": "Intelligent Document Parsing",
"page": 0,
"order_index": 0
}
]
}
The properties common to all objects are as follows:
rect
: The position of the object on the pagepage
: The page number where the object is locatedorder_index
: The reading order position of the object on the current pagetype
: Used to identify the type of the object. Currently supported object types are:- Text: Ordinary text type object, containing text content
- Image: Image type object, containing the path of the image
- Table and UnstdTable Table type object, containing the content and structure of the table
- Catalogue Catalogue type object, containing the content of the catalogue
- List and UnorderedList List type object, containing the content of the list
- Formula Formula type object, containing the content of the formula
- Header Header type object, containing the content of the header
- Footer Footer type object, containing the content of the footer
- PageNumber Page number type object, containing the content of the page number
- FigureTitle Figure title type object, containing the content of the figure title
- FigureCaption Figure caption type object, containing the content of the figure caption
Supported input formats
Supported output formats
- Zip: The zip includes Json result files and image folders.