PDF Conversion
To convert a PDF file to an Office or other format, send a request to /file/handle
, including the PDF file as input and file processing parameters. Before you begin, make sure ComPDFKit Processor is started and running.
You will send a POST request to the endpoint /file/handle
of the processor. For more information about multipart requests, please refer to the API section.
Convert using local PDF file
Send segmented requests to /file/handle
and attach the PDF file:
curl -f -X POST http://localhost:7000/file/handle \
-H "Content-Type: multipart/form-data" \
-F file=@"document.pdf" \
-F executeType="pdf/docx" \
-F password="file open password" \
-F parameter="{ \"contentOptions\": \"2\", \"worksheetOptions\": \"1\"}" \
> result.docx
PDF Conversion Parameters
This section introduces the parameter settings for file processing supported by ComPDFKit Processor. Special parameter settings are available for PDF to Word, Excel, PPT, HTML, RTF, PNG, JPG, and CSV formats. For other functionalities, parameter settings can be ignored (default parameters will be used for document processing).
PDF to Word
Note: Special parameters can be used when uploading files for different functions, while the remaining steps remain consistent.
PDF to Word:
{
"isContainAnnot": "1",
"isContainImg": "1",
"wordLayoutMode": "1",
"isAllowOcr": "0",
"isContainOcrBg": "0",
"isOnlyAiTable": "0"
}
Required parameters
isContainAnnot
:Whether to include comments (1: yes, 0: no).
isContainImg
:Whether to include images (1: yes, 0: no).
wordLayoutMode
: Typesetting method (1. Flow layout mode; 2. Flow layout supports tables; 3. Box layout mode;) Default 1.
isAllowOcr
: Whether to allow to open OCR (1: yes, 0: no) Default 0.
isContainOcrBg
: Whether to keep the background image after OCR is enabled (1: yes, 0: no) Default 0.
isOnlyAiTable
:Whether to enable AI to recognize table (1: yes, 0: no) Default 0.
PDF to Excel
Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.
PDF to Excel:
{
"contentOptions": "2",
"worksheetOptions": "1",
"isContainAnnot": "1" ,
"isContainImg": "1",
"isAllowOcr":"0",
"isOnlyAiTable":"0"
}
Required parameters
contentOptions
: Content extraction options (1: text only, 2: charts only, 3: all content).
worksheetOptions
: Options for creating worksheets (1: ForEachTable, 2: ForEachPage, 3: ForTheDocument).
isContainAnnot
: Whether to include comments (1: yes, 0: no).
isContainImg
: Whether to include images (1: yes, 0: no).
isAllowOcr
: Whether to allow to open OCR (1: yes, 0: no), Default 0.
isOnlyAiTable
: Whether to enable AI to recognize table (1: yes, 0: no) Default 0.
PDF to PPT
Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.
PDF to PPT:
{
"isContainAnnot": "1",
"isContainImg": "1",
"isAllowOcr": "0",
"isContainOcrBg": "0",
"isOnlyAiTable": "0"
}
Required parameters
isContainAnnot
: Whether to include comments (1: yes, 0: no).
isContainImg
: Whether to include images (1: yes, 0: no).
isAllowOcr
: Whether to allow to open OCR (1: yes, 0: no), Default 0.
isContainOcrBg
: Whether to keep the background image after OCR is enabled (1: yes, 0: no) Default 0.
isOnlyAiTable
: Whether to enable AI to recognize table (1: yes, 0: no) Default 0.
PDF to HTML
Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.
PDF to HTML:
{
"pageOptions": "2",
"isContainAnnot": "1",
"isContainImg": "1",
"isAllowOcr": "0",
"isContainOcrBg": "0",
"isOnlyAiTable": "0"
}
Required parameters
pageOptions
: 1: SinglePage
, 2: SinglePageNavigationByBookmarks
, 3: MultiplePages
, 4: MultiplePagesSplitByBookmarks
isContainAnnot
: Whether to include comments (1: yes, 0: no).
isContainImg
: Whether to include images (1: yes, 0: no).
isAllowOcr
: Whether to allow to open OCR (1: yes, 0: no), Default 0.
isContainOcrBg
: Whether to keep the background image after OCR is enabled (1: yes, 0: no) Default 0.
isOnlyAiTable
: Whether to enable AI to recognize table (1: yes, 0: no) Default 0.
PDF to RTF
Note: Different parameters can be used when uploading files for each specific function. The other steps remain consistent.
PDF to RTF:
{
"isContainAnnot": "1",
"isContainImg": "1",
"isAllowOcr": "0",
"isContainOcrBg": "0"
}
Required parameters
isContainAnnot
: Whether to include comments (1: yes, 0: no).
isContainImg
: Whether to include images (1: yes, 0: no).
isAllowOcr
: Whether to allow to open OCR (1: yes, 0: no), Default 0.
isContainOcrBg
: Whether to keep the background image after OCR is enabled (1: yes, 0: no) Default 0.
PDF to JPG
Note: Different parameters can be used when uploading files for each specific function. The other steps remain consistent. PDF to JPG:
{
"imgDpi": "300"
}
Required parameters
imgDpi
:The DPI (dots per inch) value range for images is 72-1500 (default is 300).
PDF to CSV
Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.
PDF to CSV:
{
"isCsvMerge": "1",
"isOnlyAiTable": "0"
}
Required parameters
isCsvMerge
: Whether to merge CSV files (1: Yes, 0: No).
- When
isCsvMerge
is set to 1, the returned file is in .csv format. - When
isCsvMerge
is set to 0, the returned file is in .zip format.
isOnlyAiTable
:Whether to enable AI to recognize table (1: yes, 0: no) Default 0.
PDF to JSON
Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.
PDF to JSON:
{
"type": "0",
"isAllowOcr": "0",
"extractTextMode": "0"
}
Required parameters
type
: The content types needed to be converted when converting PDF to JSON (0: All text outside the table in PDFs, 1: All tables in PDFs, and the text in tables in PDF,2: Extract all content). Default to 0.
isAllowOcr
: Whether to allow to open OCR (1: yes, 0: no), Default 0.
extractTextMode
: PDF to JSON When type
is "2", the content format that needs to be extracted (0: extract according to lines, 1: extract according to paragraphs) defaults to 0. After turning on OCR, it is currently impossible to extract by paragraph.
Please see the explanation of JSON file content fields PDF数据提取 JSON格式说明.pdf