Skip to content
Guides

PDF Conversion

To convert a PDF file to an Office or other format, send a request to /file/handle, including the PDF file as input and file processing parameters. Before you begin, make sure ComPDFKit Processor is started and running.

You will send a POST request to the endpoint /file/handle of the processor. For more information about multipart requests, please refer to the API section.

Convert using local PDF file

Send segmented requests to /file/handle and attach the PDF file:

shell
curl -f -X POST http://localhost:7000/file/handle \
-H "Content-Type: multipart/form-data" \
-F file=@"document.pdf" \
-F executeType="pdf/docx" \
-F password="file open password" \
-F parameter="{ \"contentOptions\": \"2\", \"worksheetOptions\": \"1\"}" \
> result.docx

PDF Conversion Parameters

This section introduces the parameter settings for file processing supported by ComPDFKit Processor. Special parameter settings are available for PDF to Word, Excel, PPT, HTML, RTF, PNG, JPG, and CSV formats. For other functionalities, parameter settings can be ignored (default parameters will be used for document processing).

PDF to Word

Note: Special parameters can be used when uploading files for different functions, while the remaining steps remain consistent.

PDF to Word:

java
{    
  "isContainAnnot": "1",  
  "isContainImg": "1",
  "wordLayoutMode": "1",
  "isAllowOcr": "0",
  "isContainOcrBg": "0",
  "isOnlyAiTable": "0"
}

Required parameters

isContainAnnot:Whether to include comments (1: yes, 0: no).

isContainImg:Whether to include images (1: yes, 0: no).

wordLayoutMode: Typesetting method (1. Flow layout mode; 2. Flow layout supports tables; 3. Box layout mode;) Default 1.

isAllowOcr : Whether to allow to open OCR (1: yes, 0: no) Default 0.

isContainOcrBg: Whether to keep the background image after OCR is enabled (1: yes, 0: no) Default 0.

isOnlyAiTable:Whether to enable AI to recognize table (1: yes, 0: no) Default 0.

PDF to Excel

Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.

PDF to Excel:

java
{
    "contentOptions": "2",
    "worksheetOptions": "1",
    "isContainAnnot": "1" ,  
    "isContainImg": "1",
    "isAllowOcr":"0",
    "isOnlyAiTable":"0"
}

Required parameters

contentOptions: Content extraction options (1: text only, 2: charts only, 3: all content).

worksheetOptions: Options for creating worksheets (1: ForEachTable, 2: ForEachPage, 3: ForTheDocument).

isContainAnnot: Whether to include comments (1: yes, 0: no).

isContainImg: Whether to include images (1: yes, 0: no).

isAllowOcr: Whether to allow to open OCR (1: yes, 0: no), Default 0.

isOnlyAiTable: Whether to enable AI to recognize table (1: yes, 0: no) Default 0.

PDF to PPT

Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.

PDF to PPT:

java
{    
  "isContainAnnot": "1",  
  "isContainImg": "1",
  "isAllowOcr": "0",
  "isContainOcrBg": "0",
  "isOnlyAiTable": "0"
}

Required parameters

isContainAnnot: Whether to include comments (1: yes, 0: no).

isContainImg: Whether to include images (1: yes, 0: no).

isAllowOcr: Whether to allow to open OCR (1: yes, 0: no), Default 0.

isContainOcrBg: Whether to keep the background image after OCR is enabled (1: yes, 0: no) Default 0.

isOnlyAiTable: Whether to enable AI to recognize table (1: yes, 0: no) Default 0.

PDF to HTML

Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.

PDF to HTML:

java
{
   "pageOptions": "2",
   "isContainAnnot": "1",  
   "isContainImg": "1",
   "isAllowOcr": "0",
   "isContainOcrBg": "0",
   "isOnlyAiTable": "0"
}

Required parameters

pageOptions: 1: SinglePage, 2: SinglePageNavigationByBookmarks, 3: MultiplePages, 4: MultiplePagesSplitByBookmarks

isContainAnnot: Whether to include comments (1: yes, 0: no).

isContainImg: Whether to include images (1: yes, 0: no).

isAllowOcr: Whether to allow to open OCR (1: yes, 0: no), Default 0.

isContainOcrBg: Whether to keep the background image after OCR is enabled (1: yes, 0: no) Default 0.

isOnlyAiTable: Whether to enable AI to recognize table (1: yes, 0: no) Default 0.

PDF to RTF

Note: Different parameters can be used when uploading files for each specific function. The other steps remain consistent.

PDF to RTF:

java
{    
  "isContainAnnot": "1",  
  "isContainImg": "1",
  "isAllowOcr": "0",
  "isContainOcrBg": "0"
}

Required parameters

isContainAnnot: Whether to include comments (1: yes, 0: no).

isContainImg: Whether to include images (1: yes, 0: no).

isAllowOcr: Whether to allow to open OCR (1: yes, 0: no), Default 0.

isContainOcrBg: Whether to keep the background image after OCR is enabled (1: yes, 0: no) Default 0.

PDF to JPG

Note: Different parameters can be used when uploading files for each specific function. The other steps remain consistent. PDF to JPG:

java
{  
  "imgDpi": "300"
}

Required parameters

imgDpi:The DPI (dots per inch) value range for images is 72-1500 (default is 300).

PDF to CSV

Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.

PDF to CSV:

java
{    
  "isCsvMerge": "1",
  "isOnlyAiTable": "0"
}

Required parameters

isCsvMerge: Whether to merge CSV files (1: Yes, 0: No).

  • When isCsvMerge is set to 1, the returned file is in .csv format.
  • When isCsvMerge is set to 0, the returned file is in .zip format.

isOnlyAiTable:Whether to enable AI to recognize table (1: yes, 0: no) Default 0.

PDF to JSON

Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.

PDF to JSON:

java
{
   "type": "0",
   "isAllowOcr": "0",
   "extractTextMode": "0"
}

Required parameters

type: The content types needed to be converted when converting PDF to JSON (0: All text outside the table in PDFs, 1: All tables in PDFs, and the text in tables in PDF,2: Extract all content). Default to 0.

isAllowOcr: Whether to allow to open OCR (1: yes, 0: no), Default 0.

extractTextMode: PDF to JSON When type is "2", the content format that needs to be extracted (0: extract according to lines, 1: extract according to paragraphs) defaults to 0. After turning on OCR, it is currently impossible to extract by paragraph.

Please see the explanation of JSON file content fields PDF数据提取 JSON格式说明.pdf