On this page
Guides
Convert PDF to Excel
Overview
ComPDFKit Conversion SDK supports converting PDF documents to Microsoft Excel format (.xlsx). By extracting, parsing, and importing data from PDF into Excel, users can further edit, analyze, or share Excel files. This feature helps increase productivity, reduce manual entry errors, and simplify complex document processing tasks.
Set the content options for Excel
When converting PDF files to Excel files, you need to pay attention to the settings of the following options, which will directly affect the content written to the Excel file.
Content options:
If you set the
ContentOptions.OnlyText
option, only the text content will be written to the Excel file (without containing the table content).Worksheet options:
Options | Description |
---|---|
WorksheetOptions.ForEachTable | Create one sheet for one table. |
WorksheetOptions.ForEachPage | Create one sheet for one PDF page. |
WorksheetOptions.ForTheDocument | Create one sheet for the entire PDF document. |
Note
- In order to get better conversion effects, it is recommended to enable OCR or layout analysis.
- When you enable the OCR feature, the
IsContainOCRBgImage
option will be invalid.
Sample
This sample demonstrates how to convert from a PDF to XLSX file.
c#
string inputFilePath = "***";
string outputFolderPath = "***";
string outputFileName = "***";
CPDFConvertExcelOptions excelOptions = new CPDFConvertExcelOptions();
excelOptions.WorksheetOpts = WorksheetOptions.ForEachPage;
excelOptions.ContentOpts = ContentOptions.AllContent;
excelOptions.IsAllowOCR = false;
excelOptions.IsContainAnnotations = true;
excelOptions.IsContainImages = true;
CPDFConverterExcel converter = CPDFConvertFactroy.CreateConverter(CPDFConvertType.CPDFConvertTypeExcel, inputFilePath) as CPDFConverterExcel;
int pageCount = converter.GetPagesCount();
int[] pageArray = new int[pageCount];
for (int i = 0; i < pageArray.Length; i++)
{
pageArray[i] = i + 1;
}
ConvertError error = ConvertError.ERR_UNKNOWN;
converter.Convert(outputFolderPath, ref outputFileName, excelOptions, pageArray, ref error, getPorgress);