Skip to content
Guides

Layout Analysis

Overview

Layout analysis is the process of leveraging Artificial Intelligence (AI) technology to parse and understand the structure of a document's layout. Its primary goal is to extract text, images, tables, layers, and other data from the input documents.

Layout analysis has several common use cases, including:

  • Intelligent recognition of tables within PDF documents: This feature is particularly useful for analyzing company financial statements, invoices, bank statements, experimental data, medical test reports, and more.
  • Smart extraction of text, images, or tables from PDF documents through layout analysis: This functionality greatly aids in the analysis and extraction of information from identification cards, receipts, licenses, documents, ancient books, and other various types of files.

Features that support Layout Analysis:

  • Convert PDF to Word
  • Convert PDF to Excel
  • Convert PDF to PowerPoint
  • Convert PDF to HTML
  • Extract PDF Table

Note

  • You need to integrate the OCR module before using layout analysis.
  • When the OCR is enabled, the layout analysis is automatically enabled.

Sample

This Sample demonstrates how to use the ComPDFKit OCR function to convert PDF to DOCX file.

c#
string inputFilePath = "***";
string outputFolderPath = "***";
string outputFileName = "***";

CPDFConverterWord converter = CPDFConvertFactroy.CreateConverter(CPDFConvertType.CPDFConvertTypeWord, inputFilePath) as CPDFConverterWord;
CPDFConvertWordOptions wordOptions = new CPDFConvertWordOptions();
wordOptions.IsAILayoutAnalysis = true;
wordOptions.IsContainAnnotations = true;
wordOptions.IsContainImages = true;
wordOptions.LayoutOpts = LayoutOptions.RetainPageLayout;

int pageCount = converter.GetPagesCount();
int[] pageArray = new int[pageCount];
for (int i = 0; i < pageArray.Length; i++)
{
    pageArray[i] = i + 1;
}

ConvertError error = ConvertError.ERR_UNKNOWN;
converter.Convert(outputFolderPath, ref outputFileName, wordOptions, pageArray, ref error, getPorgress);