Skip to content
ComPDF
DemoFAQ

Layout Analysis

Overview

Layout analysis is the process of leveraging Artificial Intelligence (AI) technology to parse and understand the structure of a document's layout. Its primary goal is to extract text, images, tables, layers, and other data from the input documents.

Layout analysis has several common use cases, including:

  • Intelligent recognition of tables within PDF documents: This feature is particularly useful for analyzing company financial statements, invoices, bank statements, experimental data, medical test reports, and more.
  • Smart extraction of text, images, or tables from PDF documents through layout analysis: This functionality greatly aids in the analysis and extraction of information from identification cards, receipts, licenses, documents, ancient books, and other various types of files.

Features that support Layout Analysis:

  • PDF to Word
  • PDF to Excel
  • PDF to PowerPoint (PPT)
  • PDF to HTML
  • PDF to RTF
  • PDF to TXT
  • PDF to CSV
  • Extract PDF to JSON
  • Extract PDF to Markdown

Notice

  • You need to load the DocumentAI model before using layout analysis, or plug in your own AI engine via the callbacks described in 3.11 Use Custom AI Models via Callbacks.
  • When the OCR is enabled, the layout analysis is automatically enabled.
  • AI table recognition is a separate stage controlled by its own option. See 3.10 Table Recognition for details.

Sample

This Sample demonstrates how to use Layout Analysis to convert a PDF to a DOCX file.

c#
string inputFilePath = "***";
string password = "***";
string outputFileName = "***";
WordOptions wordOptions = new WordOptions();
wordOptions.ContainImage = true;
wordOptions.ContainAnnotation = true;
// Enable layout analysis option.
wordOptions.EnableAiLayout = true;
wordOptions.EnableOCR = false;
ErrorCode error = CPDFConversion.StartPDFToWord(inputFilePath, password, outputFileName, wordOptions);