Skip to content
Guides

Layout Analysis

Overview

Layout analysis is the process of leveraging Artificial Intelligence (AI) technology to parse and understand the structure of a document's layout. Its primary goal is to extract text, images, tables, layers, and other data from the input documents.

Layout analysis has several common use cases, including:

  • Intelligent recognition of tables within PDF documents: This feature is particularly useful for analyzing company financial statements, invoices, bank statements, experimental data, medical test reports, and more.
  • Smart extraction of text, images, or tables from PDF documents through layout analysis: This functionality greatly aids in the analysis and extraction of information from identification cards, receipts, licenses, documents, ancient books, and other various types of files.

Features that support Layout Analysis:

  • Convert PDF to Word
  • Convert PDF to Excel
  • Convert PDF to PowerPoint
  • Convert PDF to HTML
  • Extract PDF Table

Note

  • You need to integrate the OCR module before using layout analysis.
  • When the OCR is enabled, the layout analysis is automatically enabled.

Sample

This Sample demonstrates how to use the layout analysis of ComPDFKit Conversion SDK to convert PDFs to Word files.

java
        CPDFConvertWordOptions cpdfConvertWordOptions = new CPDFConvertWordOptions();
        cpdfConvertWordOptions.setContainAnnot(true);
        cpdfConvertWordOptions.setContainImg(true);
        cpdfConvertWordOptions.setAllowOcr(true);
        cpdfConvertWordOptions.setContainOcrBg(true);
        String inputPath = rootDir + input_file + "word.pdf";
        List<Integer> pageCounts = getPageCounts(cpdfConvertWord.getPageCount(inputPath, password));
        ConvertResult convert = cpdfConvertWord.convert(inputPath, rootDir + output_file, "", cpdfConvertWordOptions, pageCounts, password, page -> {
        });