Case StudyCase StudyIntelligent Document ProcessingAI

AI-driven OCR Revolutionizes Intelligent Layout Analysis with 24+ Labels

Nathaniel Vale | Mon. 09 Mar. 2026

CONTENTS

Customer Background and Challenges

Customer Pain Points

Customer Requirements

ComPDF AI（ComIDP） Solution

Final Result

With the rapid development of technology and the ever-changing business needs, automating repetitive tasks has become a key factor for efficiency enhancement in modern enterprises and a cornerstone for achieving digital transformation. RPA (Robotic Process Automation) is an effective technology to address this challenge. Increasingly, companies are adopting RPA technology to modernize their internal workflows.

automate labeling of unstructured data

ComPDF AI（ComIDP） Demo ComPDF AI（ComIDP） Solution

Windows Web Android iOS Mac Server React Native Flutter Electron

30-day Free

Customer Background and Challenges

A technology company specializing in office software development plans to create an RPA product and an intelligent Q&A product to help enterprises automate workflows and business processes, thereby meeting the needs of efficient, cost-effective, and compliant operations while enhancing customer experience.

However, during the development of RPA and AI Q&A products, this company encountered challenges in processing unstructured documents: manual labeling massive documents was inefficient and error-prone, leading to increased costs and slow development progress. They learned that ComPDF AI（ComIDP）'s intelligent document processing solution once helped a data provider process over 3 million unstructured documents in 5 days, prompting them to request automated data labeling for intelligent layout recognition and data parsing.

ComPDF AI（ComIDP） customized layout recognition parameters for them and upgraded OCR technology using AI models, employing over 24 labels to restore the layout and logic of documents, ensuring the integrity and consistency of the document layout. This company deployed ComPDF AI（ComIDP）'s intelligent document solution in a clustered environment for developing RPA and intelligent Q&A products, significantly shortening their development cycle, reducing costs, and enabling rapid market entry for the products.

Customer Pain Points

Due to the complex content and inconsistent format of unstructured documents, data parsing and extraction become extremely challenging. Layout recognition is a major difficulty in parsing unstructured documents, as each layout has numerous page elements, varying layouts and styles, and different logical relationships between contents. Additionally, issues such as noise, skew, and perspective further increase the difficulty of recognition. This requires parsing technology with high adaptability and intelligence. However, lacking advanced technology support, this enterprise had to rely on manual processing, which was inefficient and inaccurate, directly impacting the effectiveness of the RPA and Q&A systems.

Manual Data Labeling

This technology enterprise previously used manual labeling of unstructured data for document layout recognition, which was time-consuming and prone to errors. When different people handled the same dataset, labeling results varied, leading to inconsistent data quality. This not only increased the cost and time for subsequent data verification but also complicated the development work and extended project timelines.

Massive Document Input

This company processes over hundreds of thousands of files daily, necessitating servers with high efficiency and high-load processing capacity. However, traditional server architectures could not handle such large-scale data inputs, resulting in slow system performance.

Self-development Challenges

In a competitive market, self-development can bring personalized solutions but is costly and time-consuming. Long development cycles make it difficult for companies to quickly respond to market changes, risking the loss of market opportunities.

Customize IDP Solution Try ComPDF AI（ComIDP） Demo

Windows Web Android iOS Mac Server React Native Flutter Electron

30-day Free

Customer Requirements

This company detailed their product's application scenarios to our ComPDF AI（ComIDP） team and proposed specific requirements for intelligent data labeling of layout analysis, aiming to optimize data parsing effects while achieving AI data automation.

Types of AI Data Labeling

They needed to annotate titles, paragraphs, code blocks, tables, formulas, lists, and non-text content within documents to ensure unstructured document completeness. Separating natural paragraphs and layout segmentation were particularly crucial.

Type of Label	Sub-type	Note
title	title	All levels of titles on the page need to be labeled.
paragraph	paragraph	Text fragments consisting of plain text are categorized as paragraphs. To facilitate data search and location, large sections with multiple independent semantic paragraphs should be split, typically segmented by natural paragraphs and punctuation, and each text fragment after segmentation is called a paragraph.
block	block: unknown category, non-text block	A block is the output of layout segmentation. Data of the same type of information that is visually in a connected domain is a block. 1. An image on the same row is a block, a table is a block, and a large section of text under the same column is a block. 2. Blocks must consist of same-type information, mixed areas of different types cannot form one block and must be split. For example, a mixed area with an image and a table cannot be a block.
	code-block: code block
	img-block: mixed text and image block
	table-block: table block
	sci-block: scientific formulas block
	list-block: list block, text, such as directories or text lists	Must be at least two lines (3 and more), with average line text not exceeding two rows, else it's a paragraph

Beyond these fundamental needs, each data labeling type had specific restrictions, such as standalone title labeling, non-overlapping paragraph and block, no multi-column blocks, and no blocks containing mixed data types.

Output Labeling

Post parsing the unstructured documents, this company required the output files in JSON format with limited output labels including title, paragraph, block, code-block, img-block, table-block, sci-block, and list-block. This supports subsequent key information extraction and semantic analysis, enhancing the accuracy of RPA and Q&A systems.

ComPDF AI（ComIDP）'s R&D team customized the layout recognition parameters based on the customer’s needs. Constant updates and iterations led to an accuracy exceeding 95%, successfully delivered to them for acceptance.

ComPDF AI（ComIDP） Solution

ComPDF AI（ComIDP） team engaged in-depth conversations with this enterprise’s R&D team to comprehend specific needs and business goals, ensuring custom and practical solutions. From data collection, AI model training, model optimization to testing reports, we provide professional, flexible, and efficient services for customers.

Layout Analysis Model Training

By collecting different types of samples for manual data labeling, such as financial reports, papers, newspapers, and books, our R&D team trained a layout analysis AI model applicable to various industries. This model accurately identifies and classifies various elements on the page, such as titles, paragraphs, tables, and images, using 24 predefined labels, with recognition accuracy surpassing 95%.

geometric document layout

Based on the specific data labeling needs of this enterprise, we further optimized our AI model. Through refined labeling types and rules, we achieved precise automated data labeling of complex document content. For instance, special recognition algorithms were designed and adjusted for code blocks and formula blocks in technical documents, accurately extracting and distinguishing these unique contents. AI-based ComPDF AI（ComIDP） analyses both geometric and logical document layouts, ensuring 99% restoration of document layout and reading logic structure, thereby maintaining layout completeness and consistency. As requested by the client, labeled results are outputted in standardized JSON format, facilitating secondary processing and data analysis.

logical document layout

Test Reports Verify Effectiveness

Functional Testing

Upon AI model training completion, we conducted multiple rounds of rigorous testing to validate its performance, simultaneously using client-provided examples as validation sets to detect model accuracy, eventually producing a functional testing report. The report elaborated on our AI OCR model's behavior in automatically processing various document types, including different formats, sizes, and languages, plus elements like stamps, charts, formulas, and flowcharts. These results served as critical acceptance criteria for the model.

Format	PNG, JPG, JPEG, BMP
Size	100KB ~ 30MB
Languages	Simplified Chinese, English, Mixed Chinese and English
Types	Tables, Complex Layouts, Stamps, Handwritten text, Exams, Formulas, Flowcharts, Skewed text, Scanned, and Photographed books and PPTs

From the test report, we selected the ultimate effect of ComPDF AI（ComIDP） processing documents with formulas. Results showed accurate recognition of both text and formulas, and our customer was very satisfied with the results.

Stress Testing

Facing this enterprise with over a hundred thousand daily document inputs, we performed comprehensive stress testing to ensure the system could handle massive document input pressures. We tested PDF to Word (Grid Layout) with and without OCR in both synchronous and asynchronous environment. Our stress test report indicated ComPDF AI（ComIDP） maintained stability, accuracy, and quick responses under high load, proving its excellent performance and reliability in high-load tasks.

	Synchronous Testing	Asynchronous Testing
Test Scenario	200 users converting files simultaneously.	200 users converting files simultaneously, lasting over 10 minutes.
Test Results	All 200 users succeeded in conversion.	All 200 users succeeded in conversion.
	Success Rate and Accuracy reached 100%, with no error responses.	Success Rate and Accuracy reached 100%, with no error responses.
	99% response time under 1 second.	99% response time under 1 second.

GPU&CPU Speed Testing

Additionally, we deployed a GPU to accelerate document processing speeds. Comparing GPU and CPU efficiency for the same tasks resulted in a detailed OCR GPU&CPU speed comparison report.

Below illustrates ComPDF AI（ComIDP）'s time expenditure for processing 100 image samples using GPU vs CPU. Testing indicated that in a dual-GPU system's dual-container environment, ComPDF AI（ComIDP） processes up to 20,000 images per minute on average. GPU processing time is 100 times faster than CPU, demonstrating significant speed advantages for large-scale document processing, substantially reducing time and boosting efficiency. For customer’s actual applications and document processing demands, we provided a customized cluster deployment solution to ensure high efficiency in ComPDF AI（ComIDP）'s real-world application.

Launch Demo Free Trial

Windows Web Android iOS Mac Server React Native Flutter Electron

30-day Free

Final Result

Through accurate recognition and labeling of complex document layouts, ComPDF AI（ComIDP） helps this technology company achieve highly automated document processing within just two months. This breakthrough led cumbersome document management to more convenient and efficient, improving workflow efficiency by over 80% while ensuring 95% layout restoration accuracy. Moreover, employing structured JSON data for developing RPA (Robotic Process Automation) and intelligent Q&A products enabled this company to rapidly bring products to market at a lower cost. This not only significantly shortened time-to-market but also markedly enhanced the product's market competitiveness.

During the collaboration, ComPDF AI（ComIDP）'s R&D team promptly responded to the customers’ feedback, communicated actively, and provided corresponding solutions to swiftly address technical issues. Customers were not only satisfied with the results of our products, but also highly praised our service attitude.

In the future, we will continue dedicating ourselves to technological development, providing more comprehensive and efficient solutions for enterprises, helping them tackle business challenges, and enabling them to excel in the competitive market.

If you wish to experience automated document processing, we sincerely invite you to try our ComPDF AI（ComIDP） intelligent document processing solution. Through the ComPDF AI（ComIDP） Demo, you can personally experience the powerful functions and ease of AI technology in automated document processing. For more information, please feel free to contact us, and we will provide you with professional support and service.

ComPDF AI（ComIDP） Solution Contact Sales

Windows Web Android iOS Mac Server React Native Flutter Electron

30-day Free

Extract Data from Millions of PDFs for Data Provider Data Extraction vs OCR vs IDP: What's the Difference What Is Intelligent Document Processing

AI-driven OCR Revolutionizes Intelligent Layout Analysis with 24+ Labels

Customer Background and Challenges

Customer Pain Points

Manual Data Labeling

Massive Document Input

Self-development Challenges

Customer Requirements

Types of AI Data Labeling

Output Labeling

ComPDF AI（ComIDP） Solution

Layout Analysis Model Training

Test Reports Verify Effectiveness

Functional Testing

Stress Testing

GPU&CPU Speed Testing

Final Result

Related Articles