Case Study

AI-driven OCR Revolutionizes Intelligent Layout Analysis with 24+ Labels

By ComPDFKit | Sat. 14 Sep. 2024
Case StudyIntelligent Document ProcessingAI

With the rapid development of technology and the ever-changing business needs, automating repetitive tasks has become a key factor for efficiency enhancement in modern enterprises and a cornerstone for achieving digital transformation. RPA (Robotic Process Automation) is an effective technology to address this challenge. Increasingly, companies are adopting RPA technology to modernize their internal workflows. 

 

automate-labeling-of-unstructured-data

Windows   Web   Android   iOS   Mac   Server   React Native   Flutter   Electron
30-day Free

 

 

Customer Background and Challenges 

A technology company specializing in office software development plans to create an RPA product and an intelligent Q&A product to help enterprises automate workflows and business processes, thereby meeting the needs of efficient, cost-effective, and compliant operations while enhancing customer experience. 

 

However, during the development of RPA and AI Q&A products, this company encountered challenges in processing unstructured documents: manual labeling massive documents was inefficient and error-prone, leading to increased costs and slow development progress. They learned that ComIDP's intelligent document processing solution once helped a data provider process over 3 million unstructured documents in 5 days, prompting them to request automated data labeling for intelligent layout recognition and data parsing. 

 

ComIDP customized layout recognition parameters for them and upgraded OCR technology using AI models, employing over 24 labels to restore the layout and logic of documents, ensuring the integrity and consistency of the document layout. This company deployed ComIDP's intelligent document solution in a clustered environment for developing RPA and intelligent Q&A products, significantly shortening their development cycle, reducing costs, and enabling rapid market entry for the products. 

 

 

Customer Pain Points 

Due to the complex content and inconsistent format of unstructured documents, data parsing and extraction become extremely challenging. Layout recognition is a major difficulty in parsing unstructured documents, as each layout has numerous page elements, varying layouts and styles, and different logical relationships between contents. Additionally, issues such as noise, skew, and perspective further increase the difficulty of recognition. This requires parsing technology with high adaptability and intelligence. However, lacking advanced technology support, this enterprise had to rely on manual processing, which was inefficient and inaccurate, directly impacting the effectiveness of the RPA and Q&A systems.

 

Manual Data Labeling

This technology enterprise previously used manual labeling of unstructured data for document layout recognition, which was time-consuming and prone to errors. When different people handled the same dataset, labeling results varied, leading to inconsistent data quality. This not only increased the cost and time for subsequent data verification but also complicated the development work and extended project timelines.

 

Massive Document Input

This company processes over hundreds of thousands of files daily, necessitating servers with high efficiency and high-load processing capacity. However, traditional server architectures could not handle such large-scale data inputs, resulting in slow system performance.

 

Self-development Challenges

In a competitive market, self-development can bring personalized solutions but is costly and time-consuming. Long development cycles make it difficult for companies to quickly respond to market changes, risking the loss of market opportunities. 

 

Windows   Web   Android   iOS   Mac   Server   React Native   Flutter   Electron
30-day Free

 

 

Customer Requirements 

This company detailed their product's application scenarios to our ComIDP team and proposed specific requirements for intelligent data labeling of layout analysis, aiming to optimize data parsing effects while achieving AI data automation.

 

Types of AI Data Labeling

They needed to annotate titles, paragraphs, code blocks, tables, formulas, lists, and non-text content within documents to ensure unstructured document completeness. Separating natural paragraphs and layout segmentation were particularly crucial.

 

Type of Label Sub-type Note
title title All levels of titles on the page need to be labeled.
paragraph paragraph Text fragments consisting of plain text are categorized as paragraphs. To facilitate data search and location, large sections with multiple independent semantic paragraphs should be split, typically segmented by natural paragraphs and punctuation, and each text fragment after segmentation is called a paragraph.
block
block: unknown category, non-text block
A block is the output of layout segmentation. Data of the same type of information that is visually in a connected domain is a block.

1. An image on the same row is a block, a table is a block, and a large section of text under the same column is a block.
2. Blocks must consist of same-type information, mixed areas of different types cannot form one block and must be split. For example, a mixed area with an image and a table cannot be a block.
code-block: code block
img-block: mixed text and image block
table-block: table block
sci-block: scientific formulas block
list-block: list block, text, such as directories or text lists Must be at least two lines (3 and more), with average line text not exceeding two rows, else it's a paragraph

 

Beyond these fundamental needs, each data labeling type had specific restrictions, such as standalone title labeling, non-overlapping paragraph and block, no multi-column blocks, and no blocks containing mixed data types.

 

Output Labeling

Post parsing the unstructured documents, this company required the output files in JSON format with limited output labels including title, paragraph, block, code-block, img-block, table-block, sci-block, and list-block. This supports subsequent key information extraction and semantic analysis, enhancing the accuracy of RPA and Q&A systems. 

 

ComIDP's R&D team customized the layout recognition parameters based on the customer’s needs. Constant updates and iterations led to an accuracy exceeding 95%, successfully delivered to them for acceptance. 

 

 

ComIDP Solution 

ComIDP team engaged in-depth conversations with this enterprise’s R&D team to comprehend specific needs and business goals, ensuring custom and practical solutions. From data collection, AI model training, model optimization to testing reports, we provide professional, flexible, and efficient services for customers.

 

Layout Analysis Model Training

By collecting different types of samples for manual data labeling, such as financial reports, papers, newspapers, and books, our R&D team trained a layout analysis AI model applicable to various industries. This model accurately identifies and classifies various elements on the page, such as titles, paragraphs, tables, and images, using 24 predefined labels, with recognition accuracy surpassing 95%. 

 

geometric-document-layout

 

Based on the specific data labeling needs of this enterprise, we further optimized our AI model. Through refined labeling types and rules, we achieved precise automated data labeling of complex document content. For instance, special recognition algorithms were designed and adjusted for code blocks and formula blocks in technical documents, accurately extracting and distinguishing these unique contents. AI-based ComIDP analyses both geometric and logical document layouts, ensuring 99% restoration of document layout and reading logic structure, thereby maintaining layout completeness and consistency. As requested by the client, labeled results are outputted in standardized JSON format, facilitating secondary processing and data analysis. 

 

logical-document-layout

 

 

Test Reports Verify Effectiveness

 

Functional Testing

Upon AI model training completion, we conducted multiple rounds of rigorous testing to validate its performance, simultaneously using client-provided examples as validation sets to detect model accuracy, eventually producing a functional testing report. The report elaborated on our AI OCR model's behavior in automatically processing various document types, including different formats, sizes, and languages, plus elements like stamps, charts, formulas, and flowcharts. These results served as critical acceptance criteria for the model.

 

Format PNG, JPG, JPEG, BMP
Size 100KB ~ 30MB
Languages Simplified Chinese, English, Mixed Chinese and English
Types Tables, Complex Layouts, Stamps, Handwritten text, Exams, Formulas, Flowcharts, Skewed text, Scanned, and Photographed books and PPTs

 

From the test report, we selected the ultimate effect of ComIDP processing documents with formulas. Results showed accurate recognition of both text and formulas, and our customer was very satisfied with the results.

 

 

Stress Testing

Facing this enterprise with over a hundred thousand daily document inputs, we performed comprehensive stress testing to ensure the system could handle massive document input pressures. We tested PDF to Word (Grid Layout) with and without OCR in both synchronous and asynchronous environment. Our stress test report indicated ComIDP maintained stability, accuracy, and quick responses under high load, proving its excellent performance and reliability in high-load tasks.

 

  Synchronous Testing Asynchronous Testing
Test Scenario 200 users converting files simultaneously. 200 users converting files simultaneously, lasting over 10 minutes.
Test Results
All 200 users succeeded in conversion. All 200 users succeeded in conversion.
Success Rate and Accuracy reached 100%, with no error responses. Success Rate and Accuracy reached 100%, with no error responses.
99% response time under 1 second. 99% response time under 1 second.

 

 

GPU&CPU Speed Testing

Additionally, we deployed a GPU to accelerate document processing speeds. Comparing GPU and CPU efficiency for the same tasks resulted in a detailed OCR GPU&CPU speed comparison report.

 

Below illustrates ComIDP's time expenditure for processing 100 image samples using GPU vs CPU. Testing indicated that in a dual-GPU system's dual-container environment, ComIDP processes up to 20,000 images per minute on average. GPU processing time is 100 times faster than CPU, demonstrating significant speed advantages for large-scale document processing, substantially reducing time and boosting efficiency. For customer’s actual applications and document processing demands, we provided a customized cluster deployment solution to ensure high efficiency in ComIDP's real-world application. 

Windows   Web   Android   iOS   Mac   Server   React Native   Flutter   Electron
30-day Free

 

 

 Final Result

Through accurate recognition and labeling of complex document layouts, ComIDP helps this technology company achieve highly automated document processing within just two months. This breakthrough led cumbersome document management to more convenient and efficient, improving workflow efficiency by over 80% while ensuring 95% layout restoration accuracy. Moreover, employing structured JSON data for developing RPA (Robotic Process Automation) and intelligent Q&A products enabled this company to rapidly bring products to market at a lower cost. This not only significantly shortened time-to-market but also markedly enhanced the product's market competitiveness. 

 

During the collaboration, ComIDP's R&D team promptly responded to the customers’ feedback, communicated actively, and provided corresponding solutions to swiftly address technical issues. Customers were not only satisfied with the results of our products, but also highly praised our service attitude.

 

In the future, we will continue dedicating ourselves to technological development, providing more comprehensive and efficient solutions for enterprises, helping them tackle business challenges, and enabling them to excel in the competitive market. 

 

If you wish to experience automated document processing, we sincerely invite you to try our ComIDP intelligent document processing solution. Through the ComIDP Demo, you can personally experience the powerful functions and ease of AI technology in automated document processing. For more information, please feel free to contact us, and we will provide you with professional support and service.

 

Windows   Web   Android   iOS   Mac   Server   React Native   Flutter   Electron
30-day Free