With the rapid development of technology and the ever-changing business needs, automating repetitive tasks has become a key factor for efficiency enhancement in modern enterprises and a cornerstone for achieving digital transformation. RPA (Robotic Process Automation) is an effective technology to address this challenge. Increasingly, companies are adopting RPA technology to modernize their internal workflows.
Customer Background and Challenges
A technology company specializing in office software development plans to create an RPA product and an intelligent Q&A product to help enterprises automate workflows and business processes, thereby meeting the needs of efficient, cost-effective, and compliant operations while enhancing customer experience.
However, during the development of RPA and AI Q&A products, this company encountered challenges in processing unstructured documents: manual labeling massive documents was inefficient and error-prone, leading to increased costs and slow development progress. They learned that ComIDP's intelligent document processing solution once helped a data provider process over 3 million unstructured documents in 5 days, prompting them to request automated data labeling for intelligent layout recognition and data parsing.
ComIDP customized layout recognition parameters for them and upgraded OCR technology using AI models, employing over 24 labels to restore the layout and logic of documents, ensuring the integrity and consistency of the document layout. This company deployed ComIDP's intelligent document solution in a clustered environment for developing RPA and intelligent Q&A products, significantly shortening their development cycle, reducing costs, and enabling rapid market entry for the products.
Customer Pain Points
Due to the complex content and inconsistent format of unstructured documents, data parsing and extraction become extremely challenging. Layout recognition is a major difficulty in parsing unstructured documents, as each layout has numerous page elements, varying layouts and styles, and different logical relationships between contents. Additionally, issues such as noise, skew, and perspective further increase the difficulty of recognition. This requires parsing technology with high adaptability and intelligence. However, lacking advanced technology support, this enterprise had to rely on manual processing, which was inefficient and inaccurate, directly impacting the effectiveness of the RPA and Q&A systems.
Manual Data Labeling
This technology enterprise previously used manual labeling of unstructured data for document layout recognition, which was time-consuming and prone to errors. When different people handled the same dataset, labeling results varied, leading to inconsistent data quality. This not only increased the cost and time for subsequent data verification but also complicated the development work and extended project timelines.
Massive Document Input
This company processes over hundreds of thousands of files daily, necessitating servers with high efficiency and high-load processing capacity. However, traditional server architectures could not handle such large-scale data inputs, resulting in slow system performance.
Self-development Challenges
In a competitive market, self-development can bring personalized solutions but is costly and time-consuming. Long development cycles make it difficult for companies to quickly respond to market changes, risking the loss of market opportunities.
Customer Requirements
This company detailed their product's application scenarios to our ComIDP team and proposed specific requirements for intelligent data labeling of layout analysis, aiming to optimize data parsing effects while achieving AI data automation.
Types of AI Data Labeling
They needed to annotate titles, paragraphs, code blocks, tables, formulas, lists, and non-text content within documents to ensure unstructured document completeness. Separating natural paragraphs and layout segmentation were particularly crucial.
Beyond these fundamental needs, each data labeling type had specific restrictions, such as standalone title labeling, non-overlapping paragraph and block, no multi-column blocks, and no blocks containing mixed data types.
Output Labeling
Post parsing the unstructured documents, this company required the output files in JSON format with limited output labels including title, paragraph, block, code-block, img-block, table-block, sci-block, and list-block. This supports subsequent key information extraction and semantic analysis, enhancing the accuracy of RPA and Q&A systems.
ComIDP's R&D team customized the layout recognition parameters based on the customer’s needs. Constant updates and iterations led to an accuracy exceeding 95%, successfully delivered to them for acceptance.
ComIDP Solution
ComIDP team engaged in-depth conversations with this enterprise’s R&D team to comprehend specific needs and business goals, ensuring custom and practical solutions. From data collection, AI model training, model optimization to testing reports, we provide professional, flexible, and efficient services for customers.
Layout Analysis Model Training
By collecting different types of samples for manual data labeling, such as financial reports, papers, newspapers, and books, our R&D team trained a layout analysis AI model applicable to various industries. This model accurately identifies and classifies various elements on the page, such as titles, paragraphs, tables, and images, using 24 predefined labels, with recognition accuracy surpassing 95%.
Based on the specific data labeling needs of this enterprise, we further optimized our AI model. Through refined labeling types and rules, we achieved precise automated data labeling of complex document content. For instance, special recognition algorithms were designed and adjusted for code blocks and formula blocks in technical documents, accurately extracting and distinguishing these unique contents. AI-based ComIDP analyses both geometric and logical document layouts, ensuring 99% restoration of document layout and reading logic structure, thereby maintaining layout completeness and consistency. As requested by the client, labeled results are outputted in standardized JSON format, facilitating secondary processing and data analysis.
Test Reports Verify Effectiveness
Functional Testing
Upon AI model training completion, we conducted multiple rounds of rigorous testing to validate its performance, simultaneously using client-provided examples as validation sets to detect model accuracy, eventually producing a functional testing report. The report elaborated on our AI OCR model's behavior in automatically processing various document types, including different formats, sizes, and languages, plus elements like stamps, charts, formulas, and flowcharts. These results served as critical acceptance criteria for the model.
From the test report, we selected the ultimate effect of ComIDP processing documents with formulas. Results showed accurate recognition of both text and formulas, and our customer was very satisfied with the results.
Stress Testing
Facing this enterprise with over a hundred thousand daily document inputs, we performed comprehensive stress testing to ensure the system could handle massive document input pressures. We tested PDF to Word (Grid Layout) with and without OCR in both synchronous and asynchronous environment. Our stress test report indicated ComIDP maintained stability, accuracy, and quick responses under high load, proving its excellent performance and reliability in high-load tasks.
GPU&CPU Speed Testing
Additionally, we deployed a GPU to accelerate document processing speeds. Comparing GPU and CPU efficiency for the same tasks resulted in a detailed OCR GPU&CPU speed comparison report.
Below illustrates ComIDP's time expenditure for processing 100 image samples using GPU vs CPU. Testing indicated that in a dual-GPU system's dual-container environment, ComIDP processes up to 20,000 images per minute on average. GPU processing time is 100 times faster than CPU, demonstrating significant speed advantages for large-scale document processing, substantially reducing time and boosting efficiency. For customer’s actual applications and document processing demands, we provided a customized cluster deployment solution to ensure high efficiency in ComIDP's real-world application.
Final Result
Through accurate recognition and labeling of complex document layouts, ComIDP helps this technology company achieve highly automated document processing within just two months. This breakthrough led cumbersome document management to more convenient and efficient, improving workflow efficiency by over 80% while ensuring 95% layout restoration accuracy. Moreover, employing structured JSON data for developing RPA (Robotic Process Automation) and intelligent Q&A products enabled this company to rapidly bring products to market at a lower cost. This not only significantly shortened time-to-market but also markedly enhanced the product's market competitiveness.
During the collaboration, ComIDP's R&D team promptly responded to the customers’ feedback, communicated actively, and provided corresponding solutions to swiftly address technical issues. Customers were not only satisfied with the results of our products, but also highly praised our service attitude.
In the future, we will continue dedicating ourselves to technological development, providing more comprehensive and efficient solutions for enterprises, helping them tackle business challenges, and enabling them to excel in the competitive market.
If you wish to experience automated document processing, we sincerely invite you to try our ComIDP intelligent document processing solution. Through the ComIDP Demo, you can personally experience the powerful functions and ease of AI technology in automated document processing. For more information, please feel free to contact us, and we will provide you with professional support and service.