TutorialsData ExtractionPDF APIPHP

How to Extract Words from PDF using PHP - PDF Parsing API

Sierra Nakamura | Wed. 09 Jul. 2025

CONTENTS

Step1: Get and Access the License

Step2: Authentication PDF API

Step3: Create Task

Step4: Upload Files

Step5: Process and Extract Text

Step6: Get Task Information

Conclusion

How to Extract Words from PDF using PHP - PDF Parsing API

In today's digital age, the ability to parse and extract text from PDF documents is essential for enhancing efficiency, reducing error rates, and automating business processes. It's the same for PHP projects. In this article, we will delve into how to call ComPDFKit's PDF API in PHP to extract text from PDF documents efficiently.

This technology proves to be invaluable across various domains, significantly simplifying manual workflows and improving data accuracy and accessibility. The applications are widespread, including but not limited to:

Automated handling and auditing of bank statements and financial reports
Automatic grading and correction of exam papers and student assignments
Extraction of medical records and diagnostic reports for archiving and quick retrieval
Automatic extraction of customer data and feedback forms, which are then stored in database systems for data mining and analysis

Extract Text Demo 1000/Month Free API

Windows Web Android iOS Mac Server React Native Flutter Electron

30-day Free

Step1: Get and Access the License of PHP PDF API

For ComPDFKit API users, we provide 1000 free PDF API requests. Follow the steps below to access the license and start your API requests.

Register ComPDFKit API to go to the dashboard. You will see the API Keys, the progress of your API plan, and the status of API requests on your dashboard.

Create a project and get the Public Key and Secret Key.

After your account is created, a default project will be created. You can create more projects to call ComPDFKit API. All supported PDF APIs could be checked on the documentation pages.

There are unique Public Key and Secret Key for each project. Remember to apply the right key for the corresponding project.

ComPDFKit API Dashboard

Step2: Authentication PDF API for PDF Text Extraction

You need to replace the real publicKey and secretKey to get the accessToken. Then, use the accessToken to create a task, upload files, extract PDF words, and get the extracted PDF Text JSON file.

PHP code example to authenticate ComPDFKit PDF text Extracting API:

$params = [
    'publicKey' => $publicKey,
    'secretKey' => $secretKey
];
$headers = ['Content-Type: application/json'];
$curl = curl_init();
curl_setopt_array($curl, array(
    CURLOPT_URL => 'https://api-server.compdf.com/server/v1/oauth/token',
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_ENCODING => '',
    CURLOPT_MAXREDIRS => 10,
    CURLOPT_TIMEOUT => 0,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_CUSTOMREQUEST => 'POST',
    CURLOPT_HTTPHEADER => $headers,
    CURLOPT_POSTFIELDS => json_encode($params)
));
$response = curl_exec($curl);
curl_close($curl);
$result = json_decode($response, true);
$accessToken = $result['data']['accessToken'];
$bearerToken = "Bearer $accessToken";

Step3: Create Task - Extract PDF Text

You need to replace the accessToken which was obtained from the previous step. Set the language type you want to display the error information (1, English, 2, Chinese). ComPDFKit PDF API parameters can be found on the Quick Start --> Request Description page.

After replacing them, you will get the taskId in the response data. PHP code example to create PDF text extracting task:

$headers = [
    'Content-Type: application/json',
    'Authorization: ' . $bearerToken
];
$curl = curl_init();
curl_setopt_array($curl, array(
    CURLOPT_URL => 'https://api-server.compdf.com/server/v1/task/pdf/json?language=' . $language,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_ENCODING => '',
    CURLOPT_MAXREDIRS => 10,
    CURLOPT_TIMEOUT => 0,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_CUSTOMREQUEST => 'GET',
    CURLOPT_HTTPHEADER => $headers,
));
$response = curl_exec($curl);
curl_close($curl);
$result = json_decode($response, true);
$taskId = $result['data']['taskId'];

Step4: Upload Files for PDF Parser

Replace the information in the PHP code:

PDF Files: The PDF you want to extract Text from.
taskId: Obtained in the tast creating step.
Language: The language you want to display the error information.
accessToken: Obtained in the Authentication step.

ComPDFKit API provide AI, OCR, etc. You can also input the parameters in this step:

type：Options to extract contents (0: text, 1: table) Default 0.
isAllowOcr: Whether to allow to open OCR (1: yes, 0: no), Default 0.
isOnlyAiTable: Whether to enable AI to recognize table (1: yes, 0: no) Default 0.

PHP code example to upload PDFs to parsing:

$params = [
    'taskId' => $taskId, // ID of your task
    'file' => new CURLFile($pdfPath), // Files you need to process
    'language' => $language,
    'password' => '',
    'parameter' => json_encode(['type' => 1, 'isAllowOcr' => 1, 'isContainOcrBg' => 0])
];
$headers = [
    'Authorization: ' . $bearerToken
];
$curl = curl_init();
curl_setopt_array($curl, array(
    CURLOPT_URL => 'https://api-server.compdf.com/server/v1/file/upload',
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_ENCODING => '',
    CURLOPT_MAXREDIRS => 10,
    CURLOPT_TIMEOUT => 0,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_CUSTOMREQUEST => 'POST',
    CURLOPT_HTTPHEADER => $headers,
    CURLOPT_POSTFIELDS => $params
));
$response = curl_exec($curl);
curl_close($curl);
$result = json_decode($response, true);
$fileKey = $result['data']['fileKey'];

Step5: Process and Extract Text From Uploaded PDF Files

Execute the tast to extract Words from PDF you uploaded. Here is the PHP code example:

$headers = [
    'Content-Type: application/json',
    'Authorization: ' . $bearerToken
];
$curl = curl_init();
curl_setopt_array($curl, array(
    CURLOPT_URL => 'https://api-server.compdf.com/server/v1/execute/start?language=' . $language . '&taskId=' . $taskId,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_ENCODING => '',
    CURLOPT_MAXREDIRS => 10,
    CURLOPT_TIMEOUT => 0,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_CUSTOMREQUEST => 'GET',
    CURLOPT_HTTPHEADER => $headers,
));
$response = curl_exec($curl);
curl_close($curl);

Step6: Get Task Information of PDF Text Extraction

Follow the PHP code example below to obtain the task information. Replace the needed information like taskId and access_token. The PDF PDF parser and extracted result file is presented in a JSON file, which is a structured data format beneficial for the reuse of PDF text extraction.

$headers = [
    'Content-Type: application/json',
    'Authorization: ' . $bearerToken
];

$curl = curl_init();
curl_setopt_array($curl, array(
    CURLOPT_URL => 'https://api-server.compdf.com/server/v1/task/taskInfo' . '?taskId=' . $taskId,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_ENCODING => '',
    CURLOPT_MAXREDIRS => 10,
    CURLOPT_TIMEOUT => 0,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_CUSTOMREQUEST => 'GET',
    CURLOPT_HTTPHEADER => $headers,
));
$response = curl_exec($curl);
curl_close($curl);
$result = json_decode($response, true);

Conclusion

Beyond the ability to extract text from PDFs, we also support the extraction of tables, images, and other elements. This comprehensive functionality makes our PDF API solution an invaluable tool for anyone dealing with vast amounts of data encapsulated within PDF files.

By accurate data extraction, we empower users to quickly and efficiently harness the full potential of the information contained in their documents. Whether for research, data analysis, or simply improving productivity, ComPDFKit API stands as a cornerstone for better handling of PDF data.

Extract Text Demo 1000/Month Free API

Windows Web Android iOS Mac Server React Native Flutter Electron

30-day Free

Smart Ways to Convert Unstructured Data to Processable Data How to Copy Text from Images?What's So Hard about PDF Text Extraction?– Reasons and Solutions

How to Extract Words from PDF using PHP - PDF Parsing API

Step1: Get and Access the License of PHP PDF API

Step2: Authentication PDF API for PDF Text Extraction

Step3: Create Task - Extract PDF Text

Step4: Upload Files for PDF Parser

Step5: Process and Extract Text From Uploaded PDF Files

Step6: Get Task Information of PDF Text Extraction

Conclusion

Related Articles