Conversion Options: Contain Image & Annotation
Overview
In the process of converting PDF documents into various formats, ComPDFKit Conversion SDK offers two additional options for users: one option to determine whether images are included in the generated document, and another to decide if annotations from the PDF file are to be retained.
With the "Include Images" option enabled, ComPDFKit Conversion SDK will extract the images from the PDF document and embed them in the corresponding pages and positions in the output file. For areas with overlapping images, ComPDFKit Conversion SDK merges these images into one and embeds it into the exact location on the corresponding page of the output file.
When the "Include Annotations" option is selected, most annotations are converted into raster images and embedded at the respective positions within your document. However, certain types of annotations, such as highlights, underlines, strikeouts, and squiggly, are converted into their respective formatting equivalents in the converted Word, PPT, and HTML documents, and are marked over the corresponding text. It is important to note that the conversion won't be 100% accurate in every instance.
In the ComPDFKit Conversion SDK, the options of including image and annotation are commonly used in the following format conversion:
- PDF to Word
- PDF to Excel
- PDF to PowerPoint
- PDF to HTML
- PDF to RTF
About Text Markup Annotation
Highlight: When converting PDF to Word format keeping the highlight markups, it is important to note that Microsoft Word only supports 15 highlighter colors. To approximate the original document's appearance as closely as possible, text in the Word document will be marked with a text background color that matches the color of the original document's highlight annotation. For conversions to Microsoft PPT format, the native highlighting feature within the format is used to mark the text. In the case of converting to HTML format, a unique
<span>
tag is created for the marked text, and the background style is set to match the color of the corresponding annotation in the original document.Underlines & Squiggly: When converting PDF to Word or PPT formats keeping the underline and squiggly markups, the marked text will be marked with the same style in Microsoft Office. When converted to HTML format, the marked text will be styled to display the same effect. However, if a paragraph of text in the original document is marked by both underline and squiggly, then the text will only be marked with one type (Because squiggly is actually a type of underline in Word, PPT, and HTML formats).
Strikeout: When converting strikeout markups to Word and PowerPoint formats, the marked text will be added with a strikeout natively supported by Microsoft Office. However, in these two file formats, the color of the strikeout itself cannot be the same as that in the original PDF document because the strikeout color in Word and PPT will only change according to the color of the marked text font itself. When converted to HTML format, the same strikeout color as the original document will be displayed.
Markup annotations are not supported when converting PDF to Word in flow layout mode.
Sample
This Sample demonstrates how to use the ComPDFKit Conversion SDK to convert a PDF document to a Word document with the selected options: Include images and annotations.
options = ConvertOptions()
options.contain_image = True
options.contain_annotation = True
error_code = PDFToOffice.start_pdf_to_word("sample.pdf", "", "path/to/output", options, callback)
if error_code == ErrorCode.Success:
print("Convert success")