TextExtractionOptions Enumeration

Apitron PDF Kit help
Apitron.PDF.Kit library for .NET
This enum represents a text extraction options.

Namespace:  Apitron.PDF.Kit.Extraction
Assembly:  Apitron.PDF.Kit (in Apitron.PDF.Kit.dll) Version: 2.0.37.0 (2.0.37.0)
Syntax

[FlagsAttribute]
public enum TextExtractionOptions
Members

  Member nameValueDescription
RawText0 Text will be extracted as it appears in PDF content without any formatting.
FormattedText1 Text will be extracted as it appears in PDF content and intelligent formatting will be applied.
TaggedText2 Text will be extracted in xml format, fragmented as it appears on pdf page.
Examples

<page><textblock attribute1 ... attributeN>text1</textblock>...<textblock>text2</textblock></page>
MergedTaggedText3 Text will be extracted in xml format similar to used by TaggedText option. This option enables a strategy that merges text blocks, having similar propeties, as much as possible.
Examples

<page><textblock attribute1 ... attributeN>text1</textblock>...<textblock>text2</textblock></page>
HtmlFragment4 Text will be extracted in html format. Page content will be wrapped by <div> and each text fragment will be represented by a preformatted, absolutely positioned, styled block.
HtmlPage5 Text will be extracted in html format. Page content will be represented by a complete html page, and each text fragment by a preformatted, absolutely positioned, styled block.
See Also

Reference