TextExtractionOptions Enumeration

Apitron PDF Kit help

Apitron.PDF.Kit library for .NET

This enum represents a text extraction options.

Namespace: Apitron.PDF.Kit.Extraction
Assembly: Apitron.PDF.Kit (in Apitron.PDF.Kit.dll) Version: 2.0.37.0 (2.0.37.0)

Syntax

[FlagsAttribute]
public enum TextExtractionOptions

<FlagsAttribute>
Public Enumeration TextExtractionOptions

Members

Member name	Value	Description
RawText	0	Text will be extracted as it appears in PDF content without any formatting.
FormattedText	1	Text will be extracted as it appears in PDF content and intelligent formatting will be applied.
TaggedText	2	Text will be extracted in xml format, fragmented as it appears on pdf page. Examples <page><textblock attribute1 ... attributeN>text1</textblock>...<textblock>text2</textblock></page>
MergedTaggedText	3	Text will be extracted in xml format similar to used by TaggedText option. This option enables a strategy that merges text blocks, having similar propeties, as much as possible. Examples <page><textblock attribute1 ... attributeN>text1</textblock>...<textblock>text2</textblock></page>
HtmlFragment	4	Text will be extracted in html format. Page content will be wrapped by <div> and each text fragment will be represented by a preformatted, absolutely positioned, styled block.
HtmlPage	5	Text will be extracted in html format. Page content will be represented by a complete html page, and each text fragment by a preformatted, absolutely positioned, styled block.