Apitron.PDF.Kit library for .NET
Assembly: Apitron.PDF.Kit (in Apitron.PDF.Kit.dll) Version: 2.0.37.0 (2.0.37.0)
This enum represents a text extraction options.
Namespace: Apitron.PDF.Kit.Extraction
Assembly: Apitron.PDF.Kit (in Apitron.PDF.Kit.dll) Version: 2.0.37.0 (2.0.37.0)
Syntax
Members
Member name | Value | Description | |
---|---|---|---|
RawText | 0 | Text will be extracted as it appears in PDF content without any formatting. | |
FormattedText | 1 | Text will be extracted as it appears in PDF content and intelligent formatting will be applied. | |
TaggedText | 2 |
Text will be extracted in xml format, fragmented as it appears on pdf page.
Examples | |
MergedTaggedText | 3 |
Text will be extracted in xml format similar to used by TaggedText option. This option enables a strategy that merges text blocks, having similar propeties, as much as possible.
Examples | |
HtmlFragment | 4 | Text will be extracted in html format. Page content will be wrapped by <div> and each text fragment will be represented by a preformatted, absolutely positioned, styled block. | |
HtmlPage | 5 | Text will be extracted in html format. Page content will be represented by a complete html page, and each text fragment by a preformatted, absolutely positioned, styled block. |
See Also