SetTextExtractionOptions

Text, Extraction

Description

Sets various options that affect the text extraction functionality.

From 8.13, this function sets the text extraction options for the selected document only. It also only affects the results of the GetPageText

function.

To adjust the text extraction for the ExtractFilePageText and DAExtractPageText functions, use the new DASetTextExtractionOptions

function.

Syntax

Delphi

function TPDFlib.SetTextExtractionOptions(OptionID, NewValue: Integer): Integer;

ActiveX

Function PDFlib::SetTextExtractionOptions(OptionID As Long, NewValue As Long) As Long

DLL

int DLSetTextExtractionOptions(int InstanceID, int OptionID, int NewValue);

Parameters

OptionID1 = Ignore Font changes to allow grouping different blocks together 2 = Ignore Color changes to allow grouping different blocks together 3 = Ignore Text Block changes to allow grouping different blocks together 4 = Output CMYK color values 5 = Sort text blocks based on top left position 6 = Descenders from font metrics 7 = Ignore overlaps 8 = Ignore duplicates 9 = Split on double space 10 = Trim characters outside area 11 = Alternative block matching 12 = Ignore rotated text blocks 13 = Trim leading and trailing whitespace from text blocks 14 = Output non ASCII characters below Space character (0x32) 15 = Remove certain character strings such as underscore lines (see below)
NewValueFor OptionID = 1, 2, 3 and 6: 0 = Use, 1 = Ignore For OptionID = 4: 0 = Show as RGB (default), 1 = Show as CMYK For OptionID = 5: 0 = Do not sort blocks (default), 1 = Sort blocks For OptionID = 7, 8 and 12: 0 = Do not ignore, 1 = Ignore OptionID = 9: 0 = Do not split on double space (default) 1 = Split on double space OptionID = 10: 0 = Do not trim characters outside area (default) 1 = Trim characters outside area OptionID = 11: 0 = Regular block matching 1 = Alternative block matching OptionID = 13: 0 = Do not trim leading or trailing whitespace 1 = Trim leading and trailing whitespace OptionID = 14 0 = Remove non ASCII chracters below space character from output (default) 1 = Output raw unfiltered ASCII characters OptionID = 15 0 = Output text lines made with Underscore characters (default) 1 = Remove text lines made with Underscore characters

Return values

0The OptionID or NewValue parameter was not valid
1The text extraction option was set successfully