PDFiumVCL 文档

CharacterGenerated property

此 API 条目保留标识符、签名、代码块和 PDF 术语的原始形式。
Component: TPdf  ·  Unit: PDFium
Returns True when the character at the specified index was synthesised by PDFium during text extraction rather than being drawn explicitly by the page content stream. Synthetic characters are emitted for ligature expansion, soft hyphenation, and word/line boundary heuristics so search and select-all results read naturally.

Syntax

property CharacterGenerated[Index: Integer]: Boolean; // read only

IndexZero-based character index on the current page, in the range 0 to CharacterCount - 1.

Description

CharacterGenerated returns True when the character at the specified index was generated internally by PDFium rather than being present in the original content stream. Typical examples are: ligature decomposition (splitting an fi ligature glyph into f + i), the trailing space that PDFium inserts between adjacent text runs that are visually separated, and the line-break characters injected at the end of each visual line.

Generated characters have no own drawing in the page content stream, so their bounding boxes are derived from neighbouring glyphs. Treat their on-page position with caution if you need pixel-accurate hit-testing; their CharacterRectangle can collapse to a zero-width or zero-height region.

For tasks such as exact verbatim extraction (digital signatures, hash-of-text workflows, ASCII-true comparison with the page content stream), filter out generated characters first. For normal text search and clipboard copy use the flag is purely informational — PDFium has already inserted the characters because users expect them.

Remarks

Example

// Build a verbatim string that contains only characters from the content stream
var
  I: Integer;
  S: WString;
begin
  S := '';
  for I := 0 to Pdf.CharacterCount - 1 do
    if not Pdf.CharacterGenerated[I] then
      S := S + Pdf.Character[I];
  Memo1.Text := S;
end;

See Also

Character, CharacterIsHyphen, CharacterMapError, CharacterRectangle, Text