Docs PDFiumVCL

CharacterCount property

Questa voce API conserva identificatori, firme, blocchi di codice e termini PDF nella forma originale.
Component: TPdf  ·  Unit: PDFium
Number of extractable characters on the currently active page.

Syntax

property CharacterCount: Integer; // read only

Description

CharacterCount returns the number of characters PDFium identifies on the active page’s FPDF_TEXTPAGE — the same indexing that drives Character[Index], CharacterRectangle[Index], CharacterOrigin[Index] and the search / selection layer. The count reflects glyphs after ToUnicode CMap resolution, so each entry corresponds to one Unicode code point as decoded by PDFium, not to one PDF font-program glyph.

The value is 0 when Active is False, when the page has no text content (pure image / vector pages), or when the text content is drawn with fonts that lack a ToUnicode CMap and no fallback heuristic was able to decode them. PDFium also includes “generated” whitespace characters that it synthesises to model line breaks and word boundaries — check CharacterGenerated[Index] to distinguish real glyphs from synthetic ones.

The typical loop is for I := 0 to Pdf.CharacterCount - 1 do, with Pdf.Character[I] returning a WideChar. Concatenating these into a single string yields the page’s text content in reading order. CharacterCount itself is O(1) after the page is parsed; the indexed accessors do the per-character work.

Remarks

Example

var S: string;
var I: Integer;
SetLength(S, Pdf1.CharacterCount);
for I := 1 to Length(S) do
  S[I] := Pdf1.Character[I - 1];
Memo1.Lines.Add(S);

See Also

Character, CharacterRectangle, CharacterOrigin, TextPage