CharacterCount - TPdf

Questa voce API conserva identificatori, firme, blocchi di codice e termini PDF nella forma originale.

Component: TPdf · Unit: PDFium

Number of extractable characters on the currently active page.

Syntax

property CharacterCount: Integer; // read only

Description

CharacterCount returns the number of characters PDFium identifies on the active page’s FPDF_TEXTPAGE — the same indexing that drives Character[Index], CharacterRectangle[Index], CharacterOrigin[Index] and the search / selection layer. The count reflects glyphs after ToUnicode CMap resolution, so each entry corresponds to one Unicode code point as decoded by PDFium, not to one PDF font-program glyph.

The value is 0 when Active is False, when the page has no text content (pure image / vector pages), or when the text content is drawn with fonts that lack a ToUnicode CMap and no fallback heuristic was able to decode them. PDFium also includes “generated” whitespace characters that it synthesises to model line breaks and word boundaries — check CharacterGenerated[Index] to distinguish real glyphs from synthetic ones.

The typical loop is for I := 0 to Pdf.CharacterCount - 1 do, with Pdf.Character[I] returning a WideChar. Concatenating these into a single string yields the page’s text content in reading order. CharacterCount itself is O(1) after the page is parsed; the indexed accessors do the per-character work.

Remarks

Surrogate pairs in supplementary planes (emoji, rare CJK ideographs) occupy two character indices — one high surrogate, one low surrogate — matching UTF-16 code units, not Unicode code points.
Use CharacterMapError[Index] to detect glyphs that PDFium could not map back to Unicode; those typically need OCR or font-program inspection.
For document-wide totals, iterate every page and accumulate the count. CharacterCount is per-page; PDFium does not provide an “all pages” total directly.

Example

var S: string;
var I: Integer;
SetLength(S, Pdf1.CharacterCount);
for I := 1 to Length(S) do
S[I] := Pdf1.Character[I - 1];
Memo1.Lines.Add(S);

CharacterCount property

Component: TPdf · Unit: PDFium

Syntax

Description

Remarks

Example

See Also