Docs PDFiumVCL

CharacterMapError property

Questa voce API conserva identificatori, firme, blocchi di codice e termini PDF nella forma originale.
Component: TPdf  ·  Unit: PDFium
Returns True when the character at the specified index could not be mapped to a valid Unicode code point. The flag is reported per-character by the PDFium text extractor and lets callers identify code points that are likely incorrect because the originating font lacks a usable ToUnicode CMap or encoding vector.

Syntax

property CharacterMapError[Index: Integer]: Boolean; // read only

IndexZero-based character index on the current page, in the range 0 to CharacterCount - 1.

Description

CharacterMapError returns True when PDFium encountered an error mapping the character at the specified index to a Unicode code point. When this is True, the value returned by Character[Index] is unreliable and often falls back to a replacement character (#$FFFD) or to a passthrough of the underlying glyph identifier.

The flag is most commonly raised on PDFs produced by printer drivers and conversion tools that embed font subsets without a corresponding ToUnicode CMap, or with a CMap that does not cover the full glyph set. Scanned PDFs with embedded OCR text and PDFs that use custom encoding vectors are typical sources of map errors.

Callers that need to round-trip text from a PDF should filter or annotate CharacterMapError hits before treating them as searchable content. A cluster of consecutive map errors usually indicates an entire string run that should be re-extracted through OCR rather than via the text layer.

Remarks

Example

var
  I, BadCount: Integer;
begin
  BadCount := 0;
  for I := 0 to Pdf.CharacterCount - 1 do
    if Pdf.CharacterMapError[I] then
    begin
      Memo1.Lines.Add(Format('Map error at %d (charcode $%x)',
        [I, Pdf.Charcode[I]]));
      Inc(BadCount);
    end;
  Caption := Format('%d untranslatable characters on page', [BadCount]);
end;

See Also

Character, Charcode, CharacterGenerated, CharacterCount, Text