THotPDF.AssignSyntheticCodepointForGID Method

THotPDF.AssignSyntheticCodepointForGID / GetSyntheticCodepointForGID

THotPDF PUA synthetic codepoint allocator (v2.119.68)

GSUB Engine Auto Shaping Pipeline Arabic Shaping

Allocates and queries Private Use Area (U+E000 - U+F8FF) synthetic codepoints for OpenType GSUB substitute GIDs that have no natural Unicode codepoint reachable through the font's cmap. Closes the producer-side GID-level emission gap left by the v2.119.43-66 GSUB query and refinement APIs.

Delphi syntax:

function AssignSyntheticCodepointForGID(GID: Word; out SyntheticCP: Word): Boolean;

function GetSyntheticCodepointForGID(GID: Word): Word;

Why the API exists

The v2.119.32-67 producer-side automatic shaping pipeline (Arabic / Latin / Devanagari) requires substitute GIDs returned by the GSUB engine to be reachable through a Unicode codepoint - the existing hex-encoded text pipeline emits codepoints, not GIDs, and the consumer reader resolves the codepoint back to a GID through the embedded /CIDToGIDMap. For substitute GIDs that have a natural Unicode codepoint via the font's cmap (Arabic Presentation Forms, Latin Standard Ligatures FB00-FB06), the existing pipeline works fine.

But font-specific substitutes that land on font-internal GIDs - most Devanagari cluster shapes, stylistic alternates the font designer ships only as numbered GIDs, CJK ideographic variation sequences (IVS), discretionary ligatures with no corresponding Presentation Form - have no codepoint at all in the font's cmap. Pre-v2.119.68 these GIDs were unreachable through the producer-side hex pipeline; v2.119.68 closes that gap by letting callers allocate a synthetic codepoint in the Private Use Area for any GID.

AssignSyntheticCodepointForGID semantics

Allocates the next available PUA codepoint (starting at U+E000) for the supplied GID and mirrors the assignment into every cache that the existing producer-side hex pipeline + consumer-reader resolution chain depends on:

1. FUnicodeCpToGid[SyntheticCP] := GID - so the producer-side hex pipeline emits SyntheticCP into the text-showing operator and the consumer reader resolves SyntheticCP back to GID through /CIDToGIDMap at render time.

2. FAcroFormUnicodeAdvances[SyntheticCP] := em-fraction - so the v2.65 word-wrap calculator finds the correct hmtx advance for the synthetic codepoint when it appears in AcroForm text-field content.

3. FUnicodeSyntheticCpForGID[GID] := SyntheticCP - the per-GID reverse-lookup table used by GetSyntheticCodepointForGID to make repeat AssignSyntheticCodepointForGID calls idempotent (the second call with the same GID returns the already-allocated SyntheticCP).

Returns True on success with SyntheticCP set to the allocated codepoint. Returns False (and leaves SyntheticCP at 0) under any of these defensive conditions: no font registered (RegisterUnicodeTTF never called or called with empty arguments to reset state), invalid GID (zero or beyond the cmap's glyph count), PUA range exhausted (all 6400 slots U+E000 - U+F8FF allocated), cache uninitialised on entry.

GetSyntheticCodepointForGID semantics

Pure-functional query of any existing assignment. Returns the synthetic codepoint allocated for GID if AssignSyntheticCodepointForGID(GID, ...) has been called previously; otherwise returns 0 (which is not a valid PUA codepoint, so it doubles as a "no assignment" sentinel). Does not allocate. Safe to call before any AssignSyntheticCodepointForGID has run.

Allocator state lifecycle

FUnicodeSyntheticCpForGID and the next-available-PUA cursor (FUnicodeNextSyntheticCp) are lazy-allocated on first AssignSyntheticCodepointForGID call. The cursor starts at 0 (uninitialised) and bumps to $E000 on first allocation; subsequent allocations move it through $E001, $E002, ..., $F8FF. Both fields are reset to empty / 0 on every RegisterUnicodeTTF('', nil) together with the rest of the per-font subset state, so callers that re-use a THotPDF instance across multiple documents start each document with a fresh PUA cursor.

Typical workflow (Devanagari cluster shape)

PDF.RegisterUnicodeTTF('NotoDeva', 'NotoSansDevanagari-Regular.ttf');

PDF.ShapingFeatures := [sfIndicShaping];

PDF.SetGSUBScript('deva');

// Get a font-internal cluster GID through the GSUB engine

ClusterGID := PDF.GetSingleSubstituteGlyph(BaseGID, 'nukt');

if ClusterGID <> BaseGID then

begin

// Check if cmap reaches the substitute - usually no for Indic

// clusters, since cluster GIDs are font-internal

// Allocate a synthetic codepoint that the producer-side

// hex pipeline can emit

if PDF.AssignSyntheticCodepointForGID(ClusterGID, SyntheticCP) then

begin

// SyntheticCP is now in the U+E000-F8FF range; emit it

// through UnicodeTextOut just like a normal codepoint

PDF.CurrentPage.UnicodeTextOut(X, Y, 0, UnicodeChar(SyntheticCP));

PDF.MarkUnicodeGlyphUsed(ClusterGID);

end;

end;

Idempotency example

PDF.AssignSyntheticCodepointForGID(150, CP1); // CP1 = $E000

PDF.AssignSyntheticCodepointForGID(151, CP2); // CP2 = $E001

PDF.AssignSyntheticCodepointForGID(150, CP3); // CP3 = $E000 (idempotent)

CP4 := PDF.GetSyntheticCodepointForGID(150); // CP4 = $E000

CP5 := PDF.GetSyntheticCodepointForGID(999); // CP5 = 0 (no assignment)

Consumer-reader behaviour

The consumer reader sees the PUA codepoint in the text-showing operator and resolves it through the document-embedded /CIDToGIDMap to the target GID, then renders that GID using the embedded font program. From the reader's perspective there is no difference between a "natural" Unicode codepoint that the cmap routes to GID and a PUA synthetic codepoint that /CIDToGIDMap routes to GID - both produce the same rendered glyph.

Copy / paste behaviour: PUA codepoints round-trip as themselves through copy / paste when the ToUnicode CMap declares them as identity mappings. Callers that want the source Unicode characters (the input run that produced the substitute) to round-trip instead can register a reverse mapping with RegisterToUnicodeReverseMapping or author ActualText marked-content sequence properties through BeginTaggedContent and emit the synthetic codepoints inside the bracketed content. HotPDF uses the same internal-CID pattern automatically for RegisterUnicodeTTF-backed AcroForm appearance streams that contain supplementary-plane Unicode characters.

Phase 8 roadmap closure

v2.119.68 / Phase 8c.6 closes the Phase 8 GSUB engine roadmap: every LookupType 1-8 query API (Phase 1-6), the Script / LangSys selection API (Phase 7), the TTF subsetter closure entry point (Phase 9), the static post-pass ligature folding (v2.119.32 / 58 / 60 / 62), the opt-in automatic pipeline (v2.119.59), Arabic rlig + Latin liga / clig + rclt automatic emission (Phase 8b / 8c.2 / 8b / GSUB 'rclt'), ToUnicode reverse-mapping (v2.119.61 / 62 / 65), advance query (v2.119.64), Devanagari Indic reorder pre-pass (v2.119.67), and now PUA synthetic codepoint GID-level emit (v2.119.68) all integrate into a single producer-side shaping surface that handles every kind of substitute glyph an OpenType font can produce.