|
THotPDF.AssignSyntheticCodepointForGID / GetSyntheticCodepointForGID THotPDF PUA synthetic codepoint allocator (v2.119.68)
|
GSUB Engine Auto Shaping Pipeline Arabic Shaping |
|
Allocates and queries Private Use Area (U+E000 - U+F8FF) synthetic codepoints for OpenType GSUB substitute GIDs that have no natural Unicode codepoint reachable through the font's cmap. Closes the producer-side GID-level emission gap left by the v2.119.43-66 GSUB query and refinement APIs.
Delphi syntax: function AssignSyntheticCodepointForGID(GID: Word; out SyntheticCP: Word): Boolean; function GetSyntheticCodepointForGID(GID: Word): Word;
Why the API exists The v2.119.32-67 producer-side automatic shaping pipeline (Arabic / Latin / Devanagari) requires substitute GIDs returned by the GSUB engine to be reachable through a Unicode codepoint - the existing hex-encoded text pipeline emits codepoints, not GIDs, and the consumer reader resolves the codepoint back to a GID through the embedded
But font-specific substitutes that land on font-internal GIDs - most Devanagari cluster shapes, stylistic alternates the font designer ships only as numbered GIDs, CJK ideographic variation sequences (IVS), discretionary ligatures with no corresponding Presentation Form - have no codepoint at all in the font's cmap. Pre-v2.119.68 these GIDs were unreachable through the producer-side hex pipeline; v2.119.68 closes that gap by letting callers allocate a synthetic codepoint in the Private Use Area for any GID.
AssignSyntheticCodepointForGID semantics Allocates the next available PUA codepoint (starting at U+E000) for the supplied GID and mirrors the assignment into every cache that the existing producer-side hex pipeline + consumer-reader resolution chain depends on:
1. 2. 3.
Returns
GetSyntheticCodepointForGID semantics Pure-functional query of any existing assignment. Returns the synthetic codepoint allocated for
Allocator state lifecycle
Typical workflow (Devanagari cluster shape)
PDF.RegisterUnicodeTTF('NotoDeva', 'NotoSansDevanagari-Regular.ttf'); PDF.ShapingFeatures := [sfIndicShaping]; PDF.SetGSUBScript('deva');
// Get a font-internal cluster GID through the GSUB engine ClusterGID := PDF.GetSingleSubstituteGlyph(BaseGID, 'nukt'); if ClusterGID <> BaseGID then begin // Check if cmap reaches the substitute - usually no for Indic // clusters, since cluster GIDs are font-internal // Allocate a synthetic codepoint that the producer-side // hex pipeline can emit if PDF.AssignSyntheticCodepointForGID(ClusterGID, SyntheticCP) then begin // SyntheticCP is now in the U+E000-F8FF range; emit it // through UnicodeTextOut just like a normal codepoint PDF.CurrentPage.UnicodeTextOut(X, Y, 0, UnicodeChar(SyntheticCP)); PDF.MarkUnicodeGlyphUsed(ClusterGID); end; end;
Idempotency example
PDF.AssignSyntheticCodepointForGID(150, CP1); // CP1 = $E000 PDF.AssignSyntheticCodepointForGID(151, CP2); // CP2 = $E001 PDF.AssignSyntheticCodepointForGID(150, CP3); // CP3 = $E000 (idempotent) CP4 := PDF.GetSyntheticCodepointForGID(150); // CP4 = $E000 CP5 := PDF.GetSyntheticCodepointForGID(999); // CP5 = 0 (no assignment)
Consumer-reader behavior The consumer reader sees the PUA codepoint in the text-showing operator and resolves it through the document-embedded
Copy / paste behavior: PUA codepoints round-trip as themselves through copy / paste when the ToUnicode CMap declares them as identity mappings. Callers that want the source Unicode characters (the input run that produced the substitute) to round-trip instead can register a reverse mapping with
Phase 8 roadmap closure v2.119.68 / Phase 8c.6 closes the Phase 8 GSUB engine roadmap: every LookupType 1-8 query API (Phase 1-6), the Script / LangSys selection API (Phase 7), the TTF subsetter closure entry point (Phase 9), the static post-pass ligature folding (v2.119.32 / 58 / 60 / 62), the opt-in automatic pipeline (v2.119.59), Arabic
See also: OpenType GSUB Substitution Engine, Automatic Shaping Pipeline (Phase 8), Arabic / Persian / Urdu Shaping Support, Syriac / Mongolian / Devanagari Shaping, THPDFPage.BeginTaggedContent |