|
OpenType GSUB Substitution Engine Glyph substitution capability surface (v2.119.43 - v2.119.50)
|
Arabic Shaping CFF / OpenType Subsetting |
|
The OpenType GSUB (Glyph SUBstitution) engine inside HotPDF lets callers query, drive, and embed every kind of glyph substitution an OpenType font declares - ligatures, stylistic alternates, contextual variants, Arabic / Indic shaping forms, CJK alternate forms, and so on. Every OpenType GSUB LookupType 1 through 8 is implemented and exposed as a capability-only query surface; the caller drives text emission and decides which substitute glyph to write to the page content stream.
Public API type TGSUBStringArray = array of AnsiString;
// LookupType 1 - Single Substitution (one glyph -> one glyph) function GetSingleSubstituteGlyph(InputGID: Word; const FeatureTag: AnsiString): Word;
// LookupType 2 - Multiple Substitution (one glyph -> sequence of glyphs) function GetMultipleSubstituteGlyphs(InputGID: Word; const FeatureTag: AnsiString; var OutGIDs: array of Word): Boolean;
// LookupType 3 - Alternate Substitution (one glyph -> one of N alternates) function GetAlternateGlyphCount(InputGID: Word; const FeatureTag: AnsiString): Integer; function GetAlternateGlyph(InputGID: Word; const FeatureTag: AnsiString; AlternateIndex: Integer): Word;
// LookupType 4 - Ligature Substitution (N glyphs -> one ligature) function ApplyLigatureSubstitution(const InputGIDs: array of Word; StartIndex: Integer; const FeatureTag: AnsiString; out OutGID: Word; out ConsumedCount: Integer): Boolean;
// LookupType 5 + 6 - Contextual / Chained Contextual Substitution function ApplyContextualSubst(const InputGIDs: array of Word; StartIndex: Integer; const FeatureTag: AnsiString; var OutGIDs: array of Word; out ConsumedLen: Integer): Boolean;
// LookupType 8 - Reverse Chained Contextual Single Substitution function ApplyReverseChainedContextualSubst(const InputGIDs: array of Word; StartIndex: Integer; const FeatureTag: AnsiString; out OutGID: Word): Boolean;
// Script / LangSys selection (Phase 7) procedure SetGSUBScript(const ScriptTag: AnsiString); procedure SetGSUBLanguage(const LangTag: AnsiString); function GetGSUBScripts: TGSUBStringArray; function GetGSUBLanguages(const ScriptTag: AnsiString): TGSUBStringArray; function GetGSUBFeatures(const ScriptTag, LangTag: AnsiString): TGSUBStringArray;
// TTF subsetter closure (Phase 9) procedure MarkUnicodeGlyphUsed(GID: Word);
Description The engine activates after
Defensive contract throughout: fonts without a GSUB table, non-4-byte feature tags, features the selected script / language does not advertise, GIDs no subtable covers, and LookupFlag-ignored input glyphs all return a safe no-op (False / OutGID = InputGID / empty OutGIDs / ConsumedCount = 1) so callers never see exceptions for routine "no substitution applies" cases.
LookupType matrix LookupType 1 (Single Substitution) - one glyph maps to one substitute. Canonical features: LookupType 2 (Multiple Substitution) - one glyph splits into a sequence of substitute glyphs. Canonical user: LookupType 3 (Alternate Substitution) - one glyph maps to one of N alternates. Canonical features: LookupType 4 (Ligature Substitution) - N input glyphs fold into one ligature. Canonical features: LookupType 5 (Contextual Substitution) + LookupType 6 (Chained Contextual Substitution) - matches an input glyph sequence and dispatches nested lookups at specific positions inside the match. All three Format variants (1 literal sequence, 2 ClassDef sequence, 3 Coverage sequence) are implemented; the SequenceLookupRecord dispatcher re-enters the LookupList and handles Single / Multiple / Alternate (first) / Ligature nested lookups with live MatchPositions tracking. Canonical features: LookupType 7 (Extension Substitution) - pure indirection layer the OpenType spec defines for fonts whose substitution subtable lives beyond the 16-bit reach of the LookupList. Every public API transparently follows the 32-bit Offset32 indirection to the real LookupType 1 / 2 / 3 / 4 / 5 / 6 / 8 subtable. Unblocks heavy CJK / Indic fonts (Noto Sans CJK, Noto Sans Devanagari) whose GSUB exceeds 64 KB. No separate API - the unwrap is automatic. LookupType 8 (Reverse Chained Contextual Single Substitution) - context-aware 1:1 substitution whose distinguishing feature is that callers must apply it in REVERSE scan order over a multi-glyph run (end -> start) because each substitute may depend on FUTURE lookahead context that must not have been substituted yet. Canonical use: Arabic / Syriac / N'Ko / Indic contextual alternates whose final form depends on the following glyph. Use
Script / LangSys selection By default the engine prefers the
Strict-vs-fallback semantics: an unknown
LookupFlag honour and GDEF Every query reads each Lookup table's LookupFlag (and the optional trailing markFilteringSet uint16 when
TTF subsetter closure (MarkUnicodeGlyphUsed) HotPDF's v2.84.0 TTF subsetter derives its used-glyph set from
After emitting any GID returned by
Typical workflow (Latin small caps)
PDF.RegisterUnicodeTTF('myFont', 'C:\\Windows\\Fonts\\arial.ttf'); PDF.SetGSUBScript('latn'); PDF.SetGSUBLanguage(''); // default LangSys SmallCapGID := PDF.GetSingleSubstituteGlyph(InputGID, 'smcp'); if SmallCapGID <> InputGID then begin // emit SmallCapGID into the page content stream... PDF.MarkUnicodeGlyphUsed(SmallCapGID); // pull into subset end;
Typical workflow (Arabic LAM-ALEF ligature)
PDF.SetGSUBScript('arab'); Run := [LamGID, FathaGID, AlefGID]; // post-cmap GIDs if PDF.ApplyLigatureSubstitution(Run, 0, 'rlig', LigGID, ConsumedCount) then begin // emit LigGID + advance by ConsumedCount PDF.MarkUnicodeGlyphUsed(LigGID); end;
Scope and limitations The engine is a capability-only query surface: it answers "what would GSUB do here", but it does not run an automatic shaping pipeline (Harfbuzz-class layout, cluster-aware reordering for Indic, BiDi resolution, GPOS positioning, mark attachment). Callers are responsible for driving the scan loop, choosing which substitute / alternate to emit, calling
Producer-side Arabic / Persian / Urdu shaping (LAM-ALEF mandatory ligature + Arabic Presentation Forms-A) is implemented as a separate built-in pipeline that runs automatically during text emission - see Arabic / Persian / Urdu Shaping.
Version trace v2.119.43 Single Substitution + Phase 1. v2.119.44 Multiple + Alternate (Phase 2). v2.119.45 Ligature (Phase 3). v2.119.46 Extension + GDEF + LookupFlag honour (Phase 4). v2.119.47 Contextual + Chained Contextual + SequenceLookupRecord dispatcher (Phase 5). v2.119.48 Reverse Chained Contextual - LookupType 1-8 matrix closed (Phase 6). v2.119.49 Script / LangSys selection API (Phase 7). v2.119.50 TTF subsetter closure via
See also: Arabic / Persian / Urdu Shaping, CFF / OpenType Font Subsetting Functions, THotPDF.EnableFontSubsetting |