|
The automatic shaping pipeline elevates the OpenType GSUB engine from a capability-only query surface into a producer-side feature that is applied automatically as text is emitted into PDF content streams. Callers enable specific GSUB features through a typed set (ShapingFeatures: THPDFShapingFeatures) and HotPDF takes care of running the right substitutions, marking substitute glyphs into the embedded font subset, and emitting the ToUnicode CMap reverse-mapping entries needed for accessibility.
Opt-in framework (v2.119.59 / Phase 8a)
A new enum and property control which automatic substitutions run during text emission:
type
THPDFShapingFeature = (
sfArabicGSUB, // font-defined 'rlig' (Required Ligatures) for Arabic
sfStandardLigatures, // Latin 'liga' (ff / fi / fl / ffi / ffl / sft / st)
sfContextualLigatures,// Latin 'clig' (contextual ligatures)
sfContextualAlternates,// 'rclt' (Required Contextual Alternates)
sfIndicShaping); // Devanagari Repha + pre-base I-matra reorder
THPDFShapingFeatures = set of THPDFShapingFeature;
property ShapingFeatures: THPDFShapingFeatures read ... write ...;
Default is [] (empty set), which preserves byte-identical output for callers who depend on the v2.119.32-58 static post-pass shaper. Setting one or more flags elevates the engine into automatic mode for the corresponding features.
sfArabicGSUB - Phase 8c.2 (v2.119.63)
When sfArabicGSUB is set, font-defined rlig (Required Ligatures) substitutions are applied to Arabic text runs automatically. ApplyArabicGSUBRefinement walks the cmap to build a GID array, calls ApplyLigatureSubstitution with the rlig feature tag, maps substitute GIDs back through the reverse cmap (covering FB50-FDFF + FE70-FEFF Arabic Presentation Forms) to a Unicode codepoint, and calls MarkUnicodeGlyphUsed so the substitute glyph is kept in the embedded font subset. This covers font-specific ligatures beyond the four hard-coded Arabic ligature families (LAM-ALEF v2.119.32, YEH-HAMZA v2.119.58, Allah v2.119.60, Bismillah v2.119.62).
Setting sfArabicGSUB implicitly bypasses the v2.85.0 static 4-position shaper for Arabic - callers who need the static shaper to keep handling codepoints outside what the font's GSUB declares should leave sfArabicGSUB off.
sfStandardLigatures / sfContextualLigatures - Phase 8b (v2.119.65)
When sfStandardLigatures is set, Latin Standard Ligatures are folded automatically using the font's liga feature. ApplyLatinLigatureRefinement targets the Alphabetic Presentation Forms block (U+FB00-FB4F) - typically FB00 ff, FB01 fi, FB02 fl, FB03 ffi, FB04 ffl, FB05 long-s + t, FB06 st. sfContextualLigatures adds a second pass for the font's clig feature. Both passes use the same reverse-cmap mechanism as sfArabicGSUB and emit 7 new ToUnicode CMap reverse-mapping entries (FB00-FB06) so consumer-reader copy / paste resolves the ligature back to the source letters.
sfContextualAlternates - GSUB 'rclt' (v2.119.66)
When sfContextualAlternates is set, the font's rclt (Required Contextual Alternates) feature is applied. ApplyArabicGSUBContextualRefinement uses the v2.119.47 ApplyContextualSubst entry point and handles variable-length N-to-M output (substitution is only committed when every replacement GID is reachable through the reverse cmap). Reverse cmap range is extended to FB00-FDFF + FE70-FEFF to cover Latin + Arabic + Hebrew Presentation Forms.
Canonical users of rclt: Arabic init / medi / fina / isol when the font drives positional shaping through GSUB instead of through Unicode Presentation Forms codepoints; certain Latin sequence disambiguation rules; Indic shaping pres / blws / psts / half / pstf / cjct features when registered as rclt by the font designer.
sfIndicShaping - Phase 8e (v2.119.67)
When sfIndicShaping is set, the v2.119.55 Devanagari capability ApplyDevanagariReorder is promoted from a manual method to an automatic pre-pass applied inside the three BuildUnicode*FieldContent helpers. Devanagari runs get Repha (Ra + Halant at cluster start) moved to the post-base position, and pre-base I-matra (U+093F) moved before the cluster base consonant, so the consumer reader's GSUB engine picks up the syllable in the correct rendering order. Other Indic reorders (above-base / below-base matra, conjunct formation) remain in the font's GSUB.
Advance-query support (v2.119.64 / Phase 8c.5)
A companion API exposes the cached /W em-fraction so callers can compute word-wrap correctly when emitting GSUB-substituted glyphs:
function GetCodepointAdvance(CP: Cardinal): Single;
Returns the hmtx-derived advance width as a fraction of em for the cmap-resolved glyph at CP. The same release also fixed CodeUnitAdvance to classify Arabic Presentation Forms (U+FB50-FDFF + U+FE70-FEFF) as NARROW instead of WIDE (the heuristic fallback was wrong before v2.65).
Typical workflow (full Arabic auto-shaping)
PDF.RegisterUnicodeTTF('NotoArab', 'NotoSansArabic-Regular.ttf');
PDF.ShapingFeatures :=
[sfArabicGSUB, // font-defined 'rlig'
sfContextualAlternates]; // 'rclt' (positional shaping in GSUB-driven fonts)
PDF.SetGSUBScript('arab');
PDF.BeginDoc;
PDF.CurrentPage.SetFont('NotoArab', [], 14);
PDF.CurrentPage.RtLTextOut(100, 700, 0,
UnicodeString(#$0628#$0633#$0645#$0020#$0627#$0644#$0644#$0647));
PDF.EndDoc;
Typical workflow (Latin standard ligatures + Devanagari reorder)
PDF.RegisterUnicodeTTF('NotoSans', 'NotoSans-Regular.ttf');
PDF.RegisterUnicodeTTF('NotoDeva', 'NotoSansDevanagari-Regular.ttf');
PDF.ShapingFeatures :=
[sfStandardLigatures, // FB00-FB06 Latin liga
sfContextualLigatures, // + clig
sfIndicShaping]; // Devanagari Repha + I-matra reorder
Phase 8 roadmap closure
Phase 8a (v2.119.59) opt-in framework + Arabic capability; Phase 8b (v2.119.65) Latin standard ligatures; Phase 8c.1 (v2.119.60) Allah; Phase 8c.2 (v2.119.63) GID-level GSUB rlig; Phase 8c.3 (v2.119.61) ToUnicode reverse mapping; Phase 8c.4 (v2.119.62) Bismillah; Phase 8c.5 (v2.119.64) advance query + heuristic fix; Phase 8c.6 (v2.119.68) PUA synthetic codepoint emit; Phase 8d was rolled into 8c sub-phases; Phase 8e (v2.119.67) Devanagari auto-reorder. With v2.119.68 the Phase 8 capability matrix is closed; further refinements (additional Indic scripts, OpenType GPOS positioning, BiDi resolution) are tracked under separate roadmap items.
Scope and limitations
The opt-in pipeline is intentionally additive over the static post-pass shaper - existing callers see no behavior change unless they opt in. The default [] set is the safe choice for byte-stable regression. Fonts without the requested GSUB feature tables produce safe no-op output (callers see no substitution applied, no exceptions raised).
No OpenType GPOS positioning is applied - the pipeline is substitution-only. No BiDi (Bidirectional) algorithm - callers still order mixed-direction runs in visual order or use a separate BiDi library. No automatic Indic shaping for scripts beyond Devanagari yet; future revisions are tracked in dev-notes/GSUB-Engine-Roadmap.md.
See also: OpenType GSUB Substitution Engine, Arabic / Persian / Urdu Shaping Support, Syriac / Mongolian / Devanagari Shaping, THotPDF.AssignSyntheticCodepointForGID, CFF / OpenType Font Subsetting
|