Syriac / Mongolian / Devanagari Shaping Support

Multi-script capability surfaces (v2.119.53 - v2.119.55)

 

Arabic Shaping  Auto Shaping Pipeline  GSUB Engine

HotPDF exposes joining-class / positional-analysis / Indic syllabic-category capability surfaces for three complex scripts beyond Arabic: Syriac (U+0700-U+074F), Mongolian (U+1800-U+18AF), and Devanagari (U+0900-U+097F). Each capability lets callers drive the script through the existing OpenType GSUB engine for correct shaping, while keeping the heavy lifting (cluster boundary detection, BiDi resolution, GPOS positioning) outside HotPDF's scope.

 

Syriac shaping capability (v2.119.53)

Two new methods expose Syriac's joining behaviour:

 

function GetSyriacJoiningClass(CP: Cardinal): TJoiningClass;

function GetSyriacPosition(const Run: array of Cardinal; Index: Integer): TPosition;

 

Syriac follows the same Right-Joining / Dual-Joining / Transparent / Non-Joining four-class framework as Arabic; GetSyriacJoiningClass classifies each codepoint in the U+0700-U+074F block according to the Unicode joining property. GetSyriacPosition walks a Syriac run and resolves each character to its isolated / initial / medial / final position based on the joining classes of its immediate neighbours.

 

Unlike Arabic, the Syriac block has no Presentation Forms pre-encoded into Unicode - there is no Syriac equivalent of the U+FB50-FDFF / U+FE70-FEFC Arabic Presentation Forms blocks. Consumers must therefore drive Syriac shaping through font-defined GSUB lookups (typically init / medi / fina / isol + rlig) rather than codepoint rewriting. The capability layer above gives callers the position labels they need to query the right GSUB feature for each glyph.

 

Mongolian shaping capability (v2.119.54)

Two parallel methods for Mongolian:

 

function GetMongolianJoiningClass(CP: Cardinal): TJoiningClass;

function GetMongolianPosition(const Run: array of Cardinal; Index: Integer): TPosition;

 

Coverage includes basic Mongolian (U+1820-U+1842), Todo (U+1843-U+1877), Sibe (U+1880-U+18A8), Manchu, and Ali Gali extensions. Variation Selectors FVS1 / FVS2 / FVS3 (U+180B-U+180D), the soft hyphen NIRUGU (U+180A), and Ali Gali vowel marks are all classified as Transparent (T-class) so they participate in joining without breaking the walk.

 

Like Syriac, Mongolian has no Presentation Forms pre-encoded in Unicode. Mongolian's traditional vertical layout, complex letter-shape variation rules, and FVS-driven shape selection are all expected to be driven by font-defined GSUB lookups (typically init / medi / fina / isol + ccmp / rlig / locl); the capability layer gives callers the position labels needed to query those features.

 

Devanagari Indic shaping capability (v2.119.55)

Devanagari is fundamentally different from Arabic / Syriac / Mongolian - it is an Indic abugida script where the meaningful unit is a syllable cluster (akshara), not a single letter, and the rendered order of glyphs inside a cluster is often different from the logical Unicode order. Two methods expose Devanagari's Indic capability layer:

 

function GetDevanagariCategory(CP: Cardinal): TIndicCategory;

procedure ApplyDevanagariReorder(var Run: array of Cardinal);

 

GetDevanagariCategory classifies each codepoint in U+0900-U+097F into one of 13 Indic syllabic categories (Base / Consonant / Vowel / Independent Vowel / Vowel Mark / Pre-base Matra / Above-base Matra / Below-base Matra / Halant Virama / Repha / Anusvara / Visarga / Other) so callers can detect syllable boundaries and identify which positions inside each cluster need reordering.

 

ApplyDevanagariReorder is a pre-pass that walks the input run and applies the two main Devanagari reorder rules: (1) Repha (Ra + Halant at the beginning of a syllable cluster) is moved to the position after the cluster's base consonant so it renders as a superscript hook; (2) Pre-base I-matra (U+093F DEVANAGARI VOWEL SIGN I) is moved before the cluster's base consonant so it renders as a left-side hook. Other Indic reorders (above-base matra, below-base matra, conjunct formation) are left to the font's GSUB engine since they require font-specific lookup tables.

 

Automatic integration (v2.119.67): when sfIndicShaping in PDF.ShapingFeatures, ApplyDevanagariReorder is applied automatically as a pre-pass inside the three BuildUnicode*FieldContent helpers. The consumer reader's GSUB engine then picks up the syllable in the correct order and applies its own contextual shaping rules. See Automatic Shaping Pipeline.

 

Typical workflow (Syriac)

 

PDF.RegisterUnicodeTTF('Estrangelo', 'SyrCOMEdessa.otf');

PDF.SetGSUBScript('syrc');  // see GSUB engine doc

for i := 0 to Length(Run) - 1 do

begin

  Pos := PDF.GetSyriacPosition(Run, i);  // init / medi / fina / isol

  // query GSUB for the position-appropriate substitute glyph

  // emit + MarkUnicodeGlyphUsed

end;

 

Typical workflow (Devanagari with automatic reorder)

 

PDF.RegisterUnicodeTTF('NotoDeva', 'NotoSansDevanagari-Regular.ttf');

PDF.ShapingFeatures := [sfIndicShaping];  // auto Repha + I-matra reorder

PDF.CurrentPage.SetFont('NotoDeva', [], 14);

PDF.CurrentPage.UnicodeTextOut(50, 700, 0,

  UnicodeString(#$0939#$093F#$0928#$094D#$0926#$0940)); // "Hindi"

 

Scope and limitations

For Syriac and Mongolian, HotPDF provides the capability layer only - the actual GSUB feature query loop is the caller's responsibility (see the OpenType GSUB engine documentation for the query APIs). Fonts without the relevant GSUB tables will render unshaped glyphs.

 

For Devanagari, the producer-side pre-pass handles only the two structural reorders (Repha, pre-base I-matra). Other Indic shaping concerns - cluster syllable boundary detection, conjunct formation, below-base matra positioning, Reph above-base hook attachment, vowel-sign reorderings specific to other Indic scripts (Bengali, Tamil, Gujarati, Kannada, Malayalam, Telugu, Oriya, Punjabi) - need consumer-reader GSUB engines to complete. Phase 9 will add further Indic script capability layers as separate Get<Script>Category / Apply<Script>Reorder entry points.

 

See also: Arabic / Persian / Urdu Shaping Support, Automatic Shaping Pipeline (Phase 8), OpenType GSUB Substitution Engine, THotPDF.AssignSyntheticCodepointForGID