Syriac / Mongolian / Devanagari Shaping Support

Multi-script capability surfaces (v2.119.53 - v2.119.55)

 

Arabic Shaping  Auto Shaping Pipeline  GSUB Engine

HotPDF exposes joining-class / positional-analysis / Indic syllabic-category capability surfaces for three complex scripts beyond Arabic: Syriac (U+0700-U+074F), Mongolian (U+1800-U+18AF), and Devanagari (U+0900-U+097F). Each capability lets callers drive the script through the existing OpenType GSUB engine for correct shaping, while keeping the heavy lifting (cluster boundary detection, BiDi resolution, GPOS positioning) outside HotPDF's scope.

 

Syriac shaping capability (v2.119.53)

Two new methods expose Syriac's joining behaviour:

 

function GetSyriacJoiningClass(CP: Cardinal): TJoiningClass;

function GetSyriacPosition(const Run: array of Cardinal; Index: Integer): TPosition;

 

Syriac follows the same Right-Joining / Dual-Joining / Transparent / Non-Joining four-class framework as Arabic; GetSyriacJoiningClass classifies each codepoint in the U+0700-U+074F block according to the Unicode joining property. GetSyriacPosition walks a Syriac run and resolves each character to its isolated / initial / medial / final position based on the joining classes of its immediate neighbours.

 

Unlike Arabic, the Syriac block has no Presentation Forms pre-encoded into Unicode - there is no Syriac equivalent of the U+FB50-FDFF / U+FE70-FEFC Arabic Presentation Forms blocks. Consumers must therefore drive Syriac shaping through font-defined GSUB lookups (typically init / medi / fina / isol + rlig) rather than codepoint rewriting. The capability layer above gives callers the position labels they need to query the right GSUB feature for each glyph.

 

Mongolian shaping capability (v2.119.54)

Two parallel methods for Mongolian:

 

function GetMongolianJoiningClass(CP: Cardinal): TJoiningClass;

function GetMongolianPosition(const Run: array of Cardinal; Index: Integer): TPosition;

 

Coverage includes basic Mongolian (U+1820-U+1842), Todo (U+1843-U+1877), Sibe (U+1880-U+18A8), Manchu, and Ali Gali extensions. Variation Selectors FVS1 / FVS2 / FVS3 (U+180B-U+180D), the soft hyphen NIRUGU (U+180A), and Ali Gali vowel marks are all classified as Transparent (T-class) so they participate in joining without breaking the walk.

 

Like Syriac, Mongolian has no Presentation Forms pre-encoded in Unicode. Mongolian's traditional vertical layout, complex letter-shape variation rules, and FVS-driven shape selection are all expected to be driven by font-defined GSUB lookups (typically init / medi / fina / isol + ccmp / rlig / locl); the capability layer gives callers the position labels needed to query those features.

 

Devanagari Indic shaping capability (v2.119.55)

Devanagari is fundamentally different from Arabic / Syriac / Mongolian - it is an Indic abugida script where the meaningful unit is a syllable cluster (akshara), not a single letter, and the rendered order of glyphs inside a cluster is often different from the logical Unicode order. Two methods expose Devanagari's Indic capability layer:

 

function GetDevanagariCategory(CP: Cardinal): TIndicCategory;

procedure ApplyDevanagariReorder(var Run: array of Cardinal);

 

GetDevanagariCategory classifies each codepoint in U+0900-U+097F into one of 13 Indic syllabic categories (Base / Consonant / Vowel / Independent Vowel / Vowel Mark / Pre-base Matra / Above-base Matra / Below-base Matra / Halant Virama / Repha / Anusvara / Visarga / Other) so callers can detect syllable boundaries and identify which positions inside each cluster need reordering.

 

ApplyDevanagariReorder is a pre-pass that walks the input run and applies the two main Devanagari reorder rules: (1) Repha (Ra + Halant at the beginning of a syllable cluster) is moved to the position after the cluster's base consonant so it renders as a superscript hook; (2) Pre-base I-matra (U+093F DEVANAGARI VOWEL SIGN I) is moved before the cluster's base consonant so it renders as a left-side hook. Other Indic reorders (above-base matra, below-base matra, conjunct formation) are left to the font's GSUB engine since they require font-specific lookup tables.

 

Automatic integration (v2.119.67): when sfIndicShaping in PDF.ShapingFeatures, ApplyDevanagariReorder is applied automatically as a pre-pass inside the three BuildUnicode*FieldContent helpers. The consumer reader's GSUB engine then picks up the syllable in the correct order and applies its own contextual shaping rules. See Automatic Shaping Pipeline.

 

Typical workflow (Syriac)

 

PDF.RegisterUnicodeTTF('Estrangelo', 'SyrCOMEdessa.otf');

PDF.SetGSUBScript('syrc');  // see GSUB engine doc

for i := 0 to Length(Run) - 1 do

begin

  Pos := PDF.GetSyriacPosition(Run, i);  // init / medi / fina / isol

  // query GSUB for the position-appropriate substitute glyph

  // emit + MarkUnicodeGlyphUsed

end;

 

Typical workflow (Devanagari with automatic reorder)

 

PDF.RegisterUnicodeTTF('NotoDeva', 'NotoSansDevanagari-Regular.ttf');

PDF.ShapingFeatures := [sfIndicShaping];  // auto Repha + I-matra reorder

PDF.CurrentPage.SetFont('NotoDeva', [], 14);

PDF.CurrentPage.UnicodeTextOut(50, 700, 0,

  UnicodeString(#$0939#$093F#$0928#$094D#$0926#$0940)); // "Hindi"

 

Unicode Subset and Extraction Helpers

RegisterToUnicodeReverseMapping records source codepoints for shaped or synthetic glyph output. ClearToUnicodeReverseMappings resets the table, ToUnicodeReverseMappingCount reports its size, GetUnicodeGlyphForCodepoint returns the registered font glyph ID for a Unicode codepoint, and EnableShapingFeatureForSubset marks substitute glyphs from a GSUB feature for inclusion in the embedded subset

 

Scope and limitations

Текущая интеграция на стороне создателя

Формирование Syriac, Mongolian, Tibetan и Indic теперь имеет явные справочные страницы API. Syriac можно включить через AutoShapeSyriac, Mongolian через sfMongolianShaping, Tibetan через sfTibetanShaping, перестановку Indic через sfIndicShaping, а полное формирование Indic GSUB через sfIndicGSUB. Подробные точки входа описаны в Tibetan/Mongolian/Syriac shaping methods и Indic shaping methods

Текущая область также охватывает N'Ko и Adlam как курсивные RTL-письменности, Thai/Lao для SARA AM и тоновых знаков, Hebrew для порядка niqqud и Javanese для предбазовых знаков. Эти пути описаны в script shaping preprocess methods

 

 

См. также: Arabic / Persian / Urdu Shaping Support, Automatic Shaping Pipeline (Phase 8), OpenType GSUB Substitution Engine, THotPDF.AssignSyntheticCodepointForGID