ApplySinhalaReorder
Signature
function ApplySinhalaReorder(const Wide: UnicodeString): UnicodeString;
Purpose
Applies the Sinhala reorder pre-pass to Wide and returns
the reordered UnicodeString ready for cmap + GSUB consumption.
Non-Sinhala content passes through byte-identical.
Sinhala specifics
- R1 Repha enabled — Ra (
U+0DBB) + Halant (U+0DCAAL-LAKUNA) at syllable start is detected, the pair is stripped from the cluster and re-emitted after the reordered output so the font's'rphf'GSUB feature can substitute the Repha glyph. - Three pre-base matras (
MatraPos = 1): E (U+0DD9), EE (U+0DDA), AI (U+0DDB). Sinhala is unique among the Phase 8f Brahmic scripts in having three logically pre-base matras — EE and AI have Top_And_Left visual positions but are stored pre-base; the visual top component is rendered by font GSUB. - Three split matras with Unicode 16.0 canonical decompositions:
U+0DDCO →U+0DD9(pre) +U+0DCF(post).U+0DDDOO →U+0DD9(pre) +U+0DCF(post) +U+0DCA(post) — three-part split; the trailing AL-LAKUNA is part of the canonical decomposition, not a syllable-level virama.U+0DDEAU →U+0DD9(pre) +U+0DDF(post).
- Above-base matras (
MatraPos = 3): I (U+0DD2), II (U+0DD3). - Below-base matras (
MatraPos = 4): U (U+0DD4), UU (U+0DD6). - Post-base matras (
MatraPos = 2): AA (U+0DCF), AE / AAE (U+0DD0–U+0DD1), Vocalic R matra (U+0DD8), L matra (U+0DDF), LL / LLL matras (U+0DF2–U+0DF3). - Halant in Sinhala is called AL-LAKUNA (
U+0DCA); RA (Repha trigger) isU+0DBB.
Reorder rules applied
- R1 Repha: Ra + AL-LAKUNA at syllable start re-emitted at the end of the syllable.
- R2 Pre-base matras: E / EE / AI emit before the base block.
- R3 Above-base matras: I / II emit after the base block.
- R4 Below-base matras: U / UU emit after the above-base block.
- R5 Post-base matras: AA / AE / AAE / Vocalic R / L / LL / LLL matras emit after the below-base block.
- Split matra decomposition: O / OO / AU expanded per Unicode 16.0 canonical decomposition; OO is the first three-part split this shaper family produces from a non-Kannada source codepoint.
Output layout per syllable: [pre-matras] + [base + halant + bindu/visarga] + [above-matras] + [below-matras] + [post-matras] + [Repha: Ra AL-LAKUNA]?. Conjuncts (C + AL-LAKUNA + C) preserved in the base block. Single-pass; idempotent on 2-part splits.
Example
var
Wide: UnicodeString;
begin
// Input: KA (U+0D9A) + O-matra (U+0DDC, 2-part split: pre + post)
Wide:= Doc.ApplySinhalaReorder(#$0D9A#$0DDC);
// Wide is now: E (U+0DD9, pre) + KA + AA (U+0DCF, post)
end;
See also
ApplyIndicReorder— total dispatcher.ApplyDevanagariReorder— Devanagari counterpart.ApplyBengaliReorder— Bengali counterpart.ApplyGujaratiReorder— Gujarati counterpart.ApplyTamilReorder— Tamil counterpart.ApplyTeluguReorder— Telugu counterpart.ApplyKannadaReorder— Kannada counterpart (also has a three-part split).ApplyMalayalamReorder— Malayalam counterpart.GetSinhalaCategory— Unicode codepoint → category lookup.
Standards
- Unicode 16.0 §12.11 (Sinhala)
- Unicode 16.0
IndicSyllabicCategory.txt,IndicPositionalCategory.txt, andUnicodeData.txt(canonical decomposition source) - ISO 32000-1 §9.10 (extraction of text content)
- OpenType Sinhala shaping spec (script tag
'sinh')
Version history
- v2.119.77 — Introduced in Phase 8f.8. Complete shaper (R1 + R2 + R3 + R4 + R5 + three split-matra decompositions). Sinhala becomes the eighth registered Indic script and completes the Brahmic SIA (South Indic Aryan) family. Notable for having three pre-base matras (more than any other Phase 8f shaper) and the three-part canonical decomposition of
U+0DDDOO.