ApplyMyanmarReorder
Signature
function ApplyMyanmarReorder(const Wide: UnicodeString): UnicodeString;
Purpose
Applies the Myanmar reorder pre-pass to Wide and returns
the reordered UnicodeString ready for cmap + GSUB consumption.
Non-Myanmar content passes through byte-identical. Myanmar is the
tenth and final registered Indic script in HotPDF and
uses the most complex syllable structure in the Phase 8f batch.
Myanmar specifics
- NO Repha — Myanmar does not form a Repha visual.
- Kinzi 3-CP prefix: the exact sequence
U+1004(NGA) +U+103A(ASAT) +U+1039(VIRAMA) at syllable start, when followed by a consonant, is detected and held aside. The 3 Kinzi codepoints are emitted at the very start of the output per OpenType Myanmar shaping spec R8. The Kinzi visually represents a small subscript NGA glyph above the syllable. - Pre-base vowel E (
U+1031) moves to syllable start (after any Kinzi) per R10. - Medial consonants (
U+103BYA,U+103CRA,U+103DWA,U+103EHA) are collected into 4 dedicated slots and emitted in fixed Y → R → W → H order per R9, regardless of source order. Well-formed Myanmar syllables rarely have all 4 medials, but the algorithm handles any subset. - ASAT (
U+103A) and VIRAMA (U+1039) are both treated as virama (category 4) for the stacked-consonant FSM. ASAT used standalone (not as part of Kinzi) stays in BaseBuf at its source position. - Above-base vowels (
MatraPos = 3): I (U+102D), II (U+102E), AI (U+1032), Vocalic R / L variants (U+1033–U+1035). - Below-base vowels (
MatraPos = 4): U (U+102F), UU (U+1030), Vocalic L / LL matras (U+1058–U+1059). - Post-base vowels (
MatraPos = 2): TALL AA (U+102B), AA (U+102C), Vocalic R / RR matras (U+1056–U+1057). - ANUSVARA (Bindu
U+1036) routes to above-base. - VISARGA (
U+1038) routes to post-base. - DOT BELOW (
U+1037) and other tone marks share category 12, but the reorder pre-pass routes DOT BELOW to below-base and other tones to above-base. - Coverage spans the Myanmar core block (
U+1000–U+109F, including Burmese, Mon, Sgaw Karen, Western Pwo, Shan, Rumai Palaung, Pa'o vowel positions) and Extended-A (U+AA60–U+AA7F, additional ethnic-language consonants and Pa'o Karen tone marks).
Reorder behavior
- Kinzi 3 CPs emit at syllable start (before all other content).
- Pre-base E vowel emits before the base block.
- Base block contains: base consonant, ASAT, VIRAMA, virama-stacked consonants, MEDIAL MON LA (
U+1082, which is not part of the Y/R/W/H set), GREAT SA (U+103F), ZWJ / ZWNJ — all in source order. - Medials emit in fixed Y → R → W → H order regardless of source order.
- Above-base block: vowels, ANUSVARA, non-DOT-BELOW tone marks.
- Below-base block: vowels, DOT BELOW.
- Post-base block: vowels, VISARGA.
Output layout per syllable: [Kinzi 3 CPs]? + [Pre-base E] + [Base + ASAT + stacked consonants] + [MedialY + MedialR + MedialW + MedialH] + [Above] + [Below] + [Post]. Single-pass; idempotent (Kinzi at start stays at start; sorted medials stay sorted; pre-base E at start stays at start).
Example
var
Wide: UnicodeString;
begin
// Input: KA (U+1000) + E-vowel (U+1031, pre-base)
Wide:= Doc.ApplyMyanmarReorder(#$1000#$1031);
// Wide is now: E (U+1031) + KA (U+1000)
// Input: Kinzi (U+1004 U+103A U+1039) + KA + AA-vowel (U+102C)
Wide:= Doc.ApplyMyanmarReorder(#$1004#$103A#$1039#$1000#$102C);
// Wide unchanged: Kinzi + KA + AA (all in canonical positions)
// Input: KA + medials H + W + R + Y (reversed source order)
Wide:= Doc.ApplyMyanmarReorder(#$1000#$103E#$103D#$103C#$103B);
// Wide is now: KA + Y + R + W + H (R9 sorted)
end;
See also
ApplyIndicReorder— total dispatcher.ApplyDevanagariReorder— Devanagari counterpart.ApplyBengaliReorder— Bengali counterpart.ApplyGujaratiReorder— Gujarati counterpart.ApplyTamilReorder— Tamil counterpart.ApplyTeluguReorder— Telugu counterpart.ApplyKannadaReorder— Kannada counterpart.ApplyMalayalamReorder— Malayalam counterpart.ApplySinhalaReorder— Sinhala counterpart.ApplyKhmerReorder— Khmer counterpart (first SE Asian script).GetMyanmarCategory— Unicode codepoint → category lookup.
Standards
- Unicode 16.0 §16.3 (Myanmar)
- Unicode 16.0
IndicSyllabicCategory.txt,IndicPositionalCategory.txt - ISO 32000-1 §9.10 (extraction of text content)
- OpenType Myanmar shaping spec (script tag
'mymr')
Version history
- v2.120.10 — Introduced in Phase 8f.10. Complete shaper with Kinzi 3-CP prefix detection (R8), pre-base E vowel rotation (R10), medial Y/R/W/H fixed-order sorting (R9), ASAT/VIRAMA stacked-consonant handling, and DOT BELOW / tone mark routing. Both Myanmar core block (
U+1000–U+109F) and Extended-A (U+AA60–U+AA7F) registered as separateIndicScriptsentries sharing the same Myanmar reorder functions. Completes the 11-Phase non-Devanagari Indic shaping batch (Phases 8f.0–8f.10): infrastructure + 10 registered Indic scripts covering the Brahmic SIA family plus 2 South-East Asian scripts.