ApplyMyanmarReorder

Signature

function ApplyMyanmarReorder(const Wide: UnicodeString): UnicodeString;

Purpose

Applies the Myanmar reorder pre-pass to Wide and returns the reordered UnicodeString ready for cmap + GSUB consumption. Non-Myanmar content passes through byte-identical. Myanmar is the tenth and final registered Indic script in HotPDF and uses the most complex syllable structure in the Phase 8f batch.

Myanmar specifics

  • NO Repha — Myanmar does not form a Repha visual.
  • Kinzi 3-CP prefix: the exact sequence U+1004 (NGA) + U+103A (ASAT) + U+1039 (VIRAMA) at syllable start, when followed by a consonant, is detected and held aside. The 3 Kinzi codepoints are emitted at the very start of the output per OpenType Myanmar shaping spec R8. The Kinzi visually represents a small subscript NGA glyph above the syllable.
  • Pre-base vowel E (U+1031) moves to syllable start (after any Kinzi) per R10.
  • Medial consonants (U+103B YA, U+103C RA, U+103D WA, U+103E HA) are collected into 4 dedicated slots and emitted in fixed Y → R → W → H order per R9, regardless of source order. Well-formed Myanmar syllables rarely have all 4 medials, but the algorithm handles any subset.
  • ASAT (U+103A) and VIRAMA (U+1039) are both treated as virama (category 4) for the stacked-consonant FSM. ASAT used standalone (not as part of Kinzi) stays in BaseBuf at its source position.
  • Above-base vowels (MatraPos = 3): I (U+102D), II (U+102E), AI (U+1032), Vocalic R / L variants (U+1033U+1035).
  • Below-base vowels (MatraPos = 4): U (U+102F), UU (U+1030), Vocalic L / LL matras (U+1058U+1059).
  • Post-base vowels (MatraPos = 2): TALL AA (U+102B), AA (U+102C), Vocalic R / RR matras (U+1056U+1057).
  • ANUSVARA (Bindu U+1036) routes to above-base.
  • VISARGA (U+1038) routes to post-base.
  • DOT BELOW (U+1037) and other tone marks share category 12, but the reorder pre-pass routes DOT BELOW to below-base and other tones to above-base.
  • Coverage spans the Myanmar core block (U+1000U+109F, including Burmese, Mon, Sgaw Karen, Western Pwo, Shan, Rumai Palaung, Pa'o vowel positions) and Extended-A (U+AA60U+AA7F, additional ethnic-language consonants and Pa'o Karen tone marks).

Reorder behaviour

  • Kinzi 3 CPs emit at syllable start (before all other content).
  • Pre-base E vowel emits before the base block.
  • Base block contains: base consonant, ASAT, VIRAMA, virama-stacked consonants, MEDIAL MON LA (U+1082, which is not part of the Y/R/W/H set), GREAT SA (U+103F), ZWJ / ZWNJ — all in source order.
  • Medials emit in fixed Y → R → W → H order regardless of source order.
  • Above-base block: vowels, ANUSVARA, non-DOT-BELOW tone marks.
  • Below-base block: vowels, DOT BELOW.
  • Post-base block: vowels, VISARGA.

Output layout per syllable: [Kinzi 3 CPs]? + [Pre-base E] + [Base + ASAT + stacked consonants] + [MedialY + MedialR + MedialW + MedialH] + [Above] + [Below] + [Post]. Single-pass; idempotent (Kinzi at start stays at start; sorted medials stay sorted; pre-base E at start stays at start).

Example

var
  Wide: UnicodeString;
begin
  // Input: KA (U+1000) + E-vowel (U+1031, pre-base)
  Wide:= Doc.ApplyMyanmarReorder(#$1000#$1031);
  // Wide is now: E (U+1031) + KA (U+1000)

  // Input: Kinzi (U+1004 U+103A U+1039) + KA + AA-vowel (U+102C)
  Wide:= Doc.ApplyMyanmarReorder(#$1004#$103A#$1039#$1000#$102C);
  // Wide unchanged: Kinzi + KA + AA (all in canonical positions)

  // Input: KA + medials H + W + R + Y (reversed source order)
  Wide:= Doc.ApplyMyanmarReorder(#$1000#$103E#$103D#$103C#$103B);
  // Wide is now: KA + Y + R + W + H (R9 sorted)
end;

See also

Standards

  • Unicode 16.0 §16.3 (Myanmar)
  • Unicode 16.0 IndicSyllabicCategory.txt, IndicPositionalCategory.txt
  • ISO 32000-1 §9.10 (extraction of text content)
  • OpenType Myanmar shaping spec (script tag 'mymr')

Version history

  • v2.120.10 — Introduced in Phase 8f.10. Complete shaper with Kinzi 3-CP prefix detection (R8), pre-base E vowel rotation (R10), medial Y/R/W/H fixed-order sorting (R9), ASAT/VIRAMA stacked-consonant handling, and DOT BELOW / tone mark routing. Both Myanmar core block (U+1000U+109F) and Extended-A (U+AA60U+AA7F) registered as separate IndicScripts entries sharing the same Myanmar reorder functions. Completes the 11-Phase non-Devanagari Indic shaping batch (Phases 8f.0–8f.10): infrastructure + 10 registered Indic scripts covering the Brahmic SIA family plus 2 South-East Asian scripts.