GetMyanmarCategory

Signature

function GetMyanmarCategory(CP: Cardinal): Integer;

Purpose

Pure Unicode-codepoint → Myanmar syllabic-category lookup. No font state required. Returns one of 11 category codes (same numbering as GetDevanagariCategory et al.). Covers the Myanmar core block (U+1000U+109F) and Extended-A (U+AA60U+AA7F).

Return values

CodeCategoryExample codepoints
0OtherUnassigned / non-syllable; Myanmar punctuation (U+104AU+104F); symbols (U+109EU+109F); Extended-A symbols (U+AA77U+AA79)
1ConsonantCore consonants (U+1000U+1020), GREAT SA (U+103F), medials Y/R/W/H (U+103BU+103E; treated as consonants here, sorted by reorder pre-pass), Extended Sanskrit (U+1050U+1055), Mon (U+105AU+1066), Sgaw Karen / Western Pwo (U+1067U+1070), Shan (U+1075U+1081), MEDIAL MON LA (U+1082), Rumai Palaung FA (U+108E), Extended-A consonants (U+AA60U+AA76, U+AA7A, U+AA7EU+AA7F)
2Independent vowelU+1021–U+102A (A..AW)
3Matra (dependent vowel sign)U+102B–U+1035, U+1056–U+1059, U+1071–U+1074, U+1083–U+1086, U+109A–U+109D (Burmese / Mon / Shan / Sgaw Karen / Pwo vowel signs at various positions)
4Virama (incl. ASAT)U+1039 VIRAMA (traditional halant), U+103A ASAT (consonant killer, also used as middle codepoint of the Kinzi 3-CP prefix)
6BinduU+1036 ANUSVARA
7VisargaU+1038
9DigitU+1040–U+1049 (Burmese), U+1090–U+1099 (Shan)
10ZWJU+200D
11ZWNJU+200C
12Above-/below-base signU+1037 DOT BELOW (routed to below-base by reorder), tone marks (U+1087U+108D, U+108F; routed to above-base), Pa'o Karen tone marks (U+AA7BU+AA7D)

Matra positional categories

MatraPosPositionMyanmar codepoints
1Pre-baseU+1031 E (the only pre-base matra in Myanmar; rotated to syllable start per R10)
2Post-baseU+102B TALL AA, U+102C AA, U+1056–U+1057 Vocalic R/RR matras, U+1083–U+1084 Shan vowel signs, U+109A–U+109B Shan tone vowels
3Above-baseU+102D I, U+102E II, U+1032 AI, U+1033–U+1035 Vocalic R/L variants, U+1071–U+1074 vowel signs, U+1085–U+1086 Shan vowel signs, U+109C Shan tone vowel
4Below-baseU+102F U, U+1030 UU, U+1058–U+1059 Vocalic L/LL matras, U+109D Shan tone vowel

Myanmar has no split-matra codepoints (no MatraPos = 5 entries).

Notable Myanmar-specific assignments

  • Kinzi prefix is not a category: the exact 3-CP sequence U+1004 + U+103A + U+1039 at syllable start is recognised by ApplyMyanmarReorder's FSM, not by category codes. The individual codepoints retain their normal categories (NGA=1, ASAT=4, VIRAMA=4).
  • ASAT and VIRAMA both category 4: the FSM does not need to distinguish them since both signal consonant stacking. ASAT used standalone outside the Kinzi prefix stays in BaseBuf at its source position.
  • Medials Y/R/W/H all category 1: U+103BU+103E are categorised as Consonant, but the reorder pre-pass recognises them by codepoint and routes each to a dedicated slot, emitting them in fixed Y → R → W → H order per R9.
  • U+1082 MEDIAL MON LA is not in the Y/R/W/H medial set. It is categorised as plain consonant (1) and stays in BaseBuf in source order.
  • DOT BELOW vs other tones (all category 12): U+1037 DOT BELOW is the only category-12 codepoint that routes to below-base; all other tone marks (Burmese, Shan, Rumai Palaung, Pa'o Karen) route to above-base.
  • Pre-base vowel count: Myanmar has only one pre-base matra (U+1031 E), reordered per R10. The complexity is in Kinzi (3 CPs), medial sorting (4 slots), and ASAT / VIRAMA handling rather than in vowel routing.
  • Extended-B (U+A9E0–U+A9FF) and Extended-C (U+116D0–U+116FF, supplementary plane): not covered in this phase. Future Phase 8g may add these.

See also

Version history

  • v2.120.10 — Introduced in Phase 8f.10.