GetMyanmarCategory
Signature
function GetMyanmarCategory(CP: Cardinal): Integer;
Purpose
Pure Unicode-codepoint → Myanmar syllabic-category lookup. No font
state required. Returns one of 11 category codes (same numbering as
GetDevanagariCategory et al.). Covers the Myanmar core
block (U+1000–U+109F) and Extended-A
(U+AA60–U+AA7F).
Return values
| Code | Category | Example codepoints |
|---|---|---|
| 0 | Other | Unassigned / non-syllable; Myanmar punctuation (U+104A–U+104F); symbols (U+109E–U+109F); Extended-A symbols (U+AA77–U+AA79) |
| 1 | Consonant | Core consonants (U+1000–U+1020), GREAT SA (U+103F), medials Y/R/W/H (U+103B–U+103E; treated as consonants here, sorted by reorder pre-pass), Extended Sanskrit (U+1050–U+1055), Mon (U+105A–U+1066), Sgaw Karen / Western Pwo (U+1067–U+1070), Shan (U+1075–U+1081), MEDIAL MON LA (U+1082), Rumai Palaung FA (U+108E), Extended-A consonants (U+AA60–U+AA76, U+AA7A, U+AA7E–U+AA7F) |
| 2 | Independent vowel | U+1021–U+102A (A..AW) |
| 3 | Matra (dependent vowel sign) | U+102B–U+1035, U+1056–U+1059, U+1071–U+1074, U+1083–U+1086, U+109A–U+109D (Burmese / Mon / Shan / Sgaw Karen / Pwo vowel signs at various positions) |
| 4 | Virama (incl. ASAT) | U+1039 VIRAMA (traditional halant), U+103A ASAT (consonant killer, also used as middle codepoint of the Kinzi 3-CP prefix) |
| 6 | Bindu | U+1036 ANUSVARA |
| 7 | Visarga | U+1038 |
| 9 | Digit | U+1040–U+1049 (Burmese), U+1090–U+1099 (Shan) |
| 10 | ZWJ | U+200D |
| 11 | ZWNJ | U+200C |
| 12 | Above-/below-base sign | U+1037 DOT BELOW (routed to below-base by reorder), tone marks (U+1087–U+108D, U+108F; routed to above-base), Pa'o Karen tone marks (U+AA7B–U+AA7D) |
Matra positional categories
| MatraPos | Position | Myanmar codepoints |
|---|---|---|
| 1 | Pre-base | U+1031 E (the only pre-base matra in Myanmar; rotated to syllable start per R10) |
| 2 | Post-base | U+102B TALL AA, U+102C AA, U+1056–U+1057 Vocalic R/RR matras, U+1083–U+1084 Shan vowel signs, U+109A–U+109B Shan tone vowels |
| 3 | Above-base | U+102D I, U+102E II, U+1032 AI, U+1033–U+1035 Vocalic R/L variants, U+1071–U+1074 vowel signs, U+1085–U+1086 Shan vowel signs, U+109C Shan tone vowel |
| 4 | Below-base | U+102F U, U+1030 UU, U+1058–U+1059 Vocalic L/LL matras, U+109D Shan tone vowel |
Myanmar has no split-matra codepoints (no MatraPos = 5 entries).
Notable Myanmar-specific assignments
- Kinzi prefix is not a category: the exact 3-CP sequence
U+1004 + U+103A + U+1039at syllable start is recognised byApplyMyanmarReorder's FSM, not by category codes. The individual codepoints retain their normal categories (NGA=1, ASAT=4, VIRAMA=4). - ASAT and VIRAMA both category 4: the FSM does not need to distinguish them since both signal consonant stacking. ASAT used standalone outside the Kinzi prefix stays in BaseBuf at its source position.
- Medials Y/R/W/H all category 1:
U+103B–U+103Eare categorised as Consonant, but the reorder pre-pass recognises them by codepoint and routes each to a dedicated slot, emitting them in fixed Y → R → W → H order per R9. - U+1082 MEDIAL MON LA is not in the Y/R/W/H medial set. It is categorised as plain consonant (1) and stays in BaseBuf in source order.
- DOT BELOW vs other tones (all category 12):
U+1037DOT BELOW is the only category-12 codepoint that routes to below-base; all other tone marks (Burmese, Shan, Rumai Palaung, Pa'o Karen) route to above-base. - Pre-base vowel count: Myanmar has only one pre-base matra (U+1031 E), reordered per R10. The complexity is in Kinzi (3 CPs), medial sorting (4 slots), and ASAT / VIRAMA handling rather than in vowel routing.
- Extended-B (U+A9E0–U+A9FF) and Extended-C (U+116D0–U+116FF, supplementary plane): not covered in this phase. Future Phase 8g may add these.
See also
Version history
- v2.120.10 — Introduced in Phase 8f.10.