GetKhmerCategory
Signature
function GetKhmerCategory(CP: Cardinal): Integer;
Purpose
Pure Unicode-codepoint → Khmer syllabic-category lookup. No font
state required. Returns one of 11 category codes (same numbering as
GetDevanagariCategory et al.).
Return values
| Code | Category | Example codepoints |
|---|---|---|
| 0 | Other | Unassigned / non-syllable codepoints; Khmer punctuation (U+17D4–U+17DC); Khmer Lek Attak numerals (U+17F0–U+17F9) |
| 1 | Consonant (incl. independent vowel) | U+1780–U+17B3 (both consonant letters and independent-vowel letters such as U+17A3 LETTER QA are bundled here for FSM simplicity) |
| 3 | Matra (dependent vowel sign) | U+17B6–U+17C5 (AA, I/II/Y/YY, U/UU/UA, OE, YA/IE, E/AE/AI, OO/AU) |
| 4 | Virama (incl. COENG) | U+17D1 VIRIAM, U+17D2 COENG (subscript joiner) |
| 6 | Bindu | U+17C6 NIKAHIT |
| 7 | Visarga | U+17C7 REAHMUK, U+17C8 YUUKALEAPINTU |
| 9 | Digit | U+17E0–U+17E9 |
| 10 | ZWJ | U+200D |
| 11 | ZWNJ | U+200C |
| 12 | Above-base sign (incl. register shifters) | U+17C9 MUUSIKATOAN (1st-series register), U+17CA TRIISAP (2nd-series register), U+17CB–U+17D0 various above signs, U+17D3 BATHAMASAT, U+17DD ATTHACAN |
Matra positional categories
| MatraPos | Position | Khmer codepoints |
|---|---|---|
| 1 | Pre-base | U+17BE OE, U+17C1 E, U+17C2 AE, U+17C3 AI, U+17C4 OO, U+17C5 AU |
| 2 | Post-base | U+17B6 AA, U+17BF YA, U+17C0 IE |
| 3 | Above-base | U+17B7 I, U+17B8 II, U+17B9 Y, U+17BA YY |
| 4 | Below-base | U+17BB U, U+17BC UU, U+17BD UA |
Khmer has no split-matra codepoints (no MatraPos = 5 entries). Vowels whose visual rendering crosses multiple positions (such as U+17BE OE with Top_And_Left, or U+17C4 OO with Top_And_Right) are categorised by their logical reorder position; visual top / right components are handled by GSUB at render time.
Notable Khmer-specific assignments
- COENG (
U+17D2) returns category4(virama) so the FSM continues the syllable through stacked-consonant subscripts. The reorder pre-pass keepsCOENG + Consonantpairs together in the base block. - VIRIAM (
U+17D1) is also category4but is functionally distinct fromCOENG: it does not stack a following consonant. The syllable FSM distinguishes the two via the previous codepoint, not the category. - Register shifters (
U+17C9MUUSIKATOAN,U+17CATRIISAP) return category12, sharing routing with other above-base signs. - Consonants and independent vowels (
U+1780–U+17B3) are both category1— both start a syllable, so bundling simplifies the FSM without affecting reorder output. - Khmer Symbols block (
U+19E0–U+19FF, lunar date symbols) is out of scope for the Khmer shaper; codepoints in that range return category0.
See also
Version history
- v2.120.9 — Introduced in Phase 8f.9.