GetKhmerCategory

Signature

function GetKhmerCategory(CP: Cardinal): Integer;

Purpose

Pure Unicode-codepoint → Khmer syllabic-category lookup. No font state required. Returns one of 11 category codes (same numbering as GetDevanagariCategory et al.).

Return values

CodeCategoryExample codepoints
0OtherUnassigned / non-syllable codepoints; Khmer punctuation (U+17D4U+17DC); Khmer Lek Attak numerals (U+17F0U+17F9)
1Consonant (incl. independent vowel)U+1780–U+17B3 (both consonant letters and independent-vowel letters such as U+17A3 LETTER QA are bundled here for FSM simplicity)
3Matra (dependent vowel sign)U+17B6–U+17C5 (AA, I/II/Y/YY, U/UU/UA, OE, YA/IE, E/AE/AI, OO/AU)
4Virama (incl. COENG)U+17D1 VIRIAM, U+17D2 COENG (subscript joiner)
6BinduU+17C6 NIKAHIT
7VisargaU+17C7 REAHMUK, U+17C8 YUUKALEAPINTU
9DigitU+17E0–U+17E9
10ZWJU+200D
11ZWNJU+200C
12Above-base sign (incl. register shifters)U+17C9 MUUSIKATOAN (1st-series register), U+17CA TRIISAP (2nd-series register), U+17CB–U+17D0 various above signs, U+17D3 BATHAMASAT, U+17DD ATTHACAN

Matra positional categories

MatraPosPositionKhmer codepoints
1Pre-baseU+17BE OE, U+17C1 E, U+17C2 AE, U+17C3 AI, U+17C4 OO, U+17C5 AU
2Post-baseU+17B6 AA, U+17BF YA, U+17C0 IE
3Above-baseU+17B7 I, U+17B8 II, U+17B9 Y, U+17BA YY
4Below-baseU+17BB U, U+17BC UU, U+17BD UA

Khmer has no split-matra codepoints (no MatraPos = 5 entries). Vowels whose visual rendering crosses multiple positions (such as U+17BE OE with Top_And_Left, or U+17C4 OO with Top_And_Right) are categorised by their logical reorder position; visual top / right components are handled by GSUB at render time.

Notable Khmer-specific assignments

  • COENG (U+17D2) returns category 4 (virama) so the FSM continues the syllable through stacked-consonant subscripts. The reorder pre-pass keeps COENG + Consonant pairs together in the base block.
  • VIRIAM (U+17D1) is also category 4 but is functionally distinct from COENG: it does not stack a following consonant. The syllable FSM distinguishes the two via the previous codepoint, not the category.
  • Register shifters (U+17C9 MUUSIKATOAN, U+17CA TRIISAP) return category 12, sharing routing with other above-base signs.
  • Consonants and independent vowels (U+1780U+17B3) are both category 1 — both start a syllable, so bundling simplifies the FSM without affecting reorder output.
  • Khmer Symbols block (U+19E0U+19FF, lunar date symbols) is out of scope for the Khmer shaper; codepoints in that range return category 0.

See also

Version history

  • v2.120.9 — Introduced in Phase 8f.9.