ApplyKhmerReorder
Signature
function ApplyKhmerReorder(const Wide: UnicodeString): UnicodeString;
Purpose
Applies the Khmer reorder pre-pass to Wide and returns
the reordered UnicodeString ready for cmap + GSUB consumption.
Non-Khmer content passes through byte-identical. Khmer is the first
South-East Asian script registered in HotPDF and uses an
independent syllable structure distinct from the
Brahmic R1–R5 family handled by Phases 8f.1–8f.8.
Khmer specifics
- NO Repha — Khmer does not form a Repha visual.
Ra+COENG+Consonantstays in original order rather than rotating to the end of the cluster. - COENG (
U+17D2) is a subscript joiner: eachCOENG + Consonantpair forms a stacked-consonant cluster. The pair stays in the base buffer in original order and the font's GSUB'pres'/'blws'features handle subscript positioning. Nested coeng (C + COENG + C + COENG + C) is supported by the FSM. - VIRIAM (
U+17D1) is a separate virama-like sign distinct fromCOENG: it does not stack a following consonant. The syllable FSM tracks the previous codepoint (not just the previous category) so it can distinguishCOENGfromVIRIAMwhen deciding whether a trailing consonant continues this syllable. - Pre-base vowels (
MatraPos = 1): E (U+17C1), AE (U+17C2), AI (U+17C3), OE (U+17BE), OO (U+17C4), AU (U+17C5). OE / OO / AU have visual top or right components that are GSUB-rendered; the reorder pre-pass moves only the logical pre-base component to syllable start. - Register shifters — MUUSIKATOAN (
U+17C9, 1st series) and TRIISAP (U+17CA, 2nd series). These plus other signs (U+17CB–U+17D0,U+17D3,U+17DD) are categorised as12and route to the above-base buffer. - Bindu (NIKAHIT
U+17C6) is categorised as6and routes to above-base. - Visarga (REAHMUK
U+17C7, YUUKALEAPINTUU+17C8) is categorised as7and routes to post-base. - Above-base vowels (
MatraPos = 3): I (U+17B7), II (U+17B8), Y (U+17B9), YY (U+17BA). - Below-base vowels (
MatraPos = 4): U (U+17BB), UU (U+17BC), UA (U+17BD). - Post-base vowels (
MatraPos = 2): AA (U+17B6), YA (U+17BF), IE (U+17C0).
Reorder behavior
- Pre-base vowels emit before the base block (analogous to Brahmic R2).
- Above-base vowels emit after the base block.
- Below-base vowels emit after the above-base block.
- Post-base vowels and Visarga emit after the below-base block.
- Register shifters, Bindu, and other above signs all route to the above-base block so they render after the consonant stack.
- Consonant cluster (consonants + COENG pairs) stays in the base block in original logical order — GSUB
'pres'/'blws'handle the visual subscript stacking. - NO Repha extraction: Khmer syllables containing
Ra+COENG+Consonantstay verbatim in the base block.
Output layout per syllable: [pre-vowels] + [consonants + COENG pairs] + [above-vowels / register-shifters / Bindu / signs] + [below-vowels] + [post-vowels / Visarga]. Single-pass; idempotent on simple inputs.
Example
var
Wide: UnicodeString;
begin
// Input: KA (U+1780) + E-vowel (U+17C1, pre-base)
Wide:= Doc.ApplyKhmerReorder(#$1780#$17C1);
// Wide is now: E (U+17C1) + KA (U+1780)
// Input: KA + COENG + KHA + AA-vowel (stacked consonant + post vowel)
Wide:= Doc.ApplyKhmerReorder(#$1780#$17D2#$1781#$17B6);
// Wide unchanged: KA + COENG + KHA + AA
// (COENG cluster stays in BaseBuf in original order; AA in PostBuf)
end;
See also
ApplyIndicReorder— total dispatcher.ApplyDevanagariReorder— Devanagari counterpart.ApplyBengaliReorder— Bengali counterpart.ApplyGujaratiReorder— Gujarati counterpart.ApplyTamilReorder— Tamil counterpart.ApplyTeluguReorder— Telugu counterpart.ApplyKannadaReorder— Kannada counterpart.ApplyMalayalamReorder— Malayalam counterpart.ApplySinhalaReorder— Sinhala counterpart.GetKhmerCategory— Unicode codepoint → category lookup.
Standards
- Unicode 16.0 §16.4 (Khmer)
- Unicode 16.0
IndicSyllabicCategory.txt,IndicPositionalCategory.txt - ISO 32000-1 §9.10 (extraction of text content)
- OpenType Khmer shaping spec (script tag
'khmr')
Version history
- v2.120.9 — Introduced in Phase 8f.9. Complete shaper with COENG subscript handling, register shifters, and pre-base vowel rotation. Khmer becomes the ninth registered Indic script and the first South-East Asian script in the registry.