ApplyMalayalamReorder
Signature
function ApplyMalayalamReorder(const Wide: UnicodeString): UnicodeString;
Purpose
Applies the Malayalam reorder pre-pass to Wide and returns
the reordered UnicodeString ready for cmap + GSUB consumption.
Non-Malayalam content passes through byte-identical.
Malayalam specifics
- R1 Repha enabled — Ra (
U+0D30) + Halant (U+0D4DCHANDRAKKALA) at syllable start is detected, the pair is stripped from the cluster and re-emitted after the reordered output so the font's'rphf'GSUB feature can substitute the Repha glyph. - I-matra is POST-base (
U+0D3F) — unique among Brahmic scripts together with Tamil, in contrast to Devanagari / Bengali / Gujarati where I-matra is pre-base. The I-matra stays in the post-base buffer; no movement to the front of the syllable. - Pre-base matras (
MatraPos = 1): E (U+0D46), EE (U+0D47), AI (U+0D48). True pre-base matras — emitted at the start of the reordered syllable. - Three split matras with Unicode 16.0 canonical decompositions:
U+0D4AO →U+0D46(pre) +U+0D3E(post).U+0D4BOO →U+0D47(pre) +U+0D3E(post).U+0D4CAU →U+0D46(pre) +U+0D57(post).
- Below-base matras (
MatraPos = 4): U (U+0D41), UU (U+0D42), Vocalic R (U+0D43), Vocalic RR (U+0D44), Vocalic L matra (U+0D62), Vocalic LL matra (U+0D63). - Post-base matras (
MatraPos = 2): AA (U+0D3E), I (U+0D3F), II (U+0D40), AU length mark (U+0D57). - Chillu letters (
U+0D54–U+0D56,U+0D7A–U+0D7F): pure consonants with no halant requirement; classified as Consonant (category 1). - DOT REPH (
U+0D4E): Malayalam-specific letter, classified as Consonant per Unicode 16.0 IndicSyllabicCategory. - Halant (CHANDRAKKALA) is
U+0D4D.
Reorder rules applied
- R1 Repha: Ra + Halant at syllable start re-emitted at the end of the syllable.
- R2 Pre-base matras: E / EE / AI emit before the base block.
- R4 Below-base matras: U / UU / Vocalic R / RR / Vocalic L / LL matras emit after the above-base block.
- R5 Post-base matras: AA / I (Malayalam!) / II / AU-length-mark emit after the below-base block.
- Split matra decomposition: three splits all routed to pre + post components.
Output layout per syllable: [pre-matras] + [base + halant + bindu/visarga/modifier] + [above-matras] + [below-matras] + [post-matras] + [Repha: Ra Halant]?. Conjuncts (C + Halant + C) preserved in the base block. Single-pass and idempotent.
Example
var
Wide: UnicodeString;
begin
// Input: KA (U+0D15) + O-matra (U+0D4A, split into pre + post)
Wide:= Doc.ApplyMalayalamReorder(#$0D15#$0D4A);
// Wide is now: E (U+0D46, pre) + KA + AA (U+0D3E, post)
end;
See also
ApplyIndicReorder— total dispatcher.ApplyDevanagariReorder— Devanagari counterpart.ApplyBengaliReorder— Bengali counterpart.ApplyGujaratiReorder— Gujarati counterpart.ApplyTamilReorder— Tamil counterpart (also has post-base I-matra).ApplyTeluguReorder— Telugu counterpart.ApplyKannadaReorder— Kannada counterpart.GetMalayalamCategory— Unicode codepoint → category lookup.
Standards
- Unicode 16.0 §12.10 (Malayalam)
- Unicode 16.0
IndicSyllabicCategory.txt,IndicPositionalCategory.txt, andUnicodeData.txt(canonical decomposition source) - ISO 32000-1 §9.10 (extraction of text content)
- OpenType Malayalam shaping spec (script tag
'mlym')
Version history
- v2.119.76 — Introduced in Phase 8f.7. Complete shaper (R1 + R2 + R4 + R5 + three split-matra decompositions). Malayalam becomes the seventh registered Indic script. Includes dedicated support for chillu consonants (
U+0D54–U+0D56,U+0D7A–U+0D7F) and the Malayalam DOT REPH letter (U+0D4E).