ApplyMalayalamReorder

Signature

function ApplyMalayalamReorder(const Wide: UnicodeString): UnicodeString;

Purpose

Applies the Malayalam reorder pre-pass to Wide and returns the reordered UnicodeString ready for cmap + GSUB consumption. Non-Malayalam content passes through byte-identical.

Malayalam specifics

  • R1 Repha enabled — Ra (U+0D30) + Halant (U+0D4D CHANDRAKKALA) at syllable start is detected, the pair is stripped from the cluster and re-emitted after the reordered output so the font's 'rphf' GSUB feature can substitute the Repha glyph.
  • I-matra is POST-base (U+0D3F) — unique among Brahmic scripts together with Tamil, in contrast to Devanagari / Bengali / Gujarati where I-matra is pre-base. The I-matra stays in the post-base buffer; no movement to the front of the syllable.
  • Pre-base matras (MatraPos = 1): E (U+0D46), EE (U+0D47), AI (U+0D48). True pre-base matras — emitted at the start of the reordered syllable.
  • Three split matras with Unicode 16.0 canonical decompositions:
    • U+0D4A O → U+0D46 (pre) + U+0D3E (post).
    • U+0D4B OO → U+0D47 (pre) + U+0D3E (post).
    • U+0D4C AU → U+0D46 (pre) + U+0D57 (post).
  • Below-base matras (MatraPos = 4): U (U+0D41), UU (U+0D42), Vocalic R (U+0D43), Vocalic RR (U+0D44), Vocalic L matra (U+0D62), Vocalic LL matra (U+0D63).
  • Post-base matras (MatraPos = 2): AA (U+0D3E), I (U+0D3F), II (U+0D40), AU length mark (U+0D57).
  • Chillu letters (U+0D54U+0D56, U+0D7AU+0D7F): pure consonants with no halant requirement; classified as Consonant (category 1).
  • DOT REPH (U+0D4E): Malayalam-specific letter, classified as Consonant per Unicode 16.0 IndicSyllabicCategory.
  • Halant (CHANDRAKKALA) is U+0D4D.

Reorder rules applied

  • R1 Repha: Ra + Halant at syllable start re-emitted at the end of the syllable.
  • R2 Pre-base matras: E / EE / AI emit before the base block.
  • R4 Below-base matras: U / UU / Vocalic R / RR / Vocalic L / LL matras emit after the above-base block.
  • R5 Post-base matras: AA / I (Malayalam!) / II / AU-length-mark emit after the below-base block.
  • Split matra decomposition: three splits all routed to pre + post components.

Output layout per syllable: [pre-matras] + [base + halant + bindu/visarga/modifier] + [above-matras] + [below-matras] + [post-matras] + [Repha: Ra Halant]?. Conjuncts (C + Halant + C) preserved in the base block. Single-pass and idempotent.

Example

var
  Wide: UnicodeString;
begin
  // Input: KA (U+0D15) + O-matra (U+0D4A, split into pre + post)
  Wide:= Doc.ApplyMalayalamReorder(#$0D15#$0D4A);
  // Wide is now: E (U+0D46, pre) + KA + AA (U+0D3E, post)
end;

See also

Standards

  • Unicode 16.0 §12.10 (Malayalam)
  • Unicode 16.0 IndicSyllabicCategory.txt, IndicPositionalCategory.txt, and UnicodeData.txt (canonical decomposition source)
  • ISO 32000-1 §9.10 (extraction of text content)
  • OpenType Malayalam shaping spec (script tag 'mlym')

Version history

  • v2.119.76 — Introduced in Phase 8f.7. Complete shaper (R1 + R2 + R4 + R5 + three split-matra decompositions). Malayalam becomes the seventh registered Indic script. Includes dedicated support for chillu consonants (U+0D54U+0D56, U+0D7AU+0D7F) and the Malayalam DOT REPH letter (U+0D4E).