ApplyKannadaReorder

Signature

function ApplyKannadaReorder(const Wide: UnicodeString): UnicodeString;

Purpose

Applies the Kannada reorder pre-pass to Wide and returns the reordered UnicodeString ready for cmap + GSUB consumption. Non-Kannada content passes through byte-identical.

Kannada specifics

  • R1 Repha enabled — Ra (U+0CB0) + Halant (U+0CCD) at syllable start is detected, the pair is stripped from the cluster and re-emitted after the reordered output so the font's 'rphf' GSUB feature can substitute the Repha glyph.
  • No pre-base matras — I (U+0CBF) and E (U+0CC6) are above-base in Kannada, not pre-base. The pre-base buffer is always empty for valid Kannada text.
  • Five split matras with Unicode 16.0 canonical decompositions:
    • U+0CC0 II → U+0CBF (above) + U+0CD5 (post-base length mark).
    • U+0CC7 EE → U+0CC6 (above) + U+0CD5 (post-base length mark).
    • U+0CC8 AI → U+0CC6 (above) + U+0CD6 (above-base AI length mark) — both components above-base.
    • U+0CCA O → U+0CC6 (above) + U+0CC2 (post-base UU).
    • U+0CCB OO → U+0CC6 (above) + U+0CC2 (post) + U+0CD5 (post) — three-part split, unique among Phase 8f scripts.
  • Above-base matras: I (U+0CBF), E (U+0CC6), AU (U+0CCC), AI length mark (U+0CD6).
  • Below-base matras: Vocalic R / RR (U+0CC3U+0CC4), Vocalic L / LL matras (U+0CE2U+0CE3).
  • Post-base matras: AA (U+0CBE), U / UU (U+0CC1U+0CC2), post-base length mark (U+0CD5).
  • Halant is U+0CCD.

Reorder rules applied

  • R1 Repha: Ra + Halant at syllable start re-emitted at the end of the syllable.
  • R3 Above-base matras: I / E / AU / AI-length-mark emit after the base block.
  • R4 Below-base matras: Vocalic R / RR / L / LL emit after the above-base block.
  • R5 Post-base matras: AA / U / UU / post-base length mark emit after the below-base block.
  • Split matra decomposition: five splits routed to above + post (II, EE), above + above (AI), above + post (O), and above + post + post three-part (OO).

Output layout per syllable: [base + halant + bindu/visarga/modifier] + [above-matras] + [below-matras] + [post-matras] + [Repha: Ra Halant]?. Conjuncts (C + Halant + C) preserved in the base block. Single-pass and idempotent.

Example

var
  Wide: UnicodeString;
begin
  // Input: KA (U+0C95) + OO-matra (U+0CCB, three-part split)
  Wide:= Doc.ApplyKannadaReorder(#$0C95#$0CCB);
  // Wide is now: KA + E (U+0CC6, above) + UU (U+0CC2, post) + length-mark (U+0CD5, post)
end;

See also

Standards

  • Unicode 16.0 §12.9 (Kannada)
  • Unicode 16.0 IndicSyllabicCategory.txt, IndicPositionalCategory.txt, and UnicodeData.txt (canonical decomposition source)
  • ISO 32000-1 §9.10 (extraction of text content)
  • OpenType Kannada shaping spec (script tag 'knda')

Version history

  • v2.119.75 — Introduced in Phase 8f.6. Complete shaper (R1 + R3 + R4 + R5 + five split-matra decompositions including a three-part split for U+0CCB OO). Kannada becomes the sixth registered Indic script.