`ApplyTeluguReorder`

Signature

function ApplyTeluguReorder(const Wide: UnicodeString): UnicodeString;

Purpose

Applies the Telugu reorder pre-pass to Wide and returns the reordered UnicodeString ready for cmap + GSUB consumption. Non-Telugu content passes through byte-identical.

Telugu specifics

R1 Repha enabled — Ra (U+0C30) + Halant (U+0C4D) at syllable start is detected, the pair is stripped from the cluster and re-emitted after the reordered output so the font's 'rphf' GSUB feature can substitute the Repha glyph (similar to Devanagari / Bengali / Gujarati).
No pre-base matras — the pre-base buffer is always empty for valid Telugu text. All Telugu matras are above-base, below-base, or split. This is Telugu’s most distinguishing trait among registered Indic scripts.
Many above-base matras: AA (U+0C3E), I (U+0C3F), II (U+0C40), E (U+0C46), EE (U+0C47), O (U+0C4A), OO (U+0C4B), AU (U+0C4C), and length mark (U+0C55).
Below-base matras: U / UU / Vocalic R / Vocalic RR (U+0C41–U+0C44), AI length mark (U+0C56), Vocalic L / LL matras (U+0C62–U+0C63).
1 split matra: U+0C48 AI decomposes into U+0C46 (E above-base) + U+0C56 (AI length mark below-base) at reorder time. This is the first registered split that routes to above + below rather than the pre+post seen in Bengali / Tamil.
Halant is U+0C4D.

Reorder rules applied

R1 Repha: Ra + Halant at syllable start re-emitted at the end of the syllable.
R3 Above-base matras: AA / I / II / E / EE / O / OO / AU / length mark emit after the base block.
R4 Below-base matras: U / UU / Vocalic R / RR, AI length mark, Vocalic L / LL matras emit after the above-base block.
Split matra decomposition: U+0C48 AI routed to above + below buffers as documented above.
No R2 pre-base and no R5 post-base emissions are produced under normal Telugu input.

Output layout per syllable: [base + halant + bindu/visarga/modifier] + [above-matras] + [below-matras] + [Repha: Ra Halant]?. Conjuncts (C + Halant + C) preserved in the base block. Single-pass and idempotent.

Example

var
  Wide: UnicodeString;
begin
  // Input: KA (U+0C15) + AI-matra (U+0C48, split)
  Wide:= Doc.ApplyTeluguReorder(#$0C15#$0C48);
  // Wide is now: KA + E (U+0C46, above) + AI-mark (U+0C56, below)
end;

Standards

Unicode 16.0 §12.8 (Telugu)
Unicode 16.0 IndicSyllabicCategory.txt and IndicPositionalCategory.txt
ISO 32000-1 §9.10 (extraction of text content)
OpenType Telugu shaping spec (script tag 'telu')

Version history

v2.119.74 — Introduced in Phase 8f.5. Complete shaper (R1 + R3 + R4 + 1 split-matra decomposition; NO pre-base matras — Telugu-specific). Telugu becomes fifth registered Indic script.

ApplyTeluguReorder