ApplyTeluguReorder

Signature

function ApplyTeluguReorder(const Wide: UnicodeString): UnicodeString;

Purpose

Applies the Telugu reorder pre-pass to Wide and returns the reordered UnicodeString ready for cmap + GSUB consumption. Non-Telugu content passes through byte-identical.

Telugu specifics

  • R1 Repha enabled — Ra (U+0C30) + Halant (U+0C4D) at syllable start is detected, the pair is stripped from the cluster and re-emitted after the reordered output so the font's 'rphf' GSUB feature can substitute the Repha glyph (similar to Devanagari / Bengali / Gujarati).
  • No pre-base matras — the pre-base buffer is always empty for valid Telugu text. All Telugu matras are above-base, below-base, or split. This is Telugu’s most distinguishing trait among registered Indic scripts.
  • Many above-base matras: AA (U+0C3E), I (U+0C3F), II (U+0C40), E (U+0C46), EE (U+0C47), O (U+0C4A), OO (U+0C4B), AU (U+0C4C), and length mark (U+0C55).
  • Below-base matras: U / UU / Vocalic R / Vocalic RR (U+0C41U+0C44), AI length mark (U+0C56), Vocalic L / LL matras (U+0C62U+0C63).
  • 1 split matra: U+0C48 AI decomposes into U+0C46 (E above-base) + U+0C56 (AI length mark below-base) at reorder time. This is the first registered split that routes to above + below rather than the pre+post seen in Bengali / Tamil.
  • Halant is U+0C4D.

Reorder rules applied

  • R1 Repha: Ra + Halant at syllable start re-emitted at the end of the syllable.
  • R3 Above-base matras: AA / I / II / E / EE / O / OO / AU / length mark emit after the base block.
  • R4 Below-base matras: U / UU / Vocalic R / RR, AI length mark, Vocalic L / LL matras emit after the above-base block.
  • Split matra decomposition: U+0C48 AI routed to above + below buffers as documented above.
  • No R2 pre-base and no R5 post-base emissions are produced under normal Telugu input.

Output layout per syllable: [base + halant + bindu/visarga/modifier] + [above-matras] + [below-matras] + [Repha: Ra Halant]?. Conjuncts (C + Halant + C) preserved in the base block. Single-pass and idempotent.

Example

var
  Wide: UnicodeString;
begin
  // Input: KA (U+0C15) + AI-matra (U+0C48, split)
  Wide:= Doc.ApplyTeluguReorder(#$0C15#$0C48);
  // Wide is now: KA + E (U+0C46, above) + AI-mark (U+0C56, below)
end;

See also

Standards

  • Unicode 16.0 §12.8 (Telugu)
  • Unicode 16.0 IndicSyllabicCategory.txt and IndicPositionalCategory.txt
  • ISO 32000-1 §9.10 (extraction of text content)
  • OpenType Telugu shaping spec (script tag 'telu')

Version history

  • v2.119.74 — Introduced in Phase 8f.5. Complete shaper (R1 + R3 + R4 + 1 split-matra decomposition; NO pre-base matras — Telugu-specific). Telugu becomes fifth registered Indic script.