ApplyTeluguReorder
Signature
function ApplyTeluguReorder(const Wide: UnicodeString): UnicodeString;
Purpose
Applies the Telugu reorder pre-pass to Wide and returns
the reordered UnicodeString ready for cmap + GSUB consumption.
Non-Telugu content passes through byte-identical.
Telugu specifics
- R1 Repha enabled — Ra (
U+0C30) + Halant (U+0C4D) at syllable start is detected, the pair is stripped from the cluster and re-emitted after the reordered output so the font's'rphf'GSUB feature can substitute the Repha glyph (similar to Devanagari / Bengali / Gujarati). - No pre-base matras — the pre-base buffer is always empty for valid Telugu text. All Telugu matras are above-base, below-base, or split. This is Telugu’s most distinguishing trait among registered Indic scripts.
- Many above-base matras: AA (
U+0C3E), I (U+0C3F), II (U+0C40), E (U+0C46), EE (U+0C47), O (U+0C4A), OO (U+0C4B), AU (U+0C4C), and length mark (U+0C55). - Below-base matras: U / UU / Vocalic R / Vocalic RR (
U+0C41–U+0C44), AI length mark (U+0C56), Vocalic L / LL matras (U+0C62–U+0C63). - 1 split matra:
U+0C48AI decomposes intoU+0C46(E above-base) +U+0C56(AI length mark below-base) at reorder time. This is the first registered split that routes to above + below rather than the pre+post seen in Bengali / Tamil. - Halant is
U+0C4D.
Reorder rules applied
- R1 Repha: Ra + Halant at syllable start re-emitted at the end of the syllable.
- R3 Above-base matras: AA / I / II / E / EE / O / OO / AU / length mark emit after the base block.
- R4 Below-base matras: U / UU / Vocalic R / RR, AI length mark, Vocalic L / LL matras emit after the above-base block.
- Split matra decomposition:
U+0C48AI routed to above + below buffers as documented above. - No R2 pre-base and no R5 post-base emissions are produced under normal Telugu input.
Output layout per syllable: [base + halant + bindu/visarga/modifier] + [above-matras] + [below-matras] + [Repha: Ra Halant]?. Conjuncts (C + Halant + C) preserved in the base block. Single-pass and idempotent.
Example
var
Wide: UnicodeString;
begin
// Input: KA (U+0C15) + AI-matra (U+0C48, split)
Wide:= Doc.ApplyTeluguReorder(#$0C15#$0C48);
// Wide is now: KA + E (U+0C46, above) + AI-mark (U+0C56, below)
end;
See also
ApplyIndicReorder— total dispatcher.ApplyDevanagariReorder— Devanagari counterpart.ApplyBengaliReorder— Bengali counterpart.ApplyGujaratiReorder— Gujarati counterpart.ApplyTamilReorder— Tamil counterpart.GetTeluguCategory— Unicode codepoint → category lookup.
Standards
- Unicode 16.0 §12.8 (Telugu)
- Unicode 16.0
IndicSyllabicCategory.txtandIndicPositionalCategory.txt - ISO 32000-1 §9.10 (extraction of text content)
- OpenType Telugu shaping spec (script tag
'telu')
Version history
- v2.119.74 — Introduced in Phase 8f.5. Complete shaper (R1 + R3 + R4 + 1 split-matra decomposition; NO pre-base matras — Telugu-specific). Telugu becomes fifth registered Indic script.