ApplyTamilReorder
Signature
function ApplyTamilReorder(const Wide: UnicodeString): UnicodeString;
Purpose
Applies the Tamil reorder pre-pass to Wide and returns
the reordered UnicodeString ready for cmap + GSUB consumption.
Non-Tamil content passes through byte-identical.
Tamil divergences from other Brahmic scripts
- NO Repha — Tamil traditionally does not form Repha (no
'rphf'visual). The Ra+Halant→syllable-end pre-pass is disabled for Tamil;Ra (U+0BB0) + PULLI (U+0BCD) + Consonantstays in source order. - I-matra (
U+0BBF) is POST-base — unique among Brahmic scripts (Devanagari / Bengali / Gujarati all use pre-base I). Algorithm routesU+0BBFto the post-base buffer. - 3 split matras decompose at reorder time:
U+0BCAO →U+0BC6(E pre) +U+0BBE(AA post);U+0BCBOO →U+0BC7(EE pre) +U+0BBE(AA post);U+0BCCAU →U+0BC6(E pre) +U+0BD7(AU-mark post). - Halant is named PULLI in Tamil at
U+0BCD.
Reorder rules applied
- R2 Pre-base matras:
U+0BC6E,U+0BC7EE,U+0BC8AI move to syllable start. - R3 Above-base matras:
U+0BC0II emits after the base block. - R5 Post-base matras:
U+0BBEAA,U+0BBFI (Tamil-specific post-base!),U+0BC1–U+0BC2U/UU,U+0BD7AU length mark emit after above-base. - Split matras:
U+0BCA–U+0BCCdecompose into their canonical (pre, post) components as documented above. - No R1 Repha and no R4 below-base in the Tamil main block.
Output layout per syllable: [pre-matras] + [base + pulli + bindu/visarga/modifier] + [above-matras] + [post-matras]. No Repha is appended at the end.
Conjuncts (C + PULLI + C) preserved in the base block.
Single-pass and idempotent.
Example
var
Wide: UnicodeString;
begin
// Input: KA (U+0B95) + O-matra (U+0BCA, split)
Wide:= Doc.ApplyTamilReorder(#$0B95#$0BCA);
// Wide is now: E (U+0BC6) + KA + AA (U+0BBE) (canonical decomposition)
end;
See also
ApplyIndicReorder— total dispatcher.ApplyDevanagariReorder— Devanagari counterpart.ApplyBengaliReorder— Bengali counterpart.ApplyGujaratiReorder— Gujarati counterpart.GetTamilCategory— Unicode codepoint → category lookup.
Standards
- Unicode 16.0 §12.7 (Tamil)
- Unicode 16.0
IndicSyllabicCategory.txtandIndicPositionalCategory.txt - ISO 32000-1 §9.10 (extraction of text content)
- OpenType Tamil shaping spec (script tag
'taml')
Version history
- v2.119.73 — Introduced in Phase 8f.4. Complete shaper (R2-R5 + 3 split-matra decompositions; NO Repha — Tamil-specific). Tamil becomes fourth registered Indic script.