ApplyTamilReorder

Signature

function ApplyTamilReorder(const Wide: UnicodeString): UnicodeString;

Purpose

Applies the Tamil reorder pre-pass to Wide and returns the reordered UnicodeString ready for cmap + GSUB consumption. Non-Tamil content passes through byte-identical.

Tamil divergences from other Brahmic scripts

  • NO Repha — Tamil traditionally does not form Repha (no 'rphf' visual). The Ra+Halant→syllable-end pre-pass is disabled for Tamil; Ra (U+0BB0) + PULLI (U+0BCD) + Consonant stays in source order.
  • I-matra (U+0BBF) is POST-base — unique among Brahmic scripts (Devanagari / Bengali / Gujarati all use pre-base I). Algorithm routes U+0BBF to the post-base buffer.
  • 3 split matras decompose at reorder time: U+0BCA O → U+0BC6 (E pre) + U+0BBE (AA post); U+0BCB OO → U+0BC7 (EE pre) + U+0BBE (AA post); U+0BCC AU → U+0BC6 (E pre) + U+0BD7 (AU-mark post).
  • Halant is named PULLI in Tamil at U+0BCD.

Reorder rules applied

  • R2 Pre-base matras: U+0BC6 E, U+0BC7 EE, U+0BC8 AI move to syllable start.
  • R3 Above-base matras: U+0BC0 II emits after the base block.
  • R5 Post-base matras: U+0BBE AA, U+0BBF I (Tamil-specific post-base!), U+0BC1U+0BC2 U/UU, U+0BD7 AU length mark emit after above-base.
  • Split matras: U+0BCAU+0BCC decompose into their canonical (pre, post) components as documented above.
  • No R1 Repha and no R4 below-base in the Tamil main block.

Output layout per syllable: [pre-matras] + [base + pulli + bindu/visarga/modifier] + [above-matras] + [post-matras]. No Repha is appended at the end.

Conjuncts (C + PULLI + C) preserved in the base block. Single-pass and idempotent.

Example

var
  Wide: UnicodeString;
begin
  // Input: KA (U+0B95) + O-matra (U+0BCA, split)
  Wide:= Doc.ApplyTamilReorder(#$0B95#$0BCA);
  // Wide is now: E (U+0BC6) + KA + AA (U+0BBE)  (canonical decomposition)
end;

See also

Standards

  • Unicode 16.0 §12.7 (Tamil)
  • Unicode 16.0 IndicSyllabicCategory.txt and IndicPositionalCategory.txt
  • ISO 32000-1 §9.10 (extraction of text content)
  • OpenType Tamil shaping spec (script tag 'taml')

Version history

  • v2.119.73 — Introduced in Phase 8f.4. Complete shaper (R2-R5 + 3 split-matra decompositions; NO Repha — Tamil-specific). Tamil becomes fourth registered Indic script.