ApplyBengaliReorder

Signature

function ApplyBengaliReorder(const Wide: UnicodeString): UnicodeString;

Purpose

Applies the Bengali reorder pre-pass to Wide and returns the reordered UnicodeString ready for cmap + GSUB consumption. Non-Bengali content (Latin, digits, punctuation, other scripts including other Indic scripts) passes through byte-identical.

Reorder rules applied

  • R1 Repha: when a syllable starts with Ra (U+09B0) + Halant (U+09CD) + Consonant, the (Ra, Halant) pair moves to the syllable end.
  • R2 Pre-base matras: U+09BF I, U+09C7 E, U+09C8 AI move to the syllable start. Note Bengali E/AI are pre-base, unlike Devanagari where they are above-base.
  • R3 Above-base matras: (empty for Bengali — no above-base matras in the main block)
  • R4 Below-base matras: U+09C1U+09C4 U/UU/Vocalic R/RR, U+09E2U+09E3 Vocalic L/LL emit after the base.
  • R5 Post-base matras: U+09BE AA, U+09C0 II, U+09D7 AU length mark emit after below-base.
  • Split matras: U+09CB Oo decomposes to U+09C7 (pre) + U+09BE (post); U+09CC AU decomposes to U+09C7 (pre) + U+09D7 (post).

Output layout per syllable: [pre-matras] + [base + halant + nukta + bindu/visarga/modifier] + [below-matras] + [post-matras] + [Repha: Ra Halant]?

Conjuncts (C + Halant + C) preserved in the base block. Single-pass and idempotent.

Example

var
  Wide: UnicodeString;
begin
  // Input: KA + Oo-matra (single codepoint U+09CB)
  Wide:= Doc.ApplyBengaliReorder(#$0995#$09CB);
  // Wide is now: U+09C7 + KA + U+09BE  (Oo decomposed to pre+post)
end;

See also

Standards

  • Unicode 16.0 §12.2 (Bengali)
  • Unicode 16.0 IndicSyllabicCategory.txt and IndicPositionalCategory.txt
  • ISO 32000-1 §9.10 (extraction of text content)
  • OpenType Bengali shaping spec

Version history

  • v2.119.71 — Introduced in Phase 8f.2. Complete shaper (R1, R2, R4, R5 + split-matra decomposition). Bengali becomes second registered Indic script after Devanagari.