ApplyIndicReorder
Signature
function ApplyIndicReorder(const Wide: UnicodeString): UnicodeString;
Purpose
Total entry point for Indic-script reorder pre-pass. Walks Wide
left-to-right, dispatches each codepoint to the matching registered
TIndicScriptInfo entry by Unicode-block range, and applies
that script's syllable + reorder callbacks. Non-Indic codepoints pass
through byte-identical. Script-boundary transitions inside Wide
automatically segment.
Registered scripts (v2.120.10 Phase 8f.10 — batch complete)
- Devanagari (
'deva', U+0900–U+097F) — complete shaper (R1-R5) - Bengali (
'beng', U+0980–U+09FF) — complete shaper (R1, R2, R4, R5 + split-matra decomposition) - Gujarati (
'gujr', U+0A80–U+0AFF) — complete shaper (R1-R5; no split matras) - Tamil (
'taml', U+0B80–U+0BFF) — complete shaper (R2-R5 + 3 split-matra decompositions; NO Repha — Tamil-specific) - Telugu (
'telu', U+0C00–U+0C7F) — complete shaper (R1+R3+R4 + 1 split-matra decomposition; no pre-base matras) - Kannada (
'knda', U+0C80–U+0CFF) — complete shaper (R1+R3+R4+R5 + 5 split-matra decompositions including 1 three-part split forU+0CCBOO; no pre-base matras) - Malayalam (
'mlym', U+0D00–U+0D7F) — complete shaper (R1+R2+R4+R5 + 3 split-matra decompositions). I-matra (U+0D3F) is post-base (Tamil-like, unique vs Devanagari/Bengali/Gujarati). Chillu letters (U+0D54–U+0D56,U+0D7A–U+0D7F) and DOT REPH (U+0D4E) classified as consonants. - Sinhala (
'sinh', U+0D80–U+0DFF) — complete shaper (R1+R2+R3+R4+R5 + 3 split-matra decompositions). Three pre-base matras (E=U+0DD9, EE=U+0DDA, AI=U+0DDB) — the highest pre-base-matra count among Phase 8f scripts.U+0DDDOO is a three-part split (pre + post + post). Completes the Brahmic SIA (South Indic Aryan) family. - Khmer (
'khmr', U+1780–U+17FF) — first South-East Asian script; independent syllable FSM (not Brahmic R1-R5). NO Repha. COENG (U+17D2) + Consonant pairs form stacked subscripts and stay in-cluster (GSUB handles subscript positioning). Six pre-base vowels (E/AE/AI/OE/OO/AU) move to syllable start; register shifters (U+17C9MUUSIKATOAN,U+17CATRIISAP) and other signs route to above-base; Bindu (NIKAHITU+17C6) → above; Visarga (REAHMUK / YUUKALEAPINTUU+17C7/U+17C8) → post. - Myanmar (
'mymr', U+1000–U+109F + U+AA60–U+AA7F Extended-A) — most complex syllable structure in the batch. NO Repha. Kinzi 3-CP prefix (U+1004+U+103A+U+1039) detected at syllable start, held aside, and emitted at output start per R8. Pre-base vowel E (U+1031) moves to syllable start per R10. Four medial consonants (U+103BYA,U+103CRA,U+103DWA,U+103EHA) emitted in fixed Y → R → W → H order regardless of source order per R9. ASAT (U+103A) and VIRAMA (U+1039) both treated as virama. Reorder algorithm uses 8 buffer slots: Kinzi + PreVowel + Base + (MedialY/R/W/H) + Above + Below + Post. TwoIndicScriptsentries (main block + Extended-A) share the same Myanmar reorder functions.
11-Phase non-Devanagari Indic shaping batch complete after this phase: Phases 8f.0 (infrastructure) and 8f.1–8f.10 (10 registered scripts — Brahmic SIA family + 2 South-East Asian scripts). Future shaping work may add Myanmar Extended-B / Extended-C, Tibetan, Lao / Thai SE Asian scripts, or other Unicode §12-§16 ranges in later Phase 8g+.
Producer-side automatic application
When sfIndicShaping is included in FShapingFeatures,
ApplyIndicReorder is invoked automatically inside the three
BuildUnicode*FieldContent helpers used for AcroForm appearance
stream generation. Callers that bypass BuildUnicode* can call
ApplyIndicReorder directly before feeding text into the
cmap + GSUB pipeline (via SetGSUBScript('deva'), etc.).
Example
var
Wide: UnicodeString;
begin
Wide:= Doc.ApplyIndicReorder('Hello '+ #$0915#$093F+ ' world.');
// Result: 'Hello ' + I-matra + KA + ' world.'
// (Latin segments unchanged; Devanagari segment reordered.)
end;
See also
ApplyDevanagariReorder— Devanagari-only wrapper (v2.119.55 backward compat).GetDevanagariCategory— Unicode codepoint → Devanagari category lookup.
Standards
- Unicode 16.0 chapters 12 (South Asian) and 16 (Southeast Asian)
- ISO 32000-1 §9.10 (extraction of text content)
- OpenType per-script shaping specs (Devanagari and siblings)
Version history
- v2.119.69 — Introduced in Phase 8f.0. Ships with Devanagari registered (R1 + R2 only, inherited from Phase 8e).
- v2.119.70 — Devanagari upgraded to complete shaper (R1-R5 + conjunct preservation) in Phase 8f.1.
- v2.119.71 — Bengali registered as second Indic script (Phase 8f.2).
- v2.119.72 — Gujarati registered as third Indic script (Phase 8f.3).
- v2.119.73 — Tamil registered as fourth Indic script (Phase 8f.4).
- v2.119.74 — Telugu registered as fifth Indic script (Phase 8f.5).
- v2.119.75 — Kannada registered as sixth Indic script (Phase 8f.6). First script to demonstrate a three-part split-matra decomposition.
- v2.119.76 — Malayalam registered as seventh Indic script (Phase 8f.7). Adds chillu consonants (U+0D54-U+0D56, U+0D7A-U+0D7F) and DOT REPH (U+0D4E) as Malayalam-specific consonant categories; I-matra post-base shared with Tamil.
- v2.119.77 — Sinhala registered as eighth Indic script (Phase 8f.8). Three pre-base matras (E/EE/AI) — most among Phase 8f scripts.
U+0DDDOO is a three-part split (pre + post + post). Completes the Brahmic SIA (South Indic Aryan) family. - v2.120.9 — Khmer registered as ninth Indic script (Phase 8f.9). First South-East Asian script; independent syllable FSM with COENG (
U+17D2) subscript handling that stays in-cluster (no Repha, no Brahmic R1-R5). Six pre-base vowels rotate to syllable start; register shifters and other signs route to above-base; Bindu and Visarga get dedicated above / post routing. Per Unicode 16.0 §16.4 and OpenType Khmer shaping spec. - v2.120.10 — Myanmar registered as tenth and final Indic script (Phase 8f.10). Two
IndicScriptsentries cover Myanmar core block (U+1000–U+109F) and Extended-A (U+AA60–U+AA7F). Most complex syllable structure in the batch: Kinzi 3-CP prefix detection (R8), pre-base E vowel rotation (R10), fixed Y → R → W → H medial sorting (R9), ASAT / VIRAMA stacked-consonant handling, DOT BELOW / tone mark routing. 8-slot buffer model. Completes the 11-Phase non-Devanagari Indic shaping batch (Phases 8f.0–8f.10): infrastructure + 10 registered Indic scripts covering the Brahmic SIA family plus 2 South-East Asian scripts. Per Unicode 16.0 §16.3 and OpenType Myanmar shaping spec.