GetTamilCategory

Signature

function GetTamilCategory(CP: Cardinal): Integer;

Purpose

Pure Unicode-codepoint → Tamil syllabic-category lookup. No font state required. Returns one of 13 category codes (same numbering as GetDevanagariCategory).

Return values

CodeCategoryExample codepoints
0OtherU+0BD0 OM, U+0BF0–U+0BFF Tamil numerals/symbols
1ConsonantU+0B95–U+0BB9 (coarse range; reserved gaps treated as consonants for simplicity — real Tamil text never uses them)
2Independent vowelU+0B85–U+0B8A, U+0B8E–U+0B90, U+0B92–U+0B94
3Matra (dependent vowel sign)U+0BBE–U+0BC2, U+0BC6–U+0BC8, U+0BCA–U+0BCC, U+0BD7
4Virama (PULLI)U+0BCD
6Bindu (anusvara)U+0B82
7Visarga (aytham)U+0B83
9DigitU+0BE6–U+0BEF
10ZWJU+200D
11ZWNJU+200C

Notable Tamil-specific assignments

  • I-matra (U+0BBF): MatraPos = 2 (post-base) — unique among Brahmic scripts. Devanagari / Bengali / Gujarati use pre-base I.
  • II-matra (U+0BC0): MatraPos = 3 (above-base).
  • E (U+0BC6) / EE (U+0BC7) / AI (U+0BC8): MatraPos = 1 (pre-base).
  • O (U+0BCA) / OO (U+0BCB) / AU (U+0BCC): MatraPos = 5 (split — decomposed by ApplyTamilReorder).
  • AU length mark (U+0BD7): MatraPos = 2 (post-base) — emitted by U+0BCC split.
  • Halant is named PULLI in Tamil (U+0BCD).
  • Visarga (U+0B83) is the Tamil aytham.
  • No category 5 Nukta (Tamil has no Nukta codepoint in main block) and no Danda (Tamil uses Latin punctuation).

See also

Version history

  • v2.119.73 — Introduced in Phase 8f.4.