Vocabulary
In language AI, vocabulary is token-based and directly affects how models handle rare terms, domain language, and multilingual variation.
Definition
The set of tokens or words that a language model can recognise and process when analysing or generating text.
How It Works
Vocabulary helps teams build predictable AI and translation workflows by setting clear expectations for quality, consistency, and decision-making.
In production environments, this concept is applied with process controls such as human review, terminology alignment, and repeatable quality checks across multilingual content.
Subword tokenisation improves rare-word coverage, but professional translation still depends on terminology governance and human review.
Key Concepts
- core principle of vocabulary
- workflow-level implementation
- terminology and quality consistency
- human validation before publication
Where It Is Used
- localisation workflows
- AI translation pipelines
- multilingual content production
- cross-referencing related concepts such as Validation Dataset