← Back to glossary Browse letter V hub

Vocabulary

In language AI, vocabulary is token-based and directly affects how models handle rare terms, domain language, and multilingual variation.

Definition

The set of tokens or words that a language model can recognise and process when analysing or generating text.

How It Works

Vocabulary helps teams build predictable AI and translation workflows by setting clear expectations for quality, consistency, and decision-making.

In production environments, this concept is applied with process controls such as human review, terminology alignment, and repeatable quality checks across multilingual content.

Subword tokenisation improves rare-word coverage, but professional translation still depends on terminology governance and human review.

Key Concepts

  • core principle of vocabulary
  • workflow-level implementation
  • terminology and quality consistency
  • human validation before publication

Where It Is Used

  • localisation workflows
  • AI translation pipelines
  • multilingual content production
  • cross-referencing related concepts such as Validation Dataset

Explore Trad AI

Open the workspace