Meet us at NVIDIA GTC 2026.Learn More

Large Language Models (LLMs)

Tokenization

Tokenization is the process of breaking text into smaller pieces called tokens—such as words or subwords—that a language model can understand. For example, "ChatGPT" might become "Chat" and "GPT." These tokens are then converted into numbers the model uses to process language. Tokenization affects how much text a model can handle at once, how fast it runs, and how accurate its output is. In short, it's the first step in helping AI read and work with language.

Tokenization is the process of breaking text into smaller pieces called tokens—such as words or subwords—that a language model can understand. For example, "ChatGPT" might become "Chat" and "GPT." These tokens are then converted into numbers the model uses to process language.

Tokenization affects how much text a model can handle at once, how fast it runs, and how accurate its output is. In short, it's the first step in helping AI read and work with language.

FAQ

Tokenization is the process of breaking text into smaller pieces called tokens like words or subwords so a language model can understand it. It's the first step that helps AI read and work with language.