Tokenization

« Back to Glossary Index

Tokenization is the process of breaking text into individual units, such as words or subwords, for input into a language model. Example: Tokenizing the sentence “I am ChatGPT” might produce the tokens: “I,” “am,” “Chat,” “G,” and “PT.”