Grok – Token Trimming

Each chat message sent has a token value based on the total content of all the messages in the chat session. As each chat session logs messages, the token total for the chat session increases as the chat session is used.

Each model has a limit on the number of tokens that can be used per request.

For Methods in the Chats Category, the available models have token limits as follows:

NameModelToken Limit
Grok 4grok-4-0709256,000
Grok 3grok-3131,072
Grok 3 Minigrok-3-mini131,072
Grok 3 Fastgrok-3-fast131,072
Grok 3 Mini Fastgrok-3-mini-fast131,072
Grok 2 Visiongrok-2-vision-121232,768

To automatically manage chat session tokens, methods in the Chats method category have 2 parameters you can use:

  • Trim Tokens?
  • Custom Token Limit

Trim Tokens?

Set this parameter to true to automatically trim chat sessions down to below the token limit for the model. If no model is found, a default token limit of 131,072 is used. When you trim a chat session, Cyclr trims the oldest message in the chat session first. This process repeats until the chat session is below the token limit, and then Cyclr sends the request with the trimmed chat session.

Note: The token total for a chat session is calculated by Cyclr calling the tokenizer for each new message sent, and capturing the response tokens for each response message from Grok.

Warning: If you enable Trim Tokens?, messages that Cyclr removes from chat sessions are permanently removed.

Warning: The Trim Tokens? parameter should not be changed after you have created the chat session to ensure token calculations remain accurate.

Custom Token Limit

Set this parameter to force a custom token limit for Cyclr to trim a chat session down to. This overrides the limit for the model and the default token limit. There is a lower limit of 100 tokens for this value to make sure that at least one message can be sent in the request.