ChatGPT – Token Trimming

Each chat message sent has a token value based on the total content of all the messages in the chat session. As each chat session logs messages, the token total for the chat session increases as the chat session is used. For more information, see OpenAI’s documentation on tokens and how to count them.

Each model has a limit on the number of tokens that can be used per request. For methods in the Chats method category, the available models have token limits as follows:

NameModelToken limit
GPT-4ogpt-4o4,096
GPT-4o latestchatgpt-4o-latest16,384
GPT-4o minigpt-4o-mini16,384
GPT-4 turbogpt-4-turbo4,096
GPT-4gpt-48,192
GPT-3.5 turbogpt-3.5-turbo4,096

To automatically manage chat session tokens, methods in the Chats method category have Trim Tokens? and Custom Token Limit parameters that you can use.

Trim Tokens?

Set this parameter to true to automatically trim chat sessions down to below the token limit for the model. If no model is found, a default token limit of 4096 is used. When you trim a chat session, Cyclr trims the oldest message in the chat session first. This process repeats until the chat session is below the token limit, and then Cyclr sends the request with the trimmed chat session.

Note: The token total for a chat session is an approximation made by Cyclr and may not reflect the token total the API calculates.

Warning: If you enable Trim Tokens?, messages that Cyclr removes from chat sessions are permanently removed.

Custom Token Limit

Set this parameter to force a custom token limit for Cyclr to trim a chat session down to. This overrides the limit for the model and the default token limit. There is a lower limit of 100 tokens for this value to make sure that at least one message can be sent in the request.