Count Your Tokens Before They Hatch!
The various subscription plans offered by Anthropic, OpenAI and Google are designed to obfuscate the underlying cost. It's better to maintain your own calculations of input, thought and output tokens and their cost to optimise token usage and avoid bill-shock.
Token Pricing
As an example the pricing for all Google's models is avaiable here. I use the gemini-2.5-flash-lite model exclusively for the moment. This costs:
- $0.10 per 1M input tokens
- $0.40 per 1M output tokens
Input Cost
To calculate the input cost I take the input token count, divide by one million and multiply by $0.10. For example if my input token count is 800 the cost will be 800 / 1,000,000 * 0.1 which is a total of $0.00008.
Output Cost
For output costs, divie the output token count by one million and multiply by $0.40. For example output token count of 50 equals 50 / 1,000,000 * 0.4 which is $0.00002.
Maintain Live Count And Aggregate Cost
In your application calclate the cost of each call to the LLM and maintain a live count of input, thought and output tokens and the corresponding aggregate cost of each.
This will give you granular control over your token usage and costs. In the event the terms of the subscription you are on change you also have a good body of data to accurately predict your new token and cost range.
It actually makes sense to have a central databasee where you store the token counts and cost. Here is the cost for a table to capture each call to the LLM and the token count and cost:
- use mydb;
- drop table if exists token_cost;
- create table token_cost
- (
- id bigint auto_increment not null primary key,
- captured timestamp not null default current_timestamp,
- captured_from varchar(100) not null,
- input_tokens int not null,
- thought_tokens int not null,
- output_tokens int not null,
- input_cost double not null,
- thought_cost double not null,
- output_cost double not null,
- index idx_token_cost_from (captured_from)
- );
Note the use of an index on captured_from. Here is the code for a view to access the table:
- use mydb;
- drop view if exists vw_token_cost;
- create view vw_token_cost as
- select id, captured, captured_from, input_tokens, thought_tokens, output_tokens,
- input_tokens + thought_tokens + output_tokens as total_tokens,
- input_cost, thought_cost, output_cost,
- input_cost + thought_cost + output_cost as total_cost
- from token_cost;
A summary query can then be run against vw_token_cost to determine the live cost of each system.
- use mydb;
- select captured_from, count(total_tokens) as token_count, sum(total_cost) as total_cost
- from vw_token_cost
- group by captured_from;
Output for my moder8 system in dev:
| captured_from | token_count | total_cost |
|---|---|---|
| moder8 | 98 | 0.0067418 |
Future Enhancement
As a future enhancement you could create a table that defines the actual cost per 1M tokens for each system.