Lab 19: Token Budget — AI Tooling Field Guide

What you'll build

A small script that turns token math into budget math.

A lot of AI cost confusion comes from mixing up request shape, context window, and monthly usage. This lab makes those pieces visible. It estimates token counts, compares provider pricing, and projects what repeated usage would cost at a given call volume.

By default it uses a rough word-based approximation, which is enough to understand the economics. If you install tiktoken, the same script can switch to exact token counts for supported OpenAI models.

Run it

cd ai_ecosystem_labs
python3 19-token-budget/token_budget.py

Starting here? Quick setup

git clone https://github.com/BanditF/ai_ecosystem_labs
cd ai_ecosystem_labs
python3 19-token-budget/token_budget.py

No required dependencies. Optional: pip install tiktoken for exact token counts where supported.

Time guide. Setup: ~2 min. Working through it: 20–35 min if you walk through both counting and cost projection.

Walk through it

Four sections turn one sample request into a budget.

1. Provider catalog

PROVIDERS stores per-million-token pricing plus context window sizes for each model. That lets the same token counts be reused across cost comparison and context usage math.

2. Approximate versus exact token counts

count_tokens_approx() uses a simple words-times-1.3 heuristic. count_tokens() upgrades to tiktoken when available. That makes the difference between rough planning and provider-specific precision easy to feel.

3. Request-level cost estimation

estimate_cost() computes input cost, output cost, total cost, and context usage percentage for one request. compare_all_models() then sorts every configured model from cheapest to most expensive.

4. Monthly projection

monthly_projection() multiplies a single request by calls per day and a 30-day month. That is the moment when a cheap-looking per-call number starts to feel operationally real.

The code

token_budget.py

Expected output

What the sample comparison looks like.

Token Budget Tool
──────────────────────────────────────────────────

Token counting method: word approximation (install tiktoken for exact counts)
Sample system+user: 24 input tokens
Sample response: 39 output tokens

Cost comparison for 24 input + 39 output tokens:
Model                       Total cost   Input cost  Output cost
─────────────────────────────────────────────────────────────────
gpt-4o-mini-batch         $    0.00001 $    0.00000 $    0.00001
gpt-4o-mini               $    0.00003 $    0.00000 $    0.00002
claude-3-haiku            $    0.00006 $    0.00001 $    0.00005
o1-mini                   $    0.00020 $    0.00003 $    0.00017
gpt-4o                    $    0.00045 $    0.00006 $    0.00039
claude-3-5-sonnet         $    0.00066 $    0.00007 $    0.00059
o1                        $    0.00270 $    0.00036 $    0.00234

Monthly projection (gpt-4o, 1000 calls/day):
  Calls per month: 30,000
  Cost per call: $0.000450
  Monthly estimate: $13.50

Monthly projection (gpt-4o-mini, 1000 calls/day):
  Calls per month: 30,000
  Cost per call: $0.000027
  Monthly estimate: $0.81

If you install tiktoken, the exact token counts may shift a bit, but the overall cost pattern should stay roughly the same.

Try this

Three ways to make the budget feel real.

Run the script and compare gpt-4o vs gpt-4o-mini cost for the same request. Notice how small per-call differences become meaningful once you imagine real traffic.
Edit the script and increase calls_per_day to 10000. Re-run and observe how the monthly projection scales.
Install tiktoken, then re-run. Compare the exact counts against the approximation and decide whether the difference matters for your use case.

Concepts behind this

Read Cost & Tokens for the broader framing around context windows, caching, and why output tokens are often the expensive part.

This also pairs nicely with Lab 17, because token economics are one of the main reasons the “best” model is not always the right one.

Where to go next

Back to all labs →