Score a small model catalog against real requirements like privacy, context window, speed, capability, and estimated monthly cost so model choice feels more like decision-making and less like brand preference.
What you'll build
A tiny ranking tool for practical tradeoffs.
This lab takes a handful of model profiles and turns them into something you can score. The script starts with default requirements, lets you override them with JSON on the command line, estimates monthly cost, and prints a ranked table.
It is intentionally opinionated and imperfect. That is the point. Model selection always involves weighting tradeoffs, and this script makes those weights visible instead of hiding them in gut feeling.
Run it
cd ai_ecosystem_labs
python3 17-model-selector/model_selector.py
python3 17-model-selector/model_selector.py '{"private": true}'
Starting here? Quick setup
git clone https://github.com/BanditF/ai_ecosystem_labs
cd ai_ecosystem_labs
python3 17-model-selector/model_selector.py
No dependencies needed. Pass a JSON object as the first argument to override requirements.
Time guide. Setup: ~2 min. Working through it: 20–35 min, mostly around the scoring logic and tradeoff weights.
Walk through it
Four script sections do the work.
1. Model catalog
MODELS is a hand-written catalog with context size, token pricing, speed, capability, privacy posture, and strengths. It is small enough to audit by eye, which is useful when you want to understand the scoring behavior.
2. Hard constraints first
score_model() now treats privacy and minimum context as hard filters. If a model cannot satisfy the requirement, it gets a score of zero and disappears from the recommendations.
3. Weighted tradeoffs
After constraints, the script adds points for cost fit, speed, capability floor, and task strengths. That is a useful pattern even if you later replace the exact weights with your own.
4. Cost estimation and CLI overrides
estimate_cost() assumes 70% input and 30% output tokens, then the CLI lets you swap in different requirements with a JSON blob. That makes the same catalog behave very differently for private deployments, long-context work, or high-volume usage.
The code
model_selector.py
Expected output
What the default ranking looks like.
Model Selection Tool
Requirements: {
"min_context_k": 32,
"private": false,
"cost_sensitivity": "medium",
"speed": "medium",
"min_capability": 6,
"monthly_tokens": 500000,
"strengths": [
"general"
]
}
Model Provider Score Est. $/mo Context
────────────────────────────────────────────────────────────────────────
gpt-4o OpenAI 7.15 $2.38 128K
claude-3-5-sonnet Anthropic 7.13 $3.30 200K
llama-3.3-70b Local (Ollama) 6.9 free 128K
gemini-1.5-pro Google 6.88 $1.19 1000K
gpt-4o-mini OpenAI 6.6 $0.14 128K
claude-3-haiku Anthropic 5.59 $0.28 200K
llama-3.2-3b Local (Ollama) 3.5 free 128K
Recommendation: gpt-4o (OpenAI)
Strengths: general, code, reasoning, vision
Note: prices change — verify at provider pricing pages before budgeting.
The exact winner depends on your inputs. For example, {"private": true} removes hosted models entirely, and a very large min_context_k now filters out short-context options instead of merely penalizing them.
Try this
Three requirement sets worth comparing.
Run with default requirements and note the ranking. This is your baseline for the built-in catalog.
Run with '{"private": true}'. Observe that only local models remain in the table.
Run with '{"cost_sensitivity": "high", "monthly_tokens": 5000000}'. Compare how the ranking changes when spend becomes a first-class constraint.
Concepts behind this
Read Model Selection for the bigger framing: benchmarks, privacy tradeoffs, context windows, and when local models actually make sense.
This also pairs nicely with Lab 16. In practice, evals tell you the quality side of the tradeoff, and a selector like this helps combine that with cost and operational constraints.