Lab 17

Lab 17: Model Selector

Score a small model catalog against real requirements like privacy, context window, speed, capability, and estimated monthly cost so model choice feels more like decision-making and less like brand preference.

What you'll build

A tiny ranking tool for practical tradeoffs.

This lab takes a handful of model profiles and turns them into something you can score. The script starts with default requirements, lets you override them with JSON on the command line, estimates monthly cost, and prints a ranked table.

It is intentionally opinionated and imperfect. That is the point. Model selection always involves weighting tradeoffs, and this script makes those weights visible instead of hiding them in gut feeling.

Run it

cd ai_ecosystem_labs
python3 17-model-selector/model_selector.py
python3 17-model-selector/model_selector.py '{"private": true}'
Starting here? Quick setup
git clone https://github.com/BanditF/ai_ecosystem_labs
cd ai_ecosystem_labs
python3 17-model-selector/model_selector.py

No dependencies needed. Pass a JSON object as the first argument to override requirements.

Time guide. Setup: ~2 min. Working through it: 20–35 min, mostly around the scoring logic and tradeoff weights.

Walk through it

Four script sections do the work.

1. Model catalog

MODELS is a hand-written catalog with context size, token pricing, speed, capability, privacy posture, and strengths. It is small enough to audit by eye, which is useful when you want to understand the scoring behavior.

2. Hard constraints first

score_model() now treats privacy and minimum context as hard filters. If a model cannot satisfy the requirement, it gets a score of zero and disappears from the recommendations.

3. Weighted tradeoffs

After constraints, the script adds points for cost fit, speed, capability floor, and task strengths. That is a useful pattern even if you later replace the exact weights with your own.

4. Cost estimation and CLI overrides

estimate_cost() assumes 70% input and 30% output tokens, then the CLI lets you swap in different requirements with a JSON blob. That makes the same catalog behave very differently for private deployments, long-context work, or high-volume usage.

The code

model_selector.py

Expected output

What the default ranking looks like.

Model Selection Tool
Requirements: {
  "min_context_k": 32,
  "private": false,
  "cost_sensitivity": "medium",
  "speed": "medium",
  "min_capability": 6,
  "monthly_tokens": 500000,
  "strengths": [
    "general"
  ]
}

Model                  Provider              Score  Est. $/mo   Context
────────────────────────────────────────────────────────────────────────
gpt-4o                 OpenAI                 7.15      $2.38     128K
claude-3-5-sonnet      Anthropic              7.13      $3.30     200K
llama-3.3-70b          Local (Ollama)          6.9       free     128K
gemini-1.5-pro         Google                 6.88      $1.19    1000K
gpt-4o-mini            OpenAI                  6.6      $0.14     128K
claude-3-haiku         Anthropic              5.59      $0.28     200K
llama-3.2-3b           Local (Ollama)          3.5       free     128K

Recommendation: gpt-4o (OpenAI)
Strengths: general, code, reasoning, vision

Note: prices change — verify at provider pricing pages before budgeting.

The exact winner depends on your inputs. For example, {"private": true} removes hosted models entirely, and a very large min_context_k now filters out short-context options instead of merely penalizing them.

Try this

Three requirement sets worth comparing.

  1. Run with default requirements and note the ranking. This is your baseline for the built-in catalog.
  2. Run with '{"private": true}'. Observe that only local models remain in the table.
  3. Run with '{"cost_sensitivity": "high", "monthly_tokens": 5000000}'. Compare how the ranking changes when spend becomes a first-class constraint.

Concepts behind this

Read Model Selection for the bigger framing: benchmarks, privacy tradeoffs, context windows, and when local models actually make sense.

This also pairs nicely with Lab 16. In practice, evals tell you the quality side of the tradeoff, and a selector like this helps combine that with cost and operational constraints.