Turn a model endpoint into one boring local command so the rest of the
stack has a concrete surface to wrap.
What you'll build
A CLI that wraps any model endpoint.
By the end of this lab you will have a single command that sends a prompt
to a model (or a stand-in) and returns structured JSON. That surface is
what every other lab in this sequence wraps, validates, and extends.
The script uses a toy response by default so you can focus on interface
shape without needing an API key. Swapping in a real provider is one
environment variable.
Run it
cd ai_ecosystem_labs
python3 00-model-access/model_cli.py "Hello from the lab" --json
Starting here? Quick setup
git clone https://github.com/BanditF/ai_ecosystem_labs
cd ai_ecosystem_labs
python3 00-model-access/model_cli.py "Hello from the lab" --json
Requires Python 3.8+. No additional packages needed for this lab.
Time guide. Setup: ~2 min. Working through it: 15–25 min if you are mostly focused on the interface shape.
Why this piece exists
The rest of the stack needs something stable to call.
Without this layer, every tool, hook, and agent loop in your stack has to
re-decide how to reach a model: which provider, which API shape, which
credentials. That decision creeps into code that shouldn't care about it.
A single CLI surface fixes this. Callers send a prompt and get a response.
The details — endpoint, API key, model name — live in environment variables
and stay out of the tools themselves. This is exactly what Ollama does at a
larger scale: one local command that anything can call.
The code
model_cli.py
Walk through it
Four things worth noticing.
call_model() is a seam, not a detail
The function signature — prompt, endpoint,
api_key, model — is the stable contract.
Right now the body returns a toy string. To connect a real model,
you replace only the body, not the signature. Everything that calls
call_model() keeps working.
Environment variables keep secrets out of code
os.getenv("TOY_MODEL_ENDPOINT", "local://toy-model")
means the script works with no setup, but a real deployment just sets
an env var — no code change. This is the pattern every real model
client uses. Credentials never get committed.
--json makes output machine-readable
Without --json, the script prints a human string. With it,
it prints a JSON object a tool, agent, or test can parse reliably. The
shape — ok, endpoint, prompt,
response — stays the same every run. That predictability
is what later labs depend on.
argparse gives you stable flags for free
Using argparse instead of reading sys.argv
directly gives you --help, type checking, and consistent
error messages at no cost. A tool an AI calls should always have a
stable, documented flag interface — not "it works if you pass things
in the right order."
Expected output
What a successful run looks like.
Without --json:
Toy model response to: Hello from the lab
With --json:
{
"ok": true,
"endpoint": "local://toy-model",
"api_key": "missing",
"model": "toy-v1",
"prompt": "Hello from the lab",
"response": "Toy model response to: Hello from the lab"
}
If you see this shape, the lab is working. The api_key field
shows "missing" by default — that is expected. It will show
"set" once you point it at a real provider.
Try this
Three things to try before moving on.
Change the model name.
Run with --model gpt-4o and check the JSON output. The model
field changes, but nothing else does. This is the point — callers can
request different models without the rest of the stack caring.
Set a fake endpoint via environment variable.
Run TOY_MODEL_ENDPOINT=myhost://custom python3 00-model-access/model_cli.py "test" --json.
The endpoint field in the output reflects the env var. No code change needed.
This is how you would point the script at Ollama, OpenAI, or any compatible endpoint.
Try the real-call path.
Set TOY_MODEL_API_KEY to your OpenAI-compatible key and run the
script again. The output shape is identical — only the source of the answer
changes. Notice the script does not need to know which provider you are using
as long as the response format matches.
What you just built
The plumbing that everything else stands on.
You now have a local command that turns a prompt into a structured response.
It does not matter whether that response comes from a toy function, a local
Ollama instance, or the OpenAI API — the interface is the same. That is the
whole point of this lab: fix the surface so the layers above it do not have
to care about the details below.
In production systems, this layer is what Ollama, LiteLLM, and provider SDKs
provide. You just built the concept from scratch. Lab 01 adds the first
real capability on top of it.
Concepts behind this
The full decision framework for model access — hosted APIs, aggregators,
local runners, and when to use each — lives on the
model access concept page.
If you are connecting a real provider key, read
API key security first.