Just the mental model
You're done. Go back to Models, Protocols, or Agents if you want to go deeper on a concept.
Hands on
These labs turn the map into something you can touch. Each one builds a deliberately small version of one AI tooling idea with plain files, shell commands, and simple JSON.
Before you start
If you came here to understand the ecosystem, the concept pages already did the job. The labs are optional. They are for people who want to build the pieces themselves and feel how they connect.
Most labs are quick to start, but they are not all the same size. Each detail page has a rough setup and working-time guide, which is more honest than pretending every lab takes the same 15–30 minutes.
You're done. Go back to Models, Protocols, or Agents if you want to go deeper on a concept.
Start at Lab 00 and follow the sequence. Each lab adds one layer to the stack from the previous one.
Go straight to Lab 03b for a real MCP server quickstart. No prior labs required.
Quick reference
If you already know what you want, start here. The path guidance below is optional.
Optional companion pages: bootstrap a real local model endpoint and move the broker behind a backend boundary.
These labs are not the start of the whole journey. They start once you have picked a model-access path. The model access page is the part that gets you to that starting line.
These are not polished products. They are practice pieces. By the end, you should have a tiny local stack that grows from model access into a command-line interface, tools, structured boundaries, memory, coordination, approvals, logs, and evals.
The runnable files live in the repository's labs/ folder.
This page explains the idea; the linked artifacts are the pieces to
inspect, run, and change.
Use a scratch folder, Python 3, and your shell. Keep every tool boring: deterministic input, structured output, and clear failure modes.
Clone the labs repo
and run examples from the repo root, for example:
python3 labs/01-cli/term_count.py agent labs/sample_docs/*.txt
Choose a starting point
This is the path chooser. These cards are not numbered steps. They are just different starting points into the same lab map.
Repo and environment setup
The labs assume one thing only: you have chosen how you will reach a model. That choice can be a subscription product, a direct provider API, a managed model platform, a router, or a local host.
Pick the access path. Use starting paths if you are not sure whether you need a product, API, platform, router, or local host.
Reduce it to one boring interface. Before any agent tooling exists, prove you can make one repeatable request and inspect one repeatable response. If that path uses a provider key, read API key security before you wire it into a host.
Then enter lab 00. From that point on, the labs are about wrapping and extending the model surface, not choosing it.
Step shortcuts
This is shortcut guidance for the 1 → 2 → 3 setup sequence, not a second path chooser.
If you do not yet know whether you are using a subscription, API, platform, router, or local host, start with the access-choice step and work straight through.
If you already have a provider API key, local endpoint, or managed platform access, you have chosen the access path. Reduce it to one boring request shape, then continue into lab 00.
If you are arriving through a hosted CLI agent surface, treat that CLI as your starting model surface. Read lab 00 for translation, then jump to lab 01 if you want to focus on tooling layers.
Optional deeper entrance
This is not a separate curriculum. It is the deeper version of the same beginning for people who want to start from an actual local model instead of a toy model surface or hosted endpoint.
Pick a small instruct model with permissive enough terms and modest hardware needs. Optimize for learnability and easy hosting, not prestige.
Run one local host that can expose a stable endpoint. Avoid mixing several runtimes at the same time while learning the boundary.
Pick one artifact format and quantization that the runtime actually supports, then document exactly what was chosen and why.
Before any tooling work begins, show one repeatable prompt and response against the local endpoint.
Once the local endpoint is real, wrap it in the same boring interface shape used by lab 00. From there, the main lab path stays the same.
Artifact: optional real local bootstrap path. Reference pages: model access and local hosting and model artifacts.
Smoke-test and recovery path
Run all examples with labs/run_all.py.
Restore mutable lab files with labs/reset.py.
python3 labs/run_all.py
python3 labs/reset.py
| Lab | You need | You add | Why it matters |
|---|---|---|---|
| Bootstrap | A decision about model access | One chosen path plus one repeatable request or CLI surface | The rest of the lab path only makes sense after the model surface exists. |
| Optional pre-bootstrap | No usable model surface yet | A real local model endpoint | This is the deeper version of the same beginning, not a different journey. |
| 0 | A way to talk to a model | A tiny model CLI | Model access becomes something you can actually use. |
| 1 | A useful action outside chat | A dumb CLI an AI can call | The model can now rely on a deterministic capability. |
| 2 | Machine-readable results | A stable JSON wrapper | Tool calls become easier to validate, log, and replay. |
| 3 | Tool discovery | A tiny protocol adapter | A host can discover and call tools without knowing CLI flags. |
| 4 | Repeatable judgment | A skill/procedure file | The system learns when and how to use the tool well. |
| 5 | Boundary checks | A lifecycle hook | Policy and logging happen without changing the tool. |
| 6 | Multi-step work | A tiny agent loop | The system can observe, decide, act, and evaluate. |
| 7 | Durable state | A memory/task graph | Work survives restarts and dependency order becomes visible. |
| 8 | More than one worker | A workspace coordinator | Claims and handoffs keep parallel work from colliding. |
| 9 | A usable control surface | A host-like CLI | Users can inspect tools, approve calls, and see results. |
| 10 | Trust and repeatability | Governance, evals, and tool-call logs | Actions become auditable and failures become visible. |
| 11 | The whole shape | A capstone flow | The pieces form one small governed workflow. |
Lab rules
A useful agent tool has predictable flags, predictable output, and predictable errors. Fancy comes later.
JSON makes tool results easy to inspect, log, validate, replay, and pass between layers.
If an agent can act, you should be able to see what it tried, what happened, and why the next step was chosen.
Main spine
Each lab now has its own page. Use this hub to keep the sequence in view, then open the dedicated page when you want the fuller teaching copy, runnable command, artifact links, and a real-world analog.
Choose the path to the model, then reduce it to one stable request or CLI surface before you build any tooling on top.
Real-world analog: curl for proving one boring request/response path.
Turn model access into one repeatable local command so the rest of the stack has something concrete to wrap.
Real-world analog: Ollama CLI.
Build one deterministic capability with stable flags, useful exit codes, and no hidden state.
Real-world analog: Git CLI, especially commands like git status and git grep.
Keep the tool, but give it one machine-readable output shape that callers can validate, log, and replay.
Real-world analog: ripgrep's JSON mode.
Expose discovery and tool calling as a protocol boundary instead of making every host learn raw CLI flags.
Real-world analog: Model Context Protocol.
Use the official Python MCP SDK to build a server that exposes one tool, runs over stdio, and connects to a real host.
Real-world analog: any MCP server in the official MCP servers registry.
Separate the procedure from the tool so the usage pattern is reusable, reviewable, and teachable.
Real-world analog: Make and similar committed automation scripts.
Add policy and logging around the tool without editing the tool itself.
Real-world analog: Git hooks.
Make the observe-decide-act-evaluate loop visible before you let a real model hide that control flow.
Real-world analog: LangGraph.
Persist task state and dependencies so work survives restarts and the next unblocked task is always visible.
Real-world analog: Taskwarrior.
Coordinate multiple workers with a queue, claims, and a readable handoff trail.
Real-world analog: GitHub Actions.
Give the user a control surface that can list tools, request approval, and dispatch approved calls.
Real-world analog: Aider.
Surround the stack with audit records, replayable evals, and explicit policy outcomes.
Real-world analog: OpenTelemetry.
Combine host, tools, policy, durable state, and evals into one small governed workflow.
Real-world analog: OpenHands.
Optional paths
These are not mandatory next steps in the main spine. They are useful companion tracks when you want to go deeper into persistent assistants or credential boundaries.
This optional extension turns the late-game comparison into working code: a tiny gateway with channels, durable memory, skills, routing, scheduling, approvals, and logs.
Real-world analogs: OpenClaw (an early-stage open-source project) and Hermes Agent (a research-adjacent open-source project).
This optional security side path adds a narrow local proxy so a host or agent can use model access without receiving the raw provider key directly.
Real-world analog: internal API gateways and local credential brokers.
Zero-shot, few-shot, chain-of-thought, and structured output side by side.
Build a retrieval-augmented generation pipeline from scratch in pure Python.
Systematic evaluation framework — test cases, assertion types, scoring, tag filtering.
Decision tool that scores models against your requirements — cost, speed, privacy.
Prepare and validate an instruction fine-tuning dataset. No GPU required.
Probe a real local runtime, list models, and prove one boring local response before you rejoin the main lab spine.
Real-world analogs: Ollama, LM Studio, and OpenAI-compatible local servers.
Replace the localhost-only teaching split with a stronger production-shaped boundary where the backend owns the provider secret and the host presents only backend credentials.
Real-world analogs: internal API gateways, backend-for-frontend services, managed-identity-backed apps.
Where the lab stack differs from real tools
Your toy protocol is not MCP, your task graph is not Beads (a niche open-source project), and your coordinator is not Gas Town (a small open-source coordination project). But the boundaries should feel familiar: typed calls, durable state, claims, handoffs, approvals, and logs.
Look at the tooling catalog and ask the same questions: what is open, what is hosted, what runs locally, where does memory live, who approves actions, and what becomes long-running once the system leaves the terminal?