Labs — AI Tooling Field Guide

Before you start

Labs are hands-on. Bring a terminal and Python 3.

If you came here to understand the ecosystem, the concept pages already did the job. The labs are optional. They are for people who want to build the pieces themselves and feel how they connect.

Most labs are quick to start, but they are not all the same size. Each detail page has a rough setup and working-time guide, which is more honest than pretending every lab takes the same 15–30 minutes.

Just the mental model

You're done. Go back to Models, Protocols, or Agents if you want to go deeper on a concept.

Build the pieces

Start at Lab 00 and follow the sequence. Each lab adds one layer to the stack from the previous one.

Jump to MCP

Go straight to Lab 03b for a real MCP server quickstart. No prior labs required.

Quick reference

All labs, in one scannable list.

If you already know what you want, start here. The path guidance below is optional.

Optional companion pages: bootstrap a real local model endpoint and move the broker behind a backend boundary.

Make the hidden boundaries visible.

These labs are not the start of the whole journey. They start once you have picked a model-access path. The model access page is the part that gets you to that starting line.

These are not polished products. They are practice pieces. By the end, you should have a tiny local stack that grows from model access into a command-line interface, tools, structured boundaries, memory, coordination, approvals, logs, and evals.

The runnable files live in the repository's labs/ folder. This page explains the idea; the linked artifacts are the pieces to inspect, run, and change.

Suggested setup

Use a scratch folder, Python 3, and your shell. Keep every tool boring: deterministic input, structured output, and clear failure modes.

Clone the labs repo and run examples from the repo root, for example: python3 labs/01-cli/term_count.py agent labs/sample_docs/*.txt

Choose a starting point

Pick the reader-type card that matches where you are.

This is the path chooser. These cards are not numbered steps. They are just different starting points into the same lab map.

Repo and environment setup

Get the repo ready before lab 00

The labs assume one thing only: you have chosen how you will reach a model. That choice can be a subscription product, a direct provider API, a managed model platform, a router, or a local host.

Pick the access path. Use starting paths if you are not sure whether you need a product, API, platform, router, or local host.

Reduce it to one boring interface. Before any agent tooling exists, prove you can make one repeatable request and inspect one repeatable response. If that path uses a provider key, read API key security before you wire it into a host.

Then enter lab 00. From that point on, the labs are about wrapping and extending the model surface, not choosing it.

Step shortcuts

Use one of these only if the numbered bootstrap steps above already partly describe your situation.

This is shortcut guidance for the 1 → 2 → 3 setup sequence, not a second path chooser.

Start at step 1

If you do not yet know whether you are using a subscription, API, platform, router, or local host, start with the access-choice step and work straight through.

Skip to step 2

If you already have a provider API key, local endpoint, or managed platform access, you have chosen the access path. Reduce it to one boring request shape, then continue into lab 00.

Skip to step 3

If you are arriving through a hosted CLI agent surface, treat that CLI as your starting model surface. Read lab 00 for translation, then jump to lab 01 if you want to focus on tooling layers.

Optional deeper entrance

Want to start from a real local endpoint? Do it before bootstrap.

This is not a separate curriculum. It is the deeper version of the same beginning for people who want to start from an actual local model instead of a toy model surface or hosted endpoint.

Choose a tiny open model

Pick a small instruct model with permissive enough terms and modest hardware needs. Optimize for learnability and easy hosting, not prestige.
Choose one runtime

Run one local host that can expose a stable endpoint. Avoid mixing several runtimes at the same time while learning the boundary.
Download one artifact variant

Pick one artifact format and quantization that the runtime actually supports, then document exactly what was chosen and why.
Prove one local request

Before any tooling work begins, show one repeatable prompt and response against the local endpoint.
Translate into lab 00

Once the local endpoint is real, wrap it in the same boring interface shape used by lab 00. From there, the main lab path stays the same.

Artifact: optional real local bootstrap path. Reference pages: model access and local hosting and model artifacts.

Smoke-test and recovery path

Use the smoke-test path when you want to verify or recover

Run all examples with labs/run_all.py. Restore mutable lab files with labs/reset.py.

python3 labs/run_all.py
python3 labs/reset.py

The mini stack you will assemble

Lab	You need	You add	Why it matters
Bootstrap	A decision about model access	One chosen path plus one repeatable request or CLI surface	The rest of the lab path only makes sense after the model surface exists.
Optional pre-bootstrap	No usable model surface yet	A real local model endpoint	This is the deeper version of the same beginning, not a different journey.
0	A way to talk to a model	A tiny model CLI	Model access becomes something you can actually use.
1	A useful action outside chat	A dumb CLI an AI can call	The model can now rely on a deterministic capability.
2	Machine-readable results	A stable JSON wrapper	Tool calls become easier to validate, log, and replay.
3	Tool discovery	A tiny protocol adapter	A host can discover and call tools without knowing CLI flags.
4	Repeatable judgment	A skill/procedure file	The system learns when and how to use the tool well.
5	Boundary checks	A lifecycle hook	Policy and logging happen without changing the tool.
6	Multi-step work	A tiny agent loop	The system can observe, decide, act, and evaluate.
7	Durable state	A memory/task graph	Work survives restarts and dependency order becomes visible.
8	More than one worker	A workspace coordinator	Claims and handoffs keep parallel work from colliding.
9	A usable control surface	A host-like CLI	Users can inspect tools, approve calls, and see results.
10	Trust and repeatability	Governance, evals, and tool-call logs	Actions become auditable and failures become visible.
11	The whole shape	A capstone flow	The pieces form one small governed workflow.

Lab rules

Keep the pieces small enough to understand.

Prefer boring tools

A useful agent tool has predictable flags, predictable output, and predictable errors. Fancy comes later.

Use JSON at boundaries

JSON makes tool results easy to inspect, log, validate, replay, and pass between layers.

Log decisions

If an agent can act, you should be able to see what it tried, what happened, and why the next step was chosen.

Main spine

Build from model access up to a governed agent

Each lab now has its own page. Use this hub to keep the sequence in view, then open the dedicated page when you want the fuller teaching copy, runnable command, artifact links, and a real-world analog.

Bootstrap the model surface

Choose the path to the model, then reduce it to one stable request or CLI surface before you build any tooling on top.

Real-world analog: curl for proving one boring request/response path.

Open the bootstrap page.

Start with model access

Turn model access into one repeatable local command so the rest of the stack has something concrete to wrap.

Real-world analog: Ollama CLI.

Open lab 00.

Build a dumb CLI tool an AI can call

Build one deterministic capability with stable flags, useful exit codes, and no hidden state.

Real-world analog: Git CLI, especially commands like git status and git grep.

Open lab 01.

Wrap the CLI in a stable JSON interface

Keep the tool, but give it one machine-readable output shape that callers can validate, log, and replay.

Real-world analog: ripgrep's JSON mode.

Open lab 02.

Build a tiny protocol adapter

Expose discovery and tool calling as a protocol boundary instead of making every host learn raw CLI flags.

Real-world analog: Model Context Protocol.

Open lab 03.

03b

Build a real MCP server

Use the official Python MCP SDK to build a server that exposes one tool, runs over stdio, and connects to a real host.

Real-world analog: any MCP server in the official MCP servers registry.

Open lab 03b.

Write a skill/procedure file

Separate the procedure from the tool so the usage pattern is reusable, reviewable, and teachable.

Real-world analog: Make and similar committed automation scripts.

Open lab 04.

Add a hook/lifecycle automation example

Add policy and logging around the tool without editing the tool itself.

Real-world analog: Git hooks.

Open lab 05.

Build a tiny agent loop

Make the observe-decide-act-evaluate loop visible before you let a real model hide that control flow.

Real-world analog: LangGraph.

Open lab 06.

Build a memory/task graph

Persist task state and dependencies so work survives restarts and the next unblocked task is always visible.

Real-world analog: Taskwarrior.

Open lab 07.

Build a workspace coordinator

Coordinate multiple workers with a queue, claims, and a readable handoff trail.

Real-world analog: GitHub Actions.

Open lab 08.

Build a host-like CLI agent

Give the user a control surface that can list tools, request approval, and dispatch approved calls.

Real-world analog: Aider.

Open lab 09.

Add governance, evals, and logging around a tool call

Surround the stack with audit records, replayable evals, and explicit policy outcomes.

Real-world analog: OpenTelemetry.

Open lab 10.

Wire the tiny stack together

Combine host, tools, policy, durable state, and evals into one small governed workflow.

Real-world analog: OpenHands.

Open lab 11.

Optional paths

Take the side paths when they match your starting point.

These are not mandatory next steps in the main spine. They are useful companion tracks when you want to go deeper into persistent assistants or credential boundaries.

Optional

Build a toy persistent assistant platform

This optional extension turns the late-game comparison into working code: a tiny gateway with channels, durable memory, skills, routing, scheduling, approvals, and logs.

Real-world analogs: OpenClaw (an early-stage open-source project) and Hermes Agent (a research-adjacent open-source project).

Open the persistent assistant platform lab.

Security

Security side path: put model access behind a localhost broker

This optional security side path adds a narrow local proxy so a host or agent can use model access without receiving the raw provider key directly.

Real-world analog: internal API gateways and local credential brokers.

Open the local-broker side path.

Prompt Patterns

Zero-shot, few-shot, chain-of-thought, and structured output side by side.

Open lab 14.

RAG Pipeline

Build a retrieval-augmented generation pipeline from scratch in pure Python.

Open lab 15.

Eval Suite

Systematic evaluation framework — test cases, assertion types, scoring, tag filtering.

Open lab 16.

Model Selector

Decision tool that scores models against your requirements — cost, speed, privacy.

Open lab 17.

Fine-tune Dataset Prep

Prepare and validate an instruction fine-tuning dataset. No GPU required.

Open lab 18.

Token Budget

Count tokens, compare costs across providers, and project monthly spend.

Open lab 19.

Local

Optional path: bootstrap a real local model endpoint

Probe a real local runtime, list models, and prove one boring local response before you rejoin the main lab spine.

Real-world analogs: Ollama, LM Studio, and OpenAI-compatible local servers.

Open the real local bootstrap path.

Backend

Optional path: move the broker behind a backend boundary

Replace the localhost-only teaching split with a stronger production-shaped boundary where the backend owns the provider secret and the host presents only backend credentials.

Real-world analogs: internal API gateways, backend-for-frontend services, managed-identity-backed apps.

Open the backend-broker path.

Where the lab stack differs from real tools

How the lab stack differs from real tools

Compare shape, not polish

Your toy protocol is not MCP, your task graph is not Beads (a niche open-source project), and your coordinator is not Gas Town (a small open-source coordination project). But the boundaries should feel familiar: typed calls, durable state, claims, handoffs, approvals, and logs.

Then inspect real products

Look at the tooling catalog and ask the same questions: what is open, what is hosted, what runs locally, where does memory live, who approves actions, and what becomes long-running once the system leaves the terminal?