Hands on

Playground experiments

The labs give you a straight line. This is where you go off-road — alternate tools, tradeoff comparisons, and experiments you invent yourself.

Build one useful capability several ways.

The best way to understand the ecosystem is to pick a small task and move it up the stack. The labs give that path a clear sequence. This playground is where you try alternate tools, compare tradeoffs, and invent your own small experiments.

If an experiment needs a real provider key, read API key security before you start improvising with shell variables or local configs.

Different ways to build the same capability

Same arc, different choices

These experiments parallel the main lab sequence. Each one assumes you have model access working and asks: what happens if you try it with a different tool, a different pattern, or a different tradeoff? The labs give you one good path; these give you the variation.

1. Wrap model access

Start with an API key, endpoint, or local address, then create a boring command that sends a prompt and returns a visible result.

Why it matters

Before tools and agents matter, model access needs a repeatable interface.

2. Add a deterministic tool

Create a small command that accepts flags, emits JSON, returns meaningful exit codes, and supports --dry-run.

Why it matters

Agents work better with tools that are explicit, inspectable, and easy to validate.

3. Add a protocol boundary

Expose the same behavior as a typed tool with a schema and structured result. MCP is the real protocol to compare against.

Why it matters

Protocols make capabilities discoverable and portable across hosts.

4. Add hooks

Run checks before dangerous inputs, after generated files, or before committing outputs.

Why it matters

Safety and consistency should not depend only on the model remembering rules.

5. Compare hosts

Try the same task through direct shell, a skill-guided agent, a protocol adapter, and a CLI AI wrapper.

Why it matters

The same underlying tool can feel very different depending on UX, approvals, and context policy.

Small enough to build, rich enough to teach.

Doc indexer

Read local docs, produce a JSON index, and answer "where is this concept explained?"

Repo health checker

Inspect a workspace for package files, tests, git status, unfinished tasks, and missing docs.

Glossary builder

Extract terms from docs, detect undefined jargon, and suggest plain English definitions.

Signals of a clean experiment

A small capability should leave a clean trail

Criterion Question
Determinism Can the capability be run twice with predictable results?
Discoverability Can an agent or human understand what the tool does without reading the source?
Safety Are dangerous actions explicit, gated, or dry-run by default?
Observability Can we see what happened, what inputs were used, and why it failed?
Portability Can the same capability be reused by another host or agent?

The doc indexer experiment

1

CLI: doc-index docs/ --json returns pages, headings, and terms.

2

Skill: instructions tell the agent when to rebuild the index and how to use it while answering questions.

3

MCP: the index becomes a resource the host can query without knowing the CLI details.

4

Hook: the index refreshes after docs change, so stale context is easier to catch.