A model-driven loop that observes, reasons, chooses actions, uses tools, and evaluates progress toward a goal. See agents and agent systems.
Agent framework
A developer toolkit for building agent loops, workflows, memory, tools, routing, and multi-agent systems.
Agent runtime
The software loop and state machine around the model: planning, memory, tool routing, retries, and termination. See agents and agent systems.
AI host
The application that owns the user experience and coordinates models, context, tools, approvals, and results.
API
A programmatic interface exposed by software. Agents often call APIs directly or through wrappers and protocols.
API key
A credential used to authenticate programmatic access to a hosted API. Treat it like a secret. See API key security.
B
Beads
A distributed graph issue tracker for AI agents, exposed through the bd CLI and designed as durable task memory. It is a niche open-source project rather than a mainstream default.
C
CLI AI wrapper
A terminal-facing AI application that wraps models, tools, context, approvals, and sometimes MCP or skills into one workflow.
CLI tool
A command-line program. Agents like CLIs because they are composable and can often be run in a controlled workspace.
Commercial hosted offering
A paid product where important parts run as a provider-operated service. Client code, SDKs, or docs may be public while the model or service remains proprietary.
Context
Information provided to the model for the current task: user prompt, files, tool results, memory, docs, or retrieved data.
Context window
The context window is the maximum amount of text a model can process in one call, measured in tokens, including both prompt and response. Bigger windows let you pass more chat history, code, or documents, but they cost more and can still lose track of details at the extremes, sometimes called the lost-in-the-middle problem.
Continuous batching
Instead of waiting for a fixed batch to fill and finish, the server keeps mixing new requests into work already in progress. Technically, it schedules overlapping inference streams so finished requests free capacity right away, which is a big part of why production servers like vLLM and TGI get better throughput.
D
Deployment plane
The part of a managed platform that lets users deploy, version, govern, and operate model endpoints rather than only calling a shared public API.
Direct model provider
A company or service that exposes its own hosted models through APIs, SDKs, and product surfaces.
E
Embedding
An embedding is a numeric representation of text or other data that captures rough meaning rather than exact wording. Technically, it is a vector in a high-dimensional space where similar items land near each other, which is why embeddings power semantic search, retrieval, and clustering.
Embedding model
A model that turns text or other inputs into vectors for search, retrieval, clustering, or similarity workflows.
Evaluation
A repeatable check that asks whether an AI tool or agent behaved as expected for a known input.
F
FOSS
Free and open-source software: software with a license that allows users to inspect, run, modify, and redistribute the code under defined terms.
Function calling
A model-provider feature where the model emits structured calls to application-defined functions.
G
Gas Town
A multi-agent workspace manager for coding agents, using concepts like Mayor, rigs, polecats, hooks, convoys, and Beads-backed work state from a niche memory project. It is a small open-source coordination project rather than a mainstream default.
GGUF
GGUF is the model file format used by llama.cpp and most local model runners. It packages weights, tokenizer data, and metadata into one binary file, replaced the older GGML format, and often appears with suffixes like Q4_K_M, where smaller numbers usually mean smaller files and lower precision.
Governance
Controls that keep AI systems safe and accountable: permissions, approvals, audit logs, evals, policy, and monitoring. See governance on the stack.
H
Hermes Agent
A Nous Research persistent agent project with a CLI, messaging gateway, memory, skills, scheduling, subagents, and tool execution according to its public README. It is a research-adjacent open-source project with early-stage adoption.
Hook
A callback triggered at a lifecycle point, commonly used for policy, context injection, automation, formatting, or logging. See hooks.
I
Inference endpoint
An API address where a client sends model inputs and receives outputs. It can be hosted by a provider or exposed by local hosting software.
Inference runtime
Software that loads a model and performs inference, often exposing a CLI, server, or local API.
J
JSON interface
A stable input/output shape using JSON so tools are easier for agents to call, validate, log, and replay.
K
KV cache
The KV cache is the saved attention work a transformer keeps around while generating the next token. Technically, it stores key and value tensors from earlier tokens so the model does not recompute them every step, and its memory use grows with context length.
L
Lab
A small hands-on exercise that builds a toy version of one AI tooling concept.
LangChain
A framework for building applications around model calls, prompts, tools, retrieval, and agent loops. It usually belongs in the framework/runtime part of the stack rather than at raw model access.
LangGraph
A graph-oriented orchestration layer in the LangChain ecosystem for stateful agent workflows and longer-running coordination.
Local endpoint
An inference endpoint running on your own machine or local network, commonly used by local model hosts and development tools.
Local hosting software
A desktop app, CLI, or server that runs model artifacts locally and often exposes a local API for other tools to call. See local hosting and model artifacts.
LSP
Language Server Protocol: a standard way for editors and tools to get code intelligence such as definitions and diagnostics.
M
MCP
Model Context Protocol: a client-server protocol for exposing tools, resources, and prompts to AI applications. See protocols and adapters.
MCP client
The host-side component that maintains a connection to a specific MCP server.
MCP server
A program that provides tools, resources, or prompts to an AI host using the MCP protocol.
Memory
Stored information from prior interactions or project history that can be retrieved for future tasks.
Model access
The path used to call a model: subscription product, direct API, managed platform, aggregate provider, local endpoint, or local model runtime. See model access.
Model artifact
The downloadable model file or checkpoint, such as weights or a quantized file. It is not the same thing as the runtime that serves it.
Model card
Documentation for a model, usually describing intended use, training notes, limitations, license, benchmarks, and safety considerations.
Model platform
A managed cloud surface that blends model access with deployment, enterprise identity, governance, evaluation, or model-catalog features. It is not just a provider API and not just a router. See managed model platforms.
Model weights
The learned parameters of a model. Access to weights affects whether a model can be run locally, inspected, modified, or redistributed under its license.
O
Open-source local tool
A tool whose important runtime code can be inspected and run locally. It may still call hosted model APIs unless paired with local models.
OpenClaw
A local-first personal AI assistant project with a Gateway, many channels, skills, toolsets, routing, and sandbox options according to its public README. It is an early-stage open-source project rather than a mainstream default. See the persistent assistant platform lab.
Orchestration
Coordination of multi-step, multi-agent, scheduled, or long-running workflows.
P
PagedAttention
PagedAttention is the core memory-management idea that made vLLM much better at serving many requests at once. Instead of reserving one fixed block of GPU memory per request, it pages KV cache in smaller chunks, a bit like virtual memory, which improves utilization and throughput.
Persistent agent
A long-running agent system with memory, scheduling, tool access, and interfaces beyond a single chat session.
Prompt
A reusable instruction or template. In MCP, prompts are user-controlled server-provided templates.
Protocol adapter
A wrapper that exposes an existing capability through a protocol-shaped interface, such as turning a CLI into a tool a host can discover and call.
Provider API
A hosted model API exposed by a direct provider or aggregator for programmatic use by applications, tools, and agents.
Q
Quantization
Quantization makes a model smaller by storing weights with less precision. In practice that usually means moving from 16- or 32-bit values to 8-bit or 4-bit values, trading some accuracy for lower memory use and faster local inference; labels like Q4 and Q8 tell you roughly how compressed the file is.
R
RAG (retrieval-augmented generation)
RAG means looking up relevant material first and then giving it to the model before it answers. Technically, it is a retrieval-plus-generation pipeline that usually uses embeddings and a vector store for the lookup step, making it the main alternative to fine-tuning when you want to add domain knowledge.
Resource
Contextual data exposed to the model or host, such as a file, database schema, log, or API response.
S
safetensors
safetensors is Hugging Face's safer model file format. Unlike older pickle-based .pt files, it does not execute arbitrary Python when loaded, and it is also designed for fast, straightforward tensor loading.
Sandbox
A constrained execution environment where an agent can run commands with controlled access and reduced risk.
Skill
A packaged unit of procedural knowledge that tells an agent how and when to perform a kind of task. See skills.
Source-available
Software where source code is visible but the license is not the same as an open-source license. Read the license before assuming fork, hosting, or commercial rights.
Subscription
A paid product access model, usually tied to a human-facing app or assistant surface. It is not automatically the same thing as API access.
T
Tokenizer
A tokenizer is the part that turns text into token IDs before a model sees it, then turns token IDs back into text after generation. Different models use different tokenizers, so token counts vary by model and content, but a rough rule of thumb is that 1,000 tokens is about 750 words.
Tool
An executable capability the agent can invoke, from a simple shell command to a typed remote API.
W
Wrapper
A layer that adapts one interface into another, such as wrapping a CLI as an MCP server or adding an AI loop around shell commands. See wrappers.
Terms in context
Tool
gh issue create is a tool because it does a concrete action.
Protocol
MCP is a protocol because it gives hosts and servers a shared way to describe and call capabilities.
Skill
A code-review skill is not the review itself. It is the reusable procedure for doing one well.
Ready to build
Turn the vocabulary back into a path.
If the words feel a little less slippery now, go back through Start Here
once at full speed. Then use the labs so the
vocabulary sticks to something real.