Tool Catalog — AI Tooling Field Guide

Labels: Current default = common practical choice right now, Useful niche = real tool, narrower audience or fit, Being absorbed = functionality increasingly bundled into broader tools, Legacy / historical = still useful for context, not the default direction.

Same ecosystem, different jobs and trust boundaries.

From far away, a CLI agent, a memory graph, a multi-agent coordinator, and a hosted coding assistant can all blur together as AI tooling. Up close, the useful questions are simpler: what runs locally, what is open, what depends on a hosted model, where state lives, and who gets to approve actions?

Useful split

Open-source, source-available, and commercial hosted tools can all be useful. The important difference is the portability, auditability, and lock-in tradeoffs they create.

Read categories first

Tools are shapes before they are brands.

Each major tool shape gets the same treatment: a metaphor, a plain explanation, a lab handoff, and real examples. The glossary stays short on purpose and the labs stay runnable on purpose.

The goal is to classify real tools by job, trust boundary, and stack fit, without pretending each new repo is the whole future.

Lifecycle tags used here

Current default Useful niche Being absorbed Legacy / historical

Keep older or transitional patterns visible, but mark whether they are still the common default.

Use categories before brands

Not every repo belongs in the catalog.

The catalog is slower on purpose. A tool gets in when it helps explain a category, has enough public material to classify responsibly, and still looks useful after the launch-week hype settles down.

Include it when

The tool has a clear stack fit, recognizable boundary, and a reason to teach it as more than a fleeting example.

Keep it on the radar when

The idea looks interesting but the category fit, docs quality, or staying power are still too fuzzy.

Use the category first

Product names are examples. The page should still make sense if a specific repo disappears next month.

Bleeding-edge watchlist: tool radar.

The major tool shapes, explained the same way every time.

Current default

Model access boundary

ELI5: Decide whether you are eating at a restaurant, ordering through a delivery window, or cooking in your own kitchen.

What it actually is: The first boundary in the stack: where the model runs, how you call it, and which costs, credentials, and controls come bundled.

Try it: Start with model access, then use the bootstrap step and lab 00.

Real tools: OpenAI, Anthropic, Azure AI Foundry, OpenRouter, Ollama, LM Studio.

Current default Absorbs other layers

Local CLI agent

ELI5: A power drill with the toolbox, flashlight, and safety trigger built into the same handle.

What it actually is: A host-shaped tool that bundles model access, context gathering, file edits, shell execution, permissions, and often protocols like MCP.

Try it: Build upward through lab 06, lab 09, and the capstone.

Real tools: Goose, Aider, OpenCode, Gemini CLI, Codex CLI, Crush.

Current default

Runtime framework

ELI5: A construction kit for building your own worker instead of buying the whole appliance.

What it actually is: A developer toolkit for agent loops, tools, memory, graphs, retries, and workflow state rather than an end-user assistant product.

Try it: The closest teaching path is lab 06, lab 07, and lab 08.

Real tools: LangChain, LangGraph, CrewAI, Pydantic AI, Microsoft Agent Framework (formerly Semantic Kernel), OpenAI Agents SDK.

Useful niche

Durable work memory and coordination

ELI5: A ticket wall and whiteboard that still exist after the apprentice goes home.

What it actually is: State that survives the chat: tasks, dependencies, claims, handoffs, schedules, and long-lived assistant context.

Try it: Use lab 07, lab 08, and the persistent assistant platform lab.

Real tools: Beads (niche open-source project), Gas Town (small open-source coordination project), OpenClaw (early-stage open-source project), Hermes Agent (research-adjacent open-source project).

Current default

Hosted assistant product

ELI5: A restaurant where the menu, kitchen, waiter, and billing are already bundled together.

What it actually is: A hosted product surface with account features, policy, model access, and product workflow already composed for the user.

Try it: Use starting paths, then translate that surface through the bootstrap step and lab 09.

Real tools: GitHub Copilot, Claude Code, Cursor, Windsurf, ChatGPT.

Openness

Separate the code, the service, and the model.

A tool can have an open client and still depend on a proprietary hosted model. A commercial product can publish SDKs or plugins without making the whole service open. The useful move is to name the boundary plainly.

Category	What is usually open	What may still be hosted or proprietary	Examples
Open-source or FOSS local tools	Client code, local runtime, extension points, and sometimes tests and docs.	Model APIs, cloud sandboxes, hosted sync, or optional enterprise features.	Goose, Aider, OpenCode, Gemini CLI, Codex CLI, Cline, OpenHands, Open Interpreter, Beads (niche), OpenClaw (early-stage), Hermes Agent (research-adjacent, early-stage).
Source-available tools	Readable source with license terms that are not the same as permissive FOSS.	Commercial restrictions, delayed open licensing, hosted services, or enterprise features.	Crush is a useful example because its license is source-available now with a future MIT conversion.
Commercial hosted offerings	Documentation, APIs, SDKs, extension points, or selected client components.	The hosted service, model weights, ranking systems, indexing, product UX, and billing model.	GitHub Copilot, Claude and Claude Code, Cursor, Windsurf, ChatGPT and OpenAI tooling.

Model access in reference form

Separate products, APIs, platforms, runtimes, and artifacts.

This page just names the categories and examples. The chooser logic lives on model access, and the credential-boundary logic lives on security.

Hosted subscriptions

Product surfaces for humans first: app UX, account features, saved context, and product-specific tools already bundled together.

Examples: ChatGPT, Claude, GitHub Copilot, Gemini, Cursor, Windsurf.

Direct providers

These companies expose their own hosted models through SDKs and APIs. This is the usual choice when you want to build your own tools and agents on top of a stable surface.

Examples: OpenAI, Anthropic, Google, Mistral, Cohere, xAI.

Aggregate providers and routers

These layers normalize access across several providers or models. They are useful for comparison and portability, but they add one more pricing and trust boundary.

Examples: OpenRouter and LiteLLM gateway patterns.

Managed model platforms

These platforms bundle model access with deployment, policy, enterprise identity, and managed operations. They often look like a provider, a router, and a control plane at the same time.

Examples: Azure AI Foundry, Amazon Bedrock, Google Vertex AI.

Local hosting software

This is the runtime side of local AI: desktop apps, CLIs, or servers that load a model and expose an endpoint other tools can call.

Examples: Ollama, LM Studio, LocalAI, llama.cpp servers, vLLM.

Model hubs and local model families

This is the distribution side: model cards, licenses, downloadable files, and release variants. It is adjacent to hosting, but not the same job.

Examples: Hugging Face model pages, publisher repos, instruct and embedding model families.

Deeper splits: model access, managed model platforms, local hosting and model artifacts, and API key security.

Metadata model

Compare domains with the same questions.

The tools change fast, but the comparison questions should stay boring: openness, local or hosted boundary, extension model, control model, memory model, stack fit, and source confidence.

Domain	Openness	Local / hosted boundary	Extension model	Control model	Memory model	Stack fit	Source confidence
Model access paths	Ranges from proprietary hosted services to local open artifacts.	Subscription app, provider API, managed platform, router, local endpoint, or downloaded model.	SDKs, HTTP APIs, platform deployments, local servers, chat surfaces, agent hosts.	API keys, account policy, provider terms, platform governance, local runtime controls.	Usually stateless unless product memory, cache, or local logs are added.	Foundation: model service, endpoint, runtime, artifact.	Provider docs, model cards, license text, and hands-on runtime tests.
Managed model platforms	Usually commercial hosted surfaces with enterprise/cloud coupling.	Managed cloud boundary exposing deployments, model catalogs, and platform-native endpoints.	SDKs, deployment configs, eval surfaces, prompt management, governance features.	Org identity, deployment policy, quota, networking, platform governance.	Often stateless model calls plus platform-managed logs, traces, and deployment state.	Foundation plus governance edge; sometimes deployment/orchestration adjacent.	Commercial docs, product docs, and hands-on platform tests.
Local host agents	Often FOSS or source-available clients.	Local CLI/desktop surface; model access may be hosted.	MCP, tools, skills, hooks, custom commands, plugins.	User approvals, tool permissions, shell/edit prompts.	Session state, repo context, optional persistent memory.	Host, runtime, wrapper, tool consumer.	README/license based; hands-on status varies.
Protocol adapters	Often open implementations around existing systems.	Adapter may run locally or remotely; backing service may differ.	Tool/resource/prompt schemas, transports, auth.	Usually delegated to the host and backing service.	Usually exposes memory rather than owning it.	Protocols and adapters.	Protocol docs plus implementation docs.
Task graph memory	Can be local/FOSS or service-backed.	State may live in local files, databases, or hosted trackers.	CLI, API, MCP package, JSON output.	Task claiming, dependency rules, status transitions.	Durable work items, dependencies, ready-task detection.	Foundation, capability, memory, coordination.	README/license based unless the task graph is run locally.
Workspace coordinators	Often open or source-available when early-stage.	Local workspace manager with optional external agents/services.	Worker roles, hooks, queues, dashboards, launchers.	Claims, watchdogs, merge gates, escalation paths.	Project state, queue state, events, handoff records.	Orchestration, governance, host-adjacent workflow.	README-based snapshot unless hands-on tested.
Persistent assistant platforms	Varies; local-first does not always mean fully open.	Local gateway or apps may call hosted models and services.	Skills, toolsets, channels, routing, companion apps.	Sandboxing, schedules, channel permissions, user approvals.	Long-lived assistant memory, files, schedules, events.	Host, runtime, memory, orchestration, tools.	Fast-moving README-based snapshot.
Commercial hosted assistants	Public docs and APIs; core service is proprietary.	Local/editor surfaces backed by hosted models and product services.	Extensions, APIs, SDKs, editor integrations, MCP support.	Product permissions, org policy, account and billing controls.	Hosted context/indexing, conversation history, product memory.	Host, runtime, model service, governance surface.	Commercial-docs-only unless client code is public.

Coding agents and wrappers

Open GitHub and local-first coding tools

These tools pull model access, context gathering, file editing, shell execution, MCP, permissions, and project memory into one terminal, editor, desktop, or other developer surface.

Tool	What it is	Notable strengths	Stack fit	License / openness
Goose	General-purpose local AI agent with desktop app, CLI, and API.	Runs on-machine, supports many providers, connects to MCP extensions, now under the Agentic AI Foundation.	Host, agent runtime, wrapper, MCP consumer.	Apache 2.0.
Aider	Terminal AI pair programmer for existing or new codebases.	Git-aware editing, repo map, broad model support, automatic lint/test loops, strong terminal workflow.	CLI host, code-editing agent, git workflow wrapper.	Apache 2.0.
OpenCode	Open-source AI coding agent with a terminal UI and desktop app beta.	Provider-agnostic, LSP support, plan/build agents, subagent support, client/server architecture.	CLI host, agent runtime, LSP/MCP-aware coding agent.	MIT.
Gemini CLI	Google's open-source terminal AI agent for Gemini.	Built-in file, shell, web fetch/search tools, MCP support, checkpointing, headless JSON modes, GitHub Action integration.	CLI host, model-specific agent, automation wrapper.	Apache 2.0 client; Gemini service/model access is separate.
Codex CLI	OpenAI coding agent that runs locally in the terminal.	Local coding-agent experience, ChatGPT-plan auth option, IDE and desktop-adjacent ecosystem.	CLI host and coding agent.	Open client repository; OpenAI service/model access is commercial.
Continue CLI (`cn`)	CLI for source-controlled AI checks and agent workflows.	Repo-defined checks as markdown files, CI status checks, suggested diffs on pull requests.	CI agent runner, governance/evaluation wrapper.	Apache 2.0.
Cline	Open-source VS Code coding agent that can edit files, run shell commands, and drive browser tasks with a bring-your-own-model setup.	Growing quickly with developers who want Cursor-style agent behavior without vendor lock-in, plus support for hosted and local model endpoints.	Editor host, coding agent, browser/tool wrapper.	Apache 2.0; privacy and model boundary depend on the endpoint you choose, and the flexibility comes with more setup than Cursor or Windsurf.
OpenHands	AI software-development agent ecosystem with SDK, CLI, local GUI, cloud, and enterprise modes.	Composable software-agent SDK, CLI, GUI/API, and sandboxed development workflows.	Agent framework, CLI host, GUI host, orchestration.	Core MIT; enterprise directory is source-available.
Open Interpreter	Terminal interface that lets LLMs run local code and shell commands.	General-purpose computer control, Python/JavaScript/Shell execution, local model support, approval before code runs.	CLI host, code execution wrapper, local automation agent.	AGPL.
Crush	Terminal coding assistant from Charm.	Multi-model support, session contexts, LSP context, MCP, skills, preliminary hooks, permission controls.	CLI host, coding agent, MCP/skills consumer.	FSL-1.1-MIT. Source-available now; MIT future license.

Frameworks and runtime toolkits

The middle layer people often skip when naming the stack

These are not raw model providers, and they are usually not full end-user products either. They are developer toolkits for the middle of the stack: prompts, tools, retrieval, loops, graphs, state, and evaluation hooks.

LangChain

Best thought of as a framework layer for model calls, prompts, retrieval, tool use, and agent-style application wiring.

Stack fit: framework around protocols, runtime, and retrieval-adjacent patterns.

LangGraph

Best thought of as graph and workflow orchestration in the same ecosystem, especially once stateful or longer-running agent behavior matters.

Stack fit: runtime plus orchestration.

Agent/runtime SDKs

Toolkits such as OpenAI Agents SDK, Pydantic AI, and Microsoft Agent Framework (formerly Semantic Kernel) package agent loops, tool schemas, memory patterns, and application wiring for developers.

Stack fit: mostly runtime, sometimes packaging and protocols.

CrewAI

CrewAI is a Python framework for building role-based teams of agents that hand work to each other toward a shared goal. It is a cleaner mental model than a raw graph when the job really is researcher plus writer plus reviewer.

Adoption signal: real production use, active community, not niche. Honest note: the role-centric abstraction is easier to grasp than LangGraph for team-style workflows, but it gives up some flexibility when you want arbitrary graph shapes and lower-level control.

Observability-adjacent framework tools

Some framework ecosystems grow companion tools for tracing, evals, and debugging. These do not replace the framework layer; they add a governance and observability edge to it.

Stack fit: runtime plus governance edge.

See also: framework placement and agent runtime.

Local runners and inference servers

Local hosting is not one product shape.

Some local tools are great for quick experiments. Others are really API layers for apps, services, and teams that want local inference without rewriting client code.

LocalAI

LocalAI is basically an OpenAI-compatible API server you can run locally while routing requests to backends like llama.cpp, vLLM, Whisper, and image-generation stacks. That makes it a good fit when you want one familiar API surface across several local inference engines.

Adoption signal: solid, with real production use and an active GitHub project. Honest note: it asks more of you than Ollama, so it makes more sense for multi-backend or production-style deployments than for quick local experimentation.

Inference and serving

The engine, the proxy, and the production server are different jobs.

This is the lower part of the stack that a lot of app-level tools sit on top of. A desktop runner, a routing proxy, and a high-throughput inference server can all expose a model endpoint, but they solve pretty different jobs.

llama.cpp

llama.cpp is a C++ transformer inference engine, and it is the part under the hood for Ollama, LM Studio, LocalAI, and a lot of other local-model tooling. If you have run a local model on a CPU or Apple Silicon laptop, there is a good chance you have used llama.cpp indirectly even if you never touched it directly. GGUF is its native model format, which is why that format shows up so often across the local-model ecosystem.

Adoption signal: foundational. Honest note: it is more runtime substrate than friendly end-user product, so many people meet it through wrappers instead of through the raw project.

LiteLLM

LiteLLM is a unified API proxy that lets you keep the OpenAI API shape in your client while routing to Anthropic, Azure, Bedrock, Ollama, and a long list of other providers. In practice, this is the tool for teams that do not want to rewrite their app every time they change model vendors or add a fallback path. It also adds production conveniences like logging, cost tracking, and failover.

Adoption signal: widely used in production for multi-provider routing. Honest note: this is mostly a control-plane and compatibility layer, not the actual inference engine.

Text Generation Inference (TGI)

TGI is Hugging Face's optimized inference server for serving open models at real deployment scale. The important features are things like continuous batching, quantization support, token streaming, and an OpenAI-compatible API, which makes it much more of a production serving stack than a desktop convenience tool. It is not as visible to casual users as Ollama, but it matters once the question becomes throughput and reliability.

Adoption signal: important inside the Hugging Face ecosystem and used by serious deployment teams. Honest note: it makes the most sense when you are already operating models, not when you just want the quickest possible local setup.

Structured output

Prompting for JSON and actually enforcing a shape are not the same thing.

This layer exists because asking nicely for structure is often not enough. Some tools validate after the call. Others constrain generation itself.

instructor

instructor is a Python library that wraps LLM API calls with Pydantic models so you can ask for a typed result instead of babysitting raw JSON in the prompt. You define the schema you want back, and the library handles validation and retries when the model drifts off shape. For a lot of application code, this is the cleanest practical answer to structured output.

Adoption signal: widely used and very practical. Honest note: it improves reliability a lot, but it is still working above the model API rather than constraining token generation at the sampler.

outlines

outlines does structured generation at a lower level by constraining the tokens the model is allowed to emit so the result matches a grammar or schema. That is a stricter guarantee than post-hoc validation, which is why it shows up in research and in production systems that really need hard format guarantees. The tradeoff is that it depends on compatible inference backends, so it is a little more infrastructural than a drop-in API wrapper.

Adoption signal: growing, especially where format guarantees matter. Honest note: more rigorous than instructor, but usually less plug-and-play.

Eval and observability

You usually need both debugging traces and regression checks.

This category is less about model access and more about seeing what the system did, judging whether it was good, and catching breakage before users do. In practice, teams often pair a tracing platform with a prompt-testing tool.

Langfuse

Langfuse is an open-source LLM observability platform with traces, prompt management, evals, datasets, and scoring in one place. It is the tool many teams land on when they want a self-hosted alternative to LangSmith and they care about keeping their own data. The UI and feature set are both mature enough now that it feels like a real operational layer, not just a tracing demo.

Adoption signal: strong and growing, probably the most popular open-source option here. Honest note: if you do not want to run infrastructure, the self-hosted advantage matters less.

LangSmith

LangSmith is LangChain's hosted observability and evaluation platform for tracing LLM calls, agent steps, datasets, and tool use. It is a strong fit if you are already building in the LangChain or LangGraph ecosystem, because the integration path is straightforward and well documented. Outside that ecosystem, the appeal is still real, but it is not quite as obvious a default.

Adoption signal: dominant among LangChain users and widely referenced. Honest note: the ecosystem fit is a feature if you are already in it and a constraint if you are not.

promptfoo

promptfoo is a CLI and CI tool for running prompt tests and model-output evals against expected behavior. It feels a lot like unit testing for prompts: define cases, run them repeatedly, and catch regressions before a prompt tweak or model swap leaks into production. It is less flashy than the tracing platforms, but often more useful for prevention.

Adoption signal: real usage among teams treating prompt quality as an engineering problem. Honest note: it will not debug your whole runtime for you, but it is very good at making prompt changes testable.

Vector and retrieval

The right retrieval store depends on whether you are prototyping or operating.

Vector search is one layer in retrieval, not the whole retrieval system, but it is still a common infrastructure choice. The useful split is usually simple: easiest local prototype, dedicated production database, or an extension inside a database you already run.

ChromaDB

ChromaDB is the low-friction vector store a lot of people start with for local development and prototypes. The API is simple, it can run in-process or as a server, and it plugs into framework ecosystems like LangChain and LlamaIndex without much fuss. It is not usually the first choice once scale and stricter operational guarantees matter, but it is a very common starting point.

Adoption signal: extremely common for prototyping and tutorials. Honest note: easy to start with is not the same thing as best production default.

Qdrant

Qdrant is a production-grade vector database written in Rust, built for fast similarity search plus rich metadata filtering. This is the step up from prototype tooling when you want a dedicated system with stronger performance characteristics, scaling options, and cloud or self-hosted deployment paths. It has become one of the more credible default answers when a team wants a serious vector database rather than just a convenient demo stack.

Adoption signal: strong production adoption and active development. Honest note: it is a real extra system to operate, which matters if your team is trying to keep the stack small.

pgvector

pgvector is a PostgreSQL extension that adds vector similarity search to a database many teams already know how to run. If you are already on Postgres, this is often the most grounded answer because it avoids adding a separate vector database just to support one retrieval feature. It supports indexing approaches like IVFFlat and HNSW, which is why it shows up so often in production stacks that want retrieval without database sprawl.

Adoption signal: widely adopted by Postgres-first teams. Honest note: it is the pragmatic answer surprisingly often, though a dedicated vector database can still win on specialized scale or feature needs.

Memory, coordination, and persistent assistants

Durable work state changes the shape of agent tools.

This category is about making work survive beyond one prompt. These tools help agents remember tasks, claim work, hand off context, recover after restarts, and coordinate across channels or workers.

Task graph memory

This pattern turns work into durable, dependency-aware state that an agent can query and update. Beads is a public example: a distributed graph issue tracker for AI agents with ready-task detection, claiming, JSON output, and MCP integration. It is still a niche open-source project rather than a mainstream default.

Stack fit: executable tool, memory substrate, workflow state. License: MIT.

Workspace coordination

This pattern coordinates multiple agents across projects and leaves a durable trail of claims, handoffs, hooks, watchdogs, and merge queues. Gas Town is a public example of this shape. It is a small open-source project, not a common default.

Stack fit: orchestration, host-adjacent workflow manager, governance. License: MIT.

Local-first assistant gateway

This pattern runs a long-lived assistant across local devices, channels, skills, toolsets, routing, apps, and sandbox options. OpenClaw is a public example according to its README. It looks early-stage rather than widely adopted.

Stack fit: persistent assistant, gateway, host, tools, skills, orchestration. License: MIT.

Persistent agent system

This pattern combines a terminal interface, messaging gateway, skill creation, persistent memory, scheduling, subagents, protocol integration, and multiple execution backends. Hermes Agent is a public example according to its README. It looks research-adjacent and early-stage rather than mainstream.

Stack fit: persistent agent system, CLI host, memory, skills, scheduling. License: MIT.

Commercial and proprietary hosted tools

Hosted products have different boundaries.

These products can be excellent, but they are not the same kind of thing as a FOSS local CLI. The client, service, model, index, and enterprise controls can all sit behind different terms and boundaries.

Tool	What it is	What is open or inspectable	What is commercial/proprietary	Useful differentiator
GitHub Copilot / GitHub Copilot CLI	Commercial developer assistant across GitHub, IDEs, and CLI-style workflows.	Public docs, extension APIs, and some surrounding tooling.	Hosted service, model routing, product UX, billing, and enterprise controls.	Deep GitHub and editor integration, strong fit for teams already living in GitHub.
Claude and Claude Code	Anthropic's hosted model product and coding-agent CLI experience.	Public docs, MCP ecosystem support, and local project configuration patterns.	Model service, subscription/API access, hosted account layer, and most product internals.	Strong conversational coding workflow with local tool use mediated by the client.
Cursor	Commercial AI-first code editor built around codebase context and edits.	Public docs, extension surface inherited from editor ecosystems, and user-visible settings.	Main application, indexing service behavior, hosted AI features, billing, and model access.	Integrated editor experience rather than a separate terminal agent.
Windsurf	AI-first code editor from Codeium, built as a VS Code fork with agentic workflows and codebase-aware multi-file editing.	Public docs, user-facing settings, and the familiar editor-extension surface inherited from the VS Code ecosystem.	Main application, model access through Codeium's cloud, ranking/indexing behavior, billing, and most service internals.	Polished Cursor-style experience with significant adoption momentum, but it still lives on a proprietary hosted boundary.
ChatGPT and OpenAI tooling	Hosted chat, API, coding, and agent-building products from OpenAI.	SDKs, API docs, examples, and some client-side tooling may be open.	Hosted models, product surfaces, server-side orchestration, billing, and data controls.	Broad API ecosystem and hosted product surface, with open components around a proprietary service.

Five questions that make tools easier to compare

Autonomy model

Does it answer once, edit files, run commands, launch subagents, or continue unattended?

Context model

Does it rely on manual file selection, repo maps, LSP, embeddings, memory, MCP resources, or hosted indexing?

Extension model

Can it use MCP servers, skills, hooks, custom commands, plugins, or only built-in tools?

Control model

How does it approve shell commands, edits, network calls, secrets, commits, and unattended execution?

Openness model

Can you inspect, self-host, fork, or replace the important parts, or are the key parts hosted?

Absorption

Why these tools overlap so much

Overlap	What is happening	Example
CLI agent absorbs MCP	The CLI ships built-in tools but also connects to external MCP servers.	Goose, Gemini CLI, OpenCode, Claude Code, and Crush all fit this pattern.
Work tracker becomes memory	Issues stop being just human project management and become agent-readable state.	Beads, a niche open-source project, gives agents dependency-aware durable memory.
Orchestrator becomes operating environment	A coordination tool grows roles, dashboards, queues, watchdogs, hooks, and agent launchers.	Gas Town, a small open-source coordination project, wraps multiple agents and Beads-backed work tracking from a niche memory project.
Personal assistant becomes platform	A local assistant grows chat gateways, skills, memory, scheduling, apps, and sandboxing.	OpenClaw (early-stage) and Hermes Agent (research-adjacent, early-stage) both point in this direction.
Commercial app publishes open edges	A hosted product opens SDKs, APIs, docs, or extension points while the core service remains proprietary.	GitHub Copilot, Claude, Cursor, Windsurf, and OpenAI tooling all need this more precise split.

What has been verified

Sourcing and verification

Current catalog snapshot: 2026-05-08.

Details for Beads and Gas Town, both niche open-source projects, are based on their public GitHub READMEs. Goose details are based on the Goose README, which says the project moved from Block to the Agentic AI Foundation. Aider, OpenCode, Gemini CLI, Codex CLI, Continue, Cline, OpenHands, Open Interpreter, Crush, CrewAI, and LocalAI details are based on their public README/license files. llama.cpp, LiteLLM, Text Generation Inference, instructor, outlines, Langfuse, LangSmith, promptfoo, ChromaDB, Qdrant, and pgvector details are based on their public docs, READMEs, and project sites.

OpenClaw details are based on the public openclaw/openclaw README and license. Hermes Agent details are based on the public NousResearch/hermes-agent README and license. Both are early-stage open-source projects that move quickly, so feature descriptions here should be treated as a snapshot of public docs rather than a hands-on audit.

Commercial tooling notes are intentionally conservative: public docs and client components may be open, but the hosted products, models, account systems, and server-side behavior are not the same thing as a FOSS local tool. Windsurf details here are based on public product docs and site copy rather than an inspectable hosted backend.

README/license based

Used for open repositories where the current description comes from public docs and license files.

Commercial-docs-only

Used when the product can be described from public docs, but the hosted service and internals are not inspectable.

Hands-on tested

Reserved for tools verified through direct hands-on use.

The catalog is updated manually as tools gain or lose relevance in the ecosystem.

Classify four tool domains

Local host agent: mostly a host and agent runtime. It is where the user talks to the system and where tools get called. Goose is one example.

Task graph memory: mostly memory and coordination. It turns work items into structured state that survives the chat. Beads is one example (a niche open-source project).

Persistent assistant platform: combines hosts, channels, memory, skills, scheduling, and tool execution. OpenClaw (early-stage) and Hermes Agent (research-adjacent, early-stage) are examples.

Commercial hosted assistant: local/editor surfaces backed by hosted services. GitHub Copilot, Claude Code, Cursor, and Windsurf are examples, with different openness boundaries than FOSS local agents.

Ready to build

Place the tool before you judge it.

If a tool feels confusing, put it back on the stack before judging it. Then go into the labs and touch the layer it actually belongs to.