The part you use all the time without usually seeing.
A lot of what feels like one AI product is really a stack of quieter components underneath it. Copilot, Claude, Cursor, Ollama, and similar tools are all sitting on top of inference engines, model formats, serving layers, structured output helpers, observability systems, and retrieval plumbing.
You do not need to become an infrastructure engineer to use the tools above this layer. But understanding this layer helps you make better decisions: what is local versus remote, what scales well, what is mature versus still growing, and where the real tradeoffs actually live.
Plain English first: this is the machinery. Technical version: model execution, API mediation, artifact formats, constrained generation, tracing and evals, and vector retrieval.
A useful framing
When a higher-level tool feels magical, the next question is usually not magic at all. It is something like: what is serving the model, what format is it in, where are calls being routed, and how is output being validated or measured?