Lab 15

Lab 15: RAG Pipeline

Build a minimal retrieval-augmented generation pipeline in plain Python: index a tiny corpus, retrieve the most relevant documents with TF-IDF, assemble an augmented prompt, and generate a grounded answer.

What you'll build

A tiny RAG loop you can actually read end to end.

This lab keeps the moving parts small on purpose. There is no embedding server, no vector database, and no framework hiding the steps. You can see the pipeline shape directly: documents go in, an index gets built, a query retrieves the top matches, and those matches get stuffed into a prompt.

That makes it easier to understand what RAG really is before the infrastructure gets heavier. The retrieval here is simple, but the control flow is the same shape you would keep in a more serious system.

Run it

cd ai_ecosystem_labs
python3 15-rag-pipeline/rag_pipeline.py "Who created Python?"
Starting here? Quick setup
git clone https://github.com/BanditF/ai_ecosystem_labs
cd ai_ecosystem_labs
python3 15-rag-pipeline/rag_pipeline.py "Who created Python?"

No dependencies needed. Python 3 is enough.

Time guide. Setup: ~2 min. Working through it: 20–35 min because retrieval, prompt assembly, and grounding each add one more layer.

Walk through it

Five small pieces make the whole pipeline.

1. Document corpus

DOCUMENTS is the toy knowledge base. It mixes Python facts with a little noise from JavaScript and Rust so retrieval has something to rank instead of always returning the obvious answer.

2. Tokenization and index building

tokenize() lowercases and splits text into word tokens. build_index() then computes term-frequency values for each document so later scoring can reward words that are both present and relatively distinctive.

3. IDF scoring and retrieval

idf(), score(), and retrieve() make up the retrieval layer. This is not semantic search. It is lexical scoring. That is useful here because you can see exactly why some documents rank above others.

4. Augmented prompt assembly

build_augmented_prompt() takes the top retrieved chunks and turns them into a prompt with explicit source IDs. That is the key RAG move: add external context at runtime instead of retraining the model.

5. Mock generation

mock_generate() stands in for a real model call. It is intentionally simple, but it makes the pipeline runnable without keys or dependencies and keeps the focus on retrieval plus grounding.

The code

rag_pipeline.py

Expected output

What the default run looks like.

Query: Who created Python?

Indexed 6 documents

Top retrieved chunks:
  [doc3] score=0.1407  The Python Package Index (PyPI) hosts thousands of third-party modules. pip is t...
  [doc1] score=0.138  Python is a high-level programming language known for its clear syntax and reada...
  [doc2] score=0.1114  Python supports multiple programming paradigms including procedural, object-orie...

Augmented prompt (529 chars):
────────────────────────────────────────
Answer the question using only the provided context. Cite the source ID.

Context:
[doc3] The Python Package Index (PyPI) hosts thousands of third-party modules. pip is the standard package manager for Python.

[doc1] Python is a high-level programming language known for its clear syntax and readability. It was created by Guido van Rossum and first released in 1991.

[doc2] Python supports multiple programming paradigms including procedural, object-oriented, and functional programming.

Question: Who created Python?
Answer:
────────────────────────────────────────

Generated answer:
Python was created by Guido van Rossum and first released in 1991. [doc1]

A small but useful detail: lexical retrieval is not perfect. doc3 ranks slightly above doc1 because of term overlap, even though doc1 contains the fact you actually need.

Try this

Three ways to push on the retrieval step.

  1. Run with a different query. Try python3 15-rag-pipeline/rag_pipeline.py "What is pip?" and compare which docs rise to the top.
  2. Add two new documents to DOCUMENTS. Re-run the script and see whether the new text gets retrieved for relevant questions.
  3. Change top_k from 3 to 1. Observe how the prompt gets shorter and whether answer quality gets more brittle.

Concepts behind this

Read RAG for the broader idea, especially the distinction between retrieval quality and answer quality.

Then continue to Lab 16, because once retrieval is in the loop you need a repeatable way to check whether grounding actually improved the result.