AI LLMs

Retrieval-Augmented Generation (RAG)

Definition

Retrieval-augmented generation is a technique that combines a language model with an external knowledge retrieval system to produce more accurate, up-to-date responses. Instead of relying solely on trained parameters, RAG fetches relevant documents from a vector database before generating an answer. This approach reduces hallucinations and enables models to reference proprietary or recent data.

How It Works

When a user submits a query, an embedding model converts it into a vector that is compared against a database of pre-indexed document chunks using similarity search. The most relevant chunks are inserted into the LLM's context window alongside the original query. The model then generates a response grounded in the retrieved evidence, often citing specific sources.