RAG
Also known as: Retrieval-Augmented Generation, Retrieval Augmented Generation
Retrieval-Augmented Generation—a technique that enhances LLM responses by first retrieving relevant information from external sources.
RAG improves LLM accuracy by retrieving relevant documents before generating responses, grounding outputs in actual data rather than just training knowledge.
How It Works
- Index: Embed documents into vector database
- Retrieve: Find relevant chunks for user query
- Augment: Add retrieved context to prompt
- Generate: LLM answers using provided context
Why Use RAG
| Problem | RAG Solution |
|---|---|
| Knowledge cutoff | Access current information |
| Hallucination | Ground in real documents |
| Domain knowledge | Include proprietary data |
| Transparency | Cite sources |
Components
- Embedding model: Converts text to vectors
- Vector database: Stores and searches embeddings
- Retriever: Finds relevant chunks
- LLM: Generates final response
Considerations
- Chunk size and overlap strategy
- Retrieval quality determines output quality
- Balancing context length vs. relevance
- Handling conflicting information in sources