AI & Generative Media

RAG

Also known as: Retrieval-Augmented Generation, Retrieval Augmented Generation

Retrieval-Augmented Generation—a technique that enhances LLM responses by first retrieving relevant information from external sources.

RAG improves LLM accuracy by retrieving relevant documents before generating responses, grounding outputs in actual data rather than just training knowledge.

How It Works

  1. Index: Embed documents into vector database
  2. Retrieve: Find relevant chunks for user query
  3. Augment: Add retrieved context to prompt
  4. Generate: LLM answers using provided context

Why Use RAG

ProblemRAG Solution
Knowledge cutoffAccess current information
HallucinationGround in real documents
Domain knowledgeInclude proprietary data
TransparencyCite sources

Components

  • Embedding model: Converts text to vectors
  • Vector database: Stores and searches embeddings
  • Retriever: Finds relevant chunks
  • LLM: Generates final response

Considerations

  • Chunk size and overlap strategy
  • Retrieval quality determines output quality
  • Balancing context length vs. relevance
  • Handling conflicting information in sources

External Resources