What is RAG?
Retrieval-Augmented Generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model in external sources of knowledge. In simple terms, it gives the AI a "library" to consult before it answers a question, ensuring the information is accurate, relevant, and up-to-date.
Core Architecture
Our RAG agent follows a three-phase architecture: Ingestion, Retrieval, and Generation. This diagram illustrates the high-level workflow.
(Processing Docs)
(Finding Info)
(Answering)
Phase 1: Ingestion - Building the Knowledge Base
When a PDF is uploaded, the system first extracts its text content. Because LLMs have a limited context window, this text is broken down into smaller, manageable "chunks" of a few hundred words each. This ensures that the retrieved information is focused and concise.
2. Embedding (Vectorization)
This is the most critical step. Each text chunk is fed into an "embedding model," which converts the text into a numerical vector (a long list of numbers). These vectors represent the semantic meaning of the text. Chunks with similar meanings will have mathematically similar vectors. This is how the AI can understand relationships between concepts.
Phase 2: Retrieval - Finding the Right Information
When you ask a question, the agent must efficiently search its knowledge base for relevant chunks. Our agent uses a powerful hybrid approach.
1. Vector Search
Your question is also converted into a vector. The system then performs a "vector search" to find the text chunks whose vectors are closest to the question's vector. This is excellent for finding conceptually related information, even if the keywords don't match exactly.
2. Keyword Search
Simultaneously, a traditional keyword search is performed to find chunks that contain the exact words from your query. This is useful for finding specific names, acronyms, or terms.
Phase 3: Ranking & Generation - Crafting the Answer
1. Re-Ranking
The results from both vector and keyword searches are combined. A re-ranker model then assesses this combined list to select the absolute best and most relevant chunks of information, discarding anything redundant or less useful.
2. Augmented Prompting & Generation
Finally, the top-ranked chunks are assembled into a "prompt" along with your original question. This augmented prompt is sent to the main generator LLM. The prompt essentially says: "Using the following information [...], please answer this question: '...'". The LLM then synthesizes this information to generate a fluent, accurate, and contextually-aware answer.
Advanced: Agentic RAG - The Thinking Retriever
Standard RAG follows a fixed pipeline. Agentic RAG introduces an LLM-powered "agent" that acts as a coordinator, making intelligent decisions throughout the retrieval process. It's the difference between a librarian who finds books based on your exact query and one who understands your intent, asks clarifying questions, and consults multiple sources to give you the best possible answer.
Key Capabilities of an Agent:
- Query Analysis: The agent first analyzes the user's query. Is it simple? Is it complex? Does it require information from multiple documents?
- Strategic Retrieval: Based on the analysis, the agent decides the best way to retrieve information. It might break a complex question into several sub-questions, search for each one, and then synthesize the results.
- Self-Correction: If the initial retrieval results are poor, the agent can recognize this, refine its search query, and try again. For example, if a search for "AI impact on finance" yields nothing, it might try "machine learning in banking" or "algorithmic trading".
- Tool Use: An agent can be given access to multiple "tools," such as the vector store, a keyword search engine, or even external APIs (like a web search). It intelligently chooses the right tool for the job.
Advanced: Retrieval Strategies
The quality of retrieval is the backbone of RAG. Modern systems use sophisticated techniques to improve it.
Hybrid Search
Instead of just using vector search or keyword search, hybrid search combines the strengths of both. It calculates a score for each method and then uses a fusion algorithm (like Reciprocal Rank Fusion) to produce a single, superior ranking. This ensures both semantic relevance and keyword precision.
Parent Document Retriever
A common challenge is that small chunks, while precise, may lack broader context. The Parent Document Retriever strategy addresses this:
1. Search is performed over small, precise chunks.
2. Once the best chunks are identified, the system retrieves the larger "parent documents" they originally came from.
3. The LLM is then given these full parent documents for generation, providing it with complete context to formulate a better answer.
Advanced: Query Transformation
Sometimes, the user's original query isn't the best one for searching. Query Transformation techniques use an LLM to rewrite the query for better retrieval.
- Multi-Query: The LLM generates several different versions of the user's query from different perspectives. All these queries are run in parallel, and the results are combined.
- HyDE (Hypothetical Document Embeddings): The LLM generates a hypothetical, ideal answer to the user's question. This hypothetical answer is then converted to a vector and used for the search. This often works better than using the vector of the short, ambiguous question itself.
Future Trends in RAG
RAG is a rapidly evolving field. Here are some of the exciting frontiers:
- Self-Correcting & Adaptive RAG: Systems that learn and adapt their retrieval and generation strategies based on feedback and the type of query. The agent essentially fine-tunes its own processes over time.
- Multi-Modal RAG: The ability to retrieve and reason over not just text, but also images, audio, and video. Imagine asking a question about a chart in a document, and the AI can "see" and interpret it.
- Graph RAG: Using knowledge graphs as the retrieval source. This allows the agent to understand and traverse complex relationships between entities, leading to more sophisticated reasoning and answers.
Essential Reading List
To dive deeper into the concepts that power modern RAG, here are some of the most influential papers in the field.
Core RAG & Foundational Concepts
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020): The original paper that introduced the RAG framework. A must-read to understand the core idea.
- Dense Passage Retrieval for Open-Domain Question Answering (DPR, 2020): While not strictly a RAG paper, it established the dominance of dense retrieval models, which became a cornerstone for most RAG systems.
Advanced & Agentic RAG
- Self-RAG: Learning to Retrieve, Generate, and Critique (2023): A pioneering work in agentic RAG. It introduces a model that can decide *when* to retrieve and *whether* the retrieved information is useful, significantly improving factuality.
- Corrective Retrieval Augmented Generation (CRAG, 2024): Introduces a retrieval evaluator to assess the quality of retrieved documents and triggers self-correction actions, including web search, if needed.
- Adaptive-RAG (2024): A framework that learns to adaptively choose the best retrieval strategy (e.g., simple retrieval vs. complex iterative search) based on the complexity of the user's question.
Surveys & Overviews
- Retrieval-Augmented Generation for Large Language Models: A Survey (2023): An excellent, comprehensive overview of the different paradigms of RAG (Naive, Advanced, Modular) and the techniques involved.
Loaded Documents
The following documents are currently loaded into the agent's knowledge base. You can upload more on the Demo page.