What is RAG?

Retrieval-Augmented Generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model in external sources of knowledge. In simple terms, it gives the AI a "library" to consult before it answers a question, ensuring the information is accurate, relevant, and up-to-date.

Core Architecture

Our RAG agent follows a three-phase architecture: Ingestion, Retrieval, and Generation. This diagram illustrates the high-level workflow.

1. Ingestion
(Processing Docs)
2. Retrieval
(Finding Info)
3. Generation
(Answering)

Phase 1: Ingestion - Building the Knowledge Base

When a PDF is uploaded, the system first extracts its text content. Because LLMs have a limited context window, this text is broken down into smaller, manageable "chunks" of a few hundred words each. This ensures that the retrieved information is focused and concise.

2. Embedding (Vectorization)

This is the most critical step. Each text chunk is fed into an "embedding model," which converts the text into a numerical vector (a long list of numbers). These vectors represent the semantic meaning of the text. Chunks with similar meanings will have mathematically similar vectors. This is how the AI can understand relationships between concepts.

Phase 2: Retrieval - Finding the Right Information

When you ask a question, the agent must efficiently search its knowledge base for relevant chunks. Our agent uses a powerful hybrid approach.

1. Vector Search

Your question is also converted into a vector. The system then performs a "vector search" to find the text chunks whose vectors are closest to the question's vector. This is excellent for finding conceptually related information, even if the keywords don't match exactly.

2. Keyword Search

Simultaneously, a traditional keyword search is performed to find chunks that contain the exact words from your query. This is useful for finding specific names, acronyms, or terms.

Phase 3: Ranking & Generation - Crafting the Answer

1. Re-Ranking

The results from both vector and keyword searches are combined. A re-ranker model then assesses this combined list to select the absolute best and most relevant chunks of information, discarding anything redundant or less useful.

2. Augmented Prompting & Generation

Finally, the top-ranked chunks are assembled into a "prompt" along with your original question. This augmented prompt is sent to the main generator LLM. The prompt essentially says: "Using the following information [...], please answer this question: '...'". The LLM then synthesizes this information to generate a fluent, accurate, and contextually-aware answer.

Advanced: Agentic RAG - The Thinking Retriever

Standard RAG follows a fixed pipeline. Agentic RAG introduces an LLM-powered "agent" that acts as a coordinator, making intelligent decisions throughout the retrieval process. It's the difference between a librarian who finds books based on your exact query and one who understands your intent, asks clarifying questions, and consults multiple sources to give you the best possible answer.

Key Capabilities of an Agent:

  • Query Analysis: The agent first analyzes the user's query. Is it simple? Is it complex? Does it require information from multiple documents?
  • Strategic Retrieval: Based on the analysis, the agent decides the best way to retrieve information. It might break a complex question into several sub-questions, search for each one, and then synthesize the results.
  • Self-Correction: If the initial retrieval results are poor, the agent can recognize this, refine its search query, and try again. For example, if a search for "AI impact on finance" yields nothing, it might try "machine learning in banking" or "algorithmic trading".
  • Tool Use: An agent can be given access to multiple "tools," such as the vector store, a keyword search engine, or even external APIs (like a web search). It intelligently chooses the right tool for the job.

Advanced: Retrieval Strategies

The quality of retrieval is the backbone of RAG. Modern systems use sophisticated techniques to improve it.

Hybrid Search

Instead of just using vector search or keyword search, hybrid search combines the strengths of both. It calculates a score for each method and then uses a fusion algorithm (like Reciprocal Rank Fusion) to produce a single, superior ranking. This ensures both semantic relevance and keyword precision.

Parent Document Retriever

A common challenge is that small chunks, while precise, may lack broader context. The Parent Document Retriever strategy addresses this:
1. Search is performed over small, precise chunks.
2. Once the best chunks are identified, the system retrieves the larger "parent documents" they originally came from.
3. The LLM is then given these full parent documents for generation, providing it with complete context to formulate a better answer.

Advanced: Query Transformation

Sometimes, the user's original query isn't the best one for searching. Query Transformation techniques use an LLM to rewrite the query for better retrieval.

  • Multi-Query: The LLM generates several different versions of the user's query from different perspectives. All these queries are run in parallel, and the results are combined.
  • HyDE (Hypothetical Document Embeddings): The LLM generates a hypothetical, ideal answer to the user's question. This hypothetical answer is then converted to a vector and used for the search. This often works better than using the vector of the short, ambiguous question itself.

Essential Reading List

To dive deeper into the concepts that power modern RAG, here are some of the most influential papers in the field.

Core RAG & Foundational Concepts

Advanced & Agentic RAG

Surveys & Overviews

Loaded Documents

The following documents are currently loaded into the agent's knowledge base. You can upload more on the Demo page.

No documents have been loaded yet.