Learning Objectives
- Explain why hallucination makes ungrounded LLM use unacceptable in finance and why retrieval-augmented generation is the core architectural response
- Design a financial RAG pipeline from document ingestion through retrieval and grounded generation, including structure-aware parsing, chunking, metadata, embeddings, and citation support
- Compare generic and domain-specific embedding models and evaluate retrieval quality on a target corpus using practical retrieval metrics and latency trade-offs
- Build a retrieval stack that combines semantic search, lexical search, metadata filtering, and re-ranking to improve precision and recall on financial documents
- Use constraint-based prompting, citation checks, and tool-verified computation to make generated answers more faithful, auditable, and numerically reliable
- Diagnose RAG failures by separating retrieval, context, synthesis, computation, and abstention errors, and apply targeted evaluation methods to improve each component
- Distinguish when to use RAG versus fine-tuning for financial applications, and explain how RAG functions as one tool within broader agentic workflows
Introduction: The Generative Leap Beyond Feature Extraction
This section motivates the transition from discriminative text classifiers (Chapter 10) to generative language models capable of multi-step synthesis, open-ended question answering, and narrative summarization over financial documents. It identifies hallucination as the fundamental obstacle: LLMs produce confident but factually wrong outputs, and in finance a single fabricated revenue figure or misattributed quote can lead to compliance breaches or flawed investment decisions. The architectural solution, Retrieval-Augmented Generation, is introduced as a constraint framework that grounds generation in verifiable evidence rather than relying on post-hoc fact-checking.
The Solution: Grounding LLMs with Retrieval-Augmented Generation
The section formalizes the three-stage RAG pipeline: offline indexing of trusted documents into vector embeddings, online retrieval of semantically similar chunks given a query, and constrained generation where the LLM synthesizes answers only from retrieved context. It introduces LlamaIndex and LangChain as orchestration frameworks and argues that naive implementations with fixed-size chunks and generic embeddings fail on complex financial documents. The section establishes that production systems require advanced techniques at every stage, setting up the remaining sections as a bottom-up construction of a reliable financial RAG stack.
Intelligent Document Ingestion
This section demonstrates that RAG quality is capped by ingestion quality, showing how naive fixed-size chunking destroys the semantic structure of financial documents by fragmenting tables, separating headers from content, and splitting risk factors across chunks. It presents structure-aware parsing tools (LlamaParse, Docling, Marker) that preserve section boundaries, table layouts, and hierarchical relationships, alongside the Parent Document Retrieval pattern that uses small precise chunks for retrieval but passes larger parent sections to the LLM. Temporal metadata tagging (fiscal year, filing date, period end) is treated as essential for preventing stale information retrieval and enabling verifiable citations.
1 notebook
Domain-Specific Embeddings
The section argues that general-purpose embedding models fail on financial vocabulary and causal chains, citing FinMTEB benchmark evidence showing consistent gaps on tasks involving financial jargon, cross-document reasoning, and temporal references. It surveys domain-adapted options including Voyage AI's finance-specific embeddings and open-source alternatives like Fin-E5 and BGE-Finance, along with practical considerations like Matryoshka embeddings for dimension reduction and binary quantization for storage efficiency. The key takeaway is that public leaderboards generate a shortlist but practitioners must validate empirically on their own corpus using a protocol of representative queries with ground-truth relevance labels.
1 notebook
Hybrid Retrieval and Vector Databases
This section addresses the blind spot of pure semantic search on exact matches like tickers, regulatory codes, and specific figures, presenting hybrid search that combines vector similarity with BM25 keyword retrieval fused through Reciprocal Rank Fusion. It covers query enhancement techniques including query expansion, Hypothetical Document Embeddings (HyDE), and query decomposition for multi-hop questions. The discussion of vector database selection emphasizes metadata-filtered pre-search as critical for financial RAG, where queries routinely require restricting results by company, fiscal year, and document type to prevent retrieval of stale or irrelevant information.
1 notebook
Re-ranking and Constraint-Based Prompting
The section covers two final defenses against hallucination: cross-encoder re-ranking that refines retrieval precision by scoring query-document pairs for fine-grained relevance, and constraint-based prompting that enforces grounding through explicit role assignment, context constraints, citation instructions, and structured output requirements. It addresses the critical caveat that LLM-generated citations are not automatically faithful and proposes lightweight verification through semantic similarity checks between claims and cited passages. The retrieve-extract-compute-narrate pattern for numeric questions delegates arithmetic to deterministic tools, addressing the distinct failure mode where evidence is correct but derived calculations are wrong.
1 notebook
Diagnosing RAG Pipeline Bottlenecks
This section provides a systematic diagnostic framework for RAG failures, distinguishing retrieval failures (correct information exists but is not retrieved), context failures (too much noise in retrieved chunks), synthesis failures (LLM hallucinates despite correct context), and computation failures (correct evidence but wrong arithmetic). It introduces RAGAs metrics for automated evaluation and RAGChecker for claim-level entailment checking, then outlines a practitioner protocol for building evaluation sets, computing baseline metrics, isolating failure modes, and applying targeted fixes. The section also covers adversarial robustness testing against prompt injection and retrieval poisoning, and corrective RAG systems that automatically re-query when initial results are insufficient.
2 notebooks
From Theory to Practice: Applications and Strategic Choices
The section applies the full RAG architecture to three financial use cases: a 10-K due diligence assistant for multi-hop questions across filing sections, ESG analysis comparing RAG-based due diligence with fine-tuned classifiers, and structured data extraction from 13F institutional holdings filings. It frames the strategic trade-off between fine-tuned classifiers (scalable numeric scores for systematic strategies) and RAG systems (cited narrative answers for fundamental research), arguing that sophisticated firms employ both. Production lifecycle concerns including incremental indexing, corpus versioning, caching, and regulatory compliance in the context of FINRA oversight and the EU AI Act are treated as table stakes rather than afterthoughts.
2 notebooks
The Next Frontier: Introduction to Agentic Frameworks
This section bridges from passive RAG question-answering to autonomous agents that plan, use tools, and achieve goals, introducing the ReAct paradigm where an LLM controller iterates through thought, action, and observation cycles. RAG becomes one tool among many in an agent's toolkit alongside web search, code interpreters, database queries, and financial APIs, enabling multi-step workflows like computing relative valuation metrics that require document retrieval, database queries, and calculations in sequence. Industry adoption evidence from Man Group and multi-agent research architectures signals a trajectory from features to synthesis to action, setting up Chapter 24's full agent implementation.