Day 6: RAG Architecture - Retrieval, Chunking, Embeddings
Learning Objectives
- - Compare chunking strategies (fixed, semantic, hierarchical) and when to use each
- - Select embedding models (Titan Embeddings v2, Cohere Embed v3, Nova Multimodal)
- - Understand Retrieve vs RetrieveAndGenerate APIs
- - Implement query expansion, decomposition, and reranking
- - Use MCP for consistent vector query access patterns
Tasks
Tasks
0/5 completed- Read45m
Bedrock Knowledge Bases - How It Works
The authoritative reference for chunking, embedding, retrieval, and reranking.
- Blog25m
Advanced Parsing, Chunking, and Query Reformulation in KB
Deep dive into advanced chunking and query reformulation options.
- Read20m
Bedrock Reranker Models Documentation
How reranker models rescore retrieved chunks for better relevance.
- Blog15m
Amazon Nova Multimodal Embeddings
Crossmodal search: text + image + video + audio embeddings in one model.
- Hands-on120m
Bedrock RAG Workshop - Chunking and Embedding Notebooks
Work through different chunking strategies and embedding model selections.
Exam Skills
Write your understanding, then reveal the reference answer.
Hands-On Lab
Build real muscle memory with these activities.
Build an End-to-End RAG Pipeline with Knowledge Bases
Create a complete RAG pipeline from document upload to retrieval-augmented generation.
- 1 Create an S3 bucket and upload 5-10 PDF documents on a specific topic
- 2 Create a Bedrock Knowledge Base with OpenSearch Serverless, Titan Embeddings v2, and fixed-size chunking (300 tokens, 20% overlap)
- 3 Sync the data source and wait for indexing to complete
- 4 Test retrieval with the Retrieve API — try different queries and check chunk relevance
- 5 Switch to RetrieveAndGenerate API and compare the augmented response quality
Experiment with Chunking Strategies
Compare fixed-size, semantic, and hierarchical chunking on the same document set.
- 1 Using your existing KB, note the retrieval quality with fixed-size chunking
- 2 Create a second KB with the same data source but semantic chunking enabled
- 3 Create a third KB with hierarchical chunking (parent: 1500 tokens, child: 300 tokens)
- 4 Run the same 3 test queries against all 3 KBs and compare chunk relevance scores
- 5 Document which strategy works best for your document type and why
Scenarios
Think through each scenario before revealing the answer.
Multilingual RAG Pipeline
- •Which embedding model supports 100+ languages?
- •How do you detect the user's query language?
- •Should you use metadata filtering for language?
Legal RAG Chunking Fix
- •Why is fixed-size chunking bad for legal documents?
- •Which chunking strategy respects document structure?
- •How do reranker models help after retrieval?
Practice Questions
15 questions across 3 difficulty levels.
Further Reading
Go deeper into today's topics.
Advanced Parsing, Chunking, and Query Reformulation in KB
Deep dive into advanced chunking and query reformulation options for Bedrock Knowledge Bases.
End-to-End RAG with CDK
IaC deployment of full RAG stack (IAM, OpenSearch, KB) using AWS CDK.
Advanced RAG with Terraform: Chunking, Hybrid Search, Reranking
IaC deployment of advanced RAG: 4 chunking strategies, hybrid search, Cohere reranking.
re:Invent 2024 — Build Scalable RAG with Bedrock KB (AIM305)
Advanced techniques for improving RAG accuracy and cost optimization with Bedrock Knowledge Bases.
Implementing RAG with Bedrock and Lambda
Serverless RAG pattern: Lambda invokes KB retrieval, augments prompt, calls FM — production-ready architecture.