Exam Scenarios

19 real-world scenarios. Think first, then reveal.

All (19) D1: FM Integration (8)D2: Implementation (5)D3: Safety & Security (4)D4: Optimization (1)D5: Testing (1)

D1: FM IntegrationMedium

Fintech AI Chatbot PoC Design

A fintech startup wants to add an AI chatbot for customer support. They have sensitive financial data, need low latency, and want to start with a PoC before going to production. How do you design this?

Think First

•Which managed service eliminates infrastructure management for FM access?
•How do you keep sensitive data traffic private?
•What safety measure protects financial PII?

D1: FM IntegrationHard

#1B

Multilingual RAG Pipeline

A multinational company needs their RAG system to handle queries in English, German, and Japanese. Documents are in all three languages. The system must return results in the user's language. Design the retrieval pipeline.

Think First

•Which embedding model supports 100+ languages?
•How do you detect the user's query language?
•Should you use metadata filtering for language?

D1: FM IntegrationMedium

Cross-Region Model Resilience

Your production GenAI app serves users in US and Europe. The primary model (Claude) occasionally hits rate limits during peak hours. How do you ensure availability?

Think First

•What Bedrock feature handles cross-region failover automatically?
•How can Step Functions implement a circuit breaker pattern?
•What tool enables model switching without code changes?

D1: FM IntegrationHard

Healthcare Model Fine-Tuning

A healthcare company has domain-specific medical terminology that general FMs handle poorly. They need to fine-tune a model on their proprietary medical corpus. Which path?

Think First

•What fine-tuning method minimizes compute cost while adapting to domain language?
•Where do you store and version the fine-tuned model?
•What compliance requirements does healthcare impose on deployment?

D1: FM IntegrationHard

Insurance Document Processing Pipeline

An insurance company receives claims as scanned PDFs containing handwritten notes, printed text, and attached photos of vehicle damage. They want to extract all information and generate a structured summary. Design the pipeline.

Think First

•Which service handles OCR for both printed and handwritten text?
•Which service assesses damage from photos?
•How do you extract named entities (names, dates, policy numbers)?
•Where does Bedrock fit in this pipeline?

D1: FM IntegrationMedium

E-Commerce Vector Store Selection

An e-commerce company wants semantic search across 2 million product descriptions AND exact-match filtering by product SKU, price range, and category. Users also need analytics on what they're searching for. Which vector store?

Think First

•Which vector store supports hybrid search (BM25 + k-NN)?
•Which has built-in analytics capabilities?
•Can the Bedrock managed store handle this level of customization?

D1: FM IntegrationHard

Legal RAG Chunking Fix

A legal firm's RAG system retrieves contract clauses, but users complain that retrieved chunks often split mid-clause, producing incoherent context. The chunking uses 512-token fixed-size. How do you fix this?

Think First

•Why is fixed-size chunking bad for legal documents?
•Which chunking strategy respects document structure?
•How do reranker models help after retrieval?

D1: FM IntegrationMedium

Centralized Prompt Management

A company has 15 different GenAI applications, each with their own prompts stored in application code. Prompt changes require code deployments. The compliance team wants audit trails for all prompt changes. How do you centralize this?

Think First

•Which Bedrock feature centralizes prompt templates?
•How do you version and track changes?
•Which AWS service provides audit trails for API calls?

D2: ImplementationHard

DevOps Monitoring Agent Architecture

A DevOps team wants to build an agent that monitors their Kubernetes cluster, reads alerts from SigNoz, queries Prometheus, and suggests remediation actions. The agent should use their existing Python monitoring scripts. Which architecture?

Think First

•Does this need a managed agent (Bedrock Agents) or custom framework?
•How do you expose existing Python scripts as agent tools?
•How does the agent learn from past incident remediations?

D2: ImplementationMedium

#8B

Financial Analysis Agent with Code Interpreter

A financial analyst needs an agent that can download CSV files from internal APIs, perform statistical analysis, and generate visualizations. The code execution must be sandboxed. Which architecture?

Think First

•Which AgentCore service provides sandboxed code execution?
•How do you securely access internal APIs?
•What libraries are available in the sandbox?

D2: ImplementationHard

Model Cascading Cost Optimization

An enterprise has 10,000 daily FM API calls. 70% are simple FAQ lookups, 20% are moderate analysis, 10% are complex reasoning tasks. Current cost is $X/month using Claude Sonnet for everything. How do you optimize?

Think First

•What is the model cascading pattern?
•Which model handles simple queries cheaply?
•How do you add caching for repeated queries?
•What Bedrock feature automates model routing?

D2: ImplementationHard

#9B

MCP Server for CRM Integration

A company wants to build an MCP server that exposes their CRM data to multiple AI agents built with different frameworks (Strands, LangGraph). Some tools need persistent database connections. How do you implement this?

Think First

•What is the difference between stateless and stateful MCP servers?
•Which compute service is best for stateless vs stateful?
•How do you make MCP servers available to agents?

D2: ImplementationMedium

#10

CI/CD Pipeline for Prompt Quality

A development team wants to automatically test prompt quality before deploying prompt changes to production. They use CodePipeline for CI/CD. Design the pipeline.

Think First

•Where are prompt templates stored as code?
•Which service runs automated evaluations against test datasets?
•What happens if evaluation scores are below threshold?

D3: Safety & SecurityMedium

#11

PII Leak Remediation

A customer service chatbot accidentally reveals another customer's email address in its response. The compliance team demands immediate remediation. What do you implement?

Think First

•Which Guardrails filter handles PII?
•Should you BLOCK or ANONYMIZE emails?
•What about other PII types?
•Can this filter work with non-Bedrock models?

D3: Safety & SecurityHard

#11B

Insurance Rate Verification with Automated Reasoning

An insurance company needs to verify that policy quotes generated by their FM are mathematically correct and comply with their rate tables. Contextual grounding alone is insufficient because the FM needs to do calculations. What do you implement?

Think First

•What is the difference between Contextual Grounding and Automated Reasoning?
•Can Contextual Grounding verify mathematical correctness?
•How does Automated Reasoning extract formal logic rules?

D3: Safety & SecurityHard

#12

Financial Institution Security Architecture

A regulated financial institution wants to use Bedrock but requires: (a) no data traversing public internet, (b) audit trail for every model invocation, (c) encryption at rest with customer-managed keys, (d) only specific teams can use specific models. Design the security architecture.

Think First

•Which networking feature keeps traffic off the public internet?
•Which services provide API-level and prompt-level audit trails?
•How do you use KMS with Bedrock?
•What IAM condition key restricts model access per team?

D3: Safety & SecurityHard

#13

HR Resume Screening Bias Mitigation

An HR department deploys a resume screening agent. The legal team raises concerns about demographic bias. How do you address this?

Think First

•Which service detects bias in model outputs?
•How do you document the model's intended use and limitations?
•How do you prevent the agent from making final hiring decisions?

D4: OptimizationHard

#14

GenAI App Cost Reduction

A startup's GenAI app costs $12,000/month on Bedrock. Analysis shows 60% of queries are near-duplicates (users asking the same thing in different words). How do you cut costs?

Think First

•Which caching method handles 'similar but not identical' queries?
•How does semantic caching work technically?
•What similarity threshold is appropriate?
•What additional optimization can you add for remaining queries?

D5: TestingHard

#15

RAG Hallucination Diagnosis

After a prompt update, users report that the RAG chatbot is 'making things up' -- responses contain information not in the knowledge base. How do you diagnose and fix this?

Think First

•What Guardrails feature detects ungrounded responses?
•How do you trace the full request path to find the failure point?
•What might be wrong with the retrieval vs the prompt?