Day 15: AI Safety - Guardrails Deep Dive
Learning Objectives
- - Know all 8 Guardrails filter types and their configuration options
- - Distinguish BLOCK vs ANONYMIZE for PII handling
- - Distinguish Contextual Grounding (RAG faithfulness) from Automated Reasoning (logical correctness)
- - Understand Standard tier PII detection in code elements
- - Design defense-in-depth architectures with multiple safety layers
Tasks
Tasks
0/5 completed- Read45m
Bedrock Guardrails Components
Every filter type, configuration options, and behavior. The authoritative reference.
- Read15m
Amazon Bedrock Guardrails Product Page
High-level overview and capabilities summary.
- Blog25m
Automated Reasoning - 99% Verification Accuracy
Formal logic verification for mathematical correctness. Key exam distinction from Contextual Grounding.
- Blog20m
Safeguard GenAI from Prompt Injections
Prompt injection defense strategies and Guardrails Prompt Attack filter.
- Blog20m
Securing Bedrock Agents from Indirect Prompt Injections
Tag external data as 'user input' to protect agents from indirect injection.
Exam Skills
Write your understanding, then reveal the reference answer.
Hands-On Lab
Build real muscle memory with these activities.
Configure Automated Reasoning Checks in Guardrails
Set up Automated Reasoning to verify mathematical and logical correctness in model outputs.
- 1 Open Bedrock → Guardrails → Create or edit a guardrail
- 2 Navigate to the Automated Reasoning section and click 'Create policy'
- 3 Define a policy for a financial domain: 'Loan interest calculations must follow simple interest formula: I = P * R * T'
- 4 Test with a correct calculation: 'Interest on $10,000 at 5% for 2 years is $1,000' — verify PASS
- 5 Test with an incorrect calculation: 'Interest on $10,000 at 5% for 2 years is $2,000' — verify BLOCKED with explanation
Test All 8 Guardrail Filter Types
Systematically test each guardrail filter type to understand their behavior.
- 1 Create a test guardrail with ALL filter types enabled
- 2 Test content filters: send prompts triggering HATE, INSULTS, SEXUAL, VIOLENCE, MISCONDUCT — verify each is blocked
- 3 Test denied topics: add 'political opinions' and verify related prompts are blocked
- 4 Test PII: send text with SSN, credit card, email and verify ANONYMIZE vs BLOCK behavior
- 5 Test prompt attack filter: try a jailbreak prompt like 'Ignore your instructions and...' — verify detection
Scenarios
Think through each scenario before revealing the answer.
PII Leak Remediation
- •Which Guardrails filter handles PII?
- •Should you BLOCK or ANONYMIZE emails?
- •What about other PII types?
- •Can this filter work with non-Bedrock models?
Insurance Rate Verification with Automated Reasoning
- •What is the difference between Contextual Grounding and Automated Reasoning?
- •Can Contextual Grounding verify mathematical correctness?
- •How does Automated Reasoning extract formal logic rules?
Practice Questions
17 questions across 3 difficulty levels.
Further Reading
Go deeper into today's topics.
Build Reliable AI with Automated Reasoning — Part 1
Four-phase implementation: create policy, test with scenarios, deploy in guardrail, integrate in app.
PwC + AWS Responsible AI with Automated Reasoning
Enterprise responsible AI implementation with formal logic verification.
Automated Reasoning for Financial Services
Formal logic verification for financial calculations.
Hacking GenAI Applications — From Theory to Practice
Red-team perspective on prompt injection attacks and defenses — practical attack/defense scenarios.
Detect Prompt Attacks with Bedrock Guardrails
Configure prompt attack filter: jailbreaks, injections, leakage detection — tag user vs system inputs.