Safety & Security Week 3 · Monday

Day 15: AI Safety - Guardrails Deep Dive

Learning Objectives

- Know all 8 Guardrails filter types and their configuration options
- Distinguish BLOCK vs ANONYMIZE for PII handling
- Distinguish Contextual Grounding (RAG faithfulness) from Automated Reasoning (logical correctness)
- Understand Standard tier PII detection in code elements
- Design defense-in-depth architectures with multiple safety layers

Tasks

0/5 completed

Read45m
Bedrock Guardrails Components
Every filter type, configuration options, and behavior. The authoritative reference.
Read15m
Amazon Bedrock Guardrails Product Page
High-level overview and capabilities summary.
Blog25m
Automated Reasoning - 99% Verification Accuracy
Formal logic verification for mathematical correctness. Key exam distinction from Contextual Grounding.
Blog20m
Safeguard GenAI from Prompt Injections
Prompt injection defense strategies and Guardrails Prompt Attack filter.
Blog20m
Securing Bedrock Agents from Indirect Prompt Injections
Tag external data as 'user input' to protect agents from indirect injection.

Exam Skills

Write your understanding, then reveal the reference answer.

0/5 reviewed

Hands-On Lab

Build real muscle memory with these activities.

intermediate 45 min

Configure Automated Reasoning Checks in Guardrails

Set up Automated Reasoning to verify mathematical and logical correctness in model outputs.

1 Open Bedrock → Guardrails → Create or edit a guardrail
2 Navigate to the Automated Reasoning section and click 'Create policy'
3 Define a policy for a financial domain: 'Loan interest calculations must follow simple interest formula: I = P * R * T'
4 Test with a correct calculation: 'Interest on $10,000 at 5% for 2 years is $1,000' — verify PASS
5 Test with an incorrect calculation: 'Interest on $10,000 at 5% for 2 years is $2,000' — verify BLOCKED with explanation

Open Lab

intermediate 40 min

Test All 8 Guardrail Filter Types

Systematically test each guardrail filter type to understand their behavior.

1 Create a test guardrail with ALL filter types enabled
2 Test content filters: send prompts triggering HATE, INSULTS, SEXUAL, VIOLENCE, MISCONDUCT — verify each is blocked
3 Test denied topics: add 'political opinions' and verify related prompts are blocked
4 Test PII: send text with SSN, credit card, email and verify ANONYMIZE vs BLOCK behavior
5 Test prompt attack filter: try a jailbreak prompt like 'Ignore your instructions and...' — verify detection

Open Lab

Scenarios

Think through each scenario before revealing the answer.

D3: Safety & SecurityMedium

#11

PII Leak Remediation

A customer service chatbot accidentally reveals another customer's email address in its response. The compliance team demands immediate remediation. What do you implement?

Think First

•Which Guardrails filter handles PII?
•Should you BLOCK or ANONYMIZE emails?
•What about other PII types?
•Can this filter work with non-Bedrock models?

D3: Safety & SecurityHard

#11B

Insurance Rate Verification with Automated Reasoning

An insurance company needs to verify that policy quotes generated by their FM are mathematically correct and comply with their rate tables. Contextual grounding alone is insufficient because the FM needs to do calculations. What do you implement?

Think First

•What is the difference between Contextual Grounding and Automated Reasoning?
•Can Contextual Grounding verify mathematical correctness?
•How does Automated Reasoning extract formal logic rules?

Practice Questions

17 questions across 3 difficulty levels.

Day 15: AI Safety - Guardrails Deep Dive

Learning Objectives

Tasks

Tasks

Exam Skills

Hands-On Lab

Configure Automated Reasoning Checks in Guardrails

Test All 8 Guardrail Filter Types

Scenarios

PII Leak Remediation

Insurance Rate Verification with Automated Reasoning

Practice Questions

Foundation

Applied

Expert

Further Reading

Build Reliable AI with Automated Reasoning — Part 1

PwC + AWS Responsible AI with Automated Reasoning

Automated Reasoning for Financial Services

Hacking GenAI Applications — From Theory to Practice

Detect Prompt Attacks with Bedrock Guardrails