FM Integration Week 1 · Thursday

Day 4: Data Pipelines + Processing for FM Consumption

Learning Objectives

- Map AWS services to data types (Comprehend=text, Transcribe=audio, Textract=docs, Rekognition=images)
- Design multimodal data processing pipelines
- Understand Bedrock Data Automation for automated processing
- Use Macie for PII discovery and Lake Formation for data access control
- Build Step Functions orchestrations for batch document processing

Tasks

0/7 completed

Read30m
Amazon Bedrock Data Automation
Automated multimodal data processing pipeline. Key for questions about processing mixed content types.
Read20m
Amazon Comprehend Documentation
Entity extraction, sentiment analysis, intent detection from text.
Read20m
Amazon Textract Documentation
OCR, text extraction, table extraction from documents and images.
Read15m
Amazon Macie for PII Discovery
Scan S3 buckets for PII BEFORE ingesting into Knowledge Bases. Critical pre-processing step.
Read15m
AWS Lake Formation for Granular Data Access
Column/row/cell-level access control for data feeding GenAI pipelines.
Blog25m
Bedrock Data Automation + Guardrails PII Pipeline
End-to-end PII detection and redaction architecture using BDA and Guardrails.
Watch20m
Amazon Bedrock Agents: Easy Data Pipelines
Video walkthrough of building data pipelines with Bedrock.

Exam Skills

Write your understanding, then reveal the reference answer.

0/6 reviewed

Hands-On Lab

Build real muscle memory with these activities.

intermediate 60 min

Build a Textract → Comprehend → Bedrock Pipeline

Create a simple document processing pipeline that extracts text, identifies entities, and generates a summary.

1 Upload a sample PDF document to an S3 bucket
2 Use the AWS CLI to call Textract DetectDocumentText and save the extracted text
3 Pass the extracted text to Comprehend DetectEntities to identify people, organizations, and dates
4 Send the extracted text and entities to Bedrock InvokeModel (Claude) with a prompt: 'Summarize this document and highlight key entities'
5 Compare the pipeline output with Bedrock Data Automation's single-API approach

Open Lab

intermediate 30 min

Test Bedrock Data Automation (BDA) for Document Processing

Use Bedrock Data Automation to process a document with a single API call instead of chaining services.

1 Open the Bedrock console and navigate to Data Automation
2 Create a new project and upload a sample multi-page document
3 Configure the extraction blueprint to extract key fields
4 Run the automation and review the structured output
5 Compare the result with the manual Textract → Comprehend pipeline from the previous activity

Open Lab

Scenarios

Think through each scenario before revealing the answer.

D1: FM IntegrationHard

Insurance Document Processing Pipeline

An insurance company receives claims as scanned PDFs containing handwritten notes, printed text, and attached photos of vehicle damage. They want to extract all information and generate a structured summary. Design the pipeline.

Think First

•Which service handles OCR for both printed and handwritten text?
•Which service assesses damage from photos?
•How do you extract named entities (names, dates, policy numbers)?
•Where does Bedrock fit in this pipeline?

Practice Questions

11 questions across 3 difficulty levels.

Day 4: Data Pipelines + Processing for FM Consumption

Learning Objectives

Tasks

Tasks

Exam Skills

Hands-On Lab

Build a Textract → Comprehend → Bedrock Pipeline

Test Bedrock Data Automation (BDA) for Document Processing

Scenarios

Insurance Document Processing Pipeline

Practice Questions

Foundation

Applied

Further Reading

Intelligent Document Processing at Scale with BDA

IDP with Textract, Bedrock, and LangChain

BDA Document Processing Samples

Multimodal Power of BDA for Unstructured Data

Lessons Learned with BDA in an IDP Product