AIP-C01 Study Hub
FM Integration Week 1 · Thursday

Day 4: Data Pipelines + Processing for FM Consumption

Learning Objectives

  • - Map AWS services to data types (Comprehend=text, Transcribe=audio, Textract=docs, Rekognition=images)
  • - Design multimodal data processing pipelines
  • - Understand Bedrock Data Automation for automated processing
  • - Use Macie for PII discovery and Lake Formation for data access control
  • - Build Step Functions orchestrations for batch document processing

Tasks

Tasks

0/7 completed
  • Read30m

    Amazon Bedrock Data Automation

    Automated multimodal data processing pipeline. Key for questions about processing mixed content types.

  • Read20m

    Amazon Comprehend Documentation

    Entity extraction, sentiment analysis, intent detection from text.

  • Read20m

    Amazon Textract Documentation

    OCR, text extraction, table extraction from documents and images.

  • Read15m

    Amazon Macie for PII Discovery

    Scan S3 buckets for PII BEFORE ingesting into Knowledge Bases. Critical pre-processing step.

  • Read15m

    AWS Lake Formation for Granular Data Access

    Column/row/cell-level access control for data feeding GenAI pipelines.

  • Blog25m

    Bedrock Data Automation + Guardrails PII Pipeline

    End-to-end PII detection and redaction architecture using BDA and Guardrails.

  • Watch20m

    Amazon Bedrock Agents: Easy Data Pipelines

    Video walkthrough of building data pipelines with Bedrock.

Exam Skills

Write your understanding, then reveal the reference answer.

0/6 reviewed

Hands-On Lab

Build real muscle memory with these activities.

intermediate 60 min

Build a Textract → Comprehend → Bedrock Pipeline

Create a simple document processing pipeline that extracts text, identifies entities, and generates a summary.

  1. 1 Upload a sample PDF document to an S3 bucket
  2. 2 Use the AWS CLI to call Textract DetectDocumentText and save the extracted text
  3. 3 Pass the extracted text to Comprehend DetectEntities to identify people, organizations, and dates
  4. 4 Send the extracted text and entities to Bedrock InvokeModel (Claude) with a prompt: 'Summarize this document and highlight key entities'
  5. 5 Compare the pipeline output with Bedrock Data Automation's single-API approach
Open Lab
intermediate 30 min

Test Bedrock Data Automation (BDA) for Document Processing

Use Bedrock Data Automation to process a document with a single API call instead of chaining services.

  1. 1 Open the Bedrock console and navigate to Data Automation
  2. 2 Create a new project and upload a sample multi-page document
  3. 3 Configure the extraction blueprint to extract key fields
  4. 4 Run the automation and review the structured output
  5. 5 Compare the result with the manual Textract → Comprehend pipeline from the previous activity
Open Lab

Scenarios

Think through each scenario before revealing the answer.

D1: FM IntegrationHard
#4

Insurance Document Processing Pipeline

An insurance company receives claims as scanned PDFs containing handwritten notes, printed text, and attached photos of vehicle damage. They want to extract all information and generate a structured summary. Design the pipeline.
Think First
  • Which service handles OCR for both printed and handwritten text?
  • Which service assesses damage from photos?
  • How do you extract named entities (names, dates, policy numbers)?
  • Where does Bedrock fit in this pipeline?

Practice Questions

11 questions across 3 difficulty levels.

Further Reading

Go deeper into today's topics.