Day 9: Deployment Strategies + Enterprise Integration
Learning Objectives
- - Design model cascading patterns (small -> large based on complexity)
- - Understand container-based LLM deployment (ECS/EKS with GPU)
- - Know edge/hybrid options (Outposts, Wavelength, Lambda@Edge)
- - Implement WebSocket streaming with API Gateway + Lambda + Bedrock
- - Apply the AI Gateway pattern for rate limiting and access control
Tasks
Tasks
0/5 completed- Read30m
GenAI Application Builder Architecture Overview
End-to-end reference architecture for production GenAI applications.
- Blog25m
Building an AI Gateway to Bedrock with API Gateway
Rate limiting, access control, usage tracking for GenAI APIs. Enterprise pattern.
- Blog25m
Serverless Generative AI Architectural Patterns
API Gateway + Lambda + Bedrock foundational patterns.
- Blog25m
Orchestrate GenAI Workflows with Bedrock and Step Functions
Parallel API calls, error handling, complex orchestration.
- Watch20m
AWS Step Functions for Generative AI
Video overview of Step Functions integration with GenAI services.
Exam Skills
Write your understanding, then reveal the reference answer.
Hands-On Lab
Build real muscle memory with these activities.
Set Up API Gateway → Lambda → Bedrock Pattern
Build the foundational serverless pattern for exposing Bedrock as a REST API.
- 1 Create a Lambda function (Python) that calls bedrock-runtime InvokeModel with Claude
- 2 Add the bedrock:InvokeModel permission to the Lambda execution role
- 3 Create a REST API in API Gateway with a POST /chat resource
- 4 Configure the Lambda proxy integration
- 5 Test with curl: curl -X POST <api-url>/chat -d '{"prompt": "Hello"}' and verify the Bedrock response
Deploy the AI Gateway Pattern with Rate Limiting
Extend the basic API Gateway pattern with usage plans and API keys for rate limiting.
- 1 In API Gateway, create a Usage Plan with rate limit: 100 requests/second, burst: 200
- 2 Create an API key and associate it with the usage plan
- 3 Enable API key requirement on the /chat resource
- 4 Test that requests without the API key return 403
- 5 Test that requests exceeding the rate limit return 429 (Too Many Requests)
Scenarios
Think through each scenario before revealing the answer.
Model Cascading Cost Optimization
- •What is the model cascading pattern?
- •Which model handles simple queries cheaply?
- •How do you add caching for repeated queries?
- •What Bedrock feature automates model routing?
Practice Questions
5 questions across 3 difficulty levels.
Further Reading
Go deeper into today's topics.
Serverless Generative AI Architectural Patterns
API Gateway + Lambda + Bedrock foundational patterns for production GenAI APIs.
Building an AI Gateway to Bedrock with API Gateway
Rate limiting, access control, usage tracking for GenAI APIs. Enterprise pattern.
Orchestrate GenAI Workflows with Bedrock and Step Functions
Parallel API calls, error handling, complex orchestration with Step Functions.
Serverless Prompt Chaining — GitHub Repo
CDK code: sequential chains, parallel jobs, loops, conditions — Streamlit demo with meal planner example.
GenAI Application Builder Architecture Overview
End-to-end reference architecture for production GenAI applications.