Optimization Week 3 · Thursday

Day 18: Cost Optimization + Performance (Domain 4)

Learning Objectives

- Distinguish 3 caching layers: Prompt Caching, Semantic Caching, Exact-Match Caching
- Design model cascading and Intelligent Prompt Routing architectures
- Know batch inference (50% discount) for non-real-time workloads
- Implement cost monitoring with CloudWatch + Cost Explorer
- Understand provisioned throughput vs on-demand pricing

Tasks

0/5 completed

Blog30m
Effective Cost Optimization Strategies for Amazon Bedrock
Comprehensive guide: model cascading, caching, batch inference, provisioned throughput.
Blog20m
Prompt Caching on Bedrock - Up to 85% Cost Reduction
5-min TTL (1.25x write, 0.1x read) or 1-hour TTL (2.0x write, 0.1x read). Min 1024 tokens.
Blog20m
ElastiCache Semantic Cache - 86% Cost Reduction
Application-level semantic caching using embeddings and cosine similarity.
Blog15m
Intelligent Prompt Routing
Auto-routes to cheapest capable model. Automated model cascading.
Read15m
Model Invocation Logging
Log all prompts/responses to CloudWatch or S3. Essential for cost analysis and debugging.

Exam Skills

Write your understanding, then reveal the reference answer.

0/16 reviewed

Hands-On Lab

Build real muscle memory with these activities.

advanced 60 min

Implement Semantic Caching with ElastiCache

Set up a semantic cache using ElastiCache to reduce Bedrock API costs for similar queries.

1 Create an ElastiCache Serverless Redis cluster with vector search enabled
2 Write a Lambda function that generates embeddings for incoming queries using Titan Embeddings
3 Before calling Bedrock, search ElastiCache for a cached response with cosine similarity > 0.95
4 If cache hit: return cached response (saving the Bedrock API call). If cache miss: call Bedrock, store the query embedding and response in ElastiCache
5 Test with 10 similar queries and measure the cache hit rate and cost savings

Open Lab

intermediate 25 min

Enable and Analyze Model Invocation Logging

Set up model invocation logging to CloudWatch to track token usage and cost per request.

1 Open Bedrock console → Settings → Model invocation logging
2 Enable logging to CloudWatch Logs and select a log group
3 Make several Bedrock API calls with different models
4 Open CloudWatch Logs Insights and run: fields @timestamp, inputTokenCount, outputTokenCount, modelId | sort @timestamp desc
5 Calculate the cost per request using the token counts and published pricing

Open Lab

Scenarios

Think through each scenario before revealing the answer.

D4: OptimizationHard

#14

GenAI App Cost Reduction

A startup's GenAI app costs $12,000/month on Bedrock. Analysis shows 60% of queries are near-duplicates (users asking the same thing in different words). How do you cut costs?

Think First

•Which caching method handles 'similar but not identical' queries?
•How does semantic caching work technically?
•What similarity threshold is appropriate?
•What additional optimization can you add for remaining queries?

Practice Questions

14 questions across 3 difficulty levels.

Day 18: Cost Optimization + Performance (Domain 4)

Learning Objectives

Tasks

Tasks

Exam Skills

Hands-On Lab

Implement Semantic Caching with ElastiCache

Enable and Analyze Model Invocation Logging

Scenarios

GenAI App Cost Reduction

Practice Questions

Foundation

Applied

Expert

Further Reading

Effective Cost Optimization Strategies for Bedrock

Prompt Caching on Bedrock — Up to 85% Cost Reduction

Track, Allocate, and Manage GenAI Cost with Bedrock

Batch Job Orchestration with Step Functions

Build a Proactive AI Cost Management System

Optimizing Cost for FMs with Amazon Bedrock — FinOps Blog