AIP-C01 Study Hub
Optimization Week 3 · Thursday

Day 18: Cost Optimization + Performance (Domain 4)

Learning Objectives

  • - Distinguish 3 caching layers: Prompt Caching, Semantic Caching, Exact-Match Caching
  • - Design model cascading and Intelligent Prompt Routing architectures
  • - Know batch inference (50% discount) for non-real-time workloads
  • - Implement cost monitoring with CloudWatch + Cost Explorer
  • - Understand provisioned throughput vs on-demand pricing

Tasks

Tasks

0/5 completed
  • Blog30m

    Effective Cost Optimization Strategies for Amazon Bedrock

    Comprehensive guide: model cascading, caching, batch inference, provisioned throughput.

  • Blog20m

    Prompt Caching on Bedrock - Up to 85% Cost Reduction

    5-min TTL (1.25x write, 0.1x read) or 1-hour TTL (2.0x write, 0.1x read). Min 1024 tokens.

  • Blog20m

    ElastiCache Semantic Cache - 86% Cost Reduction

    Application-level semantic caching using embeddings and cosine similarity.

  • Blog15m

    Intelligent Prompt Routing

    Auto-routes to cheapest capable model. Automated model cascading.

  • Read15m

    Model Invocation Logging

    Log all prompts/responses to CloudWatch or S3. Essential for cost analysis and debugging.

Exam Skills

Write your understanding, then reveal the reference answer.

0/16 reviewed

Hands-On Lab

Build real muscle memory with these activities.

advanced 60 min

Implement Semantic Caching with ElastiCache

Set up a semantic cache using ElastiCache to reduce Bedrock API costs for similar queries.

  1. 1 Create an ElastiCache Serverless Redis cluster with vector search enabled
  2. 2 Write a Lambda function that generates embeddings for incoming queries using Titan Embeddings
  3. 3 Before calling Bedrock, search ElastiCache for a cached response with cosine similarity > 0.95
  4. 4 If cache hit: return cached response (saving the Bedrock API call). If cache miss: call Bedrock, store the query embedding and response in ElastiCache
  5. 5 Test with 10 similar queries and measure the cache hit rate and cost savings
Open Lab
intermediate 25 min

Enable and Analyze Model Invocation Logging

Set up model invocation logging to CloudWatch to track token usage and cost per request.

  1. 1 Open Bedrock console → Settings → Model invocation logging
  2. 2 Enable logging to CloudWatch Logs and select a log group
  3. 3 Make several Bedrock API calls with different models
  4. 4 Open CloudWatch Logs Insights and run: fields @timestamp, inputTokenCount, outputTokenCount, modelId | sort @timestamp desc
  5. 5 Calculate the cost per request using the token counts and published pricing
Open Lab

Scenarios

Think through each scenario before revealing the answer.

D4: OptimizationHard
#14

GenAI App Cost Reduction

A startup's GenAI app costs $12,000/month on Bedrock. Analysis shows 60% of queries are near-duplicates (users asking the same thing in different words). How do you cut costs?
Think First
  • Which caching method handles 'similar but not identical' queries?
  • How does semantic caching work technically?
  • What similarity threshold is appropriate?
  • What additional optimization can you add for remaining queries?

Practice Questions

14 questions across 3 difficulty levels.

Further Reading

Go deeper into today's topics.