Every engineer who has operated a Lambda-based production service has encountered the cold start problem. The function responds in 12 milliseconds on the second invocation but takes 3.8 seconds on the first. Monitoring dashboards show a bimodal latency distribution — a tight cluster under 50ms and a long tail stretching past multiple seconds. That long tail is cold starts, and for latency-sensitive workloads, it determines whether Lambda is viable or not.
This post breaks down exactly what happens during a cold start, provides concrete benchmark numbers across runtimes, and walks through seven specific strategies for reducing cold start latency. We will also cover how to measure cold starts accurately and when Lambda is simply the wrong compute choice.
The Cold Start Lifecycle: What Actually Happens
A cold start occurs when AWS has no pre-warmed execution environment available for your function. Every invocation during a cold start passes through these phases sequentially:
Phase 1: Execution Environment Provisioning (~50-200ms)
The Lambda service allocates a microVM using Firecracker. This includes provisioning CPU, memory, and network interfaces according to your function's configuration. This phase is managed entirely by AWS and is outside your control. The time here scales roughly with memory allocation — a 128MB function provisions a different slice of a physical host than a 3008MB function.
Phase 2: Deployment Package Download and Extraction (~50-500ms)
Your function's deployment package (ZIP or container image) is downloaded from S3 (for ZIP) or ECR (for container images) to the execution environment. AWS caches deployment packages aggressively, but the first download in a region or after a deployment will be a full fetch. A 5MB ZIP extracts in roughly 50ms. A 250MB container image takes considerably longer — often 300-500ms even with the Sparse Filesystem optimizations AWS introduced for container images.
Phase 3: Runtime Initialization (~10-2000ms)
The Lambda service starts the runtime process — the Node.js V8 engine, the Python interpreter, the JVM, or whatever runtime your function targets. This is where runtime choice has its largest impact. Starting a JVM with default settings takes 1-3 seconds. Starting the Go runtime takes under 10ms. This phase also includes the Lambda Runtime API bootstrap, where the runtime registers with the Lambda service and signals readiness to receive events.
Phase 4: Handler Module Initialization (~10-5000ms)
Your code outside the handler function runs: top-level imports, global variable initialization, database connection establishment, SDK client creation. This is the phase you have the most control over. If your Python function does import pandas at the top level, that import alone adds 200-400ms. If your Java function initializes a Spring context, you can add 3-8 seconds here.
Phase 5: First Invocation
The handler function executes with the actual event payload. On warm invocations, only this phase runs.
The total cold start duration is the sum of all five phases. For a well-optimized Node.js function, that might be 200ms. For a poorly configured Java function in a VPC, it could exceed 10 seconds.
Cold Start Duration by Runtime: Measured Benchmarks
These numbers reflect a minimal function (hello-world equivalent) with 1024MB memory in us-east-1, measured at the p50 level across 1000+ cold invocations. Real-world functions will be higher due to handler initialization:
| Runtime | p50 Cold Start | p99 Cold Start | Notes |
|---|---|---|---|
| Node.js 20 | ~180ms | ~300ms | V8 isolate startup is well-optimized |
| Python 3.12 | ~170ms | ~250ms | CPython interpreter starts fast |
| Java 21 (Corretto) | ~3.2s | ~5.8s | JVM class loading dominates |
| Java 21 + SnapStart | ~200ms | ~400ms | Firecracker snapshot restore |
| .NET 8 (AOT) | ~250ms | ~450ms | NativeAOT significantly improved this |
| .NET 8 (JIT) | ~600ms | ~800ms | CLR JIT compilation overhead |
| Go 1.x | ~90ms | ~150ms | Statically compiled, minimal runtime |
| Rust (custom runtime) | ~12ms | ~45ms | No runtime overhead, just binary startup |
The takeaway: if cold start latency is a primary concern, runtime selection is your highest-impact decision. Moving from Java (without SnapStart) to Node.js or Python eliminates 2-5 seconds of cold start time immediately.
Factors That Affect Cold Start Duration
Package Size
There is a direct, measurable correlation between deployment package size and cold start duration. The relationship is roughly linear up to about 50MB, then flattens slightly as S3 download parallelism and caching improve throughput for larger packages.
Measured on Node.js 20 with 1024MB memory:
| Package Size | Cold Start (p50) |
|---|---|
| 1MB | ~180ms |
| 10MB | ~220ms |
| 50MB | ~350ms |
| 100MB | ~500ms |
| 250MB (container) | ~700ms |
Every megabyte matters. Removing a 40MB unused dependency is not premature optimization — it is a direct latency improvement.
Memory Allocation
Lambda allocates CPU proportionally to memory. At 128MB, your function gets a fraction of a vCPU. At 1769MB, you get exactly one full vCPU. At 3008MB and above, you get multiple vCPU cores.
More CPU means faster runtime initialization and faster handler code execution during cold start. It is common to see a function's cold start drop by 40-60% when moving from 128MB to 1024MB, even if the function does not need that much memory. The cost-per-invocation increases, but the total billed duration often decreases because the function initializes and executes faster.
Run this to profile across memory configurations:
# Install the Lambda Power Tuning tool
sam deploy \
--template-url https://raw.githubusercontent.com/alexcasalboni/aws-lambda-power-tuning/master/template.yml \
--stack-name lambda-power-tuning \
--capabilities CAPABILITY_IAM
# Run the power tuning state machine
aws stepfunctions start-execution \
--state-machine-arn arn:aws:states:us-east-1:123456789:stateMachine:powerTuningStateMachine \
--input '{
"lambdaARN": "arn:aws:lambda:us-east-1:123456789:function:my-func",
"powerValues": [128, 256, 512, 1024, 1769, 3008],
"num": 50,
"payload": "{}",
"parallelInvocation": true,
"strategy": "cost"
}'
VPC Attachment
Historically, attaching a Lambda function to a VPC added 10-14 seconds to cold starts because AWS provisioned a dedicated Elastic Network Interface (ENI) for each execution environment on every cold start.
In late 2019, AWS introduced Hyperplane — a shared NAT infrastructure that provisions VPC networking asynchronously during function creation or update rather than during cold start. Today, VPC-attached cold starts add roughly 0-200ms of additional latency compared to non-VPC functions. If you are still seeing multi-second VPC cold starts, verify your function is using a runtime version from 2020 or later.
# Check when your function was last updated (older configs may not have Hyperplane)
aws lambda get-function-configuration \
--function-name my-function \
--query '{Runtime: Runtime, LastModified: LastModified, VpcConfig: VpcConfig}'
Strategy 1: Provisioned Concurrency
Provisioned Concurrency keeps a specified number of execution environments initialized and ready to serve requests with zero cold start latency. AWS runs your function's initialization code proactively and keeps the result cached.
How it works: When you configure provisioned concurrency of N, AWS maintains N initialized execution environments. Incoming invocations route to these pre-warmed environments. If all N are busy, additional invocations experience regular cold starts.
Cost model:
Provisioned concurrency charges two components:
- ✓Provisioned concurrency — $0.0000041667 per GB-second (~$0.015 per GB-hour)
- ✓Provisioned invocation duration — $0.0000097222 per GB-second (35% discount vs on-demand)
For a 1024MB function with 10 provisioned instances running 15 hours/day (business hours):
Provisioned cost: 10 instances × 1 GB × 54,000 seconds × $0.0000041667 = $2.25/day
Monthly: $2.25 × 30 = $67.50/month
Compare this against the cost of cold-start-induced retries, timeout failures, and SLA breaches. For a webhook processor handling payment events, a single missed webhook can trigger manual reconciliation that costs far more than $67.50.
Auto-scaling provisioned concurrency:
Static provisioned concurrency is wasteful — you do not need 10 warm instances at 3am. Use Application Auto Scaling to tie provisioned concurrency to actual utilization:
# Register the function as a scalable target
aws application-autoscaling register-scalable-target \
--service-namespace lambda \
--resource-id function:webhook-processor:live \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--min-capacity 2 \
--max-capacity 50
# Create a target tracking policy
aws application-autoscaling put-scaling-policy \
--service-namespace lambda \
--resource-id function:webhook-processor:live \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--policy-name webhook-utilization-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 0.7,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization"
},
"ScaleInCooldown": 300,
"ScaleOutCooldown": 60
}'
This scales provisioned instances to maintain 70% utilization — enough headroom to absorb traffic spikes without cold starts, but not so much that you are paying for idle capacity.
Strategy 2: SnapStart for Java
SnapStart addresses Java's fundamental cold start problem — the JVM needs time to load classes, verify bytecode, and JIT-compile hot paths. SnapStart takes a Firecracker microVM snapshot after the initialization phase completes and restores from that snapshot on subsequent cold starts.
How it works:
- ✓You publish a new function version with SnapStart enabled
- ✓AWS invokes your function's init phase and waits for it to complete
- ✓Firecracker takes a memory snapshot of the entire microVM state
- ✓On cold start, AWS restores the snapshot instead of re-running init
- ✓Cold start drops from 3-6 seconds to 200-400ms
What cannot be snapshotted:
SnapStart restores from a frozen point in time. Anything that depends on runtime state will break:
- ✓Open network connections — TCP connections will be stale after restore. Use connection pools that validate before use.
- ✓Random number generators —
java.security.SecureRandominitialized before snapshot will produce identical sequences across instances. AWS provides aCRaCRestoreHookto re-seed. - ✓Unique identifiers — UUIDs generated during init will be duplicated. Generate them inside the handler.
- ✓Cached timestamps —
System.currentTimeMillis()cached during init will be stale.
// Register a restore hook to re-initialize state after snapshot restore
import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;
public class Handler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent>, Resource {
private SecureRandom secureRandom;
public Handler() {
Core.getGlobalContext().register(this);
this.secureRandom = new SecureRandom();
}
@Override
public void afterRestore(Context<? extends Resource> context) {
// Re-seed after snapshot restore
this.secureRandom = new SecureRandom();
}
}
Enable SnapStart in your SAM template:
MyJavaFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: java21
SnapStart:
ApplyOn: PublishedVersions
AutoPublishAlias: live
Strategy 3: Minimize Package Size
Every byte in your deployment package has a cost measured in cold start milliseconds. Aggressive package optimization is the single most effective strategy for runtimes with fast interpreters (Node.js, Python).
Tree shaking with esbuild (Node.js):
// esbuild.config.mjs
import { build } from 'esbuild';
await build({
entryPoints: ['src/handler.ts'],
bundle: true,
minify: true,
sourcemap: true,
platform: 'node',
target: 'node20',
outfile: 'dist/handler.js',
external: [
'@aws-sdk/*', // Available in Lambda runtime, no need to bundle
],
treeShaking: true,
metafile: true,
});
Results we have measured:
| Approach | Package Size |
|---|---|
npm install (all deps) | 180MB |
Production deps only (--omit=dev) | 94MB |
| esbuild bundle (all deps) | 8.2MB |
| esbuild bundle (external AWS SDK) | 2.1MB |
| esbuild bundle + minify | 1.4MB |
Python: avoid large dependencies at the top level:
# Check what is actually large in your package
du -sh .venv/lib/python3.12/site-packages/* | sort -rh | head -20
# Common offenders:
# 60MB boto3/botocore (already in Lambda runtime — do not package)
# 45MB pandas
# 30MB numpy
# 22MB cryptography
For Python, use Lambda layers to share large dependencies across functions and keep your deployment package to application code only:
# Create a layer with shared dependencies
mkdir -p layer/python
pip install -r requirements-layer.txt -t layer/python
cd layer && zip -r ../dependencies-layer.zip python/
aws lambda publish-layer-version \
--layer-name shared-dependencies \
--zip-file fileb://dependencies-layer.zip \
--compatible-runtimes python3.12
Strategy 4: ARM64 (Graviton)
AWS Graviton processors are available for Lambda and provide measurable improvements in both cold start latency and cost.
Cold start benchmarks (Node.js 20, 1024MB, hello-world):
| Architecture | Cold Start (p50) | Cost per ms |
|---|---|---|
| x86_64 | ~180ms | $0.0000166667/GB-s |
| arm64 | ~160ms | $0.0000133334/GB-s |
ARM64 provides approximately 10-15% faster cold starts and 20% lower cost. The cold start improvement comes from Graviton's higher single-threaded performance per watt, which benefits the sequential initialization process.
Switching to ARM64:
# SAM template
MyFunction:
Type: AWS::Serverless::Function
Properties:
Architectures:
- arm64
Runtime: nodejs20.x
# CLI
aws lambda update-function-configuration \
--function-name my-function \
--architectures arm64
Compatibility check: If your function uses native Node.js addons or Python C extensions, ensure they are compiled for ARM64. Pure JavaScript/Python functions require zero code changes.
Strategy 5: Connection Reuse
Establishing new TCP and TLS connections inside the handler adds 50-150ms per connection on every invocation (cold or warm). Move connection initialization outside the handler so connections persist across warm invocations, and configure HTTP keep-alive to reuse connections.
Node.js — reuse HTTP connections:
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { NodeHttpHandler } from '@smithy/node-http-handler';
import { Agent } from 'node:https';
// Initialized once, reused across invocations
const agent = new Agent({
keepAlive: true,
maxSockets: 50,
timeout: 5000,
});
const dynamodb = new DynamoDBClient({
requestHandler: new NodeHttpHandler({
httpsAgent: agent,
}),
});
export const handler = async (event) => {
// dynamodb client reuses connections from the persistent agent
const result = await dynamodb.send(new GetItemCommand({
TableName: 'webhooks',
Key: { id: { S: event.webhookId } },
}));
return result.Item;
};
Python — reuse database connections:
import psycopg2
import os
# Connection established once, outside the handler
conn = None
def get_connection():
global conn
if conn is None or conn.closed:
conn = psycopg2.connect(
host=os.environ['DB_HOST'],
port=os.environ.get('DB_PORT', 5432),
dbname=os.environ['DB_NAME'],
user=os.environ['DB_USER'],
password=os.environ['DB_PASSWORD'],
connect_timeout=5,
keepalives=1,
keepalives_idle=30,
keepalives_interval=10,
keepalives_count=5,
)
return conn
def handler(event, context):
db = get_connection()
with db.cursor() as cur:
cur.execute(
"INSERT INTO webhook_events (id, payload, received_at) VALUES (%s, %s, NOW())",
(event['id'], event['body'])
)
db.commit()
return {'statusCode': 200}
Strategy 6: Lazy Initialization
Not every invocation needs every dependency. Defer expensive initialization until it is actually required. This reduces cold start duration for the common path while accepting a one-time cost on the first invocation that hits the deferred path.
Node.js — lazy require:
let pdfGenerator;
function getPdfGenerator() {
if (!pdfGenerator) {
// puppeteer-core is 45MB — only load when actually generating PDFs
pdfGenerator = require('puppeteer-core');
}
return pdfGenerator;
}
export const handler = async (event) => {
if (event.action === 'generate-pdf') {
const puppeteer = getPdfGenerator();
// ... generate PDF
} else {
// Common path: no puppeteer overhead
return processWebhook(event);
}
};
Python — deferred import:
def handler(event, context):
if event.get('report_type') == 'analytics':
# pandas adds ~400ms to cold start — only import when needed
import pandas as pd
df = pd.read_json(event['data'])
return generate_report(df)
# Fast path: no pandas overhead
return process_event(event)
This pattern is particularly effective when a function handles multiple event types through a router pattern. The cold start cost reflects only the imports needed for the specific event type being processed.
Strategy 7: Keep-Warm Pings
A scheduled CloudWatch Events rule or EventBridge rule that invokes your function every 5-15 minutes prevents the execution environment from being reclaimed.
# SAM template
WarmingRule:
Type: AWS::Events::Rule
Properties:
ScheduleExpression: rate(5 minutes)
Targets:
- Arn: !GetAtt MyFunction.Arn
Id: warming-target
Input: '{"source": "warming"}'
WarmingPermission:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref MyFunction
Action: lambda:InvokeFunction
Principal: events.amazonaws.com
SourceArn: !GetAtt WarmingRule.Arn
Add a short-circuit in your handler:
export const handler = async (event) => {
if (event.source === 'warming') {
return { statusCode: 200, body: 'warm' };
}
// Actual business logic
};
Why this is a last resort:
- ✓It keeps exactly one execution environment warm. Concurrent requests still trigger cold starts.
- ✓It does not scale. Keeping N instances warm requires N concurrent warming invocations, which is fragile and expensive.
- ✓AWS can still reclaim your environment between pings.
- ✓Provisioned concurrency solves this problem properly with guarantees.
Use keep-warm pings only for low-traffic functions where provisioned concurrency is not cost-justified.
Measuring Cold Starts
You cannot optimize what you cannot measure. Lambda does not expose a direct "cold start" metric, but you can derive it from existing telemetry.
CloudWatch Logs Insights
Lambda annotates cold start invocations with Init Duration in the REPORT log line. Query it directly:
# Cold start frequency and duration over the last 24 hours
filter @type = "REPORT"
| fields @timestamp, @duration, @initDuration, @memorySize, @maxMemoryUsed
| filter ispresent(@initDuration)
| stats
count(*) as coldStarts,
avg(@initDuration) as avgColdStart,
pct(@initDuration, 50) as p50ColdStart,
pct(@initDuration, 95) as p95ColdStart,
pct(@initDuration, 99) as p99ColdStart,
max(@initDuration) as maxColdStart
by bin(1h)
| sort @timestamp desc
# Cold start percentage relative to total invocations
filter @type = "REPORT"
| stats
count(*) as totalInvocations,
sum(ispresent(@initDuration)) as coldStarts,
sum(ispresent(@initDuration)) / count(*) * 100 as coldStartPct,
avg(@initDuration) as avgColdStartMs
by bin(1d)
X-Ray Tracing
Enable active tracing to see cold start phases in the X-Ray service map:
aws lambda update-function-configuration \
--function-name my-function \
--tracing-config Mode=Active
X-Ray breaks the invocation into subsegments: Initialization, Invocation, and Overhead. The Initialization subsegment appears only on cold starts and shows exactly how long your function's init code took.
Custom Metrics with Embedded Metric Format
For dashboards and alarms, emit a custom metric on cold starts:
const isColdStart = true; // Set to true at module scope
export const handler = async (event) => {
if (isColdStart) {
console.log(JSON.stringify({
_aws: {
Timestamp: Date.now(),
CloudWatchMetrics: [{
Namespace: 'WebhookProcessor',
Dimensions: [['FunctionName']],
Metrics: [{ Name: 'ColdStart', Unit: 'Count' }],
}],
},
FunctionName: process.env.AWS_LAMBDA_FUNCTION_NAME,
ColdStart: 1,
}));
isColdStart = false; // Module-scoped, persists across warm invocations
}
// handler logic
};
This emits a CloudWatch metric without needing the CloudWatch SDK or a PutMetricData API call — CloudWatch Logs automatically extracts metrics from the embedded metric format.
When Lambda Is the Wrong Choice
Cold start mitigation has limits. If your requirements include any of the following, Lambda is architecturally mismatched:
Consistent sub-50ms latency (p99): Even with provisioned concurrency, Lambda adds invocation overhead (routing, security context creation) that makes guaranteed sub-50ms p99 difficult. Use containers on ECS/EKS or EC2 with connection pooling.
Long-running processes (>15 minutes): Lambda has a hard 15-minute execution timeout. For batch jobs, ETL pipelines, or media processing that exceeds this, use AWS Batch, ECS tasks, or Step Functions with activity workers.
Heavy sustained compute: Lambda charges per millisecond of compute. If your function runs continuously at high concurrency, a reserved EC2 instance or Fargate service will be significantly cheaper. The crossover point is roughly 20-30% sustained utilization — above that, always-on compute is more economical.
High-throughput streaming: Processing millions of events per second with strict ordering requirements is better served by Kinesis consumers on ECS or Kafka consumers on MSK.
Case Study: Webhook Processing Service
A payment processing integration service handled incoming webhooks from Stripe and Razorpay. The service validated webhook signatures, parsed the payload, updated order status in DynamoDB, and enqueued downstream processing in SQS.
The Problem
Monitoring showed that 5-8% of webhook invocations were cold starts. The function was deployed as a Java 11 application with the full AWS SDK, Jackson, and several internal libraries — a 180MB deployment package running on x86_64 with 512MB memory.
Cold start latency distribution:
- ✓p50: 3.8 seconds
- ✓p95: 4.9 seconds
- ✓p99: 6.2 seconds
Payment providers typically enforce a 5-second timeout on webhook delivery. Webhooks that exceeded this timeout were retried, creating duplicate processing events that required manual reconciliation. The team at Stripe Systems identified cold starts as the root cause through the following CloudWatch Insights query:
filter @type = "REPORT"
| filter ispresent(@initDuration)
| stats
count(*) as coldStarts,
pct(@initDuration, 50) as p50,
pct(@initDuration, 99) as p99,
pct(@duration + @initDuration, 99) as p99Total
by bin(1h)
| sort @timestamp desc
| limit 48
X-Ray traces for cold start invocations showed this breakdown:
- ✓
Initializationsubsegment: 3,200ms (JVM startup + class loading + Spring context) - ✓DynamoDB connection establishment: 380ms
- ✓Webhook signature verification: 45ms
- ✓DynamoDB write: 28ms
- ✓SQS enqueue: 22ms
The initialization subsegment alone consumed 86% of the total cold start duration.
The Fix
The team applied three changes over two weeks:
1. Runtime migration: Java 11 → Node.js 20
The webhook processor logic was straightforward — signature verification, JSON parsing, DynamoDB writes, SQS publishes. There was no complex business logic that justified the JVM's overhead. The team rewrote the handler in TypeScript (280 lines) and bundled it with esbuild.
2. Package optimization: 180MB → 1.4MB
The esbuild configuration eliminated dead code and externalized the AWS SDK (available in the Lambda runtime):
// esbuild.config.mjs
import { build } from 'esbuild';
const result = await build({
entryPoints: ['src/handler.ts'],
bundle: true,
minify: true,
sourcemap: true,
platform: 'node',
target: 'node20',
outfile: 'dist/handler.js',
external: [
'@aws-sdk/client-dynamodb',
'@aws-sdk/client-sqs',
'@aws-sdk/lib-dynamodb',
],
treeShaking: true,
metafile: true,
});
// Print bundle analysis
const text = await import('esbuild').then(e =>
e.analyzeMetafile(result.metafile)
);
console.log(text);
Bundle analysis output:
dist/handler.js 1.4MB
├ node_modules/crypto-js/core.js 82.3KB
├ src/webhook-validator.ts 4.2KB
├ src/handler.ts 3.1KB
├ src/dynamodb-repository.ts 2.8KB
└ src/sqs-publisher.ts 1.9KB
The function went from a 180MB Java ZIP (full AWS SDK v1, Jackson, Spring Boot, unused transitive dependencies) to a 1.4MB minified JavaScript bundle.
3. Provisioned concurrency with auto-scaling
The SAM template configured provisioned concurrency during business hours when webhook volume was highest:
WebhookProcessor:
Type: AWS::Serverless::Function
Properties:
FunctionName: webhook-processor
Handler: handler.handler
Runtime: nodejs20.x
Architectures:
- arm64
MemorySize: 1024
Timeout: 10
CodeUri: dist/
AutoPublishAlias: live
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 10
Environment:
Variables:
WEBHOOKS_TABLE: !Ref WebhooksTable
PROCESSING_QUEUE_URL: !Ref ProcessingQueue
# Auto-scaling for provisioned concurrency
ScalingTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MaxCapacity: 50
MinCapacity: 2
ResourceId: !Sub function:webhook-processor:live
ScalableDimension: lambda:function:ProvisionedConcurrency
ServiceNamespace: lambda
ScheduledActions:
- ScheduledActionName: scale-up-business-hours
Schedule: cron(30 2 ? * MON-SAT *) # 8:00 AM IST
ScalableTargetAction:
MinCapacity: 10
MaxCapacity: 50
- ScheduledActionName: scale-down-night
Schedule: cron(30 17 ? * * *) # 11:00 PM IST
ScalableTargetAction:
MinCapacity: 2
MaxCapacity: 10
ScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: webhook-utilization-tracking
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref ScalingTarget
TargetTrackingScalingPolicyConfiguration:
TargetValue: 0.7
PredefinedMetricSpecification:
PredefinedMetricType: LambdaProvisionedConcurrencyUtilization
ScaleInCooldown: 300
ScaleOutCooldown: 60
Results
After deploying all three changes, cold start metrics dropped dramatically:
| Metric | Before | After |
|---|---|---|
| Cold start p50 | 3,800ms | 180ms |
| Cold start p99 | 6,200ms | 380ms |
| Cold start percentage | 5-8% | 0.3% (non-provisioned overflow only) |
| Package size | 180MB | 1.4MB |
| Timeout-induced retries | ~120/day | 0-2/day |
| Monthly compute cost | $340 | $185 (compute) + $68 (provisioned) = $253 |
The net cost decreased by $87/month while eliminating webhook timeout failures. The previous $340 did not account for the engineering time spent on manual reconciliation of duplicate webhook events — roughly 3-4 hours per week of developer time investigating and resolving payment state inconsistencies.
X-Ray traces after the migration showed:
- ✓
Initializationsubsegment: 160ms (Node.js runtime + handler init) - ✓DynamoDB operations: 18ms (connection reuse via keep-alive agent)
- ✓Webhook verification: 12ms
- ✓SQS publish: 8ms
- ✓Total cold start invocation: ~200ms
The Stripe Systems engineering team now uses this same pattern — esbuild bundling, ARM64, provisioned concurrency with scheduled scaling — as a baseline template for all new Lambda-based services.
Conclusion
Lambda cold starts are a deterministic engineering problem, not a mystery. The initialization lifecycle is well-documented, the contributing factors are measurable, and the mitigation strategies have predictable outcomes. The decision tree is straightforward:
- ✓Measure first. Use CloudWatch Insights to quantify your cold start frequency and duration before optimizing.
- ✓Pick the right runtime. If cold starts matter, avoid JVM-based runtimes unless you can use SnapStart.
- ✓Minimize your package. Bundle and tree-shake aggressively. Externalize the AWS SDK.
- ✓Use ARM64. It is faster and cheaper with no code changes for most functions.
- ✓Use provisioned concurrency for latency-critical paths. Pair it with auto-scaling to control costs.
- ✓If none of this gets you where you need to be, Lambda may not be the right compute model for that workload.
Every millisecond of cold start latency is the result of a specific phase doing specific work. Identify which phase dominates your cold start, apply the corresponding strategy, and measure again. The numbers do not lie.
Ready to discuss your project?
Get in Touch →