[Cheat Sheet] 2026 Serverless Pricing & Quotas: AWS vs GCP vs Azure
Bottom Line
While AWS Lambda continues to dominate the bursty function-as-a-service market, Google Cloud Run has become the 2026 leader for concurrency-heavy workloads and AI inference due to its superior container-native scaling and resource efficiency.
Key Takeaways
- ›AWS Lambda Tiered Pricing: Savings of up to 20% automatically kick in after the first 6B GB-seconds per month.
- ›GCP Cloud Run Concurrency: Handles up to 1,000 concurrent requests per instance, significantly lowering per-request costs.
- ›Azure Flex Consumption: The new 2026 standard for Azure, offering sub-second scaling and integrated VNet support without a premium plan requirement.
- ›Ephemeral Storage Limits: All three providers now offer up to 10GB of /tmp space, but pricing varies from $0.0000308 per GB-hour.
As we move through 2026, the serverless landscape has evolved beyond simple event-driven functions into a sophisticated ecosystem of container-native execution and AI-optimized runtimes. Architects are no longer just choosing a provider based on language support, but on granular pricing tiers, cold-start latency mitigation, and global distribution quotas. This cheat sheet provides a high-fidelity comparison of the 'Big Three' providers, focusing on the latest 2026 pricing models and resource limitations to help you optimize your infrastructure spend.
Pricing Comparison Matrix (2026)
Understanding the unit costs is only the first step. The real complexity lies in how vCPU and Memory are coupled across different providers.
| Dimension | AWS Lambda (Graviton3) | GCP Cloud Run (Gen 2) | Azure Functions (Flex) | Edge |
|---|---|---|---|---|
| Base Compute | $0.00001333 / GB-s | $0.0000178 / vCPU-s | $0.000016 / GB-s | AWS (ARM) |
| Request Fee | $0.20 per 1M | No charge (Compute only) | $0.20 per 1M | GCP |
| Free Tier | 1M req / 400k GB-s | 180k vCPU-s / 360k GB-s | 1M req / 400k GB-s | Tied |
| Concurrency | 1 req per instance* | Up to 1000 per instance | Variable (Dynamic) | GCP |
*Unless using Lambda Runtime Extensions or specific async patterns.
Bottom Line
For high-concurrency web applications or AI inference services, Google Cloud Run is almost always more cost-effective in 2026 due to its ability to pack multiple requests into a single vCPU/Memory allocation. However, for deep integration with S3/EventBridge or high-volume background processing, AWS Lambda on Graviton remains the performance-per-dollar king.
Execution & Resource Quotas
Quotas define the architectural boundaries of your application. Pushing these limits often requires moving to provisioned capacity or dedicated instances.
- Maximum Execution Timeout:
- AWS Lambda: 15 minutes (Hard limit).
- GCP Cloud Run: 60 minutes (Configurable up to 3600s).
- Azure Functions: 10 minutes (Consumption), Unlimited (Premium/App Service).
- Memory Limits:
- AWS Lambda: 128MB to 10,240MB (Allocates vCPU proportionally).
- GCP Cloud Run: 512MB to 32GB (Can decouple vCPU from Memory).
- Azure Functions: Up to 4GB (Consumption), much higher on Flex/Premium.
- Payload Limits: Standardized across most providers at 6MB for synchronous invocations and 256KB for asynchronous events. When processing larger datasets, always use signed URLs with S3 or GCS to avoid payload bottlenecks. For security-sensitive data processing, consider utilizing a Data Masking Tool before logging or storing serverless payloads.
CLI Quick Reference
Commands grouped by common management tasks for the 2026 toolsets.
Resource Inspection
# AWS: Check concurrency limits for a specific region
aws lambda get-account-settings --region us-east-1
# GCP: List Cloud Run services with resource usage
gcloud run services list --format="table(name,status.address,spec.template.spec.containers[0].resources.limits)"
# Azure: Get function app settings
az functionapp config show --name my-app --resource-group my-group
Cost-Optimized Deployment Flags
# Deploy AWS Lambda on ARM64 (Graviton3) for 20% better price/perf
aws lambda create-function --function-name my-func \
--architectures arm64 --runtime nodejs20.x
# Set GCP Cloud Run concurrency to maximize resource density
gcloud run deploy my-service --concurrency 80 --cpu-boost
Advanced Scaling Config
In 2026, the 'Cold Start' problem is largely solved by three specific features you should be configuring in your serverless.yml or terraform files:
- Provisioned Concurrency (AWS): Keeps a specified number of functions 'warm'. Use Scheduled Scaling to turn this down at night.
- Startup CPU Boost (GCP): Temporarily allocates more CPU during container startup to decrease initialization time by up to 50%.
- Always-on Instances (Azure): Part of the Premium/Flex plans to ensure zero latency for mission-critical endpoints.
Cost Optimization Strategies
- Right-sizing Memory: Use tools like AWS Lambda Power Tuning to find the 'Goldilocks' zone. Doubling memory often halves execution time, resulting in the same cost but better performance.
- Log Retention: By default, CloudWatch and Cloud Logging retain logs forever. Set a 7-day retention policy to avoid 'hidden' storage costs that can exceed execution costs.
- VPC Cold Starts: Ensure you are using the latest ENI-per-VPC (AWS) or Serverless VPC Access (GCP) configurations to avoid the 10-second networking penalty of older architectures.
Frequently Asked Questions
Is AWS Lambda cheaper than Google Cloud Run in 2026? +
What is the 2026 free tier for serverless? +
How do I avoid cold starts for AI inference? +
Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.