Home Posts [Cheat Sheet] 2026 Serverless Pricing & Quotas: AWS vs GCP v
Cloud Infrastructure

[Cheat Sheet] 2026 Serverless Pricing & Quotas: AWS vs GCP vs Azure

[Cheat Sheet] 2026 Serverless Pricing & Quotas: AWS vs GCP vs Azure
Dillip Chowdary
Dillip Chowdary
Tech Entrepreneur & Innovator · April 30, 2026 · 12 min read

Bottom Line

While AWS Lambda continues to dominate the bursty function-as-a-service market, Google Cloud Run has become the 2026 leader for concurrency-heavy workloads and AI inference due to its superior container-native scaling and resource efficiency.

Key Takeaways

  • AWS Lambda Tiered Pricing: Savings of up to 20% automatically kick in after the first 6B GB-seconds per month.
  • GCP Cloud Run Concurrency: Handles up to 1,000 concurrent requests per instance, significantly lowering per-request costs.
  • Azure Flex Consumption: The new 2026 standard for Azure, offering sub-second scaling and integrated VNet support without a premium plan requirement.
  • Ephemeral Storage Limits: All three providers now offer up to 10GB of /tmp space, but pricing varies from $0.0000308 per GB-hour.

As we move through 2026, the serverless landscape has evolved beyond simple event-driven functions into a sophisticated ecosystem of container-native execution and AI-optimized runtimes. Architects are no longer just choosing a provider based on language support, but on granular pricing tiers, cold-start latency mitigation, and global distribution quotas. This cheat sheet provides a high-fidelity comparison of the 'Big Three' providers, focusing on the latest 2026 pricing models and resource limitations to help you optimize your infrastructure spend.

Pricing Comparison Matrix (2026)

Understanding the unit costs is only the first step. The real complexity lies in how vCPU and Memory are coupled across different providers.

Dimension AWS Lambda (Graviton3) GCP Cloud Run (Gen 2) Azure Functions (Flex) Edge
Base Compute $0.00001333 / GB-s $0.0000178 / vCPU-s $0.000016 / GB-s AWS (ARM)
Request Fee $0.20 per 1M No charge (Compute only) $0.20 per 1M GCP
Free Tier 1M req / 400k GB-s 180k vCPU-s / 360k GB-s 1M req / 400k GB-s Tied
Concurrency 1 req per instance* Up to 1000 per instance Variable (Dynamic) GCP

*Unless using Lambda Runtime Extensions or specific async patterns.

Bottom Line

For high-concurrency web applications or AI inference services, Google Cloud Run is almost always more cost-effective in 2026 due to its ability to pack multiple requests into a single vCPU/Memory allocation. However, for deep integration with S3/EventBridge or high-volume background processing, AWS Lambda on Graviton remains the performance-per-dollar king.

Execution & Resource Quotas

Quotas define the architectural boundaries of your application. Pushing these limits often requires moving to provisioned capacity or dedicated instances.

  • Maximum Execution Timeout:
    • AWS Lambda: 15 minutes (Hard limit).
    • GCP Cloud Run: 60 minutes (Configurable up to 3600s).
    • Azure Functions: 10 minutes (Consumption), Unlimited (Premium/App Service).
  • Memory Limits:
    • AWS Lambda: 128MB to 10,240MB (Allocates vCPU proportionally).
    • GCP Cloud Run: 512MB to 32GB (Can decouple vCPU from Memory).
    • Azure Functions: Up to 4GB (Consumption), much higher on Flex/Premium.
  • Payload Limits: Standardized across most providers at 6MB for synchronous invocations and 256KB for asynchronous events. When processing larger datasets, always use signed URLs with S3 or GCS to avoid payload bottlenecks. For security-sensitive data processing, consider utilizing a Data Masking Tool before logging or storing serverless payloads.

CLI Quick Reference

Commands grouped by common management tasks for the 2026 toolsets.

Resource Inspection

# AWS: Check concurrency limits for a specific region
aws lambda get-account-settings --region us-east-1

# GCP: List Cloud Run services with resource usage
gcloud run services list --format="table(name,status.address,spec.template.spec.containers[0].resources.limits)"

# Azure: Get function app settings
az functionapp config show --name my-app --resource-group my-group

Cost-Optimized Deployment Flags

# Deploy AWS Lambda on ARM64 (Graviton3) for 20% better price/perf
aws lambda create-function --function-name my-func \
    --architectures arm64 --runtime nodejs20.x

# Set GCP Cloud Run concurrency to maximize resource density
gcloud run deploy my-service --concurrency 80 --cpu-boost

Advanced Scaling Config

In 2026, the 'Cold Start' problem is largely solved by three specific features you should be configuring in your serverless.yml or terraform files:

  1. Provisioned Concurrency (AWS): Keeps a specified number of functions 'warm'. Use Scheduled Scaling to turn this down at night.
  2. Startup CPU Boost (GCP): Temporarily allocates more CPU during container startup to decrease initialization time by up to 50%.
  3. Always-on Instances (Azure): Part of the Premium/Flex plans to ensure zero latency for mission-critical endpoints.
Pro tip: Always use ARM-based runtimes (Graviton for AWS, Tau for GCP). They offer 15-20% lower costs for the exact same duration performance compared to x86_64 across all major serverless providers in 2026.

Cost Optimization Strategies

  • Right-sizing Memory: Use tools like AWS Lambda Power Tuning to find the 'Goldilocks' zone. Doubling memory often halves execution time, resulting in the same cost but better performance.
  • Log Retention: By default, CloudWatch and Cloud Logging retain logs forever. Set a 7-day retention policy to avoid 'hidden' storage costs that can exceed execution costs.
  • VPC Cold Starts: Ensure you are using the latest ENI-per-VPC (AWS) or Serverless VPC Access (GCP) configurations to avoid the 10-second networking penalty of older architectures.

Frequently Asked Questions

Is AWS Lambda cheaper than Google Cloud Run in 2026? +
It depends on concurrency. AWS Lambda is generally cheaper for low-volume, highly bursty tasks where functions execute in isolation. However, if your application handles multiple requests simultaneously, Cloud Run's concurrency model allows you to process up to 1000 requests on a single instance, making it significantly cheaper for high-traffic web apps.
What is the 2026 free tier for serverless? +
All three major providers have maintained a 'forever free' tier. AWS and Azure offer 1 million requests and 400,000 GB-seconds per month. GCP offers 2 million requests (on Cloud Functions) and 180,000 vCPU-seconds on Cloud Run, plus a generous 10GB of egress within the Google network.
How do I avoid cold starts for AI inference? +
For heavy AI models (PyTorch/TensorFlow), use GCP Cloud Run with 'min-instances' set to at least 1 and 'startup-cpu-boost' enabled. This keeps the model loaded in memory and ensures the initialization phase uses maximum CPU, which is critical for 2026 LLM deployments.

Get Engineering Deep-Dives in Your Inbox

Weekly breakdowns of architecture, security, and developer tooling — no fluff.

Found this useful? Share it.