Home Posts Cloud Cost Management [2026] Serverless GPU Spot Cheat Sheet
Developer Reference

Cloud Cost Management [2026] Serverless GPU Spot Cheat Sheet

Cloud Cost Management [2026] Serverless GPU Spot Cheat Sheet
Dillip Chowdary
Dillip Chowdary
Tech Entrepreneur & Innovator · May 03, 2026 · 11 min read

Bottom Line

The cheapest GPU hour is the one you can interrupt safely. Treat spot capacity as a scheduling strategy, not a discount code: diversify pools, checkpoint aggressively, and keep a small on-demand escape hatch.

Key Takeaways

  • AWS Spot can save up to 90%; Google Cloud Spot discounts are 60-91%; Azure Spot can be up to 90% cheaper.
  • For AWS Batch Spot, AWS recommends SPOTPRICECAPACITY_OPTIMIZED over older allocation strategies.
  • Google Cloud Batch GPU jobs use provisioningModel: SPOT plus accelerators in the allocation policy.
  • Azure Spot VMs can be evicted with a minimum 30-second notice; AWS Spot gives a 2-minute interruption notice.
  • Real savings come from retries, checkpoints, and mixed-capacity queues, not from picking a single cheap SKU.

As of May 3, 2026, the economics are clear: GPU spot capacity is still the fastest path to cheaper training, batch inference, rendering, and ETL, but only if your platform can survive interruption without turning retries into waste. This cheat sheet compresses the current AWS, Google Cloud, and Azure guidance into an operator-focused reference: what to choose, which flags matter, how to wire the configs, and how to keep the document interactive with a live filter, copyable snippets, and a sticky table of contents.

  • AWS Spot offers savings of up to 90% versus On-Demand, with a 2-minute interruption notice.
  • Google Cloud Spot VMs discount many machine types and GPUs by 60-91%.
  • Azure Spot VMs can be up to 90% cheaper, with a minimum 30-second eviction notice.
  • For GPU jobs, managed batch schedulers beat hand-rolled autoscaling when your workload is already queue-driven.

Decision Matrix

“Serverless GPU spot” usually means a managed scheduler or job service placing work onto interruptible GPU-backed nodes for you. That makes AWS Batch and Google Cloud Batch the cleanest defaults. Azure still has strong spot economics, but the operating model is more VM-centric for many GPU workloads.

Bottom Line

Use a managed batch plane first, then optimize across GPU families and zones. The winning pattern is mixed capacity + checkpointing + interruption-aware retries, not chasing the single lowest hourly price.

PlatformSpot savings signalInterruption signalGPU pathBest use
AWSUp to 90% vs On-Demand2-minute noticeAWS Batch on EC2 Spot-backed compute environmentsQueue-driven training, batch inference, rendering
Google Cloud60-91% off for many machine types and GPUsPreemption can happen any timeCloud Batch with provisioningModel: SPOTGPU batch jobs, zonal diversification, simple JSON/YAML workflows
AzureUp to 90% cheaper than pay-as-you-goMinimum 30-second eviction noticeSpot GPU VMs or VM scale setsInterruptible GPU pools where VM-level control is acceptable

Choose managed batch when

  • Your workload already enters the system through a queue or job spec.
  • You can split work into shards, scenes, prompts, files, or epochs.
  • You want retries, logs, and scheduling without maintaining a full Kubernetes control plane.

Avoid all-spot designs when

  • You have strict end-user latency targets.
  • You cannot checkpoint model state or partial outputs.
  • Capacity spikes are deadline-bound and missed work is more expensive than on-demand compute.

Commands by Purpose

Keep these grouped by operator intent: discover, provision, submit, inspect, and clean up. Before sharing logs, billing exports, or account-scoped snippets in tickets, scrub them with the Data Masking Tool.

Discover capacity

aws ec2 describe-spot-price-history \
  --instance-types g5.xlarge g6.xlarge \
  --product-descriptions Linux/UNIX \
  --start-time 2026-05-03T00:00:00Z
gcloud compute accelerator-types list \
  --zones=us-central1-a

Provision the scheduling layer

aws batch create-compute-environment \
  --compute-environment-name gpu-spot-ce \
  --type MANAGED \
  --state ENABLED \
  --service-role AWSBatchServiceRole \
  --compute-resources type=SPOT,allocationStrategy=SPOT_PRICE_CAPACITY_OPTIMIZED,minvCpus=0,maxvCpus=256,instanceTypes=g5,g6,subnets=subnet-AAA,securityGroupIds=sg-BBB,instanceRole=ecsInstanceRole,spotIamFleetRole=AmazonEC2SpotFleetTaggingRole
az group create -n gpu-spot-rg -l eastus
az vm create \
  --resource-group gpu-spot-rg \
  --name gpu-spot-vm \
  --image Ubuntu2204 \
  --size GPU_VM_SIZE \
  --admin-username azureuser \
  --generate-ssh-keys \
  --priority Spot \
  --max-price -1 \
  --eviction-policy Deallocate

Submit GPU work

aws batch submit-job \
  --job-name gpu-inference-spot \
  --job-queue gpu-queue \
  --job-definition gpu-jobdef \
  --container-overrides resourceRequirements=[{type=GPU,value=1},{type=VCPU,value=8},{type=MEMORY,value=32768}]
gcloud batch jobs submit gpu-spot-job \
  --location=us-central1 \
  --config=job.json

Inspect and re-queue

aws batch describe-jobs --jobs JOB_ID
gcloud batch jobs describe gpu-spot-job --location=us-central1

Clean up idle cost

aws batch update-compute-environment \
  --compute-environment gpu-spot-ce \
  --state DISABLED
gcloud batch jobs delete gpu-spot-job --location=us-central1
Watch out: AWS Batch supports GPUs on EC2-backed compute resources, not on Fargate. If your reference architecture says “serverless GPU” and “Fargate” in the same sentence, it needs a correction.

Configuration Patterns

The safest pattern is a small, explicit config surface: multiple GPU families, interruption-tolerant queues, and no hidden defaults around capacity type.

AWS Batch Spot GPU compute environment

{
  "computeEnvironmentName": "gpu-spot-ce",
  "type": "MANAGED",
  "state": "ENABLED",
  "serviceRole": "AWSBatchServiceRole",
  "computeResources": {
    "type": "SPOT",
    "allocationStrategy": "SPOT_PRICE_CAPACITY_OPTIMIZED",
    "minvCpus": 0,
    "maxvCpus": 256,
    "instanceTypes": ["g5", "g6"],
    "subnets": ["subnet-AAA"],
    "securityGroupIds": ["sg-BBB"],
    "instanceRole": "ecsInstanceRole",
    "spotIamFleetRole": "AmazonEC2SpotFleetTaggingRole"
  }
}
  • Prefer families over a single SKU so the scheduler has room to move.
  • Use g5 and g6 only if both fit your driver and CUDA constraints.
  • Keep minvCpus at zero for purely opportunistic queues.

Google Cloud Batch Spot GPU job

{
  "taskGroups": [
    {
      "taskSpec": {
        "runnables": [
          {
            "container": {
              "imageUri": "us-docker.pkg.dev/PROJECT_ID/images/gpu-worker:latest"
            }
          }
        ]
      },
      "taskCount": 8,
      "parallelism": 4
    }
  ],
  "allocationPolicy": {
    "instances": [
      {
        "installGpuDrivers": true,
        "policy": {
          "provisioningModel": "SPOT",
          "reservation": "NO_RESERVATION",
          "accelerators": [
            {
              "type": "nvidia-tesla-t4",
              "count": 1
            }
          ]
        }
      }
    ],
    "location": {
      "allowedLocations": ["regions/us-central1"]
    }
  },
  "logsPolicy": {
    "destination": "CLOUD_LOGGING"
  }
}
  • Set installGpuDrivers when you want Batch to install the required drivers for the specified GPU type.
  • Use allowedLocations to restrict scheduling to regions that actually expose the accelerator you want.
  • Block reservations explicitly with NO_RESERVATION when you want clean spot behavior.

Operator defaults that pay off

  • Checkpoint model state, scene progress, or shard offsets to object storage.
  • Make outputs idempotent so replayed tasks overwrite or skip cleanly.
  • Keep retry logic in the job layer, not only in application code.

Live Search + Shortcuts

Cheat sheets get used under pressure. Add a tiny client-side filter so readers can jump to commands by provider, task, or keyword.

TagSnippetWhy it matters
AWScreate-compute-environmentCreates the spot-backed scheduler pool.
AWSsubmit-jobOverrides GPU, CPU, and memory at submit time.
GCPgcloud batch jobs submitSubmits a Spot GPU job from JSON or YAML.
Azureaz vm create --priority SpotFast path for an interruptible GPU worker.
OpsDISABLED or deleteStops idle cost from lingering after tests.
const input = document.getElementById('gpu-spot-filter');
const rows = [...document.querySelectorAll('[data-filter-row]')];

function runFilter(value) {
  const q = value.trim().toLowerCase();
  rows.forEach((row) => {
    row.style.display = row.dataset.filterRow.includes(q) ? '' : 'none';
  });
}

input?.addEventListener('input', (e) => runFilter(e.target.value));

document.addEventListener('keydown', (e) => {
  if (e.key === '/') {
    e.preventDefault();
    input?.focus();
  }
  if (e.key.toLowerCase() === 'g') {
    document.getElementById('commands-by-purpose')?.scrollIntoView({behavior: 'smooth'});
  }
  if (e.key.toLowerCase() === 'c') {
    document.getElementById('config-patterns')?.scrollIntoView({behavior: 'smooth'});
  }
  if (e.key === 'Escape') {
    input.value = '';
    runFilter('');
    input.blur();
  }
});

Keyboard shortcuts

ShortcutActionUse case
/Focus filterJump into search without touching the mouse.
gGo to commandsFast access during incident response or rollout.
cGo to configurationOpen the JSON patterns immediately.
EscClear filterReset hidden rows and exit search.

Advanced Usage

Once the basic queue works, the remaining savings come from reducing wasted retries and improving scheduling flexibility.

Capacity strategy

  • Split work into a spot lane and a deadline lane instead of forcing every task onto interruptible capacity.
  • Diversify across GPU families and regions when model requirements allow it.
  • Use one queue for cheap opportunistic work and another for recovery or SLA-sensitive replay.

Retry and checkpoint design

  • Checkpoint by completed batch, shard, epoch, or frame range.
  • Write deterministic output paths so retries do not duplicate artifacts.
  • Persist retry metadata outside the worker node so eviction does not erase progress history.

Cost telemetry that actually matters

  • Track successful work per dollar, not only raw hourly price.
  • Measure interruption rate by GPU family and zone.
  • Separate queue wait time from execution time so cheap capacity does not hide slow delivery.
  • Correlate requeues with specific capacity pools before expanding them.
Pro tip: For GPU batch inference, the highest-leverage optimization is often smaller task granularity, not a different instance family. Shorter tasks reduce the amount of work you lose per eviction and make spot volatility cheaper to absorb.

Practical guardrails

  • Cap per-job runtime if your platform struggles with very large checkpoint intervals.
  • Do not treat historical spot price alone as a placement policy.
  • Budget for a controlled on-demand fallback instead of emergency ad hoc recovery.

Frequently Asked Questions

Can I run GPU jobs on AWS Fargate Spot? +
No for AWS Batch GPU workloads. AWS Batch documents GPU support for EC2-backed compute resources, and the AWS CLI reference states GPUs aren't available for jobs running on Fargate resources. If you need managed scheduling plus GPUs, use EC2 Spot-backed Batch compute environments.
What is the safest default allocation strategy for AWS Batch Spot? +
Use SPOTPRICECAPACITY_OPTIMIZED unless you have a specific reason not to. AWS documents it as the recommended strategy because it balances lower interruption risk with price, instead of chasing the absolute cheapest pool.
How do Google Cloud Batch GPU jobs request Spot capacity? +
Set provisioningModel to SPOT inside the job's allocation policy and define the GPU in accelerators. If you want Batch to install drivers automatically, set installGpuDrivers to true.
How much eviction notice do Azure Spot VMs give? +
Azure documents a minimum 30-second eviction notice. That is enough for lightweight shutdown hooks and checkpoint signals, but not enough to rely on large in-memory flushes, so design recovery outside the node.

Get Engineering Deep-Dives in Your Inbox

Weekly breakdowns of architecture, security, and developer tooling — no fluff.

Found this useful? Share it.