Cloud Cost Management [2026] Serverless GPU Spot Cheat Sheet
Bottom Line
The cheapest GPU hour is the one you can interrupt safely. Treat spot capacity as a scheduling strategy, not a discount code: diversify pools, checkpoint aggressively, and keep a small on-demand escape hatch.
Key Takeaways
- ›AWS Spot can save up to 90%; Google Cloud Spot discounts are 60-91%; Azure Spot can be up to 90% cheaper.
- ›For AWS Batch Spot, AWS recommends SPOTPRICECAPACITY_OPTIMIZED over older allocation strategies.
- ›Google Cloud Batch GPU jobs use provisioningModel: SPOT plus accelerators in the allocation policy.
- ›Azure Spot VMs can be evicted with a minimum 30-second notice; AWS Spot gives a 2-minute interruption notice.
- ›Real savings come from retries, checkpoints, and mixed-capacity queues, not from picking a single cheap SKU.
As of May 3, 2026, the economics are clear: GPU spot capacity is still the fastest path to cheaper training, batch inference, rendering, and ETL, but only if your platform can survive interruption without turning retries into waste. This cheat sheet compresses the current AWS, Google Cloud, and Azure guidance into an operator-focused reference: what to choose, which flags matter, how to wire the configs, and how to keep the document interactive with a live filter, copyable snippets, and a sticky table of contents.
- AWS Spot offers savings of up to 90% versus On-Demand, with a 2-minute interruption notice.
- Google Cloud Spot VMs discount many machine types and GPUs by 60-91%.
- Azure Spot VMs can be up to 90% cheaper, with a minimum 30-second eviction notice.
- For GPU jobs, managed batch schedulers beat hand-rolled autoscaling when your workload is already queue-driven.
Decision Matrix
“Serverless GPU spot” usually means a managed scheduler or job service placing work onto interruptible GPU-backed nodes for you. That makes AWS Batch and Google Cloud Batch the cleanest defaults. Azure still has strong spot economics, but the operating model is more VM-centric for many GPU workloads.
Bottom Line
Use a managed batch plane first, then optimize across GPU families and zones. The winning pattern is mixed capacity + checkpointing + interruption-aware retries, not chasing the single lowest hourly price.
| Platform | Spot savings signal | Interruption signal | GPU path | Best use |
|---|---|---|---|---|
| AWS | Up to 90% vs On-Demand | 2-minute notice | AWS Batch on EC2 Spot-backed compute environments | Queue-driven training, batch inference, rendering |
| Google Cloud | 60-91% off for many machine types and GPUs | Preemption can happen any time | Cloud Batch with provisioningModel: SPOT | GPU batch jobs, zonal diversification, simple JSON/YAML workflows |
| Azure | Up to 90% cheaper than pay-as-you-go | Minimum 30-second eviction notice | Spot GPU VMs or VM scale sets | Interruptible GPU pools where VM-level control is acceptable |
Choose managed batch when
- Your workload already enters the system through a queue or job spec.
- You can split work into shards, scenes, prompts, files, or epochs.
- You want retries, logs, and scheduling without maintaining a full Kubernetes control plane.
Avoid all-spot designs when
- You have strict end-user latency targets.
- You cannot checkpoint model state or partial outputs.
- Capacity spikes are deadline-bound and missed work is more expensive than on-demand compute.
Commands by Purpose
Keep these grouped by operator intent: discover, provision, submit, inspect, and clean up. Before sharing logs, billing exports, or account-scoped snippets in tickets, scrub them with the Data Masking Tool.
Discover capacity
aws ec2 describe-spot-price-history \
--instance-types g5.xlarge g6.xlarge \
--product-descriptions Linux/UNIX \
--start-time 2026-05-03T00:00:00Zgcloud compute accelerator-types list \
--zones=us-central1-aProvision the scheduling layer
aws batch create-compute-environment \
--compute-environment-name gpu-spot-ce \
--type MANAGED \
--state ENABLED \
--service-role AWSBatchServiceRole \
--compute-resources type=SPOT,allocationStrategy=SPOT_PRICE_CAPACITY_OPTIMIZED,minvCpus=0,maxvCpus=256,instanceTypes=g5,g6,subnets=subnet-AAA,securityGroupIds=sg-BBB,instanceRole=ecsInstanceRole,spotIamFleetRole=AmazonEC2SpotFleetTaggingRoleaz group create -n gpu-spot-rg -l eastus
az vm create \
--resource-group gpu-spot-rg \
--name gpu-spot-vm \
--image Ubuntu2204 \
--size GPU_VM_SIZE \
--admin-username azureuser \
--generate-ssh-keys \
--priority Spot \
--max-price -1 \
--eviction-policy DeallocateSubmit GPU work
aws batch submit-job \
--job-name gpu-inference-spot \
--job-queue gpu-queue \
--job-definition gpu-jobdef \
--container-overrides resourceRequirements=[{type=GPU,value=1},{type=VCPU,value=8},{type=MEMORY,value=32768}]gcloud batch jobs submit gpu-spot-job \
--location=us-central1 \
--config=job.jsonInspect and re-queue
aws batch describe-jobs --jobs JOB_IDgcloud batch jobs describe gpu-spot-job --location=us-central1Clean up idle cost
aws batch update-compute-environment \
--compute-environment gpu-spot-ce \
--state DISABLEDgcloud batch jobs delete gpu-spot-job --location=us-central1Configuration Patterns
The safest pattern is a small, explicit config surface: multiple GPU families, interruption-tolerant queues, and no hidden defaults around capacity type.
AWS Batch Spot GPU compute environment
{
"computeEnvironmentName": "gpu-spot-ce",
"type": "MANAGED",
"state": "ENABLED",
"serviceRole": "AWSBatchServiceRole",
"computeResources": {
"type": "SPOT",
"allocationStrategy": "SPOT_PRICE_CAPACITY_OPTIMIZED",
"minvCpus": 0,
"maxvCpus": 256,
"instanceTypes": ["g5", "g6"],
"subnets": ["subnet-AAA"],
"securityGroupIds": ["sg-BBB"],
"instanceRole": "ecsInstanceRole",
"spotIamFleetRole": "AmazonEC2SpotFleetTaggingRole"
}
}- Prefer families over a single SKU so the scheduler has room to move.
- Use g5 and g6 only if both fit your driver and CUDA constraints.
- Keep minvCpus at zero for purely opportunistic queues.
Google Cloud Batch Spot GPU job
{
"taskGroups": [
{
"taskSpec": {
"runnables": [
{
"container": {
"imageUri": "us-docker.pkg.dev/PROJECT_ID/images/gpu-worker:latest"
}
}
]
},
"taskCount": 8,
"parallelism": 4
}
],
"allocationPolicy": {
"instances": [
{
"installGpuDrivers": true,
"policy": {
"provisioningModel": "SPOT",
"reservation": "NO_RESERVATION",
"accelerators": [
{
"type": "nvidia-tesla-t4",
"count": 1
}
]
}
}
],
"location": {
"allowedLocations": ["regions/us-central1"]
}
},
"logsPolicy": {
"destination": "CLOUD_LOGGING"
}
}- Set installGpuDrivers when you want Batch to install the required drivers for the specified GPU type.
- Use allowedLocations to restrict scheduling to regions that actually expose the accelerator you want.
- Block reservations explicitly with NO_RESERVATION when you want clean spot behavior.
Operator defaults that pay off
- Checkpoint model state, scene progress, or shard offsets to object storage.
- Make outputs idempotent so replayed tasks overwrite or skip cleanly.
- Keep retry logic in the job layer, not only in application code.
Live Search + Shortcuts
Cheat sheets get used under pressure. Add a tiny client-side filter so readers can jump to commands by provider, task, or keyword.
| Tag | Snippet | Why it matters |
|---|---|---|
| AWS | create-compute-environment | Creates the spot-backed scheduler pool. |
| AWS | submit-job | Overrides GPU, CPU, and memory at submit time. |
| GCP | gcloud batch jobs submit | Submits a Spot GPU job from JSON or YAML. |
| Azure | az vm create --priority Spot | Fast path for an interruptible GPU worker. |
| Ops | DISABLED or delete | Stops idle cost from lingering after tests. |
const input = document.getElementById('gpu-spot-filter');
const rows = [...document.querySelectorAll('[data-filter-row]')];
function runFilter(value) {
const q = value.trim().toLowerCase();
rows.forEach((row) => {
row.style.display = row.dataset.filterRow.includes(q) ? '' : 'none';
});
}
input?.addEventListener('input', (e) => runFilter(e.target.value));
document.addEventListener('keydown', (e) => {
if (e.key === '/') {
e.preventDefault();
input?.focus();
}
if (e.key.toLowerCase() === 'g') {
document.getElementById('commands-by-purpose')?.scrollIntoView({behavior: 'smooth'});
}
if (e.key.toLowerCase() === 'c') {
document.getElementById('config-patterns')?.scrollIntoView({behavior: 'smooth'});
}
if (e.key === 'Escape') {
input.value = '';
runFilter('');
input.blur();
}
});Keyboard shortcuts
| Shortcut | Action | Use case |
|---|---|---|
/ | Focus filter | Jump into search without touching the mouse. |
g | Go to commands | Fast access during incident response or rollout. |
c | Go to configuration | Open the JSON patterns immediately. |
Esc | Clear filter | Reset hidden rows and exit search. |
Advanced Usage
Once the basic queue works, the remaining savings come from reducing wasted retries and improving scheduling flexibility.
Capacity strategy
- Split work into a spot lane and a deadline lane instead of forcing every task onto interruptible capacity.
- Diversify across GPU families and regions when model requirements allow it.
- Use one queue for cheap opportunistic work and another for recovery or SLA-sensitive replay.
Retry and checkpoint design
- Checkpoint by completed batch, shard, epoch, or frame range.
- Write deterministic output paths so retries do not duplicate artifacts.
- Persist retry metadata outside the worker node so eviction does not erase progress history.
Cost telemetry that actually matters
- Track successful work per dollar, not only raw hourly price.
- Measure interruption rate by GPU family and zone.
- Separate queue wait time from execution time so cheap capacity does not hide slow delivery.
- Correlate requeues with specific capacity pools before expanding them.
Practical guardrails
- Cap per-job runtime if your platform struggles with very large checkpoint intervals.
- Do not treat historical spot price alone as a placement policy.
- Budget for a controlled on-demand fallback instead of emergency ad hoc recovery.
Frequently Asked Questions
Can I run GPU jobs on AWS Fargate Spot? +
Fargate resources. If you need managed scheduling plus GPUs, use EC2 Spot-backed Batch compute environments.What is the safest default allocation strategy for AWS Batch Spot? +
How do Google Cloud Batch GPU jobs request Spot capacity? +
SPOT inside the job's allocation policy and define the GPU in accelerators. If you want Batch to install drivers automatically, set installGpuDrivers to true.How much eviction notice do Azure Spot VMs give? +
Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.
Related Deep-Dives
AWS Batch vs Kubernetes for ML Jobs
A practical comparison of managed queues, scheduling control, and operational overhead for ML workloads.
AI EngineeringGPU Inference Cost Optimization Playbook
A field guide to batching, quantization, and capacity planning for production inference.
Developer ReferenceCloud Billing Alerts That Catch Real Regressions
How to design spend alerts that map to deployment mistakes instead of generic noise.