Can I run GPU jobs on AWS Fargate Spot?

No for AWS Batch GPU workloads. AWS Batch documents GPU support for EC2-backed compute resources, and the AWS CLI reference states GPUs aren't available for jobs running on Fargate resources. If you need managed scheduling plus GPUs, use EC2 Spot-backed Batch compute environments.

What is the safest default allocation strategy for AWS Batch Spot?

Use SPOT_PRICE_CAPACITY_OPTIMIZED unless you have a specific reason not to. AWS documents it as the recommended strategy because it balances lower interruption risk with price, instead of chasing the absolute cheapest pool.

How do Google Cloud Batch GPU jobs request Spot capacity?

Set provisioningModel to SPOT inside the job's allocation policy and define the GPU in accelerators. If you want Batch to install drivers automatically, set installGpuDrivers to true.

How much eviction notice do Azure Spot VMs give?

Azure documents a minimum 30-second eviction notice. That is enough for lightweight shutdown hooks and checkpoint signals, but not enough to rely on large in-memory flushes, so design recovery outside the node.

Cloud Cost Management [2026] Serverless GPU Spot Cheat Sheet

As of May 3, 2026, the economics are clear: GPU spot capacity is still the fastest path to cheaper training, batch inference, rendering, and ETL, but only if your platform can survive interruption without turning retries into waste. This cheat sheet compresses the current AWS, Google Cloud, and Azure guidance into an operator-focused reference: what to choose, which flags matter, how to wire the configs, and how to keep the document interactive with a live filter, copyable snippets, and a sticky table of contents.

AWS Spot offers savings of up to 90% versus On-Demand, with a 2-minute interruption notice.
Google Cloud Spot VMs discount many machine types and GPUs by 60-91%.
Azure Spot VMs can be up to 90% cheaper, with a minimum 30-second eviction notice.
For GPU jobs, managed batch schedulers beat hand-rolled autoscaling when your workload is already queue-driven.

Decision Matrix

“Serverless GPU spot” usually means a managed scheduler or job service placing work onto interruptible GPU-backed nodes for you. That makes AWS Batch and Google Cloud Batch the cleanest defaults. Azure still has strong spot economics, but the operating model is more VM-centric for many GPU workloads.

Bottom Line

Use a managed batch plane first, then optimize across GPU families and zones. The winning pattern is mixed capacity + checkpointing + interruption-aware retries, not chasing the single lowest hourly price.

Platform	Spot savings signal	Interruption signal	GPU path	Best use
AWS	Up to 90% vs On-Demand	2-minute notice	AWS Batch on EC2 Spot-backed compute environments	Queue-driven training, batch inference, rendering
Google Cloud	60-91% off for many machine types and GPUs	Preemption can happen any time	Cloud Batch with provisioningModel: SPOT	GPU batch jobs, zonal diversification, simple JSON/YAML workflows
Azure	Up to 90% cheaper than pay-as-you-go	Minimum 30-second eviction notice	Spot GPU VMs or VM scale sets	Interruptible GPU pools where VM-level control is acceptable

Choose managed batch when

Your workload already enters the system through a queue or job spec.
You can split work into shards, scenes, prompts, files, or epochs.
You want retries, logs, and scheduling without maintaining a full Kubernetes control plane.

Avoid all-spot designs when

You have strict end-user latency targets.
You cannot checkpoint model state or partial outputs.
Capacity spikes are deadline-bound and missed work is more expensive than on-demand compute.

Commands by Purpose

Keep these grouped by operator intent: discover, provision, submit, inspect, and clean up. Before sharing logs, billing exports, or account-scoped snippets in tickets, scrub them with the Data Masking Tool.

Discover capacity

aws ec2 describe-spot-price-history \
  --instance-types g5.xlarge g6.xlarge \
  --product-descriptions Linux/UNIX \
  --start-time 2026-05-03T00:00:00Z

gcloud compute accelerator-types list \
  --zones=us-central1-a

Provision the scheduling layer

aws batch create-compute-environment \
  --compute-environment-name gpu-spot-ce \
  --type MANAGED \
  --state ENABLED \
  --service-role AWSBatchServiceRole \
  --compute-resources type=SPOT,allocationStrategy=SPOT_PRICE_CAPACITY_OPTIMIZED,minvCpus=0,maxvCpus=256,instanceTypes=g5,g6,subnets=subnet-AAA,securityGroupIds=sg-BBB,instanceRole=ecsInstanceRole,spotIamFleetRole=AmazonEC2SpotFleetTaggingRole

az group create -n gpu-spot-rg -l eastus
az vm create \
  --resource-group gpu-spot-rg \
  --name gpu-spot-vm \
  --image Ubuntu2204 \
  --size GPU_VM_SIZE \
  --admin-username azureuser \
  --generate-ssh-keys \
  --priority Spot \
  --max-price -1 \
  --eviction-policy Deallocate

Submit GPU work

aws batch submit-job \
  --job-name gpu-inference-spot \
  --job-queue gpu-queue \
  --job-definition gpu-jobdef \
  --container-overrides resourceRequirements=[{type=GPU,value=1},{type=VCPU,value=8},{type=MEMORY,value=32768}]

gcloud batch jobs submit gpu-spot-job \
  --location=us-central1 \
  --config=job.json

Inspect and re-queue

aws batch describe-jobs --jobs JOB_ID

gcloud batch jobs describe gpu-spot-job --location=us-central1

Clean up idle cost

aws batch update-compute-environment \
  --compute-environment gpu-spot-ce \
  --state DISABLED

gcloud batch jobs delete gpu-spot-job --location=us-central1

Watch out: AWS Batch supports GPUs on EC2-backed compute resources, not on Fargate. If your reference architecture says “serverless GPU” and “Fargate” in the same sentence, it needs a correction.

Configuration Patterns

The safest pattern is a small, explicit config surface: multiple GPU families, interruption-tolerant queues, and no hidden defaults around capacity type.

AWS Batch Spot GPU compute environment

{
  "computeEnvironmentName": "gpu-spot-ce",
  "type": "MANAGED",
  "state": "ENABLED",
  "serviceRole": "AWSBatchServiceRole",
  "computeResources": {
    "type": "SPOT",
    "allocationStrategy": "SPOT_PRICE_CAPACITY_OPTIMIZED",
    "minvCpus": 0,
    "maxvCpus": 256,
    "instanceTypes": ["g5", "g6"],
    "subnets": ["subnet-AAA"],
    "securityGroupIds": ["sg-BBB"],
    "instanceRole": "ecsInstanceRole",
    "spotIamFleetRole": "AmazonEC2SpotFleetTaggingRole"
  }
}

Prefer families over a single SKU so the scheduler has room to move.
Use g5 and g6 only if both fit your driver and CUDA constraints.
Keep minvCpus at zero for purely opportunistic queues.

Google Cloud Batch Spot GPU job

{
  "taskGroups": [
    {
      "taskSpec": {
        "runnables": [
          {
            "container": {
              "imageUri": "us-docker.pkg.dev/PROJECT_ID/images/gpu-worker:latest"
            }
          }
        ]
      },
      "taskCount": 8,
      "parallelism": 4
    }
  ],
  "allocationPolicy": {
    "instances": [
      {
        "installGpuDrivers": true,
        "policy": {
          "provisioningModel": "SPOT",
          "reservation": "NO_RESERVATION",
          "accelerators": [
            {
              "type": "nvidia-tesla-t4",
              "count": 1
            }
          ]
        }
      }
    ],
    "location": {
      "allowedLocations": ["regions/us-central1"]
    }
  },
  "logsPolicy": {
    "destination": "CLOUD_LOGGING"
  }
}

Set installGpuDrivers when you want Batch to install the required drivers for the specified GPU type.
Use allowedLocations to restrict scheduling to regions that actually expose the accelerator you want.
Block reservations explicitly with NO_RESERVATION when you want clean spot behavior.

Operator defaults that pay off

Checkpoint model state, scene progress, or shard offsets to object storage.
Make outputs idempotent so replayed tasks overwrite or skip cleanly.
Keep retry logic in the job layer, not only in application code.

Cheat sheets get used under pressure. Add a tiny client-side filter so readers can jump to commands by provider, task, or keyword.

Filter this cheat sheet

Tag	Snippet	Why it matters
AWS	`create-compute-environment`	Creates the spot-backed scheduler pool.
AWS	`submit-job`	Overrides GPU, CPU, and memory at submit time.
GCP	`gcloud batch jobs submit`	Submits a Spot GPU job from JSON or YAML.
Azure	`az vm create --priority Spot`	Fast path for an interruptible GPU worker.
Ops	`DISABLED` or delete	Stops idle cost from lingering after tests.

const input = document.getElementById('gpu-spot-filter');
const rows = [...document.querySelectorAll('[data-filter-row]')];

function runFilter(value) {
  const q = value.trim().toLowerCase();
  rows.forEach((row) => {
    row.style.display = row.dataset.filterRow.includes(q) ? '' : 'none';
  });
}

input?.addEventListener('input', (e) => runFilter(e.target.value));

document.addEventListener('keydown', (e) => {
  if (e.key === '/') {
    e.preventDefault();
    input?.focus();
  }
  if (e.key.toLowerCase() === 'g') {
    document.getElementById('commands-by-purpose')?.scrollIntoView({behavior: 'smooth'});
  }
  if (e.key.toLowerCase() === 'c') {
    document.getElementById('config-patterns')?.scrollIntoView({behavior: 'smooth'});
  }
  if (e.key === 'Escape') {
    input.value = '';
    runFilter('');
    input.blur();
  }
});

Keyboard shortcuts

Shortcut	Action	Use case
`/`	Focus filter	Jump into search without touching the mouse.
`g`	Go to commands	Fast access during incident response or rollout.
`c`	Go to configuration	Open the JSON patterns immediately.
`Esc`	Clear filter	Reset hidden rows and exit search.

Advanced Usage

Once the basic queue works, the remaining savings come from reducing wasted retries and improving scheduling flexibility.

Capacity strategy

Split work into a spot lane and a deadline lane instead of forcing every task onto interruptible capacity.
Diversify across GPU families and regions when model requirements allow it.
Use one queue for cheap opportunistic work and another for recovery or SLA-sensitive replay.

Retry and checkpoint design

Checkpoint by completed batch, shard, epoch, or frame range.
Write deterministic output paths so retries do not duplicate artifacts.
Persist retry metadata outside the worker node so eviction does not erase progress history.

Cost telemetry that actually matters

Track successful work per dollar, not only raw hourly price.
Measure interruption rate by GPU family and zone.
Separate queue wait time from execution time so cheap capacity does not hide slow delivery.
Correlate requeues with specific capacity pools before expanding them.

Pro tip: For GPU batch inference, the highest-leverage optimization is often smaller task granularity, not a different instance family. Shorter tasks reduce the amount of work you lose per eviction and make spot volatility cheaper to absorb.

Practical guardrails

Cap per-job runtime if your platform struggles with very large checkpoint intervals.
Do not treat historical spot price alone as a placement policy.
Budget for a controlled on-demand fallback instead of emergency ad hoc recovery.

Cloud Cost Management [2026] Serverless GPU Spot Cheat Sheet

Bottom Line

Decision Matrix

Bottom Line

Choose managed batch when

Avoid all-spot designs when

Commands by Purpose

Discover capacity

Provision the scheduling layer

Submit GPU work

Inspect and re-queue

Clean up idle cost

Configuration Patterns

AWS Batch Spot GPU compute environment

Google Cloud Batch Spot GPU job

Operator defaults that pay off

Live Search + Shortcuts

Keyboard shortcuts

Advanced Usage

Capacity strategy

Retry and checkpoint design

Cost telemetry that actually matters

Practical guardrails

Frequently Asked Questions

Get Engineering Deep-Dives in Your Inbox

Related Deep-Dives

AWS Batch vs Kubernetes for ML Jobs

GPU Inference Cost Optimization Playbook

Cloud Billing Alerts That Catch Real Regressions