Can AWS Batch mix ARM64 and x86 compute environments in one job queue?

No. AWS Batch documentation states that all compute environments attached to a single queue must share the same architecture. Use separate queues for ARM64 and x86_64 and route jobs intentionally.

Do I need a custom AMI to run Graviton5 jobs on AWS Batch?

Usually no. For EC2-backed Batch environments, AWS can select an ECS_AL2023 AMI automatically, and that is the correct default for new 2026 environments. Use a custom AMI only when you need host-level tuning or extra agents.

Why is my AWS Batch job stuck in RUNNABLE on M9g?

The most common causes are lack of M9g preview access, no matching instance offerings in the Region, an INVALID compute environment, or resource requests that do not fit the permitted instance sizes. Check describe-instance-type-offerings, the compute environment statusReason, and your requested resourceRequirements.

Should I use Spot for Graviton-based batch processing?

Yes for fault-tolerant, restartable jobs. AWS Batch supports Spot compute environments, and AWS recommends SPOT_PRICE_CAPACITY_OPTIMIZED for Spot capacity selection. Avoid Spot if your workload cannot tolerate interruption or if you rely on multi-node parallel jobs.

Graviton5 Batch Orchestration on AWS [Deep Dive 2026]

AWS Batch is already good at scaling embarrassingly parallel work, but many teams still leave efficiency on the table by treating CPU architecture as an afterthought. The better pattern is to make architecture an explicit scheduling boundary. On May 15, 2026, that means pairing AWS Batch with ARM64 containers and Graviton-backed EC2 capacity, then targeting M9g where your account has Graviton5 preview access.

M9g offers up to 25% better compute performance than M8g, according to the AWS product page.
AWS Batch requires all compute environments attached to one queue to share the same architecture.
ECS_AL2023 is the default AMI family for new Batch EC2 environments and the right default for 2026 builds.
If your container is not published for linux/arm64, the fleet design is correct but the job still fails.

Prerequisites

Prerequisites box

An AWS account with AWS Batch, EC2, ECR, and CloudWatch Logs access.
AWS CLI v2, Docker with buildx, and permissions to create an ECR repository.
At least two subnets and one security group for the compute environment.
An ECS instance profile such as ecsInstanceRole, or your own equivalent instance profile ARN.
Access to M9g preview capacity in your Region, or willingness to swap to M8g as a temporary fallback.

Reference the official AWS docs for Amazon EC2 M9g instances, create-compute-environment, and ARM64 ECS workloads. If you want to clean up the JSON payloads you keep in your repo, run them through TechBytes' Code Formatter before committing.

Bottom Line

Use a dedicated ARM64 Batch queue, publish an explicit ARM64 container image, and let Batch launch ECS_AL2023 hosts on Graviton-backed EC2. On May 15, 2026, that is the cleanest way to turn Graviton5 efficiency into predictable batch throughput.

Watch out: AWS documents M9g as a preview family as of December 4, 2025, and the product page still shows preview enrollment on May 15, 2026. Validate Region availability before you wire the family into automation.

Step 1: Check Graviton5 Capacity

Confirm identity and instance availability

Pick a Region and confirm which account you are using.
Query EC2 for M9g offerings before you create any Batch resources.
If the query returns nothing, keep the workflow and replace the instance family with M8g until preview access is enabled.

export AWS_REGION=us-east-1
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

aws ec2 describe-instance-type-offerings \
  --region "$AWS_REGION" \
  --filters "Name=instance-type,Values=m9g.*" \
  --query 'InstanceTypeOfferings[].InstanceType' \
  --output text

This step matters because ARM orchestration is only as good as the capacity pools behind it. AWS also notes that for EC2-backed ARM workloads you should verify Region support with describe-instance-type-offerings before deployment.

Step 2: Build an ARM64 Image

Create a tiny batch workload

Create a minimal program that prints the runtime architecture and current Batch job ID.
Build for linux/arm64, not for your laptop architecture by accident.
Push to Amazon ECR so Batch nodes can pull it directly.

mkdir -p batch-arm64-demo
cd batch-arm64-demo

cat > app.py <<'PY'
import json
import os
import platform

print(json.dumps({
    "arch": platform.machine(),
    "job_id": os.environ.get("AWS_BATCH_JOB_ID"),
    "job_attempt": os.environ.get("AWS_BATCH_JOB_ATTEMPT")
}))
PY

cat > Dockerfile <<'EOF'
FROM python:3.12-slim
WORKDIR /app
COPY app.py .
CMD ["python", "/app/app.py"]
EOF

aws ecr create-repository \
  --repository-name batch-arm64-demo \
  --region "$AWS_REGION"

aws ecr get-login-password --region "$AWS_REGION" | \
  docker login --username AWS --password-stdin \
  "$ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com"

docker buildx create --name armbuilder --use
docker buildx inspect --bootstrap

docker buildx build \
  --platform linux/arm64 \
  -t "$ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/batch-arm64-demo:latest" \
  --push .

Pro tip: Keep the first image tiny and deterministic. You are testing scheduling and architecture alignment here, not application complexity.

Step 3: Create Batch Resources

Create an ARM-only compute environment

AWS Batch states that all compute environments attached to one queue must share the same architecture, so the clean design is one ARM queue for ARM jobs. Also note the current AWS warning: new ECS compute environments should use ECS_AL2023, not AL2.

cat > compute-environment.json <<'JSON'
{
  "computeEnvironmentName": "graviton5-batch-ce",
  "type": "MANAGED",
  "state": "ENABLED",
  "computeResources": {
    "type": "EC2",
    "allocationStrategy": "BEST_FIT_PROGRESSIVE",
    "minvCpus": 0,
    "maxvCpus": 128,
    "instanceTypes": ["m9g.large", "m9g.xlarge", "m9g.2xlarge"],
    "subnets": ["subnet-0123456789abcdef0", "subnet-0123456789abcdef1"],
    "securityGroupIds": ["sg-0123456789abcdef0"],
    "instanceRole": "ecsInstanceRole",
    "ec2Configuration": [
      { "imageType": "ECS_AL2023" }
    ],
    "tags": {
      "Name": "graviton5-batch"
    }
  }
}
JSON

aws batch create-compute-environment \
  --cli-input-json file://compute-environment.json \
  --region "$AWS_REGION"

Create the job queue

export COMPUTE_ENV_ARN=$(aws batch describe-compute-environments \
  --compute-environments graviton5-batch-ce \
  --region "$AWS_REGION" \
  --query 'computeEnvironments[0].computeEnvironmentArn' \
  --output text)

cat > job-queue.json <<JSON
{
  "jobQueueName": "graviton5-batch-queue",
  "state": "ENABLED",
  "priority": 10,
  "computeEnvironmentOrder": [
    {
      "order": 1,
      "computeEnvironment": "$COMPUTE_ENV_ARN"
    }
  ],
  "jobQueueType": "ECS"
}
JSON

aws batch create-job-queue \
  --cli-input-json file://job-queue.json \
  --region "$AWS_REGION"

If you do not have M9g preview access, change every m9g entry in the compute environment file to m8g. Do not mix architectures in the same queue as a workaround; AWS explicitly disallows that queue design.

Step 4: Submit and Verify

Register the job definition

cat > job-definition.json <<JSON
{
  "jobDefinitionName": "arm64-demo",
  "type": "container",
  "platformCapabilities": ["EC2"],
  "containerProperties": {
    "image": "$ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/batch-arm64-demo:latest",
    "command": ["python", "/app/app.py"],
    "resourceRequirements": [
      { "type": "VCPU", "value": "1" },
      { "type": "MEMORY", "value": "2048" }
    ]
  }
}
JSON

aws batch register-job-definition \
  --cli-input-json file://job-definition.json \
  --region "$AWS_REGION"

Submit the job

export JOB_ID=$(aws batch submit-job \
  --job-name arm64-demo-1 \
  --job-queue graviton5-batch-queue \
  --job-definition arm64-demo \
  --region "$AWS_REGION" \
  --query jobId \
  --output text)

echo "$JOB_ID"

Verification and expected output

Wait for the compute environment to become VALID.
Wait for the job status to move from SUBMITTED to RUNNING and then SUCCEEDED.
Check the container output for aarch64.

aws batch describe-compute-environments \
  --compute-environments graviton5-batch-ce \
  --region "$AWS_REGION" \
  --query 'computeEnvironments[0].[status,state,statusReason]' \
  --output table

aws batch describe-jobs \
  --jobs "$JOB_ID" \
  --region "$AWS_REGION" \
  --query 'jobs[0].[status,container.exitCode]' \
  --output table

Expected results:

The compute environment reports VALID and ENABLED.
The job finishes with SUCCEEDED and exit code 0.
Your application output includes "arch": "aarch64", proving the job executed on an ARM64 runtime.

Troubleshooting and What's Next

Troubleshooting top 3

Compute environment is INVALID: Check statusReason, confirm subnet and security group reachability, and verify that your account actually sees M9g offerings in the target Region.
Job is stuck in RUNNABLE: The queue may reference a compute environment that is not yet VALID, your requested vCPU or memory may not fit the allowed instance sizes, or your service quotas may be too low.
Container fails with an architecture error: Rebuild and repush with docker buildx build --platform linux/arm64 --push. This is the most common breakage when a team migrates the fleet before migrating the image.

What's next

Add a second ARM queue that uses SPOT capacity for fault-tolerant single-node jobs. AWS recommends SPOTPRICECAPACITY_OPTIMIZED for Spot compute resources.
Split job queues by latency class, not just by team. High-priority queues can map to the same ARM architecture but different capacity policies.
Add a launch template only when you need deeper host tuning. Otherwise, let Batch keep selecting the latest supported ECS_AL2023 AMI.
Standardize architecture in CI so every merge publishes both linux/arm64 and linux/amd64 images, even if Batch only consumes the ARM tag today.

The main lesson is that Graviton efficiency does not come from the instance family alone. It comes from making architecture visible in your build pipeline, queue boundaries, and AMI selection policy.

Graviton5 Batch Orchestration on AWS [Deep Dive 2026]

Bottom Line

Prerequisites

Prerequisites box

Bottom Line

Step 1: Check Graviton5 Capacity

Confirm identity and instance availability

Step 2: Build an ARM64 Image

Create a tiny batch workload

Step 3: Create Batch Resources

Create an ARM-only compute environment

Create the job queue

Step 4: Submit and Verify

Register the job definition

Submit the job

Verification and expected output

Troubleshooting and What's Next

Troubleshooting top 3

What's next

Frequently Asked Questions

Get Engineering Deep-Dives in Your Inbox

Related Deep-Dives

AWS Batch vs EKS for ML Pipelines

ECR Image Strategy for Multi-Arch Containers

ECS AL2023 Migration Playbook