Home Posts Vector Sharding with Qdrant on Kubernetes [2026] Guide
Cloud Infrastructure

Vector Sharding with Qdrant on Kubernetes [2026] Guide

Vector Sharding with Qdrant on Kubernetes [2026] Guide
Dillip Chowdary
Dillip Chowdary
Tech Entrepreneur & Innovator · May 01, 2026 · 10 min read

Bottom Line

On self-hosted Kubernetes, Qdrant scales semantic search well, but shard planning is not automatic. If you choose the right shard count up front and verify distribution continuously, you avoid the two most expensive mistakes: hot nodes and forced collection rebuilds.

Key Takeaways

  • Use a 3-node StatefulSet for resilient self-hosted Qdrant clusters.
  • Set shard_number as a multiple of node count; Qdrant recommends at least 2 shards per node.
  • Enable distributed mode with config.cluster.enabled: true; peer traffic uses 6335.
  • Self-hosted scale-out does not rebalance shards automatically; use the cluster API to move shards.

Vector search usually fails to scale for boring reasons: one node gets too hot, shard counts are chosen once and regretted forever, and cluster expansion happens without verifying where data actually lives. Qdrant gives you the building blocks to avoid that, and Kubernetes gives you stable identities and storage through StatefulSets. The trick is understanding that distributed mode, replication, and shard movement are related but separate controls. This walkthrough shows the setup that holds up under real growth.

Prerequisites and shard plan

Prerequisites

  • A working Kubernetes cluster with dynamic persistent volumes.
  • kubectl and helm installed locally.
  • Network access between Qdrant peers on 6335 and client/API access on 6333.
  • A namespace where you can run a StatefulSet with at least three replicas.
  • A shard plan before first ingest, because shard_number cannot be changed later in self-hosted open source deployments without recreating the collection.

Bottom Line

Treat shard count as a capacity decision, not a tuning afterthought. In self-hosted Qdrant on Kubernetes, scale-out is straightforward, but even distribution still depends on you.

As of May 01, 2026, the public Qdrant Helm chart site lists chart 1.17.1, which deploys Qdrant v1.17.1. Qdrant’s distributed deployment guide also recommends using a shard count that is a multiple of your node count, and at least 2 shards per node if you want headroom for future expansion.

  • For a 3-node cluster, start with 6 or 12 shards.
  • Use replication_factor: 2 when you want one extra copy without tripling storage.
  • Use replication_factor: 3 when read availability matters more than storage cost.
  • Remember that scaling the StatefulSet is not the same thing as redistributing existing shards.
Watch out: If you create a collection with too few shards, Kubernetes can add pods later, but Qdrant will not magically gain parallelism. In self-hosted mode, that usually means rebuilding the collection.

Step 1: Deploy a distributed cluster

The official Helm chart already uses a StatefulSet, which is exactly what you want for stable pod names and persistent storage. The minimum changes for a distributed cluster are increasing replicaCount and enabling config.cluster.enabled.

Create the values file

replicaCount: 3

config:
  cluster:
    enabled: true

persistence:
  size: 100Gi

If you want to clean up the YAML before committing it, run it through TechBytes’ Code Formatter. That is especially useful once you start layering storage classes, resource requests, and affinity rules into the same values file.

Install the chart

helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm repo update
helm upgrade -i qdrant qdrant/qdrant \
  --namespace vector-search \
  --create-namespace \
  -f values.yaml

Wait until all pods are ready:

kubectl get pods -n vector-search -w

Expected state:

  • Three Qdrant pods become Running and READY 1/1.
  • The workload type is a StatefulSet, not a stateless Deployment.
  • Persistent volume claims are bound for each replica.

Step 2: Create a sharded collection

Now create a collection with shard math that matches the cluster you just deployed. For a 3-node cluster, 6 shards is a practical starting point because it spreads work evenly and still leaves room for moderate growth.

Forward the API locally

kubectl port-forward -n vector-search svc/qdrant 6333:6333

Create the collection

curl -X PUT http://localhost:6333/collections/docs \
  -H 'Content-Type: application/json' \
  -d '{
    "vectors": {
      "size": 4,
      "distance": "Cosine"
    },
    "shard_number": 6,
    "replication_factor": 2
  }'

This example uses a tiny 4-dimensional vector just to verify the cluster mechanics quickly. In production, your vector size must match the embedding model you actually use. The important part here is the relationship between shard_number, node count, and replication.

  • shard_number controls how many logical partitions the collection has.
  • replication_factor controls how many copies of each shard exist.
  • More shards improve distribution flexibility, but too many add overhead.

Step 3: Load data and run queries

Before you benchmark anything, prove that writes, reads, and fan-out search work across the cluster. Use a small synthetic dataset first. If you are testing with real payloads that include customer or document metadata, scrub them before sharing or debugging with TechBytes’ Data Masking Tool.

Upsert sample points

curl -X PUT 'http://localhost:6333/collections/docs/points?wait=true' \
  -H 'Content-Type: application/json' \
  -d '{
    "points": [
      {
        "id": 1,
        "vector": [0.91, 0.05, 0.02, 0.02],
        "payload": {"topic": "kubernetes", "tier": "infra"}
      },
      {
        "id": 2,
        "vector": [0.10, 0.80, 0.05, 0.05],
        "payload": {"topic": "qdrant", "tier": "infra"}
      },
      {
        "id": 3,
        "vector": [0.12, 0.10, 0.70, 0.08],
        "payload": {"topic": "security", "tier": "platform"}
      }
    ]
  }'

The official API returns an acknowledged operation when the write succeeds. Then issue a query through the newer universal query endpoint instead of the older deprecated search route.

Run a nearest-neighbor query

curl -X POST http://localhost:6333/collections/docs/points/query \
  -H 'Content-Type: application/json' \
  -d '{
    "query": [0.88, 0.06, 0.03, 0.03],
    "limit": 2,
    "with_payload": true
  }'

Expected result:

  • A top match with id close to 1.
  • A response object containing status: ok.
  • Points returned under result.points, which confirms you are using the current query API.

Verify and inspect shard placement

This is the step teams skip, and it is where most scale problems start. You need to verify both cluster health and actual collection distribution.

Check cluster status

curl http://localhost:6333/cluster

Expected output:

  • status: ok at the top level.
  • A non-disabled cluster result once distributed mode is active.
  • Peer information showing the cluster is composed rather than running as a single standalone node.

Inspect collection shard distribution

curl http://localhost:6333/collections/docs/cluster

Expected output:

  • shard_count equals 6.
  • local_shards and remote_shards entries show where shards live.
  • Shard states appear as Active after the collection settles.

If you later scale from three pods to four, do not assume the new pod receives useful work automatically. Self-hosted Qdrant exposes the cluster API for manual movement:

curl -X POST http://localhost:6333/collections/docs/cluster \
  -H 'Content-Type: application/json' \
  -d '{
    "move_shard": {
      "shard_id": 0,
      "from_peer_id": 111111111,
      "to_peer_id": 222222222,
      "method": "stream_records"
    }
  }'

Use the real peer IDs from your cluster output. This is the key operational distinction: Kubernetes adds compute, but Qdrant needs explicit shard placement changes to use that compute well.

Pro tip: When growth is likely, choose 12 shards on day one for a small cluster. Qdrant’s own guidance calls out 12 as a flexible count for expansion paths like 1, 2, 3, 6, and 12 nodes.

Troubleshooting and what’s next

Troubleshooting top 3

  1. New nodes stay mostly idle after scale-out. This is expected in self-hosted Qdrant. Scaling the StatefulSet adds peers, but existing shards are not auto-rebalanced; inspect /collections/docs/cluster and move shards explicitly.
  2. The collection cannot use all peers evenly. Your shard_number is probably too low or not a clean multiple of the current node count. In self-hosted deployments, the fix is usually creating a new collection and migrating data.
  3. Helm upgrade fails on immutable StatefulSet fields. The Helm chart documentation calls out StatefulSet immutability. If you change forbidden spec fields, Kubernetes rejects the patch; plan storage and service shape early, and use snapshots before disruptive changes.

What’s next

  • Add workload-aware benchmarking so you measure query latency by payload filter and shard fan-out, not just by raw QPS.
  • Introduce snapshots and restore drills before you call the cluster production-ready.
  • Wrap collection creation in infrastructure code so shard_number and replication_factor are reviewed, not improvised.
  • If you need automatic rebalancing or live resharding, evaluate whether Qdrant Cloud fits better than a fully self-managed Kubernetes path.

Frequently Asked Questions

How many shards should I use for Qdrant on Kubernetes? +
Start with a shard_number that is a multiple of your current node count. Qdrant’s distributed deployment guidance also recommends at least 2 shards per node so you can expand later without immediately recreating the collection.
Does Qdrant automatically rebalance shards after I scale my Kubernetes StatefulSet? +
Not in a self-hosted open source deployment. Adding replicas gives you more peers, but you still need to inspect shard placement and use POST /collections/{collection}/cluster operations to move shards when distribution is uneven.
Can I change shard_number after creating a Qdrant collection? +
In self-hosted Qdrant, treat shard_number as effectively fixed for the lifetime of the collection. If you got the number wrong, the practical fix is usually to create a new collection with the right layout and migrate data.
Why use a Kubernetes StatefulSet instead of a Deployment for Qdrant? +
Qdrant is a stateful system, so it benefits from stable pod identity and persistent storage. StatefulSets provide both, which makes peer discovery, storage attachment, and rolling operations much safer than a stateless Deployment.

Get Engineering Deep-Dives in Your Inbox

Weekly breakdowns of architecture, security, and developer tooling — no fluff.

Found this useful? Share it.