Which OpenAI embedding model should I use for semantic search in 2026?

Start with text-embedding-3-small unless your evaluation set shows a clear recall or ranking gain from text-embedding-3-large. The smaller model is much cheaper per token and usually good enough for English-heavy corpora. Move up only when multilingual search, ambiguous queries, or high-value retrieval justify the extra vector cost.

Should I use cosine similarity or Euclidean distance for OpenAI embeddings?

Use cosine similarity as the default. OpenAI says its embeddings are normalized to length 1, which means cosine similarity and Euclidean distance produce identical rankings. In practice, many teams compute a plain dot product because it is slightly simpler and faster.

Can I reduce embedding dimensions without rebuilding everything from scratch?

You can request fewer dimensions up front with the dimensions parameter on text-embedding-3-small and text-embedding-3-large. That is the preferred path because the API returns vectors at the size you actually want to store. If you already truncated vectors manually, normalize them before scoring.

What chunking strategy works best for semantic search?

There is no universal chunk size, so treat chunking as an evaluation problem rather than a rule-of-thumb problem. Start with coherent, self-contained passages that preserve titles, headings, and nearby context. If results are noisy, improve chunk boundaries and metadata before assuming you need a larger model.

Is Batch worth using for reindexing embeddings?

Yes, especially for offline backfills and scheduled refresh jobs. OpenAI's pricing page lists lower Batch pricing for embeddings than standard requests, so the savings add up quickly on large corpora. It is less relevant for latency-sensitive user queries, but very relevant for bulk indexing.

Embedding Models for Semantic Search [2026 Cheat Sheet]

As of May 07, 2026, OpenAI's docs still position text-embedding-3-small and text-embedding-3-large as the current text embedding defaults for semantic search. The practical decision is not "best model overall," but which vector size, benchmark profile, and token cost fit your index. This reference compresses the model numbers, setup steps, search UI patterns, and tuning levers you need to ship a fast retrieval stack without re-reading multiple docs pages.

text-embedding-3-small is the best default when cost and storage matter.
text-embedding-3-large buys higher retrieval quality, especially on multilingual workloads.
dimensions is the cleanest lever for shrinking vectors without changing model family.
Cosine similarity remains the safest default for ranking and nearest-neighbor search.

Dimension	text-embedding-3-small	text-embedding-3-large	Edge
Default vector size	1536	3072	Small
MTEB average	62.3%	64.6%	Large
MIRACL average	44.0%	54.9%	Large
Price per 1M tokens	$0.02	$0.13	Small
Batch price per 1M tokens	$0.01	$0.065	Small
Max input in docs	8192 tokens	8192 tokens	Tie

Model Selector

Bottom Line

Pick text-embedding-3-small first, then upgrade only if your eval set proves that text-embedding-3-large meaningfully improves ranking quality. In most stacks, index design and chunk quality matter more than jumping to the larger model too early.

What the current numbers say

OpenAI lists text-embedding-3-small at 1536 default dimensions and text-embedding-3-large at 3072.
On OpenAI's published benchmark summary, text-embedding-3-large leads on both MTEB and MIRACL.
text-embedding-3-small is still the price-performance leader for high-volume semantic search.
text-embedding-ada-002 remains available, but it is now the older baseline rather than the default recommendation.

When to choose each model

Choose text-embedding-3-small when:

You are indexing large corpora and storage cost compounds quickly.
Your corpus is mostly English and your eval set already clears quality targets.
You want cheaper backfills, refresh jobs, and continuous re-embedding.
You care more about throughput than squeezing out the last few points of recall.

Choose text-embedding-3-large when:

You need stronger multilingual retrieval quality.
Your queries are short, ambiguous, or terminology-heavy.
You rank a small but high-value corpus where better recall justifies higher cost.
You plan to shorten vectors with dimensions but still want the larger model family.

Live Search JS Filter

A cheat sheet is only useful if engineers can jump to the right command fast. The pattern below gives you a page-local filter that hides unmatched cards, highlights a focused result, and supports /, Esc, j, and k without introducing a framework dependency.

<input id='doc-filter' type='search' placeholder='Filter commands, flags, models' />
<div id='ref-grid'>
  <section class='ref-card' data-tags='small default cheap mteb miracl'>Small model notes</section>
  <section class='ref-card' data-tags='large multilingual recall dimensions'>Large model notes</section>
  <section class='ref-card' data-tags='curl node python embeddings create'>API commands</section>
</div>

<script>
const input = document.getElementById('doc-filter');
const cards = [...document.querySelectorAll('.ref-card')];
let active = 0;

function renderFilter() {
  const q = input.value.trim().toLowerCase();
  const visible = [];

  cards.forEach((card) => {
    const haystack = (card.textContent + ' ' + card.dataset.tags).toLowerCase();
    const match = !q || haystack.includes(q);
    card.hidden = !match;
    card.classList.remove('is-active');
    if (match) visible.push(card);
  });

  if (!visible.length) return;
  active = Math.min(active, visible.length - 1);
  visible[active].classList.add('is-active');
}

input.addEventListener('input', () => {
  active = 0;
  renderFilter();
});

document.addEventListener('keydown', (event) => {
  if (event.key === '/') {
    event.preventDefault();
    input.focus();
  }

  if (event.key === 'Escape') {
    input.value = '';
    input.blur();
    active = 0;
    renderFilter();
  }

  const visible = cards.filter((card) => !card.hidden);
  if (!visible.length) return;

  if (event.key === 'j') {
    active = Math.min(active + 1, visible.length - 1);
    renderFilter();
  }

  if (event.key === 'k') {
    active = Math.max(active - 1, 0);
    renderFilter();
  }
});

renderFilter();
</script>

Keep searchable metadata in data-tags so UI labels can stay short.
Prefer page-local filtering over remote search for static reference content.
Use a single is-active class to support keyboard focus styling and scroll anchoring.

Keyboard Shortcuts

These shortcuts fit a sticky ToC plus live filter layout and keep the reference usable without a mouse.

Shortcut	Action	Why it matters
`/`	Focus filter input	Fastest path to a command or model note.
`Esc`	Clear filter and blur input	Gets you back to full-page scan mode.
`j`	Move to next visible card	Works well with dense cheat sheets.
`k`	Move to previous visible card	Mirrors terminal-style navigation.
`g m`	Jump to model selector	Useful when comparing small vs large.
`g c`	Jump to commands section	Good default for implementation-heavy readers.
`?`	Open shortcut help	Reduces discoverability friction.

Commands Grouped by Purpose

Install and authenticate

npm install openai
pip install openai numpy
export OPENAI_API_KEY='your_api_key_here'

Create a baseline embedding

curl https://api.openai.com/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "input": "How do I rotate API keys safely?",
    "model": "text-embedding-3-small",
    "encoding_format": "float"
  }'

Create a shorter vector at write time

curl https://api.openai.com/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "input": "How do I rotate API keys safely?",
    "model": "text-embedding-3-large",
    "dimensions": 1024,
    "encoding_format": "float"
  }'

Generate embeddings in Node.js

import OpenAI from 'openai';

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const result = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Zero-downtime schema migration checklist',
  encoding_format: 'float'
});

console.log(result.data[0].embedding.length);

Rank results with cosine similarity

function dot(a, b) {
  let sum = 0;
  for (let i = 0; i < a.length; i += 1) sum += a[i] * b[i];
  return sum;
}

function cosineSimilarity(a, b) {
  let aNorm = 0;
  let bNorm = 0;
  for (let i = 0; i < a.length; i += 1) {
    aNorm += a[i] * a[i];
    bNorm += b[i] * b[i];
  }
  return dot(a, b) / (Math.sqrt(aNorm) * Math.sqrt(bNorm));
}

Use text-embedding-3-small for the first version of a search feature unless your evals say otherwise.
Use encoding_format set to float when you want direct numeric scoring in application code.
For offline reindex jobs, OpenAI's pricing page lists lower Batch pricing than standard embedding requests.

Configuration

Index defaults worth standardizing

Store the model name with every vector row so backfills and migrations stay auditable.
Store the resolved dimension count as a first-class field, especially if you use dimensions.
Keep chunk text, chunk ID, source ID, and content hash next to the embedding metadata.
Re-embed a whole index when you change model family, chunking policy, or normalization logic.

Minimal document schema

{
  "doc_id": "kb_1042",
  "chunk_id": "kb_1042_07",
  "model": "text-embedding-3-small",
  "dimensions": 1536,
  "content_hash": "sha256:...",
  "text": "Rotating API keys without breaking production traffic",
  "embedding": [0.0123, -0.0441, 0.0088]
}

Watch out: Do not mix vectors from different models or different dimensions in the same similarity field. A clean migration path beats silent ranking drift.

Security and test data hygiene

Do not embed raw secrets, private keys, or production-only credentials.
Sanitize support logs, tickets, and user content before building eval corpora.
If you need quick redaction before indexing examples, the Data Masking Tool is the right fit.
When you paste snippets into docs or demos, the Code Formatter helps keep reference blocks readable.

Advanced Usage

Shortening vectors without changing models

Use the API-level dimensions parameter first; it is simpler and safer than truncating arrays yourself.
If you manually cut vectors after retrieval, normalize them again before scoring.
A smaller vector can reduce memory pressure, network payloads, and vector DB storage cost.

Similarity, storage, and retrieval choices

OpenAI recommends cosine similarity; because embeddings are normalized to length 1, dot product gives the same ranking as cosine.
For large collections, use a vector database instead of brute-force nearest-neighbor scans.
Keep one embedding model per index unless you have a deliberate hybrid retrieval design.

Operational details that matter

The embeddings FAQ notes that text-embedding-3-small and text-embedding-3-large do not know about events after September 2021.
That cutoff matters less for pure semantic similarity than for generated answers, but it still affects taxonomy labels and date-sensitive corpora.
Always evaluate on your own query-document pairs before paying for the larger model.

Pro tip: Before upgrading models, try better chunking, cleaner document titles, and query-time metadata filters. Those changes often move semantic search quality more than a model switch.

Embedding Models for Semantic Search [2026 Cheat Sheet]

Bottom Line

Model Selector

Bottom Line

What the current numbers say

When to choose each model

Live Search JS Filter

Keyboard Shortcuts

Commands Grouped by Purpose

Install and authenticate

Create a baseline embedding

Create a shorter vector at write time

Generate embeddings in Node.js

Rank results with cosine similarity

Configuration

Index defaults worth standardizing

Minimal document schema

Security and test data hygiene

Advanced Usage

Shortening vectors without changing models

Similarity, storage, and retrieval choices

Operational details that matter

Frequently Asked Questions

Get Engineering Deep-Dives in Your Inbox