Home Posts Embedding Models for Semantic Search [2026 Cheat Sheet]
Developer Reference

Embedding Models for Semantic Search [2026 Cheat Sheet]

Embedding Models for Semantic Search [2026 Cheat Sheet]
Dillip Chowdary
Dillip Chowdary
Tech Entrepreneur & Innovator · May 07, 2026 · 12 min read

Bottom Line

For most production semantic search systems, text-embedding-3-small is the default pick on cost-per-quality. Move to text-embedding-3-large when multilingual recall and ranking quality matter more than vector size and token cost.

Key Takeaways

  • text-embedding-3-small: 1536 dims, 62.3% MTEB, $0.02 per 1M tokens.
  • text-embedding-3-large: 3072 dims, 64.6% MTEB, 54.9% MIRACL, $0.13 per 1M tokens.
  • The dimensions parameter lets v3 models shorten vectors at write time.
  • OpenAI recommends cosine similarity; embeddings are normalized to length 1.
  • Batch pricing cuts embedding costs in half for offline indexing jobs.

As of May 07, 2026, OpenAI's docs still position text-embedding-3-small and text-embedding-3-large as the current text embedding defaults for semantic search. The practical decision is not "best model overall," but which vector size, benchmark profile, and token cost fit your index. This reference compresses the model numbers, setup steps, search UI patterns, and tuning levers you need to ship a fast retrieval stack without re-reading multiple docs pages.

  • text-embedding-3-small is the best default when cost and storage matter.
  • text-embedding-3-large buys higher retrieval quality, especially on multilingual workloads.
  • dimensions is the cleanest lever for shrinking vectors without changing model family.
  • Cosine similarity remains the safest default for ranking and nearest-neighbor search.
Dimensiontext-embedding-3-smalltext-embedding-3-largeEdge
Default vector size15363072Small
MTEB average62.3%64.6%Large
MIRACL average44.0%54.9%Large
Price per 1M tokens$0.02$0.13Small
Batch price per 1M tokens$0.01$0.065Small
Max input in docs8192 tokens8192 tokensTie

Model Selector

Bottom Line

Pick text-embedding-3-small first, then upgrade only if your eval set proves that text-embedding-3-large meaningfully improves ranking quality. In most stacks, index design and chunk quality matter more than jumping to the larger model too early.

What the current numbers say

  • OpenAI lists text-embedding-3-small at 1536 default dimensions and text-embedding-3-large at 3072.
  • On OpenAI's published benchmark summary, text-embedding-3-large leads on both MTEB and MIRACL.
  • text-embedding-3-small is still the price-performance leader for high-volume semantic search.
  • text-embedding-ada-002 remains available, but it is now the older baseline rather than the default recommendation.

When to choose each model

Choose text-embedding-3-small when:

  • You are indexing large corpora and storage cost compounds quickly.
  • Your corpus is mostly English and your eval set already clears quality targets.
  • You want cheaper backfills, refresh jobs, and continuous re-embedding.
  • You care more about throughput than squeezing out the last few points of recall.

Choose text-embedding-3-large when:

  • You need stronger multilingual retrieval quality.
  • Your queries are short, ambiguous, or terminology-heavy.
  • You rank a small but high-value corpus where better recall justifies higher cost.
  • You plan to shorten vectors with dimensions but still want the larger model family.

Live Search JS Filter

A cheat sheet is only useful if engineers can jump to the right command fast. The pattern below gives you a page-local filter that hides unmatched cards, highlights a focused result, and supports /, Esc, j, and k without introducing a framework dependency.

<input id='doc-filter' type='search' placeholder='Filter commands, flags, models' />
<div id='ref-grid'>
  <section class='ref-card' data-tags='small default cheap mteb miracl'>Small model notes</section>
  <section class='ref-card' data-tags='large multilingual recall dimensions'>Large model notes</section>
  <section class='ref-card' data-tags='curl node python embeddings create'>API commands</section>
</div>

<script>
const input = document.getElementById('doc-filter');
const cards = [...document.querySelectorAll('.ref-card')];
let active = 0;

function renderFilter() {
  const q = input.value.trim().toLowerCase();
  const visible = [];

  cards.forEach((card) => {
    const haystack = (card.textContent + ' ' + card.dataset.tags).toLowerCase();
    const match = !q || haystack.includes(q);
    card.hidden = !match;
    card.classList.remove('is-active');
    if (match) visible.push(card);
  });

  if (!visible.length) return;
  active = Math.min(active, visible.length - 1);
  visible[active].classList.add('is-active');
}

input.addEventListener('input', () => {
  active = 0;
  renderFilter();
});

document.addEventListener('keydown', (event) => {
  if (event.key === '/') {
    event.preventDefault();
    input.focus();
  }

  if (event.key === 'Escape') {
    input.value = '';
    input.blur();
    active = 0;
    renderFilter();
  }

  const visible = cards.filter((card) => !card.hidden);
  if (!visible.length) return;

  if (event.key === 'j') {
    active = Math.min(active + 1, visible.length - 1);
    renderFilter();
  }

  if (event.key === 'k') {
    active = Math.max(active - 1, 0);
    renderFilter();
  }
});

renderFilter();
</script>
  • Keep searchable metadata in data-tags so UI labels can stay short.
  • Prefer page-local filtering over remote search for static reference content.
  • Use a single is-active class to support keyboard focus styling and scroll anchoring.

Keyboard Shortcuts

These shortcuts fit a sticky ToC plus live filter layout and keep the reference usable without a mouse.

ShortcutActionWhy it matters
/Focus filter inputFastest path to a command or model note.
EscClear filter and blur inputGets you back to full-page scan mode.
jMove to next visible cardWorks well with dense cheat sheets.
kMove to previous visible cardMirrors terminal-style navigation.
g mJump to model selectorUseful when comparing small vs large.
g cJump to commands sectionGood default for implementation-heavy readers.
?Open shortcut helpReduces discoverability friction.

Commands Grouped by Purpose

Install and authenticate

npm install openai
pip install openai numpy
export OPENAI_API_KEY='your_api_key_here'

Create a baseline embedding

curl https://api.openai.com/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "input": "How do I rotate API keys safely?",
    "model": "text-embedding-3-small",
    "encoding_format": "float"
  }'

Create a shorter vector at write time

curl https://api.openai.com/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "input": "How do I rotate API keys safely?",
    "model": "text-embedding-3-large",
    "dimensions": 1024,
    "encoding_format": "float"
  }'

Generate embeddings in Node.js

import OpenAI from 'openai';

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const result = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Zero-downtime schema migration checklist',
  encoding_format: 'float'
});

console.log(result.data[0].embedding.length);

Rank results with cosine similarity

function dot(a, b) {
  let sum = 0;
  for (let i = 0; i < a.length; i += 1) sum += a[i] * b[i];
  return sum;
}

function cosineSimilarity(a, b) {
  let aNorm = 0;
  let bNorm = 0;
  for (let i = 0; i < a.length; i += 1) {
    aNorm += a[i] * a[i];
    bNorm += b[i] * b[i];
  }
  return dot(a, b) / (Math.sqrt(aNorm) * Math.sqrt(bNorm));
}
  • Use text-embedding-3-small for the first version of a search feature unless your evals say otherwise.
  • Use encoding_format set to float when you want direct numeric scoring in application code.
  • For offline reindex jobs, OpenAI's pricing page lists lower Batch pricing than standard embedding requests.

Configuration

Index defaults worth standardizing

  • Store the model name with every vector row so backfills and migrations stay auditable.
  • Store the resolved dimension count as a first-class field, especially if you use dimensions.
  • Keep chunk text, chunk ID, source ID, and content hash next to the embedding metadata.
  • Re-embed a whole index when you change model family, chunking policy, or normalization logic.

Minimal document schema

{
  "doc_id": "kb_1042",
  "chunk_id": "kb_1042_07",
  "model": "text-embedding-3-small",
  "dimensions": 1536,
  "content_hash": "sha256:...",
  "text": "Rotating API keys without breaking production traffic",
  "embedding": [0.0123, -0.0441, 0.0088]
}
Watch out: Do not mix vectors from different models or different dimensions in the same similarity field. A clean migration path beats silent ranking drift.

Security and test data hygiene

  • Do not embed raw secrets, private keys, or production-only credentials.
  • Sanitize support logs, tickets, and user content before building eval corpora.
  • If you need quick redaction before indexing examples, the Data Masking Tool is the right fit.
  • When you paste snippets into docs or demos, the Code Formatter helps keep reference blocks readable.

Advanced Usage

Shortening vectors without changing models

  • Use the API-level dimensions parameter first; it is simpler and safer than truncating arrays yourself.
  • If you manually cut vectors after retrieval, normalize them again before scoring.
  • A smaller vector can reduce memory pressure, network payloads, and vector DB storage cost.

Similarity, storage, and retrieval choices

  • OpenAI recommends cosine similarity; because embeddings are normalized to length 1, dot product gives the same ranking as cosine.
  • For large collections, use a vector database instead of brute-force nearest-neighbor scans.
  • Keep one embedding model per index unless you have a deliberate hybrid retrieval design.

Operational details that matter

  • The embeddings FAQ notes that text-embedding-3-small and text-embedding-3-large do not know about events after September 2021.
  • That cutoff matters less for pure semantic similarity than for generated answers, but it still affects taxonomy labels and date-sensitive corpora.
  • Always evaluate on your own query-document pairs before paying for the larger model.
Pro tip: Before upgrading models, try better chunking, cleaner document titles, and query-time metadata filters. Those changes often move semantic search quality more than a model switch.

Frequently Asked Questions

Which OpenAI embedding model should I use for semantic search in 2026? +
Start with text-embedding-3-small unless your evaluation set shows a clear recall or ranking gain from text-embedding-3-large. The smaller model is much cheaper per token and usually good enough for English-heavy corpora. Move up only when multilingual search, ambiguous queries, or high-value retrieval justify the extra vector cost.
Should I use cosine similarity or Euclidean distance for OpenAI embeddings? +
Use cosine similarity as the default. OpenAI says its embeddings are normalized to length 1, which means cosine similarity and Euclidean distance produce identical rankings. In practice, many teams compute a plain dot product because it is slightly simpler and faster.
Can I reduce embedding dimensions without rebuilding everything from scratch? +
You can request fewer dimensions up front with the dimensions parameter on text-embedding-3-small and text-embedding-3-large. That is the preferred path because the API returns vectors at the size you actually want to store. If you already truncated vectors manually, normalize them before scoring.
What chunking strategy works best for semantic search? +
There is no universal chunk size, so treat chunking as an evaluation problem rather than a rule-of-thumb problem. Start with coherent, self-contained passages that preserve titles, headings, and nearby context. If results are noisy, improve chunk boundaries and metadata before assuming you need a larger model.
Is Batch worth using for reindexing embeddings? +
Yes, especially for offline backfills and scheduled refresh jobs. OpenAI's pricing page lists lower Batch pricing for embeddings than standard requests, so the savings add up quickly on large corpora. It is less relevant for latency-sensitive user queries, but very relevant for bulk indexing.

Get Engineering Deep-Dives in Your Inbox

Weekly breakdowns of architecture, security, and developer tooling — no fluff.

Found this useful? Share it.