Embedding Models for Semantic Search [2026 Cheat Sheet]
Bottom Line
For most production semantic search systems, text-embedding-3-small is the default pick on cost-per-quality. Move to text-embedding-3-large when multilingual recall and ranking quality matter more than vector size and token cost.
Key Takeaways
- ›text-embedding-3-small: 1536 dims, 62.3% MTEB, $0.02 per 1M tokens.
- ›text-embedding-3-large: 3072 dims, 64.6% MTEB, 54.9% MIRACL, $0.13 per 1M tokens.
- ›The dimensions parameter lets v3 models shorten vectors at write time.
- ›OpenAI recommends cosine similarity; embeddings are normalized to length 1.
- ›Batch pricing cuts embedding costs in half for offline indexing jobs.
As of May 07, 2026, OpenAI's docs still position text-embedding-3-small and text-embedding-3-large as the current text embedding defaults for semantic search. The practical decision is not "best model overall," but which vector size, benchmark profile, and token cost fit your index. This reference compresses the model numbers, setup steps, search UI patterns, and tuning levers you need to ship a fast retrieval stack without re-reading multiple docs pages.
- text-embedding-3-small is the best default when cost and storage matter.
- text-embedding-3-large buys higher retrieval quality, especially on multilingual workloads.
- dimensions is the cleanest lever for shrinking vectors without changing model family.
- Cosine similarity remains the safest default for ranking and nearest-neighbor search.
| Dimension | text-embedding-3-small | text-embedding-3-large | Edge |
|---|---|---|---|
| Default vector size | 1536 | 3072 | Small |
| MTEB average | 62.3% | 64.6% | Large |
| MIRACL average | 44.0% | 54.9% | Large |
| Price per 1M tokens | $0.02 | $0.13 | Small |
| Batch price per 1M tokens | $0.01 | $0.065 | Small |
| Max input in docs | 8192 tokens | 8192 tokens | Tie |
Model Selector
Bottom Line
Pick text-embedding-3-small first, then upgrade only if your eval set proves that text-embedding-3-large meaningfully improves ranking quality. In most stacks, index design and chunk quality matter more than jumping to the larger model too early.
What the current numbers say
- OpenAI lists text-embedding-3-small at 1536 default dimensions and text-embedding-3-large at 3072.
- On OpenAI's published benchmark summary, text-embedding-3-large leads on both MTEB and MIRACL.
- text-embedding-3-small is still the price-performance leader for high-volume semantic search.
- text-embedding-ada-002 remains available, but it is now the older baseline rather than the default recommendation.
When to choose each model
Choose text-embedding-3-small when:
- You are indexing large corpora and storage cost compounds quickly.
- Your corpus is mostly English and your eval set already clears quality targets.
- You want cheaper backfills, refresh jobs, and continuous re-embedding.
- You care more about throughput than squeezing out the last few points of recall.
Choose text-embedding-3-large when:
- You need stronger multilingual retrieval quality.
- Your queries are short, ambiguous, or terminology-heavy.
- You rank a small but high-value corpus where better recall justifies higher cost.
- You plan to shorten vectors with dimensions but still want the larger model family.
Live Search JS Filter
A cheat sheet is only useful if engineers can jump to the right command fast. The pattern below gives you a page-local filter that hides unmatched cards, highlights a focused result, and supports /, Esc, j, and k without introducing a framework dependency.
<input id='doc-filter' type='search' placeholder='Filter commands, flags, models' />
<div id='ref-grid'>
<section class='ref-card' data-tags='small default cheap mteb miracl'>Small model notes</section>
<section class='ref-card' data-tags='large multilingual recall dimensions'>Large model notes</section>
<section class='ref-card' data-tags='curl node python embeddings create'>API commands</section>
</div>
<script>
const input = document.getElementById('doc-filter');
const cards = [...document.querySelectorAll('.ref-card')];
let active = 0;
function renderFilter() {
const q = input.value.trim().toLowerCase();
const visible = [];
cards.forEach((card) => {
const haystack = (card.textContent + ' ' + card.dataset.tags).toLowerCase();
const match = !q || haystack.includes(q);
card.hidden = !match;
card.classList.remove('is-active');
if (match) visible.push(card);
});
if (!visible.length) return;
active = Math.min(active, visible.length - 1);
visible[active].classList.add('is-active');
}
input.addEventListener('input', () => {
active = 0;
renderFilter();
});
document.addEventListener('keydown', (event) => {
if (event.key === '/') {
event.preventDefault();
input.focus();
}
if (event.key === 'Escape') {
input.value = '';
input.blur();
active = 0;
renderFilter();
}
const visible = cards.filter((card) => !card.hidden);
if (!visible.length) return;
if (event.key === 'j') {
active = Math.min(active + 1, visible.length - 1);
renderFilter();
}
if (event.key === 'k') {
active = Math.max(active - 1, 0);
renderFilter();
}
});
renderFilter();
</script>
- Keep searchable metadata in
data-tagsso UI labels can stay short. - Prefer page-local filtering over remote search for static reference content.
- Use a single
is-activeclass to support keyboard focus styling and scroll anchoring.
Keyboard Shortcuts
These shortcuts fit a sticky ToC plus live filter layout and keep the reference usable without a mouse.
| Shortcut | Action | Why it matters |
|---|---|---|
/ | Focus filter input | Fastest path to a command or model note. |
Esc | Clear filter and blur input | Gets you back to full-page scan mode. |
j | Move to next visible card | Works well with dense cheat sheets. |
k | Move to previous visible card | Mirrors terminal-style navigation. |
g m | Jump to model selector | Useful when comparing small vs large. |
g c | Jump to commands section | Good default for implementation-heavy readers. |
? | Open shortcut help | Reduces discoverability friction. |
Commands Grouped by Purpose
Install and authenticate
npm install openai
pip install openai numpy
export OPENAI_API_KEY='your_api_key_here'
Create a baseline embedding
curl https://api.openai.com/v1/embeddings \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"input": "How do I rotate API keys safely?",
"model": "text-embedding-3-small",
"encoding_format": "float"
}'
Create a shorter vector at write time
curl https://api.openai.com/v1/embeddings \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"input": "How do I rotate API keys safely?",
"model": "text-embedding-3-large",
"dimensions": 1024,
"encoding_format": "float"
}'
Generate embeddings in Node.js
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const result = await client.embeddings.create({
model: 'text-embedding-3-small',
input: 'Zero-downtime schema migration checklist',
encoding_format: 'float'
});
console.log(result.data[0].embedding.length);
Rank results with cosine similarity
function dot(a, b) {
let sum = 0;
for (let i = 0; i < a.length; i += 1) sum += a[i] * b[i];
return sum;
}
function cosineSimilarity(a, b) {
let aNorm = 0;
let bNorm = 0;
for (let i = 0; i < a.length; i += 1) {
aNorm += a[i] * a[i];
bNorm += b[i] * b[i];
}
return dot(a, b) / (Math.sqrt(aNorm) * Math.sqrt(bNorm));
}
- Use text-embedding-3-small for the first version of a search feature unless your evals say otherwise.
- Use encoding_format set to float when you want direct numeric scoring in application code.
- For offline reindex jobs, OpenAI's pricing page lists lower Batch pricing than standard embedding requests.
Configuration
Index defaults worth standardizing
- Store the model name with every vector row so backfills and migrations stay auditable.
- Store the resolved dimension count as a first-class field, especially if you use dimensions.
- Keep chunk text, chunk ID, source ID, and content hash next to the embedding metadata.
- Re-embed a whole index when you change model family, chunking policy, or normalization logic.
Minimal document schema
{
"doc_id": "kb_1042",
"chunk_id": "kb_1042_07",
"model": "text-embedding-3-small",
"dimensions": 1536,
"content_hash": "sha256:...",
"text": "Rotating API keys without breaking production traffic",
"embedding": [0.0123, -0.0441, 0.0088]
}
Security and test data hygiene
- Do not embed raw secrets, private keys, or production-only credentials.
- Sanitize support logs, tickets, and user content before building eval corpora.
- If you need quick redaction before indexing examples, the Data Masking Tool is the right fit.
- When you paste snippets into docs or demos, the Code Formatter helps keep reference blocks readable.
Advanced Usage
Shortening vectors without changing models
- Use the API-level dimensions parameter first; it is simpler and safer than truncating arrays yourself.
- If you manually cut vectors after retrieval, normalize them again before scoring.
- A smaller vector can reduce memory pressure, network payloads, and vector DB storage cost.
Similarity, storage, and retrieval choices
- OpenAI recommends cosine similarity; because embeddings are normalized to length 1, dot product gives the same ranking as cosine.
- For large collections, use a vector database instead of brute-force nearest-neighbor scans.
- Keep one embedding model per index unless you have a deliberate hybrid retrieval design.
Operational details that matter
- The embeddings FAQ notes that text-embedding-3-small and text-embedding-3-large do not know about events after September 2021.
- That cutoff matters less for pure semantic similarity than for generated answers, but it still affects taxonomy labels and date-sensitive corpora.
- Always evaluate on your own query-document pairs before paying for the larger model.
Frequently Asked Questions
Which OpenAI embedding model should I use for semantic search in 2026? +
text-embedding-3-small unless your evaluation set shows a clear recall or ranking gain from text-embedding-3-large. The smaller model is much cheaper per token and usually good enough for English-heavy corpora. Move up only when multilingual search, ambiguous queries, or high-value retrieval justify the extra vector cost.Should I use cosine similarity or Euclidean distance for OpenAI embeddings? +
Can I reduce embedding dimensions without rebuilding everything from scratch? +
dimensions parameter on text-embedding-3-small and text-embedding-3-large. That is the preferred path because the API returns vectors at the size you actually want to store. If you already truncated vectors manually, normalize them before scoring.What chunking strategy works best for semantic search? +
Is Batch worth using for reindexing embeddings? +
Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.