WebAssembly SIMD & Threads Flags [Cheat Sheet 2026]
Bottom Line
For production WebAssembly performance, the core switches are simple: use -msimd128 for vector code and -pthread for parallel code. The real wins come from the surrounding setup: worker pools, cross-origin isolation, memory sizing, and allocator choice.
Key Takeaways
- ›-msimd128 enables Wasm SIMD and also turns on LLVM autovectorization.
- ›-pthread must be present at compile and link time for Emscripten pthread builds.
- ›-sPTHREADPOOLSIZE prewarms workers so
pthread_create()can start synchronously. - ›Threaded Wasm needs cross-origin isolation:
COOP: same-originplusCOEP: require-corporcredentialless. - ›-sMALLOC=mimalloc can scale better under thread contention, but it costs more code size and memory.
WebAssembly performance tuning gets confusing when SIMD flags, pthread settings, browser isolation headers, and memory knobs all overlap. This reference keeps the surface area small and practical: the exact Emscripten switches that matter for SIMD and multi-threading, how to group them by purpose, and which supporting settings usually determine whether a build is merely valid or actually fast in production.
Quick Reference Filter
Bottom Line
Use -msimd128 to unlock Wasm SIMD, and use -pthread only when you can also ship proper cross-origin isolation headers. Most production regressions come from missing worker pools, main-thread blocking, or bad memory defaults rather than from the SIMD flag itself.
Filter the table by flag, purpose, or caveat. If you want to clean up long command lines before sharing them in docs or PRs, run them through Code Formatter.
| Purpose | Flag / Header | What it changes | When to use it |
|---|---|---|---|
| SIMD | -msimd128 | Enables Wasm SIMD and LLVM autovectorization. | Default starting point for vector-friendly hot paths. |
| SIMD | -mrelaxed-simd | Targets Relaxed SIMD intrinsics. | Only when your code explicitly targets Relaxed SIMD behavior. |
| SIMD control | -fno-vectorize -fno-slp-vectorize | Disables autovectorization even with SIMD enabled. | When you want manual SIMD only. |
| Threads | -pthread | Enables Emscripten pthread code generation. | Required at compile and link time for threaded builds. |
| Threads | -sPTHREAD_POOL_SIZE=n | Precreates workers before main(). | When thread creation latency or sync startup matters. |
| Threads | -sPROXY_TO_PTHREAD | Runs your real main() on a pthread. | Recommended when UI-thread blocking is a risk. |
| Memory | -sINITIAL_HEAP=... | Sets initial dynamic heap size. | Preferred over INITIAL_MEMORY in most cases. |
| Memory | -sALLOW_MEMORY_GROWTH | Lets memory expand at runtime. | Use when peak memory is uncertain. |
| Memory | -sMAXIMUM_MEMORY=... | Caps growth; default is 2GB when growth is enabled. | Set explicitly for large workloads. |
| Allocator | -sMALLOC=mimalloc | Improves scaling under allocator contention. | Large threaded apps with heavy malloc/free traffic. |
| Link | -flto | Enables cross-module optimization at compile and link time. | Release builds where link time is acceptable. |
| Headers | COOP + COEP | Enables cross-origin isolation for shared memory. | Mandatory for browser pthread deployments. |
Keyboard Shortcuts
| Shortcut | Action | Use |
|---|---|---|
/ | Focus filter box | Start searching flags instantly. |
Esc | Clear filter | Reset the table to full view. |
n | Jump to next visible row | Scan long result sets faster. |
Shift+F | Toggle exact-match mode | Narrow to one flag or header quickly. |
SIMD Flags
Core compile patterns
- -msimd128 is the baseline switch for WebAssembly SIMD.
- That same flag also enables LLVM autovectorization passes.
- If you want manual intrinsics without auto-vectorized rewrites, add -fno-vectorize -fno-slp-vectorize.
- Use the preprocessor guard
__wasm_simd128__to gate SIMD-specific code paths.
emcc -O3 -msimd128 src/hotpath.cpp -o dist/app.js
emcc -O3 -msimd128 -fno-vectorize -fno-slp-vectorize src/hotpath.cpp -o dist/app.js
Intrinsics and source compatibility
- Use
#include <wasm_simd128.h>for native Wasm SIMD intrinsics. - Existing x86 intrinsic code can often be compiled with -msimd128 plus the matching ISA flag such as -msse2 or -msse4.1.
- Existing ARM NEON code can also be targeted through Emscripten’s SIMD support.
- Use -mrelaxed-simd only when you intentionally target Relaxed SIMD intrinsics.
emcc -O3 -msimd128 -msse4.1 src/sse_port.cpp -o dist/app.js
#ifdef __wasm_simd128__
// SIMD-only path
#endif
#ifdef __wasm_relaxed_simd__
// Relaxed SIMD path
#endif
Threading Flags
Build flags that actually matter
- -pthread is required when compiling source files and again when linking the final artifact.
- -sPTHREADPOOLSIZE preloads workers before
main(), which avoids async startup surprises. - -sPROXYTOPTHREAD moves your real
main()off the browser main thread. - You cannot ship one binary that transparently falls back from multithreaded to single-threaded mode. Build both variants and choose at runtime.
emcc -O3 -pthread src/app.cpp -o dist/app-threads.js
emcc -O3 -pthread \
-sPTHREAD_POOL_SIZE=navigator.hardwareConcurrency \
-sPROXY_TO_PTHREAD \
src/app.cpp -o dist/app-threads.js
Browser isolation and deployment headers
- Browser pthreads depend on shared memory, which means cross-origin isolation.
- Set
Cross-Origin-Opener-Policy: same-origin. - Set
Cross-Origin-Embedder-Policy: require-corporcredentialless. - Check
crossOriginIsolatedat runtime before assumingSharedArrayBufferis usable.
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
if (!crossOriginIsolated) {
console.warn('Threaded Wasm is unavailable: missing COOP/COEP isolation');
}
pthread_join or pthread_cond_wait are a bad fit for the browser main thread.Configuration
Memory sizing
- -sINITIAL_HEAP is the preferred dynamic-heap knob in most cases.
- -sINITIAL_MEMORY still matters when you need total memory control.
- -sALLOWMEMORYGROWTH helps when peak memory is uncertain, but it can slow JavaScript-side heap access because typed array views must be replaced.
- -sMAXIMUM_MEMORY defaults to 2GB when growth is enabled; set it explicitly for larger workloads.
emcc -O3 -msimd128 \
-sINITIAL_HEAP=64MB \
-sALLOW_MEMORY_GROWTH \
-sMAXIMUM_MEMORY=1GB \
src/app.cpp -o dist/app.js
Allocator choice
- -sMALLOC=dlmalloc is the default general-purpose option.
- -sMALLOC=emmalloc is smaller but less focused on speed.
- -sMALLOC=mimalloc is the threaded scaling option when allocator contention shows up in profiles.
- mimalloc costs more code size and more runtime memory, so treat it as a measured optimization, not a default.
emcc -O3 -pthread -sMALLOC=mimalloc src/app.cpp -o dist/app-threads.js
Advanced Usage
Commands grouped by purpose
- Fast SIMD release: combine -O3, -msimd128, and -flto.
- Threaded browser build: combine -pthread, a worker pool, and isolation headers.
- Mixed speed and safety: add growth and explicit memory caps for variable workloads.
- Maximal browser-side responsiveness: add -sPROXYTOPTHREAD.
emcc -O3 -msimd128 -flto src/core.cpp -c -o build/core.o
emcc -O3 -msimd128 -flto build/core.o -o dist/app-simd.js
emcc -O3 -pthread -msimd128 \
-sPTHREAD_POOL_SIZE=navigator.hardwareConcurrency \
-sPROXY_TO_PTHREAD \
-sMALLOC=mimalloc \
-flto src/app.cpp -o dist/app-full.js
Operational checks before you ship
- Use the same optimization strategy at compile time and link time; Emscripten’s docs call this the usual right choice.
- Profile across browsers because SIMD support and memory behavior can differ meaningfully by engine.
- Separate your single-threaded and pthread artifacts in CI and choose between them at runtime.
- Re-check long compile commands after edits; tiny flag drift between object compilation and link often causes misleading results.
# SIMD-only release
emcc -O3 -msimd128 -flto src/app.cpp -o dist/app-simd.js
# Threads-only release
emcc -O3 -pthread -sPTHREAD_POOL_SIZE=8 -sPROXY_TO_PTHREAD src/app.cpp -o dist/app-threads.js
# Combined high-performance build
emcc -O3 -pthread -msimd128 -flto -sPTHREAD_POOL_SIZE=8 -sPROXY_TO_PTHREAD src/app.cpp -o dist/app-max.js
Frequently Asked Questions
Which flag enables WebAssembly SIMD in Emscripten? +
-fno-vectorize -fno-slp-vectorize.Do I need to pass -pthread at both compile and link time? +
Why does my threaded Wasm build fail without COOP and COEP headers? +
Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp or credentialless so SharedArrayBuffer is available.Should I use INITIALHEAP or INITIALMEMORY for performance tuning? +
INITIAL_MEMORY when you need tighter control over total memory layout or when imported-memory constraints make heap-only sizing less suitable.When does mimalloc help a WebAssembly pthread build? +
mimalloc scales better for multithreaded malloc/free workloads, but it increases code size and memory usage, so it is best used as a measured optimization.Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.