SQL Optimization [2026] Window Functions & CTE Rewrites
Bottom Line
Most slow analytic SQL is not fixed with one magic index. It gets faster when you reduce repeated scans, narrow sort work, and rewrite window and CTE patterns so the optimizer has less work to do.
Key Takeaways
- ›Use QUALIFY or a single outer filter to avoid extra ranking subqueries.
- ›Default window frames often surprise LAST_VALUE; set ROWS explicitly.
- ›A reused CTE may be recomputed; materialize once when the engine will not reuse it.
- ›Pre-aggregate before joins to reduce shuffle, sort, and spill pressure.
- ›Benchmark rewrites with the plan, not intuition: scan bytes, sorts, spills, and row counts.
High-volume analytics queries usually fail in predictable ways: repeated scans hidden behind readable CTEs, window functions that sort more data than necessary, and ranking patterns that force extra passes through already large intermediates. This cheat sheet focuses on practical rewrites you can apply fast, then verify with the plan. The goal is simple: reduce bytes read, sort width, spill risk, and duplicate work before you touch heavier physical design changes.
Key Takeaways
- Use QUALIFY or a single outer filter to avoid extra ranking subqueries.
- Default window frames often surprise LAST_VALUE; set ROWS explicitly.
- A reused CTE may be recomputed; materialize once when the engine will not reuse it.
- Pre-aggregate before joins to reduce shuffle, sort, and spill pressure.
- Benchmark rewrites with the plan, not intuition: scan bytes, sorts, spills, and row counts.
Quick Reference
Bottom Line
Rewrite for less work before you tune for more horsepower. In practice, the biggest wins come from shrinking intermediate result sets and stopping the planner from doing the same expensive step twice.
Press / to focus search, Esc to clear, j and k to jump sections, and c to copy the next visible code block.
| Shortcut | Action | Use It For |
|---|---|---|
/ |
Focus filter | Jump straight to a rewrite pattern |
Esc |
Clear filter | Reset the cheat sheet view |
j |
Next section | Move through the reference without scrolling |
k |
Previous section | Review the prior block quickly |
c |
Copy code | Grab the next visible snippet |
- Use windows when you need row context without collapsing rows.
- Use grouped aggregates when you can reduce cardinality early.
- Use temp tables or materialized stages when the same heavy subquery feeds multiple branches.
- Before sharing sample data or plans with a team, scrub sensitive fields with the Data Masking Tool.
Commands by Purpose
Filter top-N per partition
- Best when you need one row or a small top-N slice per key.
- Prefer QUALIFY where supported because it removes one nesting layer.
- If QUALIFY is unavailable, keep the ranking in one subquery and filter once.
SELECT
customer_id,
order_id,
event_ts
FROM orders
QUALIFY ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY event_ts DESC
) = 1;
Replace row-to-row self-joins
- Self-joins can explode row counts and add unnecessary join work.
- LAG and LEAD usually preserve intent with less shuffle.
SELECT
session_id,
event_ts,
event_value - LAG(event_value) OVER (
PARTITION BY session_id
ORDER BY event_ts
) AS delta
FROM events;
Fix misleading default frames
- The default frame is often fine for ranking, but not for every value function.
- LAST_VALUE is a common trap because the default frame may stop at the current peer group.
- Set ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING when you want the partition-wide last row.
SELECT
account_id,
event_ts,
LAST_VALUE(status) OVER (
PARTITION BY account_id
ORDER BY event_ts
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS final_status
FROM account_events;
Pre-aggregate before the join
- Large fact-to-dimension joins get cheaper when you collapse the fact side first.
- This is especially useful when downstream logic only needs grouped metrics, not raw rows.
WITH order_totals AS (
SELECT
customer_id,
region_id,
SUM(amount) AS gross
FROM fact_orders
WHERE order_date >= @start_date
GROUP BY customer_id, region_id
)
SELECT
ot.customer_id,
dr.region_name,
ot.gross
FROM order_totals ot
JOIN dim_region dr
ON dr.region_id = ot.region_id;
Inline or materialize a CTE deliberately
- Readable WITH clauses do not guarantee reuse.
- In BigQuery documentation, WITH is described as a readability feature, not a performance guarantee.
- In PostgreSQL, eligible WITH queries can be inlined, and NOT MATERIALIZED can force that behavior.
WITH filtered AS NOT MATERIALIZED (
SELECT
user_id,
event_date,
revenue
FROM fact_events
WHERE event_date >= @start_date
)
SELECT
user_id,
SUM(revenue) AS total_revenue
FROM filtered
GROUP BY user_id;
CREATE TEMP TABLE heavy_stage AS
SELECT
customer_id,
region_id,
SUM(amount) AS gross
FROM fact_orders
WHERE order_date >= @start_date
GROUP BY customer_id, region_id;
SELECT hs.customer_id, dr.region_name, hs.gross
FROM heavy_stage hs
JOIN dim_region dr
ON dr.region_id = hs.region_id;
Configuration
Planner-friendly defaults
- Align physical partitioning or clustering with the columns used in your largest filters.
- Keep window partitions narrow where possible; wide partitions multiply sort pressure.
- Prefer stable sort keys with good cardinality for ranking and dedupe patterns.
- Persist expensive transformations if the same shaped data feeds many dashboards or jobs.
- Keep SQL readable; if you share snippets publicly, clean them with the Code Formatter after the rewrite is stable.
What to inspect in the plan
| Signal | What it usually means | Rewrite first |
|---|---|---|
| Large sort stage | Window or final ordering is wider than it needs to be | Project fewer columns and reduce partition size |
| Repeated scan of same branch | CTE or subquery is being reevaluated | Materialize once or inline to enable pushdown |
| Row explosion after join | Join occurs before dedupe or aggregation | Pre-aggregate, filter earlier, or rewrite self-join |
| Spill to disk | Sort or hash state exceeds memory budget | Shrink intermediate rows and split the workload |
Advanced Usage
Use QUALIFY to keep ranking local
- When your engine supports it, QUALIFY filters window results after the window step without forcing an extra wrapper query.
- It is especially useful for latest-row, top-N, and dedupe patterns.
Trade exact ranking for approximate bucketing when scale demands it
- BigQuery documents that heavy NTILE usage can require a global ORDER BY and may hit resource limits.
- For percentile-style grouping at massive scale, approximate quantiles can be the cheaper path if exact bucket boundaries are not mandatory.
SELECT
APPROX_QUANTILES(latency_ms, 100) AS latency_percentiles
FROM api_requests;
Split logic when the optimizer cannot
- If one monster query both transforms and serves analytics, separate the expensive stage from the final read path.
- Temp tables, scheduled summary tables, and materialized views are often easier to operate than one heroic statement.
Plan Checks
Fast verification loop
- Capture the baseline plan and runtime metrics.
- Rewrite one pattern at a time: ranking, frame, join order, or CTE reuse.
- Re-run with the same date range and concurrency conditions.
- Compare scan bytes, sort stages, spill, and output row counts.
- Keep the simpler query only if the plan is measurably better.
EXPLAIN ANALYZE
SELECT
customer_id,
SUM(amount) AS gross
FROM fact_orders
WHERE order_date >= @start_date
GROUP BY customer_id;
- For shared benchmarks, keep the same predicates, same warehouse or slot conditions, and same output shape.
- If the rewrite changes semantics, stop calling it an optimization and document it as a behavior change.
Frequently Asked Questions
How do I make window functions faster on large analytics tables? +
Are CTEs always materialized in modern SQL engines? +
WITH clauses as a readability feature rather than a guaranteed cache. If a heavy CTE is referenced multiple times, verify the plan and consider a temp table or an engine-specific materialization hint.Why does LAST_VALUE return the wrong-looking result? +
ROWS frame, often ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, when you want the final row from the whole partition.When should I replace a self-join with LAG or LEAD? +
Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.
Related Deep-Dives
PostgreSQL EXPLAIN ANALYZE Cheat Sheet
A fast reference for reading scan, join, sort, and buffer signals in PostgreSQL plans.
Cloud InfrastructureBigQuery Performance Tuning Guide
Practical tactics for lowering scan bytes, shuffle, and slot pressure in warehouse workloads.
System ArchitectureMaterialized Views vs Views in Production
Where cached query results help, where they hurt, and how refresh strategy changes the tradeoff.