# Introduction
Knowing just SELECT, WHERE, and GROUP BY covers basic data summarization, but many real-world analysis jobs demand more advanced query patterns. Think about tasks like spotting consecutive activity streaks, grouping customers by spending level, smoothing out noisy time-series data, or tracking how users move between subscription plans over time.
This guide covers 7 hands-on SQL techniques that go past the fundamentals, zeroing in on methods that tackle genuine analytical challenges.
# Setting Up the Dataset
We’ll work with a sample transactions table from a made-up subscription software (SaaS) business:
CREATE TABLE transactions (
transaction_id SERIAL PRIMARY KEY,
customer_id INT,
plan_type VARCHAR(20), -- 'starter', 'pro', 'enterprise'
amount NUMERIC(10,2),
status VARCHAR(20), -- 'completed', 'refunded', 'failed'
created_at TIMESTAMP
);The complete dataset includes 36 transactions from 7 customers, covering September 2023 to June 2024, and is stored in seed.sql. Make sure to run it before trying the queries below.
# 1. Measuring Time Between Events with LAG()
LAG() and LEAD() let you pull values from the previous or next row without writing a self-join. They’re especially handy for figuring out the gap between events — like how often a customer renews, signs they might be about to churn, or how long it takes them to come back.
Task: Figure out how many days passed between each customer’s consecutive completed transactions.
SELECT
customer_id,
created_at,
LAG(created_at) OVER (
PARTITION BY customer_id
ORDER BY created_at
) AS previous_transaction_at,
ROUND(
EXTRACT(EPOCH FROM (
created_at - LAG(created_at) OVER (
PARTITION BY customer_id
ORDER BY created_at
)
)) / 86400
) AS days_since_last
FROM transactions
WHERE status="completed"
ORDER BY customer_id, created_at;Output (truncated):
customer_id | created_at | previous_transaction_at | days_since_last
-------------+---------------------+-------------------------+-----------------
3317 | 2024-01-03 11:02:00 | |
3317 | 2024-03-15 10:45:00 | 2024-01-03 11:02:00 | 72
3317 | 2024-05-22 09:30:00 | 2024-03-15 10:45:00 | 68
4482 | 2023-09-10 09:00:00 | |
4482 | 2023-10-10 09:00:00 | 2023-09-10 09:00:00 | 30
4482 | 2023-11-10 09:14:00 | 2023-10-10 09:00:00 | 31
4482 | 2024-01-03 09:14:00 | 2023-11-10 09:14:00 | 54
4482 | 2024-03-03 08:20:00 | 2024-01-03 09:14:00 | 60
4482 | 2024-04-03 10:00:00 | 2024-03-03 08:20:00 | 31
4482 | 2024-05-01 11:00:00 | 2024-04-03 10:00:00 | 28
...
7891 | 2024-02-01 09:00:00 | |
7891 | 2024-04-01 09:00:00 | 2024-02-01 09:00:00 | 60
7891 | 2024-05-15 09:00:00 | 2024-04-01 09:00:00 | 44
8810 | 2024-01-05 12:00:00 | |
8810 | 2024-02-05 12:00:00 | 2024-01-05 12:00:00 | 31
8810 | 2024-04-05 12:00:00 | 2024-02-05 12:00:00 | 60
(29 rows)The first row for every customer shows NULL in both columns — there’s simply no earlier transaction to look back at. EXTRACT(EPOCH ...) turns the time difference into seconds; dividing by 86400 converts that into days.
LEAD() works identically but looks ahead instead of behind, which is useful for computing time until the next renewal or flagging the final transaction before a customer drops off.
# 2. Comparing a Row to Other Rows in the Same Table with a Self-Join
A self-join connects rows from the same table to one another. It’s the go-to approach when you need to compare two events for the same entity at different points in time — upgrades, downgrades, re-activations, or any before-and-after scenario.
Task: Identify customers who moved up from starter to pro (or from pro to enterprise) at some point.
SELECT DISTINCT t1.customer_id
FROM transactions t1
JOIN transactions t2
ON t1.customer_id = t2.customer_id
AND t1.plan_type="starter"
AND t2.plan_type="pro"
AND t2.created_at > t1.created_at
WHERE t1.status="completed"
AND t2.status="completed"
ORDER BY t1.customer_id;Output:
customer_id
-------------
4482
6204
7891
(3 rows)The table gets two aliases (t1, t2) so each one can stand for a different moment in time for the same customer. The condition t2.created_at > t1.created_at makes sure the order is correct — without it, you’d catch customers who had both plan types in any sequence, even the reverse. DISTINCT removes duplicates that would appear when a customer had several starter transactions before moving up.
This same pattern works for spotting downgrades, finding customers who left and returned, or comparing any two states that need to follow a chronological order.
# 3. Selecting the Top Row per Group with ROW_NUMBER()
When you need the top-N rows within each category — the biggest transaction per customer, the latest event per account, the earliest purchase per cohort — wrapping ROW_NUMBER() inside a common table expression (CTE) is the standard method.
Task: Pull each customer’s single largest completed transaction.
WITH ranked AS (
SELECT
customer_id,
transaction_id,
amount,
plan_type,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY amount DESC, created_at DESC
) AS rn
FROM transactions
WHERE status="completed"
)
SELECT customer_id, transaction_id, amount, plan_type
FROM ranked
WHERE rn = 1
ORDER BY customer_id;Output:
customer_id | transaction_id | amount | plan_type
-------------+----------------+--------+------------
3317 | 12 | 19.00 | starter
4482 | 8 | 299.00 | enterprise
5901 | 19 | 299.00 | enterprise
6103 | 25 | 299.00 | enterprise
6204 | 28 | 79.00 | pro
7891 | 32 | 79.00 | pro
8810 | 36 | 79.00 | pro
(7 rows)ROW_NUMBER() gives the value 1 to the top-ranked row within each group. The outer query then keeps only those top rows. The secondary sort on created_at DESC serves as a tiebreaker — when two transactions share the same amount, the newer one takes priority.
If you’d rather keep ties instead of breaking them, replace ROW_NUMBER() with RANK(). RANK() gives tied rows the same number and then skips the next rank (1, 1, 3), while DENSE_RANK() assigns the same number to ties without skipping (1, 1, 2).
# 4. Grouping Customers by Spending Level Using NTILE(n)
NTILE(n) splits sorted rows into n roughly equal groups and labels each row with its group number. It’s ideal for sorting customers into tiers, calculating spend quartiles, or building cohorts for A/B testing — all without setting fixed thresholds.
Task: Sort customers into four spending groups based on their total completed transaction amount.
WITH customer_spend AS (
SELECT
customer_id,
SUM(amount) AS total_spend,
COUNT(*) AS total_transactions
FROM transactions
WHERE status="completed"
GROUP BY customer_id
)
SELECT
customer_id,
total_spend,
total_transactions,
NTILE(4) OVER (ORDER BY total_spend) AS spend_quartile
FROM customer_spend
ORDER BY total_spend DESC;Output:
customer_id | total_spend | total_transactions | spend_quartile
-------------+-------------+--------------------+----------------
5901 | 1495.00 | 5 | 4
6103 | 835.00 | 5 | 3
4482 | 653.00 | 7 | 3
8810 | 237.00 | 3 | 2
6204 | 177.00 | 3 | 2
7891 | 177.00 | 3 | 1
3317 | 57.00 | 3 | 1
(7 rows)Quartile 4 represents your biggest spenders, while quartile 1 covers the smallest. Since NTILE() doesn’t rely on hardcoded spend cutoffs, the groups adjust automatically as new data comes in. This approach is far more flexible than fixed thresholds like CASE WHEN total_spend > 500.
# 5. Smoothing Out Fluctuations with a Rolling Average
A rolling (or moving) average evens out month-to-month swings, making trends in time-series data much clearer. Window functions with an explicit ROWS BETWEEN clause let you control exactly how many periods are included in the calculation.
Task: Compute a 3-month rolling average of monthly revenue to reduce noise.
WITH monthly AS (
SELECT
DATE_TRUNC('month', created_at)::DATE AS month,
SUM(amount) AS monthly_revenue
FROM transactions
WHERE status="completed"
GROUP BY DATE_TRUNC('month', created_at)
)
SELECT
month,
monthly_revenue,
ROUND(AVG(monthly_revenue) OVER (
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2) AS revenue_3mo_avg
FROM monthly
ORDER BY month;Output:
month | monthly_revenue | revenue_3mo_avg
-------------+-----------------+-----------------
2023-09-01 | 19.00 | 19.00
2023-10-01 | 19.00 | 19.00
2023-11-01 | 79.00 | 39.00
2024-01-01 | 275.00 | 124.33
2024-02-01 | 476.00 | 276.67
2024-03-01 | 555.00 | 435.33
2024-04-01 | 835.00 | 622.00
2024-05-01 | 775.00 | 721.67
2024-06-01 | 598.00 | 736.00
(9 rows)ROWS BETWEEN 2 PRECEDING AND CURRENT ROW tells the window function to consider the current row plus the two rows before it. For the first two rows, there isn’t enough prior data, so they use a 1-month and 2-month average respectively.
Switch ROWS to RANGE if you want to include all rows that share the same ORDER BY value (handy when multiple rows have the same timestamp). For a wider smoothing window, change 2 PRECEDING to 5 PRECEDING for a 6-month rolling average.
# 6. Conditional Aggregation Using FILTER
FILTER allows you to apply a WHERE condition to a specific aggregate without breaking the query into multiple subqueries. This lets you compute several conditional summaries in a single pass through the data.
Task: Pull total revenue, refunds, and failed transaction counts grouped by month — all in one row per month.
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(amount) FILTER (WHERE status="completed") AS revenue_completed,
SUM(amount) FILTER (WHERE status="refunded") AS revenue_refunded,
COUNT(*) FILTER (WHERE status="failed") AS failed_count
FROM transactions
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;Output:
month | revenue_completed | revenue_refunded | failed_count
------------------------+-------------------+------------------+--------------
2023-09-01 00:00:00+00 | 19.00 | | 0
2023-10-01 00:00:00+00 | 19.00 | | 0
2023-11-01 00:00:00+00 | 79.00 | | 0
2024-01-01 00:00:00+00 | 275.00 | | 0
2024-02-01 00:00:00+00 | 476.00 | 79.00 | 1
2024-03-01 00:00:00+00 | 555.00 | 79.00 | 0
2024-04-01 00:00:00+00 | 835.00 | 299.00 | 0
2024-05-01 00:00:00+00 | 775.00 | | 1
2024-06-01 00:00:00+00 | 598.00 | | 2
(9 rows)
Instead of using FILTER, you could write three separate subqueries and join them together — but that means more code, reduced readability, and often worse performance. Keep in mind that SUM paired with FILTER yields NULL (rather than zero) when no rows qualify in a given month, which is correct: those months truly had no refunds. Use COALESCE(..., 0) if you’d rather see zeros.
FILTER is part of standard SQL and is supported in PostgreSQL and BigQuery. On Snowflake and certain other platforms, switch to SUM(CASE WHEN status="completed" THEN amount END) as an alternative.
# 7. Spotting Consecutive Activity Streaks Using Window Functions
Identifying uninterrupted sequences — such as months with no inactivity gap, back-to-back days with transactions, or ongoing subscription runs — ranks among the more challenging SQL tasks. The go-to approach leverages a window function to cluster rows into streaks, avoiding the need for a recursive CTE.
Here’s the idea: give each active month a sequential row number within its customer group. When months are truly back-to-back, subtracting that row number from the month date yields the same value across every month in the streak. Any gap disrupts that constant.
Task: Identify each customer’s runs of consecutive active months (months containing at least one completed transaction).
WITH monthly_activity AS (
SELECT
customer_id,
DATE_TRUNC('month', created_at)::DATE AS active_month
FROM transactions
WHERE status="completed"
GROUP BY customer_id, DATE_TRUNC('month', created_at)
),
with_prev AS (
SELECT
customer_id,
active_month,
LAG(active_month) OVER (
PARTITION BY customer_id
ORDER BY active_month
) AS prev_month
FROM monthly_activity
),
streak_groups AS (
SELECT
customer_id,
active_month,
SUM(CASE WHEN active_month = prev_month + INTERVAL '1 month' THEN 0 ELSE 1 END)
OVER (PARTITION BY customer_id ORDER BY active_month) AS streak_id
FROM with_prev
),
streaks AS (
SELECT
customer_id,
streak_id,
MIN(active_month) AS streak_start,
MAX(active_month) AS streak_end,
COUNT(*) AS streak_length_months
FROM streak_groups
GROUP BY customer_id, streak_id
)
SELECT customer_id, streak_start, streak_end, streak_length_months
FROM streaks
ORDER BY customer_id, streak_start;
Output:
customer_id | streak_start | streak_end | streak_length_months
-------------+--------------+------------+----------------------
3317 | 2024-01-01 | 2024-01-01 | 1
3317 | 2024-03-01 | 2024-03-01 | 1
3317 | 2024-05-01 | 2024-05-01 | 1
4482 | 2023-09-01 | 2023-11-01 | 3
4482 | 2024-01-01 | 2024-01-01 | 1
4482 | 2024-03-01 | 2024-05-01 | 3
5901 | 2024-02-01 | 2024-06-01 | 5
6103 | 2024-01-01 | 2024-04-01 | 4
6103 | 2024-06-01 | 2024-06-01 | 1
6204 | 2024-01-01 | 2024-01-01 | 1
6204 | 2024-03-01 | 2024-03-01 | 1
6204 | 2024-05-01 | 2024-05-01 | 1
7891 | 2024-02-01 | 2024-02-01 | 1
7891 | 2024-04-01 | 2024-05-01 | 2
8810 | 2024-01-01 | 2024-02-01 | 2
8810 | 2024-04-01 | 2024-04-01 | 1
(16 rows)
# Quick Reference
These techniques rely on standard SQL without depending on any database-specific features, and they come up regularly in analytical tasks like retention studies, upgrade funnel monitoring, and revenue reporting.
| Tip | When to Use It |
|---|---|
LAG() / LEAD() | Measuring time between events, comparing before/after values per entity |
| Self-join | Tracking state transitions (plan upgrades, account re-activations) |
ROW_NUMBER() | Selecting top-N rows per group, removing duplicates |
NTILE(n) | Dividing customers into spending or activity tiers |
Rolling window (ROWS BETWEEN) | Smoothing out noisy time-series data, computing moving averages |
FILTER | Applying multiple conditional aggregations in a single query pass |
| Consecutive streak detection | Subscription continuity, retention analysis, identifying session gaps |
Once these patterns feel natural, many multi-step data transformations that people typically handle in Python can be written more clearly and run faster in a single SQL query.
Bala Priya C is a developer and technical writer from India. She loves exploring the crossroads of math, programming, data science, and content creation. Her focus areas and expertise span DevOps, data science, and natural language processing. When she’s not reading, writing, coding, or enjoying coffee, she’s building tutorials, how-to guides, opinion pieces, and resource overviews to help fellow developers grow.



