Skip to content
Rochester, NY design, systems, study tools, and experiments Built in public

SQL: GROUP BY and HAVING

GROUP BY collapses rows that share a key into one row per group so aggregates (COUNT, SUM, AVG) can be applied. HAVING filters groups after aggregation, where WHERE can't reach.

sql sql aggregation relational 2026-04-15

GROUP BY reshapes a query result so that rows sharing a key value collapse into a single row per key. Aggregate functions (COUNT, SUM, AVG, MIN, MAX) then run within each group instead of across the whole table.

Basic shape

SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS revenue
FROM orders
GROUP BY customer_id;

One row per customer_id, with the count and revenue computed within that customer's orders.

The selection rule

Every column in the SELECT clause must either:

  1. Appear in the GROUP BY clause, or
  2. Be wrapped in an aggregate function.

Otherwise the database can't decide which value of that column to show — there are many rows per group, and only one slot in the output. PostgreSQL and standard SQL enforce this strictly. MySQL historically allowed it and would pick an arbitrary value, which is a footgun.

WHERE vs HAVING — the order matters

The logical order of operations in a SQL query is:

  1. FROM / JOIN — assemble rows
  2. WHERE — filter rows
  3. GROUP BY — collapse into groups
  4. HAVING — filter groups
  5. SELECT — pick columns
  6. ORDER BY — sort
  7. LIMIT — cap

WHERE filters individual rows before grouping. HAVING filters whole groups after aggregation. They are not interchangeable.

SELECT customer_id, SUM(total) AS revenue
FROM orders
WHERE status = 'completed'      -- drop incomplete orders before grouping
GROUP BY customer_id
HAVING SUM(total) > 1000;       -- keep only customers with >$1000 in completed orders

You can't reference SUM(total) in WHERE — it doesn't exist yet at that stage. You can't reference status in HAVING (without re-aggregating) — individual row values are gone after grouping.

GROUP BY multiple columns

SELECT region, product, SUM(units) AS units_sold
FROM sales
GROUP BY region, product;

One row per (region, product) combination. The order of columns in GROUP BY doesn't change which rows appear, but it can affect query plans and how some databases sort the output.

Common gotcha

COUNT(*) counts rows. COUNT(column) counts non-NULL values in that column. They differ when nulls are present:

SELECT COUNT(*), COUNT(email) FROM users;   -- second number is smaller if any email is NULL