GROUP BY in SQL

Getting your Trinity Audio player ready...

When working with business data, we rarely want one total number.

We want:

  • Total sales per city
  • Revenue per product category
  • Employees per department
  • Monthly sales summary
  • Incentive paid per employee

This is done using GROUP BY.


What is GROUP BY?

GROUP BY is used to group rows that have the same values in specified columns.

It is usually used with:

  • SUM()
  • COUNT()
  • AVG()
  • MIN()
  • MAX()

Also Read: Aggregate Functions in SQL


FMCG Dataset Tables Used

  • customers
  • sales
  • products
  • employees

Basic GROUP BY Example

Total Sales Per City

SELECT city, SUM(sale_amount) AS total_sales
FROM sales
GROUP BY city;

What happens here?

  • Rows are grouped by city
  • SUM is calculated for each city
  • One row per city is returned

This works the same in PostgreSQL and MySQL.


Count Customers Per City

SELECT city, COUNT(*) AS total_customers
FROM customers
GROUP BY city;

Business Use:
Marketing team can see customer distribution.


Revenue Per Product Category

SELECT category, SUM(price * quantity) AS total_revenue
FROM products
GROUP BY category;

Now we’re generating management-level insights.


Multiple Column GROUP BY

Sales Per City Per Year

PostgreSQL

SELECT city,
       EXTRACT(YEAR FROM sale_date) AS sale_year,
       SUM(sale_amount) AS total_sales
FROM sales
GROUP BY city, sale_year
ORDER BY city, sale_year;

MySQL

SELECT city,
       YEAR(sale_date) AS sale_year,
       SUM(sale_amount) AS total_sales
FROM sales
GROUP BY city, sale_year
ORDER BY city, sale_year;

Now we’re doing time-based analysis 🔥


GROUP BY with WHERE

Important rule:

WHERE filters data BEFORE grouping.

Example: Sales After 2024

SELECT city, SUM(sale_amount) AS total_sales
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY city;

Flow:

  1. WHERE filters rows
  2. GROUP BY groups remaining rows
  3. Aggregate applied

HAVING Clause (Very Important)

HAVING filters data AFTER grouping.

You cannot use WHERE to filter aggregate results.

❌ Wrong:

SELECT city, SUM(sale_amount)
FROM sales
WHERE SUM(sale_amount) > 50000
GROUP BY city;

This gives error.


Correct Way Using HAVING

SELECT city, SUM(sale_amount) AS total_sales
FROM sales
GROUP BY city
HAVING SUM(sale_amount) > 50000;

Now only cities with sales greater than 50,000 are shown.


Real-Life FMCG Examples


Example 1: Employees Per Department

SELECT department, COUNT(*) AS total_employees
FROM employees
GROUP BY department;

Example 2: Cities With More Than 100 Customers

SELECT city, COUNT(*) AS total_customers
FROM customers
GROUP BY city
HAVING COUNT(*) > 100;

Now this is business intelligence level reporting.


Example 3: Top 3 Revenue Cities

SELECT city, SUM(sale_amount) AS total_sales
FROM sales
GROUP BY city
ORDER BY total_sales DESC
LIMIT 3;

Used in dashboards all the time.


Execution Order (Very Important)

SQL does NOT run in written order.

Actual order:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. ORDER BY
  7. LIMIT

Understanding this prevents confusion.


PostgreSQL vs MySQL Differences

FeaturePostgreSQLMySQL
Strict GROUP BY rulesStrictDepends on SQL mode
Date extractionEXTRACT()YEAR(), MONTH()
HAVING support✅ Yes✅ Yes

PostgreSQL forces clean grouping (good for learning).


Common Mistakes

❌ Selecting column not included in GROUP BY
❌ Using WHERE with aggregate function
❌ Forgetting HAVING
❌ Not aliasing aggregated columns


Why GROUP BY Is Critical for Data Analysts

With GROUP BY you can:

  • Generate summary reports
  • Build KPIs
  • Create pivot-style outputs
  • Perform trend analysis
  • Prepare Power BI / Tableau datasets

Without GROUP BY, analytics is incomplete.


Summary

GROUP BY in SQL is used to group rows with similar values and apply aggregate functions like SUM, COUNT, AVG, MIN, and MAX. It is essential for generating business reports such as sales per city, revenue per category, and employee count per department. The HAVING clause is used to filter grouped results after aggregation.

Both PostgreSQL and MySQL support GROUP BY with similar syntax, though PostgreSQL follows stricter grouping rules. GROUP BY is a foundational reporting tool for Data Analysts and MIS professionals.


What’s Next?

👉 Subqueries in SQL

Spread the love

Leave a Comment

Your email address will not be published. Required fields are marked *

Translate »
Scroll to Top