|
Getting your Trinity Audio player ready...
|
If you work with:
- Sales reports
- Customer databases
- MIS dashboards
- Government project data
- FMCG company systems
You already know this truth:
Real-world data is messy.
You will face:
- NULL values
- Wrong date formats
- Extra spaces
- Mixed text cases
- Incorrect data types
- Blank strings
- Duplicates
- Logical inconsistencies
Before analysis, you must clean data.
This article will teach you how to clean data directly inside SQL using:
- IFNULL / COALESCE
- CASE statement
- NULLIF
- TRIM
- CAST
- Date conversion
- String formatting
- Conditional corrections
We will use the FMCG company dataset:
- customers
- products
- employees
- sales
Understanding NULL in SQL
NULL does NOT mean zero.
NULL does NOT mean empty string.
NULL means “Unknown”.
Example:
SELECT customer_name, email
FROM customers;Some customers may have:
- email = NULL
- email = ” (empty string)
These are different.
Also Read: Select Data in SQL
Handling NULL Values
COALESCE
COALESCE returns the first non-null value.
Example: Replace NULL Email
SELECT
customer_name,
COALESCE(email, 'Not Provided') AS email_status
FROM customers;If email is NULL → show “Not Provided”
This works in both PostgreSQL and MySQL.
IFNULL (MySQL Only)
In MySQL:
SELECT
customer_name,
IFNULL(email, 'Not Provided') AS email_status
FROM customers;⚠ PostgreSQL does NOT support IFNULL.
Use COALESCE instead.
Since PostgreSQL is primary for Smart Tutorials → always prefer COALESCE.
CASE Statement
CASE is used for conditional logic.
Think of it as SQL’s version of:
if-else
Example: Categorizing Customers by Sales
SELECT
customer_id,
SUM(sale_amount) AS total_sales,
CASE
WHEN SUM(sale_amount) > 100000 THEN 'Premium'
WHEN SUM(sale_amount) > 50000 THEN 'Gold'
ELSE 'Regular'
END AS customer_category
FROM sales
GROUP BY customer_id;Now you’re applying business logic directly in SQL.
This works in PostgreSQL and MySQL.
Removing Extra Spaces (TRIM)
Sometimes names are stored like:
‘ Rahul Sharma ‘
That creates reporting problems.
TRIM Example
SELECT
TRIM(customer_name) AS cleaned_name
FROM customers;You can also use:
LTRIM(customer_name)
RTRIM(customer_name)Works in both PostgreSQL and MySQL.
Fixing Text Case – UPPER, LOWER
Data may look like:
- raipur
- RAIPUR
- RaIpUr
We want consistency.
PostgreSQL + MySQL
SELECT
UPPER(city) AS city_upper,
LOWER(city) AS city_lower,
INITCAP(city) -- PostgreSQL only
FROM customers;⚠ INITCAP works only in PostgreSQL.
MySQL alternative:
SELECT
CONCAT(UPPER(LEFT(city,1)), LOWER(SUBSTRING(city,2))) AS city_proper
FROM customers;NULLIF (Convert Value to NULL)
NULLIF converts a specific value into NULL.
Example:
If empty string should be treated as NULL.
SELECT
NULLIF(email, '') AS cleaned_email
FROM customers;If email = ” → becomes NULL.
This is extremely useful before using COALESCE.
Converting Data Types (CAST)
Sometimes data is stored incorrectly.
Example:
price stored as TEXT instead of NUMERIC.
PostgreSQL
SELECT
CAST(price AS NUMERIC) AS numeric_price
FROM products;Or:
price::NUMERICMySQL
SELECT
CAST(price AS DECIMAL(10,2)) AS numeric_price
FROM products;Cleaning Date Formats
Real-life problem:
Date stored as text:
’05-Mar-2025 10:30 AM’
MySQL Conversion
SELECT
STR_TO_DATE(order_date, '%d-%b-%Y %h:%i %p') AS clean_date
FROM sales;PostgreSQL Conversion
SELECT
TO_TIMESTAMP(order_date, 'DD-Mon-YYYY HH12:MI AM') AS clean_date
FROM sales;This is extremely important in MIS reporting.
Handling Blank Strings vs NULL
Sometimes:
email = ”
instead of
email = NULL
Correct approach:
SELECT
COALESCE(NULLIF(email, ''), 'Not Provided') AS clean_email
FROM customers;First:
NULLIF(email, ”) → converts blank to NULL
Then COALESCE replaces NULL.
Professional-level cleaning.
Removing Duplicates
If duplicate customers exist:
PostgreSQL
SELECT DISTINCT *
FROM customers;If deleting duplicates:
DELETE FROM customers c1
USING customers c2
WHERE c1.customer_id > c2.customer_id
AND c1.email = c2.email;MySQL
DELETE c1
FROM customers c1
JOIN customers c2
ON c1.email = c2.email
AND c1.customer_id > c2.customer_id;Be careful while deleting duplicates.
Practical FMCG Cleaning Scenario
Imagine sales table has:
- sale_amount as TEXT
- blank city names
- NULL quantity
- inconsistent date format
Let’s clean in one query.
PostgreSQL Example
SELECT
sale_id,
customer_id,
CAST(sale_amount AS NUMERIC) AS clean_sale_amount,
COALESCE(quantity, 0) AS clean_quantity,
TRIM(UPPER(city)) AS clean_city,
TO_TIMESTAMP(order_date, 'DD-Mon-YYYY HH12:MI AM') AS clean_order_date
FROM sales;Now the dataset is analysis-ready.
Using CASE for Business Fixes
Example:
If quantity is NULL or 0 → mark as invalid.
SELECT
sale_id,
quantity,
CASE
WHEN quantity IS NULL OR quantity = 0 THEN 'Invalid'
ELSE 'Valid'
END AS quantity_status
FROM sales;This helps create data quality reports.
Handling Negative Values
Sometimes sale_amount becomes negative due to error.
SELECT
sale_id,
ABS(sale_amount) AS corrected_amount
FROM sales;ABS works in both PostgreSQL and MySQL.
Creating Clean Reporting View
Instead of cleaning every time, create a View.
CREATE VIEW clean_sales AS
SELECT
sale_id,
CAST(sale_amount AS NUMERIC) AS sale_amount,
COALESCE(quantity, 0) AS quantity,
TRIM(UPPER(city)) AS city
FROM sales;Now use:
SELECT * FROM clean_sales;This is how professionals build reporting layers.
PostgreSQL vs MySQL Differences Summary
| Function | PostgreSQL | MySQL |
|---|---|---|
| COALESCE | ✅ | ✅ |
| IFNULL | ❌ | ✅ |
| INITCAP | ✅ | ❌ |
| STR_TO_DATE | ❌ | ✅ |
| TO_TIMESTAMP | ✅ | ❌ |
| CAST | ✅ | ✅ |
| NULLIF | ✅ | ✅ |
PostgreSQL is stricter and more powerful for analytics.
Common Data Cleaning Mistakes
❌ Treating NULL as zero without checking
❌ Not trimming text
❌ Ignoring blank strings
❌ Not converting date types
❌ Running analysis on TEXT numbers
❌ Deleting duplicates without backup
Always clean carefully.
Why Data Cleaning is Critical for Data Analysts
In real jobs:
- 60–70% time goes in cleaning
- 20% in analysis
- 10% in reporting
If your data is wrong → your insights are wrong.
And wrong insights → wrong business decisions.
Final Summary
Data Cleaning in SQL is the process of preparing raw, messy data for accurate analysis. Using functions like COALESCE, CASE, NULLIF, TRIM, CAST, and date conversion tools, analysts can handle NULL values, fix inconsistent text, convert incorrect data types, remove duplicates, and apply business rules.
Both PostgreSQL and MySQL provide strong data cleaning capabilities, though syntax may differ slightly. Mastering data cleaning is essential for building reliable MIS reports, dashboards, and analytical queries. Clean data ensures accurate reporting, better insights, and professional-level SQL skills.