Analyze Data in Pandas - Smart Tutorials

Getting your Trinity Audio player ready...

Data analysis is one of the most in-demand skills today, whether you’re working in MIS, sales, finance, HR, or data science. And when it comes to analyzing data in Python, Pandas is the most powerful and beginner-friendly library you will ever use.

But here’s the real challenge:

Most beginners don’t know where to start. Should you learn filtering first? Or groupby? Or pivot tables? Or merging datasets?

In this complete guide, we will break everything down step-by-step using a real-world dataset that includes sales transactions, customers, products, prices, ratings, and regions.

We’ll learn:

How to load data
How to explore datasets
How to clean and transform data
How to find business insights
How to summarize results
How to prepare your dataset for reporting

This article is practical, simple, and designed to help you analyze any dataset confidently.

Dataset Used in This Tutorial

We will use the following sample dataset stored as a TSV file:

Dataset URL:

https://docs.google.com/spreadsheets/d/e/2PACX-1vTR5NhWbt9Vwz4I4C9r2R7UqmDQrkX_zbeCEykWkWQN3c217r3UFtdSEOwufhT71oC-sy_OzAA9QVIn/pub?gid=0&single=true&output=tsv

https://docs.google.com/spreadsheets/d/e/2PACX-1vTR5NhWbt9Vwz4I4C9r2R7UqmDQrkX_zbeCEykWkWQN3c217r3UFtdSEOwufhT71oC-sy_OzAA9QVIn/pub?gid=0&single=true&output=tsv

This file contains:

Column	Description
OrderID	Unique order number
Date	Order date
CustomerID	Customer code
Product	Product name
Category	Product category
Quantity	Units sold
UnitPrice	Price per unit
Region	Sales region
SalesRep	Sales representative
PaymentMethod	Cash/UPI/Card etc.
Rating	Customer rating

This dataset looks like something you would receive in a real business:
Sales data, product data, missing values, mixed formats—everything needed for practical analysis.

Analyzing Data in Pandas

Let’s start step-by-step.

1. Importing Pandas and Loading the Dataset

import pandas as pd

url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vTR5NhWbt9Vwz4I4C9r2R7UqmDQrkX_zbeCEykWkWQN3c217r3UFtdSEOwufhT71oC-sy_OzAA9QVIn/pub?gid=0&single=true&output=tsv"

df = pd.read_csv(url, sep="\t")

import pandas as pd

url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vTR5NhWbt9Vwz4I4C9r2R7UqmDQrkX_zbeCEykWkWQN3c217r3UFtdSEOwufhT71oC-sy_OzAA9QVIn/pub?gid=0&single=true&output=tsv"

df = pd.read_csv(url, sep="\t")

2. Exploring the Dataset

2.1 Preview First 5 Rows

df.head()

df.head()

2.2 Check Shape (Rows and Columns)

df.shape

df.shape

2.3 Get Summary Information

df.info()

df.info()

This helps us see data types, missing values, and column structure.

2.4 Statistical Summary

df.describe()

df.describe()

Also More: Basic Functions in Pandas

3. Adding a New Column: Total Sales

In business reports, we always need total sales = Quantity × UnitPrice.

df["TotalSales"] = df["Quantity"] * df["UnitPrice"]

df["TotalSales"] = df["Quantity"] * df["UnitPrice"]

Sample Output:

OrderID	Quantity	UnitPrice	TotalSales
1001	1	1200	1200
1002	2	25	50

4. Handling Missing Values

Our dataset contains missing SalesRep and Rating values.

4.1 Check Missing Values

df.isnull().sum()

df.isnull().sum()

4.2 Fill Missing SalesRep with “Unknown”

df["SalesRep"] = df["SalesRep"].fillna("Unknown")

df["SalesRep"] = df["SalesRep"].fillna("Unknown")

4.3 Fill Missing Ratings with Mean Rating

df["Rating"] = df["Rating"].fillna(df["Rating"].mean())

df["Rating"] = df["Rating"].fillna(df["Rating"].mean())

5. Filtering Data

5.1 Filter Electronics Category

df[df["Category"] == "Electronics"]

df[df["Category"] == "Electronics"]

5.2 Filter Orders from North Region

df[df["Region"] == "North"]

df[df["Region"] == "North"]

5.3 Filter Orders with Rating above 4.5

df[df["Rating"] > 4.5]

df[df["Rating"] > 4.5]

5.4 Multiple Conditions

df[(df["Category"] == "Electronics") & (df["Region"] == "North")]

df[(df["Category"] == "Electronics") & (df["Region"] == "North")]

6. Sorting Data

6.1 Sort by Total Sales

df.sort_values("TotalSales", ascending=False)

df.sort_values("TotalSales", ascending=False)

6.2 Sort by Rating then Date

df.sort_values(["Rating", "Date"], ascending=[False, True])

df.sort_values(["Rating", "Date"], ascending=[False, True])

7. Grouping and Aggregation

Grouping is one of the most important parts of data analysis.

7.1 Total Sales by Region

df.groupby("Region")["TotalSales"].sum()

df.groupby("Region")["TotalSales"].sum()

Example Output:

Region	TotalSales
North	4,20,000
South	3,80,000
East	3,10,000
West	2,95,000

7.2 Average Rating by Category

df.groupby("Category")["Rating"].mean()

df.groupby("Category")["Rating"].mean()

7.3 Sales by SalesRep

df.groupby("SalesRep")["TotalSales"].sum()

df.groupby("SalesRep")["TotalSales"].sum()

7.4 Count of Orders Per Category

df.groupby("Category")["OrderID"].count()

df.groupby("Category")["OrderID"].count()

8. Pivot Table Analysis

Pivot tables in Pandas replicate Excel-like reporting.

pd.pivot_table(
    df,
    values="TotalSales",
    index="Category",
    columns="Region",
    aggfunc="sum",
    fill_value=0
)

pd.pivot_table(
    df,
    values="TotalSales",
    index="Category",
    columns="Region",
    aggfunc="sum",
    fill_value=0
)

9. Unique Values and Value Counts

9.1 Unique Categories

df["Category"].unique()

df["Category"].unique()

9.2 Count of Payment Methods

df["PaymentMethod"].value_counts()

df["PaymentMethod"].value_counts()

10. Analyzing Top and Bottom Performers

10.1 Top 5 Highest Selling Products

df.groupby("Product")["TotalSales"].sum().sort_values(ascending=False).head()

df.groupby("Product")["TotalSales"].sum().sort_values(ascending=False).head()

10.2 Bottom 5 Lowest Rated Products

df.groupby("Product")["Rating"].mean().sort_values().head()

df.groupby("Product")["Rating"].mean().sort_values().head()

11. Time-Based Analysis

Convert Date column to datetime:

df["Date"] = pd.to_datetime(df["Date"])

df["Date"] = pd.to_datetime(df["Date"])

11.1 Monthly Sales

df.resample("M", on="Date")["TotalSales"].sum()

df.resample("M", on="Date")["TotalSales"].sum()

11.2 Daily Order Count

df.resample("D", on="Date")["OrderID"].count()

df.resample("D", on="Date")["OrderID"].count()

12. Detecting Outliers

12.1 Using Describe

Outliers are visible through high deviation between mean and max.

12.2 IQR Method

Q1 = df["TotalSales"].quantile(0.25)
Q3 = df["TotalSales"].quantile(0.75)
IQR = Q3 - Q1

df[(df["TotalSales"] < (Q1 - 1.5*IQR)) |
   (df["TotalSales"] > (Q3 + 1.5*IQR))]

Q1 = df["TotalSales"].quantile(0.25)
Q3 = df["TotalSales"].quantile(0.75)
IQR = Q3 - Q1

df[(df["TotalSales"] < (Q1 - 1.5*IQR)) |
   (df["TotalSales"] > (Q3 + 1.5*IQR))]

13. Exporting Data

Export to CSV

df.to_csv("processed_sales.csv", index=False)

df.to_csv("processed_sales.csv", index=False)

Export to Excel

df.to_excel("sales_analysis.xlsx", index=False)

df.to_excel("sales_analysis.xlsx", index=False)

Use Cases of Pandas Data Analysis

✔ Sales Reporting

Total sales, average order value, top products, least performing category.

✔ MIS Reporting

Generate monthly reports, region-based performance, agent activity.

✔ Finance

Calculate revenue, expenses, profit, trends.

✔ HR Analytics

Attendance patterns, performance scores, department-wise summaries.

✔ Ecommerce Analytics

Customer behavior, order trends, product performance.

Benefits of Using Pandas for Analysis

Easy to write and understand
Works fast even with large datasets
Excel-like operations with more power
Integrates easily with NumPy and Matplotlib
Easy exporting and automation
Perfect for beginners and professionals

Challenges Beginners Face

❌ Confusion between loc and iloc
❌ Handling missing values incorrectly
❌ Using wrong merge type
❌ Misinterpreting groupby outputs
❌ Date parsing errors

Best Practices

✔ Always check data types
✔ Fix missing values early
✔ Create calculated fields like TotalSales
✔ Use groupby instead of loops
✔ Convert date columns to datetime
✔ Export cleaned data for reporting
✔ Avoid modifying original DataFrame—use copies

Conclusion

Pandas makes data analysis extremely simple and powerful. Once you learn how to load data, clean it, filter it, group it, and extract insights, you can handle any real dataset confidently—exactly like an analyst or data scientist.

This guide gives you everything you need to analyze data step-by-step using a real dataset. Practice these techniques regularly, and you will quickly master Pandas for business reporting, MIS automation, and data science.

What’s Next?

In the next post, we’ll learn about the Cleaning Data in Pandas

Spread the love