Analyze Data in Pandas

Getting your Trinity Audio player ready...

Data analysis is one of the most in-demand skills today, whether you’re working in MIS, sales, finance, HR, or data science. And when it comes to analyzing data in Python, Pandas is the most powerful and beginner-friendly library you will ever use.

But here’s the real challenge:

Most beginners don’t know where to start. Should you learn filtering first? Or groupby? Or pivot tables? Or merging datasets?

Analyze Data in Pandas

In this complete guide, we will break everything down step-by-step using a real-world dataset that includes sales transactions, customers, products, prices, ratings, and regions.

We’ll learn:

  • How to load data
  • How to explore datasets
  • How to clean and transform data
  • How to find business insights
  • How to summarize results
  • How to prepare your dataset for reporting

This article is practical, simple, and designed to help you analyze any dataset confidently.


Dataset Used in This Tutorial

We will use the following sample dataset stored as a TSV file:

Dataset URL:

https://docs.google.com/spreadsheets/d/e/2PACX-1vTR5NhWbt9Vwz4I4C9r2R7UqmDQrkX_zbeCEykWkWQN3c217r3UFtdSEOwufhT71oC-sy_OzAA9QVIn/pub?gid=0&single=true&output=tsv

This file contains:

ColumnDescription
OrderIDUnique order number
DateOrder date
CustomerIDCustomer code
ProductProduct name
CategoryProduct category
QuantityUnits sold
UnitPricePrice per unit
RegionSales region
SalesRepSales representative
PaymentMethodCash/UPI/Card etc.
RatingCustomer rating

This dataset looks like something you would receive in a real business:
Sales data, product data, missing values, mixed formats—everything needed for practical analysis.


Analyzing Data in Pandas

Let’s start step-by-step.


1. Importing Pandas and Loading the Dataset

import pandas as pd

url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vTR5NhWbt9Vwz4I4C9r2R7UqmDQrkX_zbeCEykWkWQN3c217r3UFtdSEOwufhT71oC-sy_OzAA9QVIn/pub?gid=0&single=true&output=tsv"

df = pd.read_csv(url, sep="\t")

2. Exploring the Dataset

2.1 Preview First 5 Rows

df.head()

2.2 Check Shape (Rows and Columns)

df.shape

2.3 Get Summary Information

df.info()

This helps us see data types, missing values, and column structure.

2.4 Statistical Summary

df.describe()

Also More: Basic Functions in Pandas


3. Adding a New Column: Total Sales

In business reports, we always need total sales = Quantity × UnitPrice.

df["TotalSales"] = df["Quantity"] * df["UnitPrice"]

Sample Output:

OrderIDQuantityUnitPriceTotalSales
1001112001200
100222550

4. Handling Missing Values

Our dataset contains missing SalesRep and Rating values.

4.1 Check Missing Values

df.isnull().sum()

4.2 Fill Missing SalesRep with “Unknown”

df["SalesRep"] = df["SalesRep"].fillna("Unknown")

4.3 Fill Missing Ratings with Mean Rating

df["Rating"] = df["Rating"].fillna(df["Rating"].mean())

5. Filtering Data

5.1 Filter Electronics Category

df[df["Category"] == "Electronics"]

5.2 Filter Orders from North Region

df[df["Region"] == "North"]

5.3 Filter Orders with Rating above 4.5

df[df["Rating"] > 4.5]

5.4 Multiple Conditions

df[(df["Category"] == "Electronics") & (df["Region"] == "North")]

6. Sorting Data

6.1 Sort by Total Sales

df.sort_values("TotalSales", ascending=False)

6.2 Sort by Rating then Date

df.sort_values(["Rating", "Date"], ascending=[False, True])

7. Grouping and Aggregation

Grouping is one of the most important parts of data analysis.


7.1 Total Sales by Region

df.groupby("Region")["TotalSales"].sum()

Example Output:

RegionTotalSales
North4,20,000
South3,80,000
East3,10,000
West2,95,000

7.2 Average Rating by Category

df.groupby("Category")["Rating"].mean()

7.3 Sales by SalesRep

df.groupby("SalesRep")["TotalSales"].sum()

7.4 Count of Orders Per Category

df.groupby("Category")["OrderID"].count()

8. Pivot Table Analysis

Pivot tables in Pandas replicate Excel-like reporting.

pd.pivot_table(
    df,
    values="TotalSales",
    index="Category",
    columns="Region",
    aggfunc="sum",
    fill_value=0
)

9. Unique Values and Value Counts

9.1 Unique Categories

df["Category"].unique()

9.2 Count of Payment Methods

df["PaymentMethod"].value_counts()

10. Analyzing Top and Bottom Performers

10.1 Top 5 Highest Selling Products

df.groupby("Product")["TotalSales"].sum().sort_values(ascending=False).head()

10.2 Bottom 5 Lowest Rated Products

df.groupby("Product")["Rating"].mean().sort_values().head()

11. Time-Based Analysis

Convert Date column to datetime:

df["Date"] = pd.to_datetime(df["Date"])

11.1 Monthly Sales

df.resample("M", on="Date")["TotalSales"].sum()

11.2 Daily Order Count

df.resample("D", on="Date")["OrderID"].count()

12. Detecting Outliers

12.1 Using Describe

Outliers are visible through high deviation between mean and max.

12.2 IQR Method

Q1 = df["TotalSales"].quantile(0.25)
Q3 = df["TotalSales"].quantile(0.75)
IQR = Q3 - Q1

df[(df["TotalSales"] < (Q1 - 1.5*IQR)) |
   (df["TotalSales"] > (Q3 + 1.5*IQR))]

13. Exporting Data

Export to CSV

df.to_csv("processed_sales.csv", index=False)

Export to Excel

df.to_excel("sales_analysis.xlsx", index=False)

Use Cases of Pandas Data Analysis

✔ Sales Reporting

Total sales, average order value, top products, least performing category.

✔ MIS Reporting

Generate monthly reports, region-based performance, agent activity.

✔ Finance

Calculate revenue, expenses, profit, trends.

✔ HR Analytics

Attendance patterns, performance scores, department-wise summaries.

✔ Ecommerce Analytics

Customer behavior, order trends, product performance.


Benefits of Using Pandas for Analysis

  • Easy to write and understand
  • Works fast even with large datasets
  • Excel-like operations with more power
  • Integrates easily with NumPy and Matplotlib
  • Easy exporting and automation
  • Perfect for beginners and professionals

Challenges Beginners Face

❌ Confusion between loc and iloc
❌ Handling missing values incorrectly
❌ Using wrong merge type
❌ Misinterpreting groupby outputs
❌ Date parsing errors


Best Practices

✔ Always check data types
✔ Fix missing values early
✔ Create calculated fields like TotalSales
✔ Use groupby instead of loops
✔ Convert date columns to datetime
✔ Export cleaned data for reporting
✔ Avoid modifying original DataFrame—use copies


Conclusion

Pandas makes data analysis extremely simple and powerful. Once you learn how to load data, clean it, filter it, group it, and extract insights, you can handle any real dataset confidently—exactly like an analyst or data scientist.

This guide gives you everything you need to analyze data step-by-step using a real dataset. Practice these techniques regularly, and you will quickly master Pandas for business reporting, MIS automation, and data science.

FAQs – Analyze Data in Pandas

Pandas comfortably handles datasets up to millions of rows on a normal laptop.

Not necessary, but Excel knowledge makes it easier to learn filtering, grouping, and pivoting.

Because time-based analysis (monthly sum, weekly trends, etc.) only works with datetime format.

Excel is great for manual work.
Pandas is best for automation, repeat analysis, and large datasets.

What’s Next?

In the next post, we’ll learn about the Cleaning Data in Pandas

Spread the love

Leave a Comment

Your email address will not be published. Required fields are marked *

Translate »
Scroll to Top