|
Getting your Trinity Audio player ready...
|
Data analysis is one of the most in-demand skills today, whether you’re working in MIS, sales, finance, HR, or data science. And when it comes to analyzing data in Python, Pandas is the most powerful and beginner-friendly library you will ever use.
But here’s the real challenge:
Most beginners don’t know where to start. Should you learn filtering first? Or groupby? Or pivot tables? Or merging datasets?

In this complete guide, we will break everything down step-by-step using a real-world dataset that includes sales transactions, customers, products, prices, ratings, and regions.
We’ll learn:
- How to load data
- How to explore datasets
- How to clean and transform data
- How to find business insights
- How to summarize results
- How to prepare your dataset for reporting
This article is practical, simple, and designed to help you analyze any dataset confidently.
Dataset Used in This Tutorial
We will use the following sample dataset stored as a TSV file:
Dataset URL:
https://docs.google.com/spreadsheets/d/e/2PACX-1vTR5NhWbt9Vwz4I4C9r2R7UqmDQrkX_zbeCEykWkWQN3c217r3UFtdSEOwufhT71oC-sy_OzAA9QVIn/pub?gid=0&single=true&output=tsvThis file contains:
| Column | Description |
|---|---|
| OrderID | Unique order number |
| Date | Order date |
| CustomerID | Customer code |
| Product | Product name |
| Category | Product category |
| Quantity | Units sold |
| UnitPrice | Price per unit |
| Region | Sales region |
| SalesRep | Sales representative |
| PaymentMethod | Cash/UPI/Card etc. |
| Rating | Customer rating |
This dataset looks like something you would receive in a real business:
Sales data, product data, missing values, mixed formats—everything needed for practical analysis.
Analyzing Data in Pandas
Let’s start step-by-step.
1. Importing Pandas and Loading the Dataset
import pandas as pd
url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vTR5NhWbt9Vwz4I4C9r2R7UqmDQrkX_zbeCEykWkWQN3c217r3UFtdSEOwufhT71oC-sy_OzAA9QVIn/pub?gid=0&single=true&output=tsv"
df = pd.read_csv(url, sep="\t")
2. Exploring the Dataset
2.1 Preview First 5 Rows
df.head()
2.2 Check Shape (Rows and Columns)
df.shape
2.3 Get Summary Information
df.info()
This helps us see data types, missing values, and column structure.
2.4 Statistical Summary
df.describe()
Also More: Basic Functions in Pandas
3. Adding a New Column: Total Sales
In business reports, we always need total sales = Quantity × UnitPrice.
df["TotalSales"] = df["Quantity"] * df["UnitPrice"]
Sample Output:
| OrderID | Quantity | UnitPrice | TotalSales |
|---|---|---|---|
| 1001 | 1 | 1200 | 1200 |
| 1002 | 2 | 25 | 50 |
4. Handling Missing Values
Our dataset contains missing SalesRep and Rating values.
4.1 Check Missing Values
df.isnull().sum()
4.2 Fill Missing SalesRep with “Unknown”
df["SalesRep"] = df["SalesRep"].fillna("Unknown")
4.3 Fill Missing Ratings with Mean Rating
df["Rating"] = df["Rating"].fillna(df["Rating"].mean())
5. Filtering Data
5.1 Filter Electronics Category
df[df["Category"] == "Electronics"]
5.2 Filter Orders from North Region
df[df["Region"] == "North"]
5.3 Filter Orders with Rating above 4.5
df[df["Rating"] > 4.5]
5.4 Multiple Conditions
df[(df["Category"] == "Electronics") & (df["Region"] == "North")]
6. Sorting Data
6.1 Sort by Total Sales
df.sort_values("TotalSales", ascending=False)
6.2 Sort by Rating then Date
df.sort_values(["Rating", "Date"], ascending=[False, True])
7. Grouping and Aggregation
Grouping is one of the most important parts of data analysis.
7.1 Total Sales by Region
df.groupby("Region")["TotalSales"].sum()
Example Output:
| Region | TotalSales |
|---|---|
| North | 4,20,000 |
| South | 3,80,000 |
| East | 3,10,000 |
| West | 2,95,000 |
7.2 Average Rating by Category
df.groupby("Category")["Rating"].mean()
7.3 Sales by SalesRep
df.groupby("SalesRep")["TotalSales"].sum()
7.4 Count of Orders Per Category
df.groupby("Category")["OrderID"].count()
8. Pivot Table Analysis
Pivot tables in Pandas replicate Excel-like reporting.
pd.pivot_table(
df,
values="TotalSales",
index="Category",
columns="Region",
aggfunc="sum",
fill_value=0
)
9. Unique Values and Value Counts
9.1 Unique Categories
df["Category"].unique()
9.2 Count of Payment Methods
df["PaymentMethod"].value_counts()
10. Analyzing Top and Bottom Performers
10.1 Top 5 Highest Selling Products
df.groupby("Product")["TotalSales"].sum().sort_values(ascending=False).head()
10.2 Bottom 5 Lowest Rated Products
df.groupby("Product")["Rating"].mean().sort_values().head()
11. Time-Based Analysis
Convert Date column to datetime:
df["Date"] = pd.to_datetime(df["Date"])
11.1 Monthly Sales
df.resample("M", on="Date")["TotalSales"].sum()
11.2 Daily Order Count
df.resample("D", on="Date")["OrderID"].count()
12. Detecting Outliers
12.1 Using Describe
Outliers are visible through high deviation between mean and max.
12.2 IQR Method
Q1 = df["TotalSales"].quantile(0.25)
Q3 = df["TotalSales"].quantile(0.75)
IQR = Q3 - Q1
df[(df["TotalSales"] < (Q1 - 1.5*IQR)) |
(df["TotalSales"] > (Q3 + 1.5*IQR))]
13. Exporting Data
Export to CSV
df.to_csv("processed_sales.csv", index=False)
Export to Excel
df.to_excel("sales_analysis.xlsx", index=False)
Use Cases of Pandas Data Analysis
✔ Sales Reporting
Total sales, average order value, top products, least performing category.
✔ MIS Reporting
Generate monthly reports, region-based performance, agent activity.
✔ Finance
Calculate revenue, expenses, profit, trends.
✔ HR Analytics
Attendance patterns, performance scores, department-wise summaries.
✔ Ecommerce Analytics
Customer behavior, order trends, product performance.
Benefits of Using Pandas for Analysis
- Easy to write and understand
- Works fast even with large datasets
- Excel-like operations with more power
- Integrates easily with NumPy and Matplotlib
- Easy exporting and automation
- Perfect for beginners and professionals
Challenges Beginners Face
❌ Confusion between loc and iloc
❌ Handling missing values incorrectly
❌ Using wrong merge type
❌ Misinterpreting groupby outputs
❌ Date parsing errors
Best Practices
✔ Always check data types
✔ Fix missing values early
✔ Create calculated fields like TotalSales
✔ Use groupby instead of loops
✔ Convert date columns to datetime
✔ Export cleaned data for reporting
✔ Avoid modifying original DataFrame—use copies
Conclusion
Pandas makes data analysis extremely simple and powerful. Once you learn how to load data, clean it, filter it, group it, and extract insights, you can handle any real dataset confidently—exactly like an analyst or data scientist.
This guide gives you everything you need to analyze data step-by-step using a real dataset. Practice these techniques regularly, and you will quickly master Pandas for business reporting, MIS automation, and data science.
FAQs – Analyze Data in Pandas
How big of a dataset can Pandas handle?
Pandas comfortably handles datasets up to millions of rows on a normal laptop.
Is it necessary to know Excel before learning Pandas?
Not necessary, but Excel knowledge makes it easier to learn filtering, grouping, and pivoting.
Why convert date columns to datetime?
Because time-based analysis (monthly sum, weekly trends, etc.) only works with datetime format.
Which is better—Pandas or Excel?
Excel is great for manual work.
Pandas is best for automation, repeat analysis, and large datasets.
What’s Next?
In the next post, we’ll learn about the Cleaning Data in Pandas