|
Getting your Trinity Audio player ready...
|
If you’ve ever used Excel for managing data, sorting, filtering, or creating reports — then you already understand the power of data manipulation.
But what if you could do all that, and much more, with just a few lines of Python code?
Welcome to Pandas — one of the most powerful and popular Python libraries for data analysis.

What is Pandas?
Pandas stands for “Python Data Analysis Library.”
It is an open-source library that makes it easy to work with structured data — such as tables, spreadsheets, and databases — right inside Python.
With Pandas, you can:
- Load and read data from Excel, CSV, or databases
- Clean and filter large datasets
- Perform calculations and aggregations
- Merge or join multiple data sources
- Generate quick reports and insights
In short, Pandas is like Excel on steroids — faster, more flexible, and capable of handling millions of rows without breaking a sweat!
Why Use Pandas?
Let’s understand why Pandas is essential for anyone learning Data Analysis, MIS Reporting, or Python automation.
| Excel Feature | Pandas Equivalent | Benefit |
|---|---|---|
| Filter rows | df[df["Column"] > 10] | Filter with one line |
| Sort data | df.sort_values("Column") | Sort multiple columns easily |
| Pivot tables | df.pivot_table() | Powerful summaries |
| VLOOKUP | merge() | Easier joins across files |
| Conditional formatting | apply() | Fully programmable |
| Excel Formulas | df["Total"] = df["Qty"] * df["Price"] | Vectorized calculations |
So, if you can do Excel reporting, you’ll love how Pandas takes it to the next level.
Installing Pandas
Before using Pandas, install it with pip using Command Prompt (CMD):
pip install pandas
To verify installation:
import pandas as pd
print(pd.__version__)
You’ll typically see something like:
2.2.3
That’s your Pandas version.
Pandas Core Data Structures
Pandas has two main data types:
- Series – A one-dimensional array (like a column in Excel)
- DataFrame – A two-dimensional table (like an entire Excel sheet)
Let’s learn both.
1. Pandas Series
A Series is like a single column of data with an index.
Example:
import pandas as pd
sales = pd.Series([100, 200, 300, 400], name="Sales")
print(sales)
Output:
0 100
1 200
2 300
3 400
Name: Sales, dtype: int64
- The left side (0,1,2,3) is the index.
- The right side is the data.
name="Sales"labels the column.
You can even use custom indexes:
sales = pd.Series([100, 200, 300], index=["Jan", "Feb", "Mar"])
print(sales)
Output:
Jan 100
Feb 200
Mar 300
dtype: int64
Read More: Python Tutorials
2. Pandas DataFrame
A DataFrame is like a full table (rows + columns).
Example:
data = {
"Month": ["Jan", "Feb", "Mar"],
"Sales": [100, 200, 300],
"Profit": [20, 40, 60]
}
df = pd.DataFrame(data)
print(df)
Output:
Month Sales Profit
0 Jan 100 20
1 Feb 200 40
2 Mar 300 60
Looks just like an Excel sheet, right?
That’s the magic of Pandas.
Reading Data from Files
You can load data directly from files.
CSV File
df = pd.read_csv("sales_data.csv")
Excel File
df = pd.read_excel("sales_report.xlsx")
JSON File
df = pd.read_json("data.json")
After loading, just use:
df.head()
to preview the first 5 rows.
Exploring the DataFrame
Let’s say your data looks like this:
| Month | Sales | Profit |
|---|---|---|
| Jan | 100 | 20 |
| Feb | 200 | 40 |
| Mar | 300 | 60 |
| Apr | 400 | 80 |
Now let’s analyze it step by step.
1. Check top rows
df.head(3)
Shows first 3 rows.
2. Check bottom rows
df.tail(2)
Shows last 2 rows.
3. Get summary info
df.info()
Displays column names, types, and non-null counts.
4. Quick statistics
df.describe()
Shows mean, min, max, and standard deviation.
Selecting Columns and Rows
Select one column
df["Sales"]
Select multiple columns
df[["Month", "Profit"]]
Select row by index
df.iloc[2]
Select row by label
df.loc[1]
Filter rows by condition
df[df["Sales"] > 150]
Adding New Columns
Let’s calculate a “Profit Percentage” column.
df["Profit %"] = (df["Profit"] / df["Sales"]) * 100
print(df)
Output:
Month Sales Profit Profit %
0 Jan 100 20 20.0
1 Feb 200 40 20.0
2 Mar 300 60 20.0
3 Apr 400 80 20.0
Simple math — but on entire columns!
Deleting Columns or Rows
Delete a column
df.drop("Profit %", axis=1, inplace=True)
Delete a row
df.drop(2, axis=0, inplace=True)
Sorting and Filtering
Sort by column
df.sort_values("Sales", ascending=False)
Filter by multiple conditions
df[(df["Sales"] > 150) & (df["Profit"] < 70)]
Grouping and Aggregation
Like Excel’s “Pivot Table”, Pandas can group and summarize easily.
grouped = df.groupby("Month")["Sales"].sum()
print(grouped)
Output:
Month
Apr 400
Feb 200
Jan 100
Mar 300
Name: Sales, dtype: int64
You can even group multiple columns:
df.groupby("Month")[["Sales", "Profit"]].mean()
Merging and Joining DataFrames
You can combine multiple tables just like VLOOKUP or Power Query.
sales = pd.DataFrame({
"ID": [1, 2, 3],
"Sales": [1000, 1500, 1200]
})
targets = pd.DataFrame({
"ID": [1, 2, 3],
"Target": [900, 1300, 1250]
})
merged = pd.merge(sales, targets, on="ID")
print(merged)
Output:
ID Sales Target
0 1 1000 900
1 2 1500 1300
2 3 1200 1250
Exporting Data to Files
When done, save your results:
df.to_csv("final_report.csv", index=False)
df.to_excel("report.xlsx", index=False)
Handling Missing Data
Check missing values
df.isnull().sum()
Fill missing data
df["Sales"].fillna(0, inplace=True)
Drop missing rows
df.dropna(inplace=True)
Real-Life Example: Monthly Sales Analysis
Let’s put all concepts together.
Example Data
import pandas as pd
data = {
"Month": ["Jan", "Feb", "Mar", "Apr", "May"],
"Sales": [1000, 1200, 900, None, 1500],
"Expenses": [500, 550, 400, 450, None]
}
df = pd.DataFrame(data)
print(df)
Step 1: Fill Missing Data
df.fillna(0, inplace=True)
Step 2: Add Profit Column
df["Profit"] = df["Sales"] - df["Expenses"]
Step 3: Calculate Profit %
df["Profit %"] = (df["Profit"] / df["Sales"]) * 100
Step 4: Filter High-Performing Months
high = df[df["Profit %"] > 40]
print(high)
Step 5: Save Report
df.to_excel("Monthly_Report.xlsx", index=False)
And there you go — a complete MIS-style report built in a few lines!
Common Pandas Functions Cheat Sheet
| Function | Purpose | Example |
|---|---|---|
pd.read_csv() | Read CSV file | pd.read_csv("data.csv") |
pd.read_excel() | Read Excel file | pd.read_excel("file.xlsx") |
df.head() | View first rows | df.head(3) |
df.tail() | View last rows | df.tail(2) |
df.info() | Data summary | df.info() |
df.describe() | Statistics | df.describe() |
df["Col"].mean() | Average | df["Sales"].mean() |
df.sort_values() | Sort data | df.sort_values("Sales") |
df.groupby() | Group data | df.groupby("Month").sum() |
df.to_csv() | Export CSV | df.to_csv("output.csv") |
Final Thoughts
The Pandas library is a game changer for anyone working with data.
If Excel is your old friend, Pandas will soon become your new favorite companion.
You’ve now learned how to:
- Read, explore, and manipulate data
- Perform calculations and filtering
- Handle missing data
- Export reports easily
Pandas is just the beginning — once you master it, you can move on to:
- Data Visualization (Matplotlib, Seaborn)
- Data Cleaning Automation
- Powerful Dashboards (Power BI + Python combo)
Stay tuned — your journey from MIS to Data Analyst has just begun
What’s Next?
In the next post, we’ll learn about the Series vs DataFrame in Pandas