Introduction to Pandas - Smart Tutorials

Getting your Trinity Audio player ready...

If you’ve ever used Excel for managing data, sorting, filtering, or creating reports — then you already understand the power of data manipulation.

But what if you could do all that, and much more, with just a few lines of Python code?

Welcome to Pandas — one of the most powerful and popular Python libraries for data analysis.

What is Pandas?

Pandas stands for “Python Data Analysis Library.”

It is an open-source library that makes it easy to work with structured data — such as tables, spreadsheets, and databases — right inside Python.

With Pandas, you can:

Load and read data from Excel, CSV, or databases
Clean and filter large datasets
Perform calculations and aggregations
Merge or join multiple data sources
Generate quick reports and insights

In short, Pandas is like Excel on steroids — faster, more flexible, and capable of handling millions of rows without breaking a sweat!

Why Use Pandas?

Let’s understand why Pandas is essential for anyone learning Data Analysis, MIS Reporting, or Python automation.

Excel Feature	Pandas Equivalent	Benefit
Filter rows	`df[df["Column"] > 10]`	Filter with one line
Sort data	`df.sort_values("Column")`	Sort multiple columns easily
Pivot tables	`df.pivot_table()`	Powerful summaries
VLOOKUP	`merge()`	Easier joins across files
Conditional formatting	`apply()`	Fully programmable
Excel Formulas	`df["Total"] = df["Qty"] * df["Price"]`	Vectorized calculations

So, if you can do Excel reporting, you’ll love how Pandas takes it to the next level.

Installing Pandas

Before using Pandas, install it with pip using Command Prompt (CMD):

pip install pandas

pip install pandas

To verify installation:

import pandas as pd
print(pd.__version__)

import pandas as pd
print(pd.__version__)

You’ll typically see something like:

2.2.3

That’s your Pandas version.

Pandas Core Data Structures

Pandas has two main data types:

Series – A one-dimensional array (like a column in Excel)
DataFrame – A two-dimensional table (like an entire Excel sheet)

Let’s learn both.

1. Pandas Series

A Series is like a single column of data with an index.

Example:

import pandas as pd

sales = pd.Series([100, 200, 300, 400], name="Sales")
print(sales)

import pandas as pd

sales = pd.Series([100, 200, 300, 400], name="Sales")
print(sales)

Output:

0    100
1    200
2    300
3    400
Name: Sales, dtype: int64

The left side (0,1,2,3) is the index.
The right side is the data.
name="Sales" labels the column.

You can even use custom indexes:

sales = pd.Series([100, 200, 300], index=["Jan", "Feb", "Mar"])
print(sales)

sales = pd.Series([100, 200, 300], index=["Jan", "Feb", "Mar"])
print(sales)

Output:

Jan    100
Feb    200
Mar    300
dtype: int64

Read More: Python Tutorials

2. Pandas DataFrame

A DataFrame is like a full table (rows + columns).

Example:

data = {
    "Month": ["Jan", "Feb", "Mar"],
    "Sales": [100, 200, 300],
    "Profit": [20, 40, 60]
}

df = pd.DataFrame(data)
print(df)

data = {
    "Month": ["Jan", "Feb", "Mar"],
    "Sales": [100, 200, 300],
    "Profit": [20, 40, 60]
}

df = pd.DataFrame(data)
print(df)

Output:

  Month  Sales  Profit
0   Jan    100      20
1   Feb    200      40
2   Mar    300      60

Looks just like an Excel sheet, right?
That’s the magic of Pandas.

Reading Data from Files

You can load data directly from files.

CSV File

df = pd.read_csv("sales_data.csv")

df = pd.read_csv("sales_data.csv")

Excel File

df = pd.read_excel("sales_report.xlsx")

df = pd.read_excel("sales_report.xlsx")

JSON File

df = pd.read_json("data.json")

df = pd.read_json("data.json")

After loading, just use:

df.head()

df.head()

to preview the first 5 rows.

Exploring the DataFrame

Let’s say your data looks like this:

Month	Sales	Profit
Jan	100	20
Feb	200	40
Mar	300	60
Apr	400	80

Now let’s analyze it step by step.

1. Check top rows

df.head(3)

df.head(3)

Shows first 3 rows.

2. Check bottom rows

df.tail(2)

df.tail(2)

Shows last 2 rows.

3. Get summary info

df.info()

df.info()

Displays column names, types, and non-null counts.

4. Quick statistics

df.describe()

df.describe()

Shows mean, min, max, and standard deviation.

Selecting Columns and Rows

Select one column

df["Sales"]

df["Sales"]

Select multiple columns

df[["Month", "Profit"]]

df[["Month", "Profit"]]

Select row by index

df.iloc[2]

df.iloc[2]

Select row by label

df.loc[1]

df.loc[1]

Filter rows by condition

df[df["Sales"] > 150]

df[df["Sales"] > 150]

Adding New Columns

Let’s calculate a “Profit Percentage” column.

df["Profit %"] = (df["Profit"] / df["Sales"]) * 100
print(df)

df["Profit %"] = (df["Profit"] / df["Sales"]) * 100
print(df)

Output:

  Month  Sales  Profit  Profit %
0   Jan    100      20     20.0
1   Feb    200      40     20.0
2   Mar    300      60     20.0
3   Apr    400      80     20.0

Simple math — but on entire columns!

Deleting Columns or Rows

Delete a column

df.drop("Profit %", axis=1, inplace=True)

df.drop("Profit %", axis=1, inplace=True)

Delete a row

df.drop(2, axis=0, inplace=True)

df.drop(2, axis=0, inplace=True)

Sorting and Filtering

Sort by column

df.sort_values("Sales", ascending=False)

df.sort_values("Sales", ascending=False)

Filter by multiple conditions

df[(df["Sales"] > 150) & (df["Profit"] < 70)]

df[(df["Sales"] > 150) & (df["Profit"] < 70)]

Grouping and Aggregation

Like Excel’s “Pivot Table”, Pandas can group and summarize easily.

grouped = df.groupby("Month")["Sales"].sum()
print(grouped)

grouped = df.groupby("Month")["Sales"].sum()
print(grouped)

Output:

Month
Apr    400
Feb    200
Jan    100
Mar    300
Name: Sales, dtype: int64

You can even group multiple columns:

df.groupby("Month")[["Sales", "Profit"]].mean()

df.groupby("Month")[["Sales", "Profit"]].mean()

Merging and Joining DataFrames

You can combine multiple tables just like VLOOKUP or Power Query.

sales = pd.DataFrame({
    "ID": [1, 2, 3],
    "Sales": [1000, 1500, 1200]
})

targets = pd.DataFrame({
    "ID": [1, 2, 3],
    "Target": [900, 1300, 1250]
})

merged = pd.merge(sales, targets, on="ID")
print(merged)

sales = pd.DataFrame({
    "ID": [1, 2, 3],
    "Sales": [1000, 1500, 1200]
})

targets = pd.DataFrame({
    "ID": [1, 2, 3],
    "Target": [900, 1300, 1250]
})

merged = pd.merge(sales, targets, on="ID")
print(merged)

Output:

   ID  Sales  Target
0   1   1000     900
1   2   1500    1300
2   3   1200    1250

Exporting Data to Files

When done, save your results:

df.to_csv("final_report.csv", index=False)
df.to_excel("report.xlsx", index=False)

df.to_csv("final_report.csv", index=False)
df.to_excel("report.xlsx", index=False)

Handling Missing Data

Check missing values

df.isnull().sum()

df.isnull().sum()

Fill missing data

df["Sales"].fillna(0, inplace=True)

df["Sales"].fillna(0, inplace=True)

Drop missing rows

df.dropna(inplace=True)

df.dropna(inplace=True)

Real-Life Example: Monthly Sales Analysis

Let’s put all concepts together.

Example Data

import pandas as pd

data = {
    "Month": ["Jan", "Feb", "Mar", "Apr", "May"],
    "Sales": [1000, 1200, 900, None, 1500],
    "Expenses": [500, 550, 400, 450, None]
}

df = pd.DataFrame(data)
print(df)

import pandas as pd

data = {
    "Month": ["Jan", "Feb", "Mar", "Apr", "May"],
    "Sales": [1000, 1200, 900, None, 1500],
    "Expenses": [500, 550, 400, 450, None]
}

df = pd.DataFrame(data)
print(df)

Step 1: Fill Missing Data

df.fillna(0, inplace=True)

df.fillna(0, inplace=True)

Step 2: Add Profit Column

df["Profit"] = df["Sales"] - df["Expenses"]

df["Profit"] = df["Sales"] - df["Expenses"]

Step 3: Calculate Profit %

df["Profit %"] = (df["Profit"] / df["Sales"]) * 100

df["Profit %"] = (df["Profit"] / df["Sales"]) * 100

Step 4: Filter High-Performing Months

high = df[df["Profit %"] > 40]
print(high)

high = df[df["Profit %"] > 40]
print(high)

Step 5: Save Report

df.to_excel("Monthly_Report.xlsx", index=False)

df.to_excel("Monthly_Report.xlsx", index=False)

And there you go — a complete MIS-style report built in a few lines!

Common Pandas Functions Cheat Sheet

Function	Purpose	Example
`pd.read_csv()`	Read CSV file	`pd.read_csv("data.csv")`
`pd.read_excel()`	Read Excel file	`pd.read_excel("file.xlsx")`
`df.head()`	View first rows	`df.head(3)`
`df.tail()`	View last rows	`df.tail(2)`
`df.info()`	Data summary	`df.info()`
`df.describe()`	Statistics	`df.describe()`
`df["Col"].mean()`	Average	`df["Sales"].mean()`
`df.sort_values()`	Sort data	`df.sort_values("Sales")`
`df.groupby()`	Group data	`df.groupby("Month").sum()`
`df.to_csv()`	Export CSV	`df.to_csv("output.csv")`

Final Thoughts

The Pandas library is a game changer for anyone working with data.
If Excel is your old friend, Pandas will soon become your new favorite companion.

You’ve now learned how to:

Read, explore, and manipulate data
Perform calculations and filtering
Handle missing data
Export reports easily

Pandas is just the beginning — once you master it, you can move on to:

Data Visualization (Matplotlib, Seaborn)
Data Cleaning Automation
Powerful Dashboards (Power BI + Python combo)

Stay tuned — your journey from MIS to Data Analyst has just begun

What’s Next?

In the next post, we’ll learn about the Series vs DataFrame in Pandas

Spread the love

What is Pandas?

Why Use Pandas?

Installing Pandas

Pandas Core Data Structures

1. Pandas Series

Example:

You can even use custom indexes:

2. Pandas DataFrame

Example:

Reading Data from Files

CSV File

Excel File

JSON File

Exploring the DataFrame

1. Check top rows

2. Check bottom rows

3. Get summary info

4. Quick statistics

Selecting Columns and Rows

Select one column

Select multiple columns

Select row by index

Select row by label

Filter rows by condition

Adding New Columns

Deleting Columns or Rows

Delete a column

Delete a row

Sorting and Filtering

Sort by column

Filter by multiple conditions

Grouping and Aggregation

Merging and Joining DataFrames

Exporting Data to Files

Handling Missing Data

Check missing values

Fill missing data

Drop missing rows

Real-Life Example: Monthly Sales Analysis

Example Data

Step 1: Fill Missing Data

Step 2: Add Profit Column

Step 3: Calculate Profit %

Step 4: Filter High-Performing Months

Step 5: Save Report

Common Pandas Functions Cheat Sheet

Final Thoughts

What’s Next?

Leave a Comment Cancel Reply