Introduction to Pandas

Getting your Trinity Audio player ready...

If you’ve ever used Excel for managing data, sorting, filtering, or creating reports — then you already understand the power of data manipulation.

But what if you could do all that, and much more, with just a few lines of Python code?

Welcome to Pandas — one of the most powerful and popular Python libraries for data analysis.

Introduction to Pandas

What is Pandas?

Pandas stands for “Python Data Analysis Library.”

It is an open-source library that makes it easy to work with structured data — such as tables, spreadsheets, and databases — right inside Python.

With Pandas, you can:

  • Load and read data from Excel, CSV, or databases
  • Clean and filter large datasets
  • Perform calculations and aggregations
  • Merge or join multiple data sources
  • Generate quick reports and insights

In short, Pandas is like Excel on steroids — faster, more flexible, and capable of handling millions of rows without breaking a sweat!


Why Use Pandas?

Let’s understand why Pandas is essential for anyone learning Data Analysis, MIS Reporting, or Python automation.

Excel FeaturePandas EquivalentBenefit
Filter rowsdf[df["Column"] > 10]Filter with one line
Sort datadf.sort_values("Column")Sort multiple columns easily
Pivot tablesdf.pivot_table()Powerful summaries
VLOOKUPmerge()Easier joins across files
Conditional formattingapply()Fully programmable
Excel Formulasdf["Total"] = df["Qty"] * df["Price"]Vectorized calculations

So, if you can do Excel reporting, you’ll love how Pandas takes it to the next level.


Installing Pandas

Before using Pandas, install it with pip using Command Prompt (CMD):

pip install pandas

To verify installation:

import pandas as pd
print(pd.__version__)

You’ll typically see something like:

2.2.3

That’s your Pandas version.


Pandas Core Data Structures

Pandas has two main data types:

  1. Series – A one-dimensional array (like a column in Excel)
  2. DataFrame – A two-dimensional table (like an entire Excel sheet)

Let’s learn both.


1. Pandas Series

A Series is like a single column of data with an index.

Example:

import pandas as pd

sales = pd.Series([100, 200, 300, 400], name="Sales")
print(sales)

Output:

0    100
1    200
2    300
3    400
Name: Sales, dtype: int64
  • The left side (0,1,2,3) is the index.
  • The right side is the data.
  • name="Sales" labels the column.

You can even use custom indexes:

sales = pd.Series([100, 200, 300], index=["Jan", "Feb", "Mar"])
print(sales)

Output:

Jan    100
Feb    200
Mar    300
dtype: int64

Read More: Python Tutorials


2. Pandas DataFrame

A DataFrame is like a full table (rows + columns).

Example:

data = {
    "Month": ["Jan", "Feb", "Mar"],
    "Sales": [100, 200, 300],
    "Profit": [20, 40, 60]
}

df = pd.DataFrame(data)
print(df)

Output:

  Month  Sales  Profit
0   Jan    100      20
1   Feb    200      40
2   Mar    300      60

Looks just like an Excel sheet, right?
That’s the magic of Pandas.


Reading Data from Files

You can load data directly from files.

CSV File

df = pd.read_csv("sales_data.csv")

Excel File

df = pd.read_excel("sales_report.xlsx")

JSON File

df = pd.read_json("data.json")

After loading, just use:

df.head()

to preview the first 5 rows.


Exploring the DataFrame

Let’s say your data looks like this:

MonthSalesProfit
Jan10020
Feb20040
Mar30060
Apr40080

Now let’s analyze it step by step.


1. Check top rows

df.head(3)

Shows first 3 rows.

2. Check bottom rows

df.tail(2)

Shows last 2 rows.

3. Get summary info

df.info()

Displays column names, types, and non-null counts.

4. Quick statistics

df.describe()

Shows mean, min, max, and standard deviation.


Selecting Columns and Rows

Select one column

df["Sales"]

Select multiple columns

df[["Month", "Profit"]]

Select row by index

df.iloc[2]

Select row by label

df.loc[1]

Filter rows by condition

df[df["Sales"] > 150]

Adding New Columns

Let’s calculate a “Profit Percentage” column.

df["Profit %"] = (df["Profit"] / df["Sales"]) * 100
print(df)

Output:

  Month  Sales  Profit  Profit %
0   Jan    100      20     20.0
1   Feb    200      40     20.0
2   Mar    300      60     20.0
3   Apr    400      80     20.0

Simple math — but on entire columns!


Deleting Columns or Rows

Delete a column

df.drop("Profit %", axis=1, inplace=True)

Delete a row

df.drop(2, axis=0, inplace=True)

Sorting and Filtering

Sort by column

df.sort_values("Sales", ascending=False)

Filter by multiple conditions

df[(df["Sales"] > 150) & (df["Profit"] < 70)]

Grouping and Aggregation

Like Excel’s “Pivot Table”, Pandas can group and summarize easily.

grouped = df.groupby("Month")["Sales"].sum()
print(grouped)

Output:

Month
Apr    400
Feb    200
Jan    100
Mar    300
Name: Sales, dtype: int64

You can even group multiple columns:

df.groupby("Month")[["Sales", "Profit"]].mean()

Merging and Joining DataFrames

You can combine multiple tables just like VLOOKUP or Power Query.

sales = pd.DataFrame({
    "ID": [1, 2, 3],
    "Sales": [1000, 1500, 1200]
})

targets = pd.DataFrame({
    "ID": [1, 2, 3],
    "Target": [900, 1300, 1250]
})

merged = pd.merge(sales, targets, on="ID")
print(merged)

Output:

   ID  Sales  Target
0   1   1000     900
1   2   1500    1300
2   3   1200    1250

Exporting Data to Files

When done, save your results:

df.to_csv("final_report.csv", index=False)
df.to_excel("report.xlsx", index=False)

Handling Missing Data

Check missing values

df.isnull().sum()

Fill missing data

df["Sales"].fillna(0, inplace=True)

Drop missing rows

df.dropna(inplace=True)

Real-Life Example: Monthly Sales Analysis

Let’s put all concepts together.

Example Data

import pandas as pd

data = {
    "Month": ["Jan", "Feb", "Mar", "Apr", "May"],
    "Sales": [1000, 1200, 900, None, 1500],
    "Expenses": [500, 550, 400, 450, None]
}

df = pd.DataFrame(data)
print(df)

Step 1: Fill Missing Data

df.fillna(0, inplace=True)

Step 2: Add Profit Column

df["Profit"] = df["Sales"] - df["Expenses"]

Step 3: Calculate Profit %

df["Profit %"] = (df["Profit"] / df["Sales"]) * 100

Step 4: Filter High-Performing Months

high = df[df["Profit %"] > 40]
print(high)

Step 5: Save Report

df.to_excel("Monthly_Report.xlsx", index=False)

And there you go — a complete MIS-style report built in a few lines!


Common Pandas Functions Cheat Sheet

FunctionPurposeExample
pd.read_csv()Read CSV filepd.read_csv("data.csv")
pd.read_excel()Read Excel filepd.read_excel("file.xlsx")
df.head()View first rowsdf.head(3)
df.tail()View last rowsdf.tail(2)
df.info()Data summarydf.info()
df.describe()Statisticsdf.describe()
df["Col"].mean()Averagedf["Sales"].mean()
df.sort_values()Sort datadf.sort_values("Sales")
df.groupby()Group datadf.groupby("Month").sum()
df.to_csv()Export CSVdf.to_csv("output.csv")

Final Thoughts

The Pandas library is a game changer for anyone working with data.
If Excel is your old friend, Pandas will soon become your new favorite companion.

You’ve now learned how to:

  • Read, explore, and manipulate data
  • Perform calculations and filtering
  • Handle missing data
  • Export reports easily

Pandas is just the beginning — once you master it, you can move on to:

  • Data Visualization (Matplotlib, Seaborn)
  • Data Cleaning Automation
  • Powerful Dashboards (Power BI + Python combo)

Stay tuned — your journey from MIS to Data Analyst has just begun

What’s Next?

In the next post, we’ll learn about the Series vs DataFrame in Pandas

Spread the love

Leave a Comment

Your email address will not be published. Required fields are marked *

Translate »
Scroll to Top