Milan Ghimire

Learning Pandas: Working with Tabular Data

My notes on pandas - the Series and DataFrame, how indexing really works, and the groupby split-apply-combine pattern that finally made data wrangling click.

Why pandas

pandas is the library I reach for whenever data looks like a table - rows and columns, like a spreadsheet but programmable. It sits on top of NumPy, so it is fast, but it adds labels (column names and an index) that make the data readable instead of just a wall of numbers.

The two core objects

  • Series - a single labelled column (a 1D array with an index).
  • DataFrame - a whole table: a dict of Series sharing one index.
import pandas as pd

df = pd.DataFrame({
    "name": ["Asha", "Bina", "Chetan"],
    "score": [82, 91, 77],
    "city": ["Kathmandu", "Pokhara", "Kathmandu"],
})

df.head()        # first rows
df.shape         # (rows, columns)
df["score"]      # one column -> a Series

Indexing: the part that confused me

The thing I had to slow down on is .loc vs .iloc:

  • .loc selects by label (the index/column names).
  • .iloc selects by integer position (like a normal array).
df.loc[0, "name"]      # label-based -> "Asha"
df.iloc[0, 0]          # position-based -> "Asha"

# boolean filtering: keep rows where score > 80
df[df["score"] > 80]

Boolean filtering (df[condition]) was the unlock - you build a True/False mask and pandas keeps only the True rows.

groupby: split - apply - combine

This is the pattern I use most. Split the data into groups, apply a function to each, combine the results back into a table.

# average score per city
df.groupby("city")["score"].mean()

groupby splits by city, takes the score of each group, averages it, and hands back a tidy Series indexed by city. Once I saw it as split-apply-combine, half of data analysis stopped feeling like magic.

What I keep coming back to

  • df.info() and df.describe() to understand a dataset before touching it.
  • Handling missing data with df.dropna() / df.fillna(...).
  • df.merge(...) to join tables - basically SQL joins in Python.

A living note - I update it as I use pandas on more real datasets.