You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Pandas is the most important Python library for data manipulation and analysis. Built on top of NumPy, it provides two primary data structures — Series (1D) and DataFrame (2D) — that make working with structured data intuitive and efficient. The name comes from "Panel Data", a term from econometrics.
A Series is a one-dimensional labelled array:
import pandas as pd
import numpy as np
# Create from a list
s = pd.Series([10, 20, 30, 40, 50])
print(s)
# 0 10
# 1 20
# 2 30
# 3 40
# 4 50
# dtype: int64
# Create with custom index
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s['b']) # 20
# Create from a dictionary
s = pd.Series({'London': 9000000, 'Paris': 2200000, 'Berlin': 3600000})
print(s)
A DataFrame is a two-dimensional labelled table — the workhorse of Pandas:
# Create from a dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'Age': [25, 30, 35, 28],
'City': ['London', 'Paris', 'Berlin', 'Madrid'],
'Salary': [55000, 62000, 58000, 51000]
}
df = pd.DataFrame(data)
print(df)
# Name Age City Salary
# 0 Alice 25 London 55000
# 1 Bob 30 Paris 62000
# 2 Charlie 35 Berlin 58000
# 3 Diana 28 Madrid 51000
# CSV
df = pd.read_csv('data.csv')
# Excel
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
# JSON
df = pd.read_json('data.json')
# SQL
import sqlite3
conn = sqlite3.connect('database.db')
df = pd.read_sql('SELECT * FROM users', conn)
# From a URL
df = pd.read_csv('https://example.com/data.csv')
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)
df.to_json('output.json')
df.head() # First 5 rows
df.head(10) # First 10 rows
df.tail() # Last 5 rows
df.shape # (rows, columns)
df.info() # Column types, non-null counts, memory usage
df.describe() # Statistical summary for numeric columns
df.dtypes # Data type of each column
df.columns # Column names
df.index # Row index
df.nunique() # Number of unique values per column
df.value_counts() # Frequency of each value (for Series)
# Single column (returns Series)
df['Name']
# Multiple columns (returns DataFrame)
df[['Name', 'Age']]
# By position (iloc — integer location)
df.iloc[0] # First row
df.iloc[0:3] # First three rows
df.iloc[0, 1] # Row 0, column 1
# By label (loc — label-based)
df.loc[0] # Row with index label 0
df.loc[0:2, 'Name':'City'] # Rows 0-2, columns Name through City
# Boolean indexing
df[df['Age'] > 28]
df[df['City'] == 'London']
df[(df['Age'] > 25) & (df['Salary'] > 55000)]
# Add a new column
df['Bonus'] = df['Salary'] * 0.1
# Modify existing column
df['Age'] = df['Age'] + 1
# Conditional column
df['Senior'] = df['Age'] >= 30
# Using apply with a function
df['Name_Upper'] = df['Name'].apply(str.upper)
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.