You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Exploratory Data Analysis (EDA) is the process of investigating a dataset to discover patterns, spot anomalies, test hypotheses, and check assumptions — primarily through statistical summaries and visualisations. Coined by the statistician John Tukey in the 1970s, EDA is the bridge between raw data and formal modelling.
Before building any model, you need to understand your data:
Skipping EDA is like prescribing medicine without diagnosing the patient.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
df = pd.read_csv('dataset.csv')
# Basic information
print(f"Shape: {df.shape}") # (rows, columns)
print(f"Columns: {df.columns.tolist()}")
print(f"Data types:\n{df.dtypes}")
# First few rows
df.head()
# Detailed info
df.info()
# Numeric summary
df.describe()
# Include non-numeric columns
df.describe(include='all')
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.