You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
What is Data Visualisation
What is Data Visualisation
Data visualisation is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualisation tools provide an accessible way to see and understand trends, outliers, and patterns in data. It sits at the intersection of data science, graphic design, and communication — turning raw numbers into stories that drive decisions.
A Brief History
- 1786 — William Playfair invents the bar chart and the line graph in The Commercial and Political Atlas
- 1801 — Playfair introduces the pie chart in Statistical Breviary
- 1854 — John Snow maps cholera cases in London, demonstrating the power of spatial data visualisation
- 1858 — Florence Nightingale creates the polar area diagram (coxcomb chart) to advocate for military hospital reform
- 1869 — Charles Joseph Minard publishes his famous flow map of Napoleon's Russian campaign — often called the greatest statistical graphic ever drawn
- 1967 — Jacques Bertin publishes Semiology of Graphics, establishing a theoretical foundation for visual encoding
- 1977 — John Tukey publishes Exploratory Data Analysis, formalising EDA as a discipline
- 1983 — Edward Tufte publishes The Visual Display of Quantitative Information, a landmark work on chart design
- 2005 — Hans Rosling's Gapminder TED talk brings animated scatter plots to a global audience
- 2010s — Interactive visualisation libraries (D3.js, Plotly, Bokeh) democratise web-based charting
- Today — Data visualisation is a core skill in data science, journalism, business intelligence, and public health
Why Visualise Data?
1. The Human Visual System is Powerful
Humans process visual information roughly 60,000 times faster than text. A well-designed chart can communicate in seconds what a table of numbers takes minutes to parse.
2. Discover Patterns and Relationships
Visualisation makes it easy to spot trends, clusters, correlations, and anomalies that might be invisible in raw data.
3. Communicate Findings Effectively
A clear visualisation bridges the gap between technical analysis and decision-makers who may not be fluent in statistics or code.
4. Facilitate Exploration
During exploratory data analysis (EDA), quick plots help you form hypotheses, check distributions, and guide your next steps.
The Data Visualisation Pipeline
| Stage | Description |
|---|---|
| Acquire | Obtain data from databases, APIs, CSV files, or web scraping |
| Parse | Clean and structure data so it is ready for plotting |
| Filter | Focus on the subset of data relevant to your question |
| Mine | Apply statistics or aggregations to extract key metrics |
| Represent | Choose an appropriate visual encoding (chart type, colour, shape) |
| Refine | Improve aesthetics, remove clutter, add labels and annotations |
| Interact | Add tooltips, zoom, filtering, or animation for exploration |
Types of Data
Understanding what type of data you have determines which charts are appropriate.
Quantitative (Numerical) Data
Data that can be measured and expressed as numbers.
| Sub-type | Examples | Suitable Charts |
|---|---|---|
| Continuous | Temperature, height, revenue | Histograms, line plots, scatter plots |
| Discrete | Number of orders, count of students | Bar charts, dot plots |
Categorical (Qualitative) Data
Data that represents groups or labels.
| Sub-type | Examples | Suitable Charts |
|---|---|---|
| Nominal | Country, colour, product category | Bar charts, pie charts, treemaps |
| Ordinal | Education level, satisfaction rating | Bar charts (ordered), heatmaps |
Temporal Data
Data that involves time — dates, timestamps, durations. Best shown with line plots, area charts, or timeline visualisations.
Geospatial Data
Data tied to locations — latitude/longitude, postcodes, country codes. Best shown with maps, choropleths, and bubble maps.
Key Terminology
| Term | Definition |
|---|---|
| Mark | The geometric element representing data — a point, line, bar, or area |
| Channel | The visual property used to encode data — position, length, colour, size, angle |
| Encoding | The mapping of a data variable to a visual channel |
| Scale | A function mapping data values to visual values (e.g., linear, logarithmic) |
| Legend | A key explaining what colours, sizes, or shapes represent |
| Annotation | Text or markers added to highlight specific data points |
| Aspect ratio | The width-to-height ratio of a chart, which affects how trends appear |
Static vs Interactive Visualisation
| Feature | Static | Interactive |
|---|---|---|
| Format | PNG, PDF, print | HTML, dashboard, web app |
| Exploration | Fixed view | Zoom, pan, filter, hover |
| Tools | Matplotlib, Seaborn | Plotly, Bokeh, D3.js, Altair |
| Best for | Reports, papers, presentations | Dashboards, exploration, storytelling |
| Audience | Readers of a final report | Analysts who want to drill down |
The Python Visualisation Ecosystem
| Library | Type | Strengths |
|---|---|---|
Matplotlib |
Static | Fine-grained control, publication-quality figures |
Seaborn |
Static | Statistical plots with minimal code, built on Matplotlib |
Plotly |
Interactive | Rich interactivity, easy to embed in web apps |
Bokeh |
Interactive | Streaming and real-time data, server-backed dashboards |
Altair |
Declarative | Concise grammar-of-graphics API, Vega-Lite backend |
Pandas |
Static | Quick plots directly from DataFrames |
Folium |
Maps | Leaflet-based interactive maps |
Streamlit |
Dashboards | Turn Python scripts into shareable web apps |
Anscombe's Quartet — Why Visualisation Matters
In 1973, statistician Francis Anscombe constructed four datasets that have nearly identical summary statistics (mean, variance, correlation, regression line) yet look completely different when plotted. This is a powerful demonstration of why you should always visualise your data before relying solely on summary statistics.
import seaborn as sns
import matplotlib.pyplot as plt
# Load Anscombe's quartet
df = sns.load_dataset("anscombe")
# Plot all four datasets
g = sns.lmplot(x="x", y="y", col="dataset",
data=df, col_wrap=2,
height=4, aspect=1)
g.set_titles("Dataset {col_name}")
plt.suptitle("Anscombe's Quartet", y=1.02)
plt.show()
Tip: If two datasets have the same mean, standard deviation, and correlation, they are NOT necessarily the same. Always plot before you model.
Summary
Data visualisation transforms raw data into visual stories. It leverages the power of human perception to reveal patterns, communicate findings, and guide decisions. The field has evolved from hand-drawn charts in the 18th century to interactive web-based dashboards today. In this course, you will learn the principles of effective visual design, master the major Python visualisation libraries — Matplotlib, Seaborn, Plotly, and Pandas — and develop the skills to build dashboards, geospatial maps, and data-driven narratives.