Named Entity Recognition

Named Entity Recognition (NER) is the task of identifying and classifying named entities — specific real-world objects such as people, organisations, locations, dates, and more — within text. NER is a fundamental building block for information extraction, question answering, and knowledge graph construction.

What Are Named Entities?

Entity Type	Tag	Examples
Person	PER / PERSON	"Albert Einstein", "Marie Curie"
Organisation	ORG	"Google", "United Nations", "Oxford University"
Location	LOC / GPE	"London", "Mount Everest", "France"
Date / Time	DATE / TIME	"14 March 2023", "next Monday"
Money	MONEY	"£50", "$1.2 million"
Percentage	PERCENT	"25%", "three quarters"
Product	PRODUCT	"iPhone", "Windows 11"
Event	EVENT	"World Cup", "COP26"

NER Approaches

1. Rule-Based NER

Uses hand-crafted patterns and gazetteers (lists of known entities).

Advantage	Disadvantage
No training data needed	Cannot generalise to unseen entities
High precision for known patterns	Extremely labour-intensive
Fully explainable	Breaks with new domains

2. Classical Machine Learning NER

Algorithm	Description
Hidden Markov Model (HMM)	Generative sequence model
Conditional Random Fields (CRF)	Discriminative sequence model — considers neighbouring labels
Maximum Entropy Markov Model	Discriminative model with feature engineering

Features typically include:

Word shape (capitalisation, digits, hyphens)
Part-of-speech tags
Surrounding words (context window)
Prefix and suffix patterns
Gazetteer membership

3. Deep Learning NER

Modern NER systems use neural networks — particularly BiLSTM-CRF and Transformer-based models.

Model	Description
BiLSTM-CRF	Bi-directional LSTM + CRF layer for sequence labelling
BERT + Token Classification	Pre-trained transformer fine-tuned for NER
spaCy NER	Efficient CNN/Transformer-based NER built into spaCy

NER with spaCy

import spacy

nlp = spacy.load("en_core_web_sm")

text = "Apple was founded by Steve Jobs in Cupertino, California on 1 April 1976."
doc = nlp(text)

for ent in doc.ents:
    print(f"{ent.text:20s} {ent.label_:10s} {spacy.explain(ent.label_)}")

Output:

Apple                ORG        Companies, agencies, institutions, etc.
Steve Jobs           PERSON     People, including fictional
Cupertino            GPE        Countries, cities, states
California           GPE        Countries, cities, states
1 April 1976         DATE       Absolute or relative dates or periods

spaCy Entity Labels

Label	Description	Example
PERSON	People	"Steve Jobs"
ORG	Organisations	"Apple"
GPE	Geopolitical entities	"California"
LOC	Non-GPE locations	"Mount Everest"
DATE	Dates	"1 April 1976"
TIME	Times	"3:30 pm"
MONEY	Monetary values	"$500"
CARDINAL	Numerals	"three"
ORDINAL	Ordinal numbers	"first"
NORP	Nationalities, religions, political groups	"British"

NER with Hugging Face Transformers

from transformers import pipeline

ner_pipeline = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english", aggregation_strategy="simple")

text = "Albert Einstein was born in Ulm, Germany and later worked at Princeton University."

entities = ner_pipeline(text)
for ent in entities:
    print(f"{ent['word']:20s} {ent['entity_group']:10s} {ent['score']:.4f}")

Named Entity Recognition

Named Entity Recognition

What Are Named Entities?

NER Approaches

1. Rule-Based NER

2. Classical Machine Learning NER

3. Deep Learning NER

NER with spaCy

spaCy Entity Labels

NER with Hugging Face Transformers

More in Data Science