Skip to content

You are viewing a free preview of this lesson.

Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.

What is Natural Language Processing

What is Natural Language Processing

Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language. It sits at the intersection of computer science, linguistics, and machine learning.


A Brief History

  • 1950 — Alan Turing publishes Computing Machinery and Intelligence, proposing the Turing Test as a measure of machine intelligence
  • 1954 — The Georgetown-IBM experiment demonstrates automatic translation of over 60 Russian sentences into English
  • 1966 — ELIZA, an early chatbot, is created at MIT by Joseph Weizenbaum
  • 1970s — Conceptual ontologies and early knowledge-based systems emerge
  • 1980s — Statistical approaches begin to replace hand-crafted rules
  • 1990s — Corpus-based methods and probabilistic models (Hidden Markov Models, n-gram language models) gain prominence
  • 2001 — The first neural language model is proposed by Bengio et al.
  • 2013 — Word2Vec is released by Mikolov et al. at Google, popularising word embeddings
  • 2014 — Sequence-to-sequence models with attention transform machine translation
  • 2017 — The Transformer architecture is introduced in Attention Is All You Need
  • 2018 — BERT is released by Google, setting new state-of-the-art results on many NLP benchmarks
  • 2020 — GPT-3 demonstrates remarkable few-shot language generation capabilities
  • 2022–2024 — Large Language Models (LLMs) like ChatGPT, Gemini, and Claude become mainstream
  • Today — NLP underpins search engines, virtual assistants, content moderation, translation services, and more

What is Natural Language Processing?

At its core, NLP is about teaching machines to work with human language — text and speech — so they can perform useful tasks. Language is inherently complex: it is ambiguous, context-dependent, constantly evolving, and deeply tied to culture.

Why is Language Hard for Computers?

Challenge Example
Ambiguity "I saw her duck" — did she duck, or did I see her duck (the bird)?
Context "It's cold" — weather? food? a person's attitude?
Sarcasm "Oh great, another meeting" — positive words, negative meaning
Synonymy "happy", "glad", "cheerful" — different words, same meaning
Polysemy "bank" — river bank or financial bank?
Coreference "Alice told Bob she was tired" — who is "she"?

Approaches to NLP

NLP has evolved through several paradigms:

1. Rule-Based Approaches

Early NLP systems relied on hand-crafted rules and grammars written by linguists. For example, a rule might say: "If a sentence starts with 'Wh-' it is a question."

Advantage Disadvantage
Precise for well-defined cases Brittle — cannot handle exceptions or variation
Explainable Extremely labour-intensive to build and maintain

2. Statistical / Machine Learning Approaches

From the 1990s onward, NLP shifted towards learning patterns from data using statistical models and classical machine learning.

Technique Use Case
Naive Bayes Text classification
Hidden Markov Models Part-of-speech tagging
Conditional Random Fields Named entity recognition
TF-IDF + Logistic Regression Sentiment analysis

3. Deep Learning Approaches

Modern NLP is dominated by neural network models — recurrent neural networks (RNNs), convolutional neural networks (CNNs), and most importantly, Transformers.

Architecture Breakthrough
RNN / LSTM Sequential modelling of text
Seq2Seq + Attention Machine translation
Transformer Parallelisable, scalable, state-of-the-art on virtually all NLP tasks
Pre-trained LLMs (BERT, GPT) Transfer learning for NLP

Core NLP Tasks

Task Description Example
Tokenisation Splitting text into individual words or subwords "Hello world" → ["Hello", "world"]
Part-of-Speech Tagging Labelling each word with its grammatical role "The cat sat" → [DET, NOUN, VERB]
Named Entity Recognition Identifying names, places, dates, etc. "London" → LOCATION
Sentiment Analysis Determining the emotional tone of text "Great product!" → Positive
Machine Translation Translating text between languages English → French
Text Summarisation Producing a shorter version of a document Long article → Summary
Question Answering Answering questions given a context passage "Who wrote Hamlet?" → "Shakespeare"
Text Generation Producing coherent text from a prompt Chatbots, story generation

Python Libraries for NLP

Library Purpose
NLTK Classic NLP library — tokenisation, stemming, corpora, educational use
spaCy Industrial-strength NLP — fast, production-ready pipelines
Hugging Face Transformers Pre-trained transformer models (BERT, GPT, T5, etc.)
Gensim Topic modelling and word embeddings (Word2Vec, Doc2Vec)
TextBlob Simple API for common NLP tasks (sentiment, translation)
Scikit-Learn TF-IDF vectorisation, classical ML classifiers

A First NLP Example in Python

import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

nltk.download('punkt_tab')
nltk.download('stopwords')

text = "Natural Language Processing enables computers to understand human language."

# Tokenise
tokens = word_tokenize(text)
print("Tokens:", tokens)

# Remove stop words
stop_words = set(stopwords.words('english'))
filtered = [w for w in tokens if w.lower() not in stop_words]
print("Filtered:", filtered)

Output:

Tokens: ['Natural', 'Language', 'Processing', 'enables', 'computers', 'to', 'understand', 'human', 'language', '.']
Filtered: ['Natural', 'Language', 'Processing', 'enables', 'computers', 'understand', 'human', 'language', '.']

Real-World Applications

Healthcare

  • Clinical note analysis and medical coding
  • Drug interaction detection from literature

Business

  • Customer feedback analysis (sentiment, topics)
  • Intelligent chatbots and virtual assistants

Legal

  • Contract analysis and clause extraction
  • Legal document search and summarisation

Education

  • Automated essay scoring
  • Language learning assistants

Media & Social

  • Content moderation and hate speech detection
  • Fake news detection

When to Use NLP

NLP is the right tool when:

  • You have unstructured text data (emails, reviews, documents, social media)
  • The task requires understanding meaning — not just pattern matching
  • You need to process text at scale (thousands or millions of documents)
  • A human would need to read and interpret the text

Avoid NLP when:

  • The data is purely numerical or structured (use traditional ML instead)
  • A simple keyword search or regular expression suffices
  • You lack sufficient text data for the language or domain

Summary

Natural Language Processing is a field of AI dedicated to teaching computers to work with human language. It has evolved from hand-crafted rules through statistical methods to modern deep learning with Transformers. Key NLP tasks include tokenisation, classification, sentiment analysis, named entity recognition, machine translation, and text generation. Python offers a rich ecosystem — NLTK for learning, spaCy for production, and Hugging Face Transformers for state-of-the-art models — making NLP accessible to beginners and practitioners alike.