You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language. It sits at the intersection of computer science, linguistics, and machine learning.
At its core, NLP is about teaching machines to work with human language — text and speech — so they can perform useful tasks. Language is inherently complex: it is ambiguous, context-dependent, constantly evolving, and deeply tied to culture.
| Challenge | Example |
|---|---|
| Ambiguity | "I saw her duck" — did she duck, or did I see her duck (the bird)? |
| Context | "It's cold" — weather? food? a person's attitude? |
| Sarcasm | "Oh great, another meeting" — positive words, negative meaning |
| Synonymy | "happy", "glad", "cheerful" — different words, same meaning |
| Polysemy | "bank" — river bank or financial bank? |
| Coreference | "Alice told Bob she was tired" — who is "she"? |
NLP has evolved through several paradigms:
Early NLP systems relied on hand-crafted rules and grammars written by linguists. For example, a rule might say: "If a sentence starts with 'Wh-' it is a question."
| Advantage | Disadvantage |
|---|---|
| Precise for well-defined cases | Brittle — cannot handle exceptions or variation |
| Explainable | Extremely labour-intensive to build and maintain |
From the 1990s onward, NLP shifted towards learning patterns from data using statistical models and classical machine learning.
| Technique | Use Case |
|---|---|
| Naive Bayes | Text classification |
| Hidden Markov Models | Part-of-speech tagging |
| Conditional Random Fields | Named entity recognition |
| TF-IDF + Logistic Regression | Sentiment analysis |
Modern NLP is dominated by neural network models — recurrent neural networks (RNNs), convolutional neural networks (CNNs), and most importantly, Transformers.
| Architecture | Breakthrough |
|---|---|
| RNN / LSTM | Sequential modelling of text |
| Seq2Seq + Attention | Machine translation |
| Transformer | Parallelisable, scalable, state-of-the-art on virtually all NLP tasks |
| Pre-trained LLMs (BERT, GPT) | Transfer learning for NLP |
| Task | Description | Example |
|---|---|---|
| Tokenisation | Splitting text into individual words or subwords | "Hello world" → ["Hello", "world"] |
| Part-of-Speech Tagging | Labelling each word with its grammatical role | "The cat sat" → [DET, NOUN, VERB] |
| Named Entity Recognition | Identifying names, places, dates, etc. | "London" → LOCATION |
| Sentiment Analysis | Determining the emotional tone of text | "Great product!" → Positive |
| Machine Translation | Translating text between languages | English → French |
| Text Summarisation | Producing a shorter version of a document | Long article → Summary |
| Question Answering | Answering questions given a context passage | "Who wrote Hamlet?" → "Shakespeare" |
| Text Generation | Producing coherent text from a prompt | Chatbots, story generation |
| Library | Purpose |
|---|---|
NLTK | Classic NLP library — tokenisation, stemming, corpora, educational use |
spaCy | Industrial-strength NLP — fast, production-ready pipelines |
Hugging Face Transformers | Pre-trained transformer models (BERT, GPT, T5, etc.) |
Gensim | Topic modelling and word embeddings (Word2Vec, Doc2Vec) |
TextBlob | Simple API for common NLP tasks (sentiment, translation) |
Scikit-Learn | TF-IDF vectorisation, classical ML classifiers |
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
nltk.download('punkt_tab')
nltk.download('stopwords')
text = "Natural Language Processing enables computers to understand human language."
# Tokenise
tokens = word_tokenize(text)
print("Tokens:", tokens)
# Remove stop words
stop_words = set(stopwords.words('english'))
filtered = [w for w in tokens if w.lower() not in stop_words]
print("Filtered:", filtered)
Output:
Tokens: ['Natural', 'Language', 'Processing', 'enables', 'computers', 'to', 'understand', 'human', 'language', '.']
Filtered: ['Natural', 'Language', 'Processing', 'enables', 'computers', 'understand', 'human', 'language', '.']
NLP is the right tool when:
Avoid NLP when:
Natural Language Processing is a field of AI dedicated to teaching computers to work with human language. It has evolved from hand-crafted rules through statistical methods to modern deep learning with Transformers. Key NLP tasks include tokenisation, classification, sentiment analysis, named entity recognition, machine translation, and text generation. Python offers a rich ecosystem — NLTK for learning, spaCy for production, and Hugging Face Transformers for state-of-the-art models — making NLP accessible to beginners and practitioners alike.