You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson covers big data for the OCR A-Level Computer Science (H446) specification. You need to understand the characteristics of big data, the difference between structured and unstructured data, and how techniques like data mining and machine learning are applied.
Big data refers to datasets that are so large, complex, or rapidly changing that traditional data processing methods are inadequate. Big data requires specialised tools, techniques, and infrastructure.
| Characteristic | Description | Example |
|---|---|---|
| Volume | The sheer amount of data generated and stored. | Facebook generates over 4 petabytes of data per day. |
| Velocity | The speed at which new data is generated and must be processed. | Stock market trades processed in milliseconds; real-time sensor data. |
| Variety | The different types and formats of data. | Text, images, videos, GPS coordinates, social media posts, sensor readings. |
| V | Description |
|---|---|
| Veracity | The accuracy and trustworthiness of the data. Not all data is reliable. |
| Value | The usefulness and business insights that can be extracted from the data. |
| Feature | Structured Data | Unstructured Data |
|---|---|---|
| Format | Organised in tables with defined fields and data types | No predefined format or schema |
| Storage | Relational databases (SQL) | Data lakes, NoSQL databases, file systems |
| Examples | Customer records, financial transactions, exam results | Emails, social media posts, images, videos, audio |
| Querying | Easy to query with SQL | Requires specialised tools (NLP, image recognition) |
| Proportion | Approximately 20% of all data | Approximately 80% of all data |
Data with some organisation but not fitting neatly into tables:
Data mining is the process of analysing large datasets to discover patterns, trends, and relationships that are not immediately obvious.
| Technique | Description | Example |
|---|---|---|
| Classification | Categorising data into predefined groups. | Classifying emails as spam or not spam. |
| Clustering | Grouping similar data points without predefined categories. | Grouping customers by purchasing behaviour. |
| Association rules | Finding relationships between items. | "Customers who buy bread often also buy butter." |
| Anomaly detection | Identifying unusual data points. | Detecting fraudulent bank transactions. |
| Prediction | Using historical data to forecast future outcomes. | Predicting which customers are likely to cancel a subscription. |
| Application | Description |
|---|---|
| Retail | Market basket analysis, personalised recommendations. |
| Healthcare | Disease pattern recognition, drug interaction analysis. |
| Finance | Fraud detection, credit risk assessment. |
| Social media | Sentiment analysis, trend prediction. |
| Science | Genomics research, climate modelling. |
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.