An introduction to big data
How analysing huge amounts of information can impact our lives
This guide looks at what big data actually is, and how this information can be used in many different ways.
What is big data?
In its simplest form, the term ‘big data’ just refers to massive sets of information. However, it’s a term more commonly used to describe a way of studying that information to understand and predict human behaviour.
As internet users, shoppers, travellers and workers, we’re constantly creating huge datasets about our habits. In the right hands and with the right technology, it’s possible to analyse all this information to spot and predict trends.
There are a few features that make big data so valuable and revolutionary:
- It’s big. And we mean really big. For example, a typical computer hard drive has around 1 terabyte (1TB) of storage. You could fit around 2 million photos on one of these. It’s estimated that Facebook users collectively generate over 500TB of data each day.
- It’s messy. Such enormous datasets hold more information than any one person can comprehend. It requires powerful algorithms and computer programmes to clean, process, and understand the information.
- It’s valuable. Businesses are prepared to pay top dollar to get their hands on this big data. They can learn a lot about their customers’ behaviour with such information, whether the customer likes it or not. There are even data marketplaces where customer info is bought and sold.
- It’s the future. Big data analytics is still quite a new field. As we’re able to collect more and more information and process it faster, we’ll be able to be more accurate with our predictions.
Why is it so useful?
Big data has probably already impacted your life in one way or another. A simple example is Netflix and other streaming services. They analyse vast amounts of user data on what people watch, how long they watch it for, and how much they like it. This data then allows Netflix to give you recommendations, show trending videos, and predict a percentage match for how much you might like something.
There are many other more valuable uses of big data aside from what you fancy binge-watching. For example:
- Healthcare. Big data can help doctors provide medication and treatment targeted specifically to individual patients.
- Marketing. Companies can target their customers with highly relevant promotions and products based on the data they’ve gathered.
- Security. Law enforcement can use big data to predict where and when crime is likely to happen. This allows police to dispatch officers more effectively and help protect more people.
- Sports. Teams and athletes can access detailed information about individual performances in specific areas. They can use this data to improve further and come up with new tactics.
What are the concerns around big data?
Now, useful as all this big data can be for improving your life, there are some potential drawbacks and risks. Here are some of the other main concerns surrounding the use of big data:
- Privacy. When everything you do is being tracked, it’s easy to feel a lack of privacy. Data laws aren’t always very strong, and companies are taking more data than many people realise. In 2020, the Chinese social media platform Sina Weibo reported collecting names, gender, location and phone numbers of 172 million+ users, all of which were posted for sale on dark internet markets. Check out our guide on How to protect your data for more info.
- Security. It’s not just companies that can make use of big datasets. The fact that so much information is collected and stored leaves it vulnerable to hackers. Such breaches could leave the personal data of millions of people at risk. For example, in 2015, the Office of Policy Management breach in the USA resulted in the leak of over four million people’s fingerprints and background check information.
- Discrimination. Whilst data itself does not discriminate, the algorithms created from the data can reflect and exacerbate inequalities that exist in the world. For example, there were concerns that the new UK police algorithm, designed to help with making custody decisions, could lead to increased discrimination against poor people. The system initially used postcodes as one of the data points to decide how likely it could be for a person to re-offend, which would have reinforced existing inequalities based on where people live.
- Targeting based on vulnerability. Combining multiple forms of data sets from people can lead to ‘algorithmic profiling’, which is when people sell the data sets of the vulnerable, like children, sufferers of addiction and victims of abuse. A 2013 testimony to the US Congress by Pam Dixon, Executive Director of the World Privacy Forum, revealed a large amount of data brokers (people who illegally collect and sell data) selling lists of people who are late on credit card payments and bills, to companies who make predatory offers to those in financial trouble.
- Political manipulation. Big data can be harvested to influence politics and sway citizens. In 2014, the consulting company Cambridge Analytica harvested big data on millions of Facebook customers across the world, without their knowledge, to influence various political elections.