Big Data – is it possible to define it?

This is a big question and one which, once fully considered, has massive implications for any business. Every day, businesses are amassing an increasing amount of data and the scope of what can be measured is also expanding at an incredible rate. While businesses are still getting to grips with how to use this data meaningfully, some are struggling to manage it effectively.

Sometimes that means databases becoming corrupt at an increasing rate as they become larger, or more difficult to store effectively in-house. Massive databases may be creating too much demand on infrastructure when being processed at speed or are that diverse in nature that it is difficult to know where to start when organising them into a usable format.

What is Big Data?

Once upon a time, a business would store essential information such as client names and invoice details, order history and accounts records. This information would be structured into usable format and, with the dawn of the computer age, tied up with software making it easy to access. Looking back to that era, data was gathered conscientiously and with a definite purpose in mind. The bigger the business was, the bigger the databases required to store its prized information.

Digital evolution has changed the data landscape forever. Where data was once input into a system one unit at a time by an operator or by an individual filling in a questionnaire, data collection and its transfer is now a more automated affair. Modern day communications being what they are means that information relating to somebody’s Facebook usage, while accessing the site from their mobile phone in Indonesia, is usable in the US in the blink of an eye.

It’s not only the speed that data is transferred that’s changing, and how it’s transferred; the spirit in which data is now gathered has now become incidental, almost accidental. That is, much of what people do with their credit card or view using their browsers leaves a data trail that can be detected and processed retrospectively. It’s this discovery of what’s already out there that’s making Big Data so much of a wild card commercially with regard to its changing nature, and also what makes it so exciting an opportunity.

Types of data

The type of data being collected and the sheer amount of information that’s out there is absolutely mind-boggling. Most companies in the US have at least 100 terabytes of data stored - that’s 100,000 gigabytes if you’re wondering – according to a useful infographic (below) published by IBM. The same document highlights the fact that out of a world population of 7 billion, some 6 billion of us are using a mobile phone. Imagine the humungous amount of information being measured through our phones. What’s particularly interesting in this infographic is how data is categorised according to four criteria: volume, velocity, variety and veracity.

Volume

There’s no particular agreed volume of data needed for it to qualify as Big Data, so it’s a bit of a case of how long is a piece of string, However if you’re looking at a sizeable amount that’s becoming hard to manage, then it’s safe to say you’ve entered the realm of Big Data. It’s now possible to mine a large array of information from data trails, and where specific data is being gathered deliberately, the processes have become so efficient through better real-time technology, that it's very easy for a business to find it has a tricky amount of facts and figures very quickly.

Velocity

For data to be useful, it needs to be processed. How quickly the data is gathered in a meaningful way, and how quickly it can be analysed and used effectively, is a reflection of its velocity. As shown by the earlier infographic, it is projected that there will be 18.9 billion network connections by 2016; that’s almost 2.5 connections per person on earth, and they will be creating a whole new universe of high velocity, streaming data that will be analysed on the fly. There’s a great illustration of how velocity interacts with volume and variety in this WhatIs article.

Variety

The variety of data that exists is changing all the time as technology changes. Twenty years ago, the idea that 30 billion pieces of content would be shared on Facebook each month or that 400 million Tweets would be generated EVERY DAY were unthinkable. Data is now being collected using hi-tech medical monitoring equipment; from computers, mobile phones etc. Just as today’s variables were unimaginable yesterday, it’s highly likely that there are many others beyond tomorrow’s horizon that are unthinkable today. Big Data is going to get much bigger and even more complex, but the rewards for managing it effectively are going to be exciting for everyone.

Veracity

From the Latin root, ‘veritas’, meaning ‘truth’, this wonderful word refers to how dependable, or how certain the data that’s been gathered, is. Since records began, there were those that would leave a space blank rather than double-check, or lazily record an approximate value or sometimes an inaccurate one. Whether through human error or mechanical failure, as would be the case if a particular button on a keyboard was faulty, mistakes have always been made and will continue to be a reality for some years yet.

The problem with Big Data in relation to veracity is that mistakes tend to be greatly amplified for greater amounts of data. Also, just as an arrow that’s off course by just half a degree can finish up increasingly further from its target depending on the distance travelled to reach it, so poorly measured or recorded data can have a snowball effect in the long term, especially as data size increases.

What do Volume, Velocity, Variety and Veracity mean for business?

The scale of information gathering and processing now available to business means lucrative opportunities for understanding their target markets, current trends, projections, spending habits, ways to become more efficient; the potential is enormous. The four main categories of Big Data discussed in this article, the 4 V's, present different challenges to business in regard to how they can be practically handled.

Effects on IT infrastructure

As the volume of information being gathered continues to increase, perhaps not exponentially but dramatically, businesses are going to need servers that can handle the extra load and remote data back-up will also need to be airtight. After all, the potential to lose data will be greater because whereas one day’s data loss was this much yesterday, it will be many times more than that today. The size of data being transferred will have knock-on effects on IT infrastructure; with cabling needing to be of a suitable spec to handle the larger volume for example.

Even smarter software development

The experts are getting better at measuring things and converting that information into noughts and ones. Light, sound, humidity, occupancy are all measurable and these represent just the tip of the iceberg from the world of building management. As we are able to collect an increasing range of variables from the world around us, we will need to store these in an intelligent way for easier processing. They will need separate databases, and sophisticated software packages that can move them, shake them, make sense of them and create the commercial honey we all want.

Robust data gathering and recording

As more data is collected automatically, veracity will improve as software becomes more sophisticated and data transfer more reliable. The slightest error could mean there’s a black hole for every hundred thousandth unit of data, for example, and when dealing with a mammoth amount of information that’s going to be put through heaps of processing, the long term corruption potential could be catastrophic.

Whether or not human beings fill in the fields of CRM packages or online forms depends on many factors. Are they being asked too much by a company they have never done business with before? Is the process too tedious? In regard to CRM software, has it been embraced by the workforce who are using it or is it seen as a way of snooping on them? Have they been trained to use it correctly or is it just another unwelcome task that has landed on their lap, another thing to slow them down in an atmosphere where productivity is constantly monitored? After all, if the team who are using it don’t embrace it, how can the data being entered into the system be trusted?

Unintended consequences

Other factors that can affect veracity are the unintended consequences of sales people, for example, who are trying to find loopholes in the system or are cutting corners. Perhaps, they are creating a duplicate account for business that has been barred from further business by the accounts department of the organisation, or maybe an operator has figured out they can get through the system quicker by putting any old number into a particular field.

These accuracy issues can only be dealt with using a thorough approach – software fail-safes, intelligent design, effective communication and training of the staff that are using the software etc.

So is it possible to define Big Data?

Every aspect of Big Data is changing; the amount of data, more is expected from the data collected, and the type of data being processed is evolving. What this means is that trying to define Big Data is like trying to grab hold of blamanche. Any definition that can be applied is likely to become outdated fairly quickly because the boundaries of this new and wonderful entity are in a state of flux.

Big Data is bringing a whole new world of opportunity to everybody. For the everyday person it means added convenience because the organisations that offer everyday services are getting better at giving people what they want. From the perspective of big business, it means getting it right more often, stock that sells, projections that are accurate, efficiencies and increased profits.

The wise will bear in mind that with every opportunity, there is usually a threat and that is certainly true in the case of Big Data. If businesses don’t get to grips with how they handle Big Data accurately, efficiently and effectively, not only will they miss out on growth potential, they will be vulnerable to competitors who are in command of their Big Data.

The real question

For a business to be truly future-proofed, it needs to be prepared for change and ready to adapt to a new world. In the case of Big Data, this means be ready to take advantage of whatever new variables become measureable – and these new possibilities may well be beyond what we can currently see. The question then becomes this: -

If businesses really want to reap the benefits that Big Data brings, should they even define it in the first place or is it better to keep eyes peeled on the next big change?