According to SAS (http://www.sas.com/) “Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis1. But it’s not the amount of data that’s important2. It’s what organizations do with the data that matters3. Big data can be analyzed for insights that lead to better decisions and strategic business moves4.”
Let’s analyse what exactly is being said first:
- Big Data is unstructured – end – no discussion here because people get this wrong more often than not -any other data is called large volumes of data. Large volumes of data in a structured environment can be addressed by tools that have been around for a long time – there are many different ways to optimize and address said volume with regular databases. Because of the increase in quantity, the downward variance in quality and cheaper storage options, more data is being retained. This is the reason we have all sorts of unstructured data being held in storage for a lower cost and for much longer.
- “But it’s not the amount of data that’s important” – who wrote this? Whether it is structured or unstructured data this data needs to be analysed. The more data there is the more thought has to go in to how that data is analysed, or sliced and diced, so that the business can make use of this information. The more data in play the more storage is needed. The more storage that is needed must give a return on investment so there needs to be a different approach to this newer, larger volume of data for that reason alone. New methods have come into play – the front-runner being the statistical language R, the next in class being Python (two great programming languages that, in my view, shouldn’t be compared. They both have their advantages and will be discussed at a later date here). SQL just doesn’t cut it when it comes to unstructured data, although this may change.
- “It’s what organizations do with the data that matters” – again – who wrote this? Of course it’s what they do that matters! Why would an organisation collect and collate this data unless it was of use to them in both current and future analyses. There are a many number of analyses out there – check my shortened view here.
- The final statement could be construed as true, but many organisations collect this data for the wrong reasons, as in “Do I want to ensure that employees are not going outside their job parameters and checking a recipe for dinner later?” I think not, but this seems to be one of the prevalent ways in which this data is being used. Heavy investment, poor return. Big Brother does not a happy employee make.