written by Emiliano Sammassimo
For sure, you have heard talking about Big Data more and more frequently for many years, usually in opposition to traditional data. But, if you try to really understand the issue, are this data actually so Big Data, as they have been called?
It is not just a question of size, indeed. We can go from terabytes to larger sizes, but what matters the most, Big Data must satisfy some fundamental characteristics.
We are used to talk about the 4Vs of Big Data. But today many others speak of 5Vs or even 6.
Let’s see in detail which are the 4 fundamental characteristics of Big Data and also the latest definitions. Here we only talk about the definitions, taking into account that we have already clarified the way of saving and managing Big Data explaining the concepts and differences between data lake and data warehouse.
- Volume: concerns the amount of data that is generated by the systems through which it is collected and saved. Data updating is usually constant and continuous.
- Variety: data can be divided into two categories: structured and unstructured data. The former refers to recognizable data and the latter to raw data. An example may be web pages. By variety, we mean the heterogeneity of the data sources that are saved, sources that are not only proprietary systems but also social networks, sensors such as beacons or other proximity devices and so on.
- Velocity: data is constantly updated and constantly changing, coming from numerous platforms. Many devices and sources are able to collect and update data in real time, the real difficulty is analyzing it in real time and making decisions based on data changes compared to the moment. The change, in this case as in the others, is epochal and concerns not only the underlying technology but also the business models and company dynamics. Fast-changing real-time data means profound changes not only in choices but also in business dynamics.
- Veracity: or reliability. The saying goes “having wrong data is worse than having none at all”. For this reason, one of the fundamental characteristics of Big Data is reliability. The speed of data collection and the variety of sources should be integrated and able to tell the truth, in a word, talking to each other. Telling the truth, how they change and vary with respect to the difference in sources and how often they are updated. Having, and therefore analyzing, unreliable data can be much riskier than having no data at all.
Big Data 6Vs
We have just clarified which are the 4 fundamental characteristics of Big Data. Let’s see which are the additional ones, i.e. the fifth and sixth which acquire ever greater significance and importance in specific Business Analysis or process scenarios.
- Variability: that is variability, a lot of data in different formats and coming from different sources. It is fundamental to take into account the differences in the data from the different sources and the frequency with which they are updated.
- Value: the value. Today, data is considered as gold, a classical safe haven. However, their value is not limited to the mere collection. It is fundamental to perform analysis and the value of the analysts who everyday deal with the data, their characteristics and their meanings are even more important so that the mass of collected data acquires business value.
Understanding the characteristics of the Big Data Vs is crucial, in order to maintain their reliability and correctly integrate them into analysis systems that can support the business and guide decisions and plans. 3rdPlace designs and develops Big Data integration systems from multiple sources, read for example the case history of Italo Treno