Volume, Variety, Velocity and Veracity of Data Explained
In today’s digital environment, shortage of data is rarely an issue. On the contrary, the masses of complex data representing operations requires sophisticated AI and machine learning techniques to extract relevant insights and information – and serves as the “fuel” that enables high value applications of these advanced technologies.
AI applications thrive on rich, large scale data; the closer you can come to a true 360° data view of the situations being analysed, the more successful these applications will be.
In considering the data to be analysed, data scientists will weigh up:
- The volume of data: Is there enough data, and is it of such a size that special consideration needs to be given to how to scale its analysis?
- The variety of data: Structured data is the data typically seen in databases, spreadsheets and the like; each record follows the same format, with fields or variables that can be easily extracted. Unstructured data is the data that doesn’t exist in such regular, repetitive formats – for example, free text, images, audio and video. In the oil and gas industry, data scientists will often combine structured data, such as instrumentation readings and alarms, with unstructured data including operational logs and images and videos from equipment inspection.
- The velocity of data: Static data, such as equipment descriptions, rarely (or never) changes. Dynamic data changes frequently, either regularly (such as daily production measures) or irregularly (for example, the combination of wells feeding in to a particular system). Data received at high frequency and often at regular intervals, such as sensor readings, is described as streaming data.
- The veracity of data: early in any project, data scientists will study the quality of data to understand where data is missing or incorrect, and either take steps to rectify that or design analytical approaches that work around the quality issues.