Data Science Glossary
Brush up on your data science terminology with our simple data science glossary below.
An algorithm is a self-contained sequence of actions that performs a specific task. Machine learning algorithms are the basis of applying artificial intelligence (AI) in data-rich environments.
Using machine learningalgorithms/models to recognise situations or cases which are unusual – for example, equipment/process status falling outside the bounds of normal operation.
If a computer does something that, if done by a human would be said to be intelligent, it is showing artificial intelligence (AI). AI includes many different technologies and disciplines including machine learning.
Finds sets of events which tend to happen together.
Describes data sets that are so large or complex that traditional data processing and basic analytical approaches are inadequate to deal with them.
Providing analytical information to end users via reports, graphs and other visual representations, usually with facilities to explore the data. Commonly used in descriptive analytics, or as a means of displaying the results of diagnostic, predictive and prescriptive analytics.
Using machine learning algorithms/models to differentiate between different outcome classes e.g. normal vs. abnormal plant operation.
Clustering algorithms group together similar cases. Clustering algorithms can be used, for example, to automatically identify different operating modes of specific equipment.
A widely adopted methodology for machine learning/analytics projects that covers all stages from understanding the business problem, through data science work, to deployment of results and ongoing maintenance of deployed solutions.
The practice of applying machine learning algorithms and other analytical tools and techniques to data to address business/operational problems.
Models where decisions are arrived at by traversing a tree structure. Decision trees have the advantage of being relatively easy to understand; reading through the tree gives insights into how different factors interact in predicting, for example, variations in levels of production.
Putting the results of artificial intelligence and predictive analytics to work – going “from analysis to action”.
Using analytical techniques to describe and understand what is or has been happening, up to the current point in time.
Using analytical techniques to discover the root causes of phenomena that are being / have been observed.
Human knowledge about the area to which machine learning is being applied. In the Oil & Gas industry, domain expertise typically comes from engineers.
Data that changes frequently: for example, daily production measures, or the crew assigned to particular activities.
Combinations of predictive models used together to reach a decision with greater accuracy and confidence. Analogous to a panel of human experts collaborating.
Using machine learning algorithms/models to predict a numeric value, for example, the level of oil in water at a point in the near future.
Creating, from raw data, higher level features/variables which can help machine learning algorithms create more accurate and useful models. As these features typically embody knowledge about the process or equipment being analysed, domain expertise is an important input to feature engineering.
Machine learning algorithms are applied to historical data and create models which can be used to make judgements about current or future cases.
An entity constructed by a machine learning algorithm. It receives input data – such as for example, a set of current tag values – and produces one or more outputs such as the probability of near-term equipment failure.
A family of algorithms/models inspired by the structure of the brain. Neural networks provide fine-grained scoring, but for some uses their opacity is an obstacle.
Natural Language Processing (NLP) analyses text with regard to language structure and meaning. Can be used to process unstructured data sources such as log entries and inspection notes.
Using a combination of mathematical / constraint-based techniques to find the “optimal” solutions – for example, changes to plant settings that maximise production while minimising emissions.
The analysis of historical data in such a way that the findings can be used to make robust and accurate assessments of new or future cases. Machine learning, and the predictive models that produces, are central to this.
Usually built on the outputs of predictive analytics, combined with business rules or optimisation, to recommend actions e.g. interventions to prevent equipment failure or adjustments to improve production efficiency.
A propensity is a measure of the likelihood of a particular outcome. Propensities are usually produced by models and are sometimes confused with probabilities, but these have a precise and formal statistical definition while the exact meaning of propensity varies from application to application. They are a figure in a numeric range – often 0.0 to 1.0 – where the higher the propensity, the more likely something is.
Models which represent classification decisions as a series of rules. These are easy to read and understand, and can be thought of as “profiles” of the situations in which different outcomes are expected.
Using machine learning algorithms/models to give the propensity of a particular outcome e.g. equipment failure.
Finds sets of events which tend to happen in a sequence over time.
Data that does not change, or only changes in exceptional circumstances – for example, the make, model and rated capacity of equipment.
Applying statistical tests to data to prove or disprove hypotheses, identify correlations, and determine the significance of patterns. Commonly used in descriptive and diagnostic analysis.
Data that is updated at high frequency, for example sensor readings and alarms.
Structured data is the well-organised, well-behaved data typically stored in databases and spreadsheets. Records of for example tag values at a particular point in time contain a fixed and known number of fields, and the content of each field will usually be of the same type (e.g. a reading or a flag).
Machine learning techniques applied to historical data where the outcome to be predicted is known, such as for classification.
Data that doesn’t fit the simple repeated structure of structured data: for example, free text, recordings of speech or other sounds, images, and video clips.
Machine learning techniques applied to historical data where no outcome is specified for prediction, but rather patterns are to be detected, such as for clustering.