Data Science Glossary
Brush up on your data science terminology with our simple data science glossary below.

Algorithm
An algorithm is a self-contained sequence of actions that performs a specific task. Machine learning algorithms are the basis of applying artificial intelligence (AI) in data-rich environments.
Anomaly Detection
Using machine learning algorithms/models to recognise situations or cases which are unusual – for example, equipment/process status falling outside the bounds of normal operation.
Artificial Intelligence
If a computer does something that, if done by a human would be said to be intelligent, it is showing artificial intelligence (AI). AI includes many different technologies and disciplines including machine learning.
Association Analysis
Finds sets of events which tend to happen together.
Big Data
Describes data sets that are so large or complex that traditional data processing and basic analytical approaches are inadequate to deal with them.
Business Intelligence
Providing analytical information to end users via reports, graphs and other visual representations, usually with facilities to explore the data. Commonly used in descriptive analytics, or as a means of displaying the results of diagnostic, predictive and prescriptive analytics.
Classification
Using machine learning algorithms/models to differentiate between different outcome classes e.g. normal vs. abnormal plant operation.
Clustering
Clustering algorithms group together similar cases. Clustering algorithms can be used, for example, to automatically identify different operating modes of specific equipment.
CRISP-DM
A widely adopted methodology for machine learning/analytics projects that covers all stages from understanding the business problem, through data science work, to deployment of results and ongoing maintenance of deployed solutions.
Data Science
The practice of applying machine learning algorithms and other analytical tools and techniques to data to address business/operational problems.
Decision Trees
Models where decisions are arrived at by traversing a tree structure. Decision trees have the advantage of being relatively easy to understand; reading through the tree gives insights into how different factors interact in predicting, for example, variations in levels of production.
Deployment
Putting the results of artificial intelligence and predictive analytics to work – going “from analysis to action”.
Descriptive Analytics
Using analytical techniques to describe and understand what is or has been happening, up to the current point in time.
Diagnostic Analytics
Using analytical techniques to discover the root causes of phenomena that are being / have been observed.
Domain Expertise
Human knowledge about the area to which machine learning is being applied. In the Oil & Gas industry, domain expertise typically comes from engineers.
Dynamic Data
Data that changes frequently: for example, daily production measures, or the crew assigned to particular activities.
Ensembles
Combinations of predictive models used together to reach a decision with greater accuracy and confidence. Analogous to a panel of human experts collaborating.
Estimation/Forecasting
Using machine learning algorithms/models to predict a numeric value, for example, the level of oil in water at a point in the near future.
Feature Engineering
Creating, from raw data, higher level features/variables which can help machine learning algorithms create more accurate and useful models. As these features typically embody knowledge about the process or equipment being analysed, domain expertise is an important input to feature engineering.
Insight (X-PAS™)
An insight is a piece of meaningful information, observation or potential opportunity which is derived from a customer's asset data by X-PAS™.
Machine Learning
Machine learning algorithms are applied to historical data and create models which can be used to make judgements about current or future cases.
Model
An entity constructed by a machine learning algorithm. It receives input data – such as for example, a set of current tag values – and produces one or more outputs such as the probability of near-term equipment failure.
Neural Network
A family of algorithms/models inspired by the structure of the brain. Neural networks provide fine-grained scoring, but for some uses their opacity is an obstacle.
NLP
Natural Language Processing (NLP) analyses text with regard to language structure and meaning. Can be used to process unstructured data sources such as log entries and inspection notes.
Optimisation
Using a combination of mathematical / constraint-based techniques to find the “optimal” solutions – for example, changes to plant settings that maximise production while minimising emissions.
Predictive Analytics
The analysis of historical data in such a way that the findings can be used to make robust and accurate assessments of new or future cases. Machine learning, and the predictive models that produces, are central to this.
Prescriptive Analytics
Usually built on the outputs of predictive analytics, combined with business rules or optimisation, to recommend actions e.g. interventions to prevent equipment failure or adjustments to improve production efficiency.
Propensity
A propensity is a measure of the likelihood of a particular outcome. Propensities are usually produced by models and are sometimes confused with probabilities, but these have a precise and formal statistical definition while the exact meaning of propensity varies from application to application. They are a figure in a numeric range – often 0.0 to 1.0 – where the higher the propensity, the more likely something is.
Rule Induction
Models which represent classification decisions as a series of rules. These are easy to read and understand, and can be thought of as “profiles” of the situations in which different outcomes are expected.
Scoring
Using machine learning algorithms/models to give the propensity of a particular outcome e.g. equipment failure.
Sequence Detection
Finds sets of events which tend to happen in a sequence over time.
Static Data
Data that does not change, or only changes in exceptional circumstances – for example, the make, model and rated capacity of equipment.
Statistical Analysis
Applying statistical tests to data to prove or disprove hypotheses, identify correlations, and determine the significance of patterns. Commonly used in descriptive and diagnostic analysis.
Streaming Data
Data that is updated at high frequency, for example sensor readings and alarms.
Structured Data
Structured data is the well-organised, well-behaved data typically stored in databases and spreadsheets. Records of for example tag values at a particular point in time contain a fixed and known number of fields, and the content of each field will usually be of the same type (e.g. a reading or a flag).
Supervised Learning
Machine learning techniques applied to historical data where the outcome to be predicted is known, such as for classification.
Unstructured Data
Data that doesn’t fit the simple repeated structure of structured data: for example, free text, recordings of speech or other sounds, images, and video clips.
Unsupervised Learning
Machine learning techniques applied to historical data where no outcome is specified for prediction, but rather patterns are to be detected, such as for clustering.
X-PAS™
X-PAS™ is a cloud-based software platform that helps oil and gas asset teams to make better use of their data, so they can make more informed decisions and achieve cleaner, more efficient and lower cost energy production.
Signup to our Newsletter
