Data Science 3x2

Algorithm

An algorithm is a self-contained sequence of actions that performs a specific task. Machine learning algorithms are the basis of applying artificial intelligence (AI) in data-rich environments.

Anomaly Detection

Using machine learningalgorithms/models to recognise situations or cases which are unusual – for example, equipment/process status falling outside the bounds of normal operation.

Artificial Intelligence

If a computer does something that, if done by a human would be said to be intelligent, it is showing artificial intelligence (AI). AI includes many different technologies and disciplines including machine learning.

Association Analysis

Finds sets of events which tend to happen together.

Big Data

Describes data sets that are so large or complex that traditional data processing and basic analytical approaches are inadequate to deal with them.

Business Intelligence

Providing analytical information to end users via reports, graphs and other visual representations, usually with facilities to explore the data. Commonly used in descriptive analytics, or as a means of displaying the results of diagnostic, predictive and prescriptive analytics.

Classification

Using machine learning algorithms/models to differentiate between different outcome classes e.g. normal vs. abnormal plant operation.

Clustering

Clustering algorithms group together similar cases. Clustering algorithms can be used, for example, to automatically identify different operating modes of specific equipment.

CRISP-DM

A widely adopted methodology for machine learning/analytics projects that covers all stages from understanding the business problem, through data science work, to deployment of results and ongoing maintenance of deployed solutions.

Data Science

The practice of applying machine learning algorithms and other analytical tools and techniques to data to address business/operational problems.

Decision Trees

Models where decisions are arrived at by traversing a tree structure. Decision trees have the advantage of being relatively easy to understand; reading through the tree gives insights into how different factors interact in predicting, for example, variations in levels of production.

Deployment

Putting the results of artificial intelligence and predictive analytics to work – going “from analysis to action”.

Descriptive Analytics

Using analytical techniques to describe and understand what is or has been happening, up to the current point in time.

Diagnostic Analytics

Using analytical techniques to discover the root causes of phenomena that are being / have been observed.

Domain Expertise

Human knowledge about the area to which machine learning is being applied. In the Oil & Gas industry, domain expertise typically comes from engineers.

Dynamic Data

Data that changes frequently: for example, daily production measures, or the crew assigned to particular activities.

Ensembles

Combinations of predictive models used together to reach a decision with greater accuracy and confidence. Analogous to a panel of human experts collaborating.

Estimation/Forecasting

Using machine learning algorithms/models to predict a numeric value, for example, the level of oil in water at a point in the near future.

Feature Engineering

Creating, from raw data, higher level features/variables which can help machine learning algorithms create more accurate and useful models. As these features typically embody knowledge about the process or equipment being analysed, domain expertise is an important input to feature engineering.

Machine Learning

Machine learning algorithms are applied to historical data and create models which can be used to make judgements about current or future cases.

Model

An entity constructed by a machine learning algorithm. It receives input data – such as for example, a set of current tag values – and produces one or more outputs such as the probability of near-term equipment failure.

Neural Network

A family of algorithms/models inspired by the structure of the brain. Neural networks provide fine-grained scoring, but for some uses their opacity is an obstacle.

NLP

Natural Language Processing (NLP) analyses text with regard to language structure and meaning. Can be used to process unstructured data sources such as log entries and inspection notes.

Optimisation

Using a combination of mathematical / constraint-based techniques to find the “optimal” solutions – for example, changes to plant settings that maximise production while minimising emissions.

Predictive Analytics

The analysis of historical data in such a way that the findings can be used to make robust and accurate assessments of new or future cases. Machine learning, and the predictive models that produces, are central to this.

Prescriptive Analytics

Usually built on the outputs of predictive analytics, combined with business rules or optimisation, to recommend actions e.g. interventions to prevent equipment failure or adjustments to improve production efficiency.

Propensity

A propensity is a measure of the likelihood of a particular outcome. Propensities are usually produced by models and are sometimes confused with probabilities, but these have a precise and formal statistical definition while the exact meaning of propensity varies from application to application. They are a figure in a numeric range – often 0.0 to 1.0 – where the higher the propensity, the more likely something is.

Rule Induction

Models which represent classification decisions as a series of rules. These are easy to read and understand, and can be thought of as “profiles” of the situations in which different outcomes are expected.

Scoring

Using machine learning algorithms/models to give the propensity of a particular outcome e.g. equipment failure.

Sequence Detection

Finds sets of events which tend to happen in a sequence over time.

Static Data

Data that does not change, or only changes in exceptional circumstances – for example, the make, model and rated capacity of equipment.

Statistical Analysis

Applying statistical tests to data to prove or disprove hypotheses, identify correlations, and determine the significance of patterns. Commonly used in descriptive and diagnostic analysis.

Streaming Data

Data that is updated at high frequency, for example sensor readings and alarms.

Structured Data

Structured data is the well-organised, well-behaved data typically stored in databases and spreadsheets. Records of for example tag values at a particular point in time contain a fixed and known number of fields, and the content of each field will usually be of the same type (e.g. a reading or a flag).

Supervised Learning

Machine learning techniques applied to historical data where the outcome to be predicted is known, such as for classification.

Unstructured Data

Data that doesn’t fit the simple repeated structure of structured data: for example, free text, recordings of speech or other sounds, images, and video clips.

Unsupervised Learning

Machine learning techniques applied to historical data where no outcome is specified for prediction, but rather patterns are to be detected, such as for clustering.

Signup to our Newsletter