Anomaly detection is a term that is commonly used and understood across verticals, however, the semantics varies from one vertical to another. For example, in cybersecurity anomaly detection refers to detecting a hacking attempt. In financial transactions anomaly detection refers to detecting compliance failures. In OT anomaly detection refers detecting that temperature readings from an array of sensors has exceeded a threshold.
Broadly speaking, anomalies can be categorized into two categories.
-Known anomalies (a.k.a. known - knowns and known - unknowns) – The rules for detecting these anomalies is well known. The rules can be combination of domain knowledge (e.g. if the temperate of a sensor is greater than 75 then it is an anomaly) and predictive rules derived from supervised learning (e.g classify a fraud when some conditions are met)
-Unknown anomalies (unknown – unknowns) – The rules for detecting these anomalies are not known. This is because we are still learning about data or the data that is gathered so far provides no clue about these anomalies. In such situations using unsupervised learning e.g PCA and clustering can be used to identify the unknown when it occurs and then study the causality.
Typically implementing anomaly detection is a two-stage process. In the first stage data from the past is used for determining the rules and in the second stage the rules are deployed and applied on the current production data. Typically for the first stage search and discovery system (e.g. Splunk or ELK) or some type big data system (e.g. Hadoop) is used along with some type ML system (e.g. Weka, Python). For the second stage, an OI system (e.g. Hitachi Operational Intelligence) is used for deploying the rules. Please read my blog (Disrupt your OI strategy) on why it is useful have system designed for OI for deploying the rules.
This combination of learning and deployment is a continuous process which enables improving the quality of the rules for capturing anomalies. The rules developed in the learning stage is a combination of different types of rules.It includes rules that captures the known anomalies and unknown anomalies. In the deployment stage the rules are continuously executed to spot anomalies and trigger appropriate actions.
There is no silver bullet for anomaly detection, instead there is methodology or approaches we define that can accelerate projects requiring anomaly detection. The following presentation describes a manufacturing use case and the associated methodology for detecting unknown anomalies in large dimensional data sets (e.g. data from 100s of sensors)