Why operators are ignoring SCADA alarms
In this blog post, we present why SCADA alarms are abundant but not helpful for operators.
In this blog post, we present why SCADA alarms are abundant but not helpful for operators.
Virtually any SCADA system contains a data and an event component. It makes sensor data and set-points available through the data component, and also provides an interface to the logs of the equipment; the event data.
Here are some examples of conventional SCADA events:
Event data can come in various flavours; logs convey any actions from operators (stops, manual interventions), warnings show relevant but not critical events, and finally, alarms contain information critical to the operator. Seems that an operator of a complex piece of equipment (say a wind turbine) is well equipped with this to make well-informed real-time decisions on its operations, right? Unfortunately, that’s not the case.
An average wind turbine easily generates dozens to hundreds of SCADA events per day. In the following plot, we show the number of alarms generated by the SCADA event system for a 19-turbine wind farm. We can easily see that such an amount of alarms is impractical.
Do you still think your operators are actually looking at this information? No, as John Naisbitt once said, they are drowning in information but starved for knowledge.
These consist of threshold-based, univariate (based on only a single sensor) rules and, although narrowly useful, are very limited. Let's see why.
Taking example rule 1. above. If we are monitoring a wind turbine, a simplified rule could be: If the generator temperature is above 60ºC, raise an alarm (see Figure 2).
Indeed, having the generator temperature above 60ºC can be something we want to avoid at all times, regardless of any other factors. But now imagine that at a certain moment in time, the wind flowing over the turbine blades is weak, making the generator produce less energy, and its temperature is above 50ºC. In an high-energy-generation operation mode, 50ºC could be fine and as expected. But under low energy generation -in this operation condition-, 50ºC is actually alarmingly high (see what we did there?). However, no alarm will be raised by the SCADA system.
An alternative would be to configure the rule to be defensive and set the original threshold to a lower value, such as 50ºC. But this would have the reverse problem, where too many alarms would be raised when the wind surrounding the turbine is stronger (in the nomenclature of statistics, this would mean a lower precision and higher recall performance).
This is the main problem with rule-based SCADA events; they are not context-aware! They are the one-size-fits-all T-Shirt of asset operations. And there are so many elements of context that are important; environmental conditions, operational conditions, asset-specific identity (not all assets are the same), seasonality, etc.
An example of this is shown in Figure 1, where a large number of alarms are flagged, producing more noise than information. This has a fatiguing effect on the operator looking at them. The operator becomes desensitised and ignores the alarms at the times where they could be most useful. This is a known effect called alarm fatigue. For sensors with higher variance or fast dynamics (for example wind speeds), alarm fatigue tends to be even worse.
In order to reduce alarm fatigue, we need more complete rules that give fewer but more meaningful alarms. Rules that are dynamic over time and update their threshold values given the operating context.
So how about as an intermediate step we invest more time in configuring a better rule system? One that will be able to handle seasonality? And maybe control options too? And how about environmental conditions, and operational conditions?
And also, very importantly, even if from the same make and model, each asset is slightly different and unique (see Figure 3 for an example). What about if we also tweak all of the above for each asset and each sensor individually?
You see where this is going, right? Before you know it, you have a jungle of hundreds or thousands of rules. Testing this system, adding rules, changing rules or removing rules becomes an enormous effort, and any potential financial upsides from this improved rule system will be swallowed by the additional costs in maintaining the system.
However, there’s another solution to solve this; by letting intelligent systems figure out all these rules! In our post on Normality modelling, we explain how at Jungle we deliver machine-learning-based models that understand the individual asset and the context in which it’s operating. Using all the historical data available, a digital twin learns each individual asset dynamic behaviour taking into account the unique operational regimes and environmental context it is exposed to. Based on these models, creating meaningful alarms is quite straightforward.
Based on various SCADA sensors (e.g. active power and wind speed, among others), we model the normal behaviour of several components in wind turbines.
For example, our model output for the generator slip ring is shown below in green. Shades of green indicate the degree of normality and green dots being the most likely normal behaviour. Red dots show actual sensor measurements from the SCADA data.
Through the outputs of our normality models, we have now access to the dynamic and context-aware threshold that we were searching for! These prediction bands tell us if, for a certain turbine and for it’s operational context, the sensor readings are as expected. They are the basis in which we will be developing our alarms!
We have covered a lot so far! We understood why SCADA events are not insightful enough and why static threshold-based alarms are also not the solution. We also introduced the dynamic and context-aware prediction bands provided by our normality models on top of which we will build our alarms mechanism.