Why operators are ignoring SCADA alarms

In this blog post, we present why SCADA alarms are abundant but not helpful for operators.

Silvio Rodrigues

CIO & Co-founder

Technology

Virtually any SCADA system contains a data and an event component. It makes sensor data and set-points available through the data component, and also provides an interface to the logs of the equipment; the event data.

Here are some examples of conventional SCADA events:

If temperature X is above the threshold a, raise alarm.
If speed Y is below threshold b, raise alarm.
If component Z is on for more than c minutes, raise alarm.
If the operator pressed a button, log an entry.
If alarm A has not been acknowledged within M minutes, raise it again.

‍

Event data can come in various flavours; logs convey any actions from operators (stops, manual interventions), warnings show relevant but not critical events, and finally, alarms contain information critical to the operator. Seems that an operator of a complex piece of equipment (say a wind turbine) is well equipped with this to make well-informed real-time decisions on its operations, right? Unfortunately, that’s not the case.

‍

SCADA alarms are fatiguing operators

An average wind turbine easily generates dozens to hundreds of SCADA events per day. In the following plot, we show the number of alarms generated by the SCADA event system for a 19-turbine wind farm. We can easily see that such an amount of alarms is impractical.

‍

Figure 1: SCADA alarms count for a wind farm with 19 turbines. Larger circles represent higher counts.

‍

Do you still think your operators are actually looking at this information? No, as John Naisbitt once said, they are drowning in information but starved for knowledge.

‍

These consist of threshold-based, univariate (based on only a single sensor) rules and, although narrowly useful, are very limited. Let's see why.

Taking example rule 1. above. If we are monitoring a wind turbine, a simplified rule could be: If the generator temperature is above 60ºC, raise an alarm (see Figure 2).

Indeed, having the generator temperature above 60ºC can be something we want to avoid at all times, regardless of any other factors. But now imagine that at a certain moment in time, the wind flowing over the turbine blades is weak, making the generator produce less energy, and its temperature is above 50ºC. In an high-energy-generation operation mode, 50ºC could be fine and as expected. But under low energy generation -in this operation condition-, 50ºC is actually alarmingly high (see what we did there?). However, no alarm will be raised by the SCADA system.

‍

Figure 2: Two example rules for 60ºC (above) and 50ºC (below) threshold values. Whereas the first would raise no alarm, the second would raise what could be false alarms. (spoiler alert: green bands show the expected range as calculated by Jungle alarms. More on this later.

‍

An alternative would be to configure the rule to be defensive and set the original threshold to a lower value, such as 50ºC. But this would have the reverse problem, where too many alarms would be raised when the wind surrounding the turbine is stronger (in the nomenclature of statistics, this would mean a lower precision and higher recall performance).

This is the main problem with rule-based SCADA events; they are not context-aware! They are the one-size-fits-all T-Shirt of asset operations. And there are so many elements of context that are important; environmental conditions, operational conditions, asset-specific identity (not all assets are the same), seasonality, etc.

An example of this is shown in Figure 1, where a large number of alarms are flagged, producing more noise than information. This has a fatiguing effect on the operator looking at them. The operator becomes desensitised and ignores the alarms at the times where they could be most useful. This is a known effect called alarm fatigue. For sensors with higher variance or fast dynamics (for example wind speeds), alarm fatigue tends to be even worse.

In order to reduce alarm fatigue, we need more complete rules that give fewer but more meaningful alarms. Rules that are dynamic over time and update their threshold values given the operating context.

‍

Creating Context-aware Alarms

So how about as an intermediate step we invest more time in configuring a better rule system? One that will be able to handle seasonality? And maybe control options too? And how about environmental conditions, and operational conditions?

And also, very importantly, even if from the same make and model, each asset is slightly different and unique (see Figure 3 for an example). What about if we also tweak all of the above for each asset and each sensor individually?

‍

Figure 3: Turbines of the same model still present different dynamics and personalities. In this case we see that the generator temperature is similar but not exactly the same for turbines of the same wind farm.

‍

You see where this is going, right? Before you know it, you have a jungle of hundreds or thousands of rules. Testing this system, adding rules, changing rules or removing rules becomes an enormous effort, and any potential financial upsides from this improved rule system will be swallowed by the additional costs in maintaining the system.

‍

Figure 4: The number of possible alarms, even if defined with simple thresholds, can easily increase in complexity. Image source

‍

However, there’s another solution to solve this; by letting intelligent systems figure out all these rules! In our post on Normality modelling, we explain how at Jungle we deliver machine-learning-based models that understand the individual asset and the context in which it’s operating. Using all the historical data available, a digital twin learns each individual asset dynamic behaviour taking into account the unique operational regimes and environmental context it is exposed to. Based on these models, creating meaningful alarms is quite straightforward.

Based on various SCADA sensors (e.g. active power and wind speed, among others), we model the normal behaviour of several components in wind turbines.

For example, our model output for the generator slip ring is shown below in green. Shades of green indicate the degree of normality and green dots being the most likely normal behaviour. Red dots show actual sensor measurements from the SCADA data.

‍

Figure 5: Prediction bands provided by our normality models which inform us, at each timestamp, what should be the observation range for various sensors given the operating conditions of the turbine.

‍

Through the outputs of our normality models, we have now access to the dynamic and context-aware threshold that we were searching for! These prediction bands tell us if, for a certain turbine and for it’s operational context, the sensor readings are as expected. They are the basis in which we will be developing our alarms!

We have covered a lot so far! We understood why SCADA events are not insightful enough and why static threshold-based alarms are also not the solution. We also introduced the dynamic and context-aware prediction bands provided by our normality models on top of which we will build our alarms mechanism.

‍

Silvio Rodrigues

CIO & Co-founder

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.