The software defined programmable logic controller: An introduction to Red Hat’s predictable latency / realtime capabilities
The software-defined industrial control system is a huge opportunity for open source technologies. Moving away from bespoke specialized hardware (like PLC devices) to standard hardware with software only is a trend that crosses industries, extending now into the industrial and manufacturing world. This type of shift for industrial organizations is similar to telcos moving to network function virtualisation (NFV) and open radio access networks (ORAN).
Successfully implementing this trend, however, requires low latency capabilities. The controller has to respond quickly to an external event, such as a robot hitting a wall and stopping it before it actually makes contact. Please note that I avoid the term “real time”, which requires specialized Real Time Operating Systems (RTOS). Instead, the best term to use is “predictable latency.”
One of Red Hat’s answers to this burgeoning industrial trend is Red Hat Enterprise Linux (RHEL) for Real Time, which serves as the base for the Red Hat OpenShift Performance Addon Operator (PAO). While this was developed for the telco space, it’s very suitable for reuse for operational technologies and industrial control systems.
Together with Intel, we presented Digital Transformation at the Edge with Red Hat and Intel Edge Controls for Industrial, the result of a combined testing and evaluation effort between our two companies.
Let’s take a look at the data:
This test is done using Red Hat OpenShift on bare metal with the Performance Add-On Operator and the RHEL Real-Time kernel (RT-Kernel). Max
(the time from event to response) is around 47 µsec to 33 µsec. This is more or less out of the box on standard hardware, with next to no special tuning besides activating Intel Cache Allocation Technology. This is already good enough for many industrial control system applications with PLC cycle times in the ~10 msec range, leaving enough headroom for the actual program execution.
latency (the time from event to response) is around 47 µsec to 33 µsec. This is more or less out of the box on standard hardware, with next to no special tuning besides activating Intel Cache Allocation Technology. This is already good enough for many industrial control system applications with PLC cycle times in the ~10 msec range, leaving enough headroom for the actual program execution.
But it gets even better:
This chart shows the value of the OpenShift Performance Add-On Operator and the RHEL RT-Kernel it deploys. Note that we switched from latency (how
quickly does the system respond) tomsec — not acceptable for most control applications.
quickly does the system respond) to jitter (how consistently does the system respond, which is actually more important). You can see on a default system (standard kernel), it is around 57— not acceptable for most control applications.
But just by turning on the OpenShift Performance Add On Operator, with no additional tuning, this drops down to 55 µsec. That’s an improvement of three orders of magnitude.
Let’s take a look at what’s possible when we add Intel capabilities to the equation:
The reason for the left hand side wide deviations is CPU cache misses. If the code/data is not in the cache, it takes quite some time to fetch it from memory. First and second level caches can be disturbed by other workloads or processes running on the same core.
avoids this by dedicating parts of the CPU cache to a process.
Intel Cache Allocation Technology avoids this by dedicating parts of the CPU cache to a process.
Looking at the new numbers, the old minimum turns into the new maximum, with a 76% reduction in max jitter.
Conclusion
The numbers show that low latency workload like industrial control systems can run on modern standard hardware in containerized environments. No need for special purpose devices, no need for long lasting heavy fine tuning. The out of the box performance characteristics are good enough for most use cases.