Which Quality of Service (QoS) is Right for IIoT?

Part 6 of Data Communication for Industrial IoT

Quality of Service (QoS) is a general term to indicate the delivery contract from a sender to a receiver.  In some applications QoS talks about delivery time, reliability, latency or throughput.  In IIoT, QoS generally refers to the reliability of delivery.

Using MQTT as an example, there are three common Quality of Service levels for IIoT:

  • Level 0 – At most once.  Every message will be delivered on a best-effort basis, similar to UDP.  If the message is lost in transit for whatever reason, it is abandoned―the receiver never receives it, and the sender does not know that it was lost.
  • Level 1 – At least once.  Every message will be delivered to a receiver, though sometimes the same message will be delivered two or more times.  The receiver may be able to distinguish the duplicates, but perhaps not.  The sender is not aware that the receiver received multiple copies of the message.
  • Level 2 – Exactly once.  Every message will be delivered exactly once to the receiver, and the sender will be aware that it was received.

These QoS levels actually miss something important that comes up a lot in industrial systems, but let’s look at these three quickly.

First, QoS level 0 is simply unacceptable.  It is fine to lose a frame of a video once in a while, but not fine to lose a control signal that safely shuts down a stamping machine.  If the sender is transmitting data more quickly than the receiver can handle it, there will come a point where in-flight messages will fill the available queue positions, and new messages will be lost.  Essentially, QoS 0 will favor old messages over new ones.  In IIoT, this is a fatal flaw.  There’s no reason to discuss QoS 0 further.

QoS level 1 seems pretty reasonable at first glance.  Message duplication is not a problem in most cases, and where there is an issue the duplicates can be identified by the receiver and eliminated, assuming the client maintains enough history to be able to identify them.

However, problems arise when the sender is transmitting data more quickly than the receiver can process it.  Since there is a delivery guarantee at QoS 1, the sender must be able to queue an infinite number of packets waiting for an opportunity to deliver them.  Longer queues mean longer latencies.  For example, if I turn a light on and off three times, and the delivery latency is 5 seconds simply due to the queue volume, then it will take 30 seconds for the receiver to see that the light has settled into its final state.  In the meantime the client will be acting on false information.  In the case of a light, this may not matter much (unless it is a visual alarm), but in industrial systems timeliness matters.  The problem becomes even more severe if the client is aggregating data from multiple sources.  If some sources are delayed by seconds or minutes relative to other, then the client will be performing logic on data that are not only inconsistent with reality but also with each other.

Ultimately, QoS 1 cannot be used where any client could produce data faster than the slowest leg of the communication path can handle.  Beyond a certain data rate, the system will effectively “fall off a cliff” and become unusable.  I’ve personally seen this exact thing happen in a municipal waste treatment facility.  It wasn’t pretty.  The solution was to completely replace the communication mechanism.

QoS level 2 is similar to QoS 1, but more severe.  QoS 2 is designed for transactional systems, where every message matters, and duplication is equivalent to failure.  For example, a system that manages invoices and payments would not want to record a payment twice or emit multiple invoices for a single sale.  In that case, latency matters far less than guaranteed unique delivery.

Since QoS level 2 requires more communication to provide its guarantee, it requires more time to deliver each message.  It will exhibit the same problems under load as QoS level 1, but at a lower data rate.  That is, the maximum sustained data rate for QoS 2 will be lower than for QoS 1.  The “cliff” just happens sooner.

QoS Levels 1 and 2 Don’t Propagate

Both QoS level 1 and level 2 suffer from another big flaw – they don’t propagate.  Consider a trivial system where two clients, A and B, are connected to a single broker.  The goal is to ensure that B receives every message that A transmits, meaning that QoS 1 or 2 should apply between A and B.  Looking at QoS 1, A would send a message and wait for a delivery confirmation.  The broker would need to transmit the message to B before sending the confirmation to A.  That would imply that the broker knows that A needs to wait for B to respond.  Two problems arise: first, A cannot know that B is even connected to the broker.  That is a fundamental property of a one-to-many broker like MQTT.  Second, the broker cannot know that the intention of the system is to provide reliable communication between A and B.  Even if the broker were somehow programmed to wait like that, how would it deal with a third client, C, also listening for that message.  Would it wait for delivery on all clients?  What would it do about clients that are temporarily disconnected?  The answer is that it cannot.  If the intention of the system is to offer QoS 1 or 2 among clients then that QoS promise cannot be kept.

Some brokers have a server-to-server, or daisy-chain, mechanism that allows brokers to transfer messages to each other.  This allows clients on different brokers to intercommunicate.  In this configuration the QoS promise cannot be maintained beyond the connection between the original sender and the first broker in the chain.

Guaranteed Consistency

None of these QoS levels is really right for IIoT.  We need something else, and that is guaranteed consistency.  In a typical industrial system there are analog data points that move continuously, like flows, temperatures and levels.  A client application would like to see as much detail as it can, but most critical is the current value of these points.  If it misses a value that is already superseded by a new measurement, that is not generally a problem.  However, the client cannot accept missing the most recent value for a point.  For example, if I flick a light on and off 3 times, the client does not need to know how many times I did it, but it absolutely must know that the switch ended in the off position.  The communication path needs to guarantee that the final “off” message gets through, even if some intermediate states are lost.  This is the critical insight in IIoT.  The client is mainly interested in the current state of the system, not in every transient state that led up to it.

Guaranteed consistency for QoS is actually slightly more complex than that.  There are really three critical aspects that are too often ignored:

  1. Message queues must be managed for each data point and client. When communication is slow, old messages must be dropped from the queue in favor of new messages to avoid ever-lengthening latencies.  This queuing must occur on a per-point, per-client basis.  Only messages that are superseded for a specific point destined for a specific client can be dropped.  If we drop messages blindly then we risk dropping the last message in a sequence, as in the final switch status above.
  2. Event order must be preserved.  When a new value for a point enters the queue, it goes to the back of the queue even if it supersedes a message near the front of the queue.  If we don’t do this, the client could see the light turn on before the switch is thrown.  Ultimately the client gets a consistent view of the data, but for a short time it may have been inconsistent.
  3. The client must be notified when a value is no longer current.  For the client to trust its data, it must know when data consistency is no longer being maintained.  If a data source is disconnected for any reason, its data will no longer be updated in the client.  The physical world will move on, and the client will not be informed.  Although the data delivery mechanism cannot stop hardware from breaking, it can guarantee that the client knows that something is broken.  The client must be informed, on a per-point basis, whether the point is currently active and current or inaccessible and thus invalid.  In the industrial world this is commonly done using data quality, a per-point indication of the trustworthiness of each data value.

For those instances where it is critical to see every change in a process (that is, where QoS 1 or 2 is required), that critical information should be handled as close as possible to the data source, whether it’s a PLC or an embedded device.  That is, time-critical and event-critical information should be processed at its source, not transmitted via the network to a remote system for processing where that transmission could introduce latency or drop intermediate values. We will discuss this more when we talk about edge processing.

For the IIoT, the beauty of guaranteed consistency for QoS is that it can respond to changes in network conditions without slowing down, backing up or invalidating the client’s view of the system state.  It has a bounded queue size and is thus suitable for resilient embedded systems.  This quality of service can propagate through any number of intermediate brokers and still maintain its guarantee, as well as notify the client when any link in the chain is broken.

So there’s the answer.  For IIoT, you definitely don’t want QoS 0, and probably cannot accept the limitations and failure modes of QoS 1 or 2.  You want something altogether different—guaranteed consistency.

Continue reading, or go back to Table of Contents

Industrial Speed IIoT

What does “real time” really mean in an industrial system?  And what does “real time” mean for the Industrial IoT?  For some people, updating their data within 5 seconds counts as real time.  For them, getting data updates once per second is blazingly fast.  For us, data updates for the IIoT should be as close to network latencies as possible, typically no more than a few milliseconds.

What does that look like?  Check it out.  We’ve created a SkkyHub demo page for industrial speed IIoT.  This simple demo shows how you can aggregate data from multiple data sources, visualize the data, and more importantly witness real-time Industrial IoT.

In the blue box, as you hover your mouse over the gray dot, it moves.  If you or a friend open the same page on a second browser or a phone and swap IDs, you’ll see a black dot for each other’s mouse (or finger, if it’s on a phone).  Select All, and when all other users move their mouse or finger, you’ll see their black dots move on your page and vice versa. You are participating in the IIoT, in real time.

How close to real time? You can see for yourself the latency of the SkkyHub system.  Just enter and submit your own ID.  Now when you move your mouse or finger around, you get a momentary glimpse of a black dot, shadowing each movement.  The black dot is generated by a round-trip data feed from SkkyHub.  The amount of time it takes for it to catch up to the gray dot is the latency of the data travelling round trip from your browser or phone to SkkyHub running in the cloud, and back.

Why is this useful?  The demo shows that the IIoT can be as responsive as most human operators need it to be.  There is no need to wait a few seconds for each action to have an effect.  This is most valuable for supervisory control, where an operator or manager may need to change a setting in an HMI.  The instant feedback of the SkkyHub service gives assurance to the operator that the system has picked up the change, and has responded accordingly.

At a machine-to-machine level, this kind of industrial speed, along with the ability to sustain multiple simultaneous connections, ensures that internal system activities are well coordinated.  A change in one machine or device propagates in real time to any or all connected devices.  This keeps the logic of the system intact, and ensures the smoothest possible performance.

When this kind of performance is coupled with a secure-by-design architecture and the ability to connect seamlessly to virtually any existing industrial system, then we feel confident in calling it Industrial IoT that works.

Cloud Economics 4: Does Location Matter?

If you’ve been following the recent blogs, you’ll know the “L” in Joe Weinman’s C L O U D definition stands for location independence.  One of the five distinctive attributes of cloud computing, location independence means that you can access your data anywhere.  Location doesn’t matter in cloud economics.

Or does it?  Like many things in life, there is a trade-off.  Time is related to distance, even in cloud computing.  The farther you are from your data source, the longer it takes for the data to reach you.  And since timeliness has value, a better location should give better value.  So maybe location does matter after all.  The question is, how much?

Let’s put things into perspective by translating distance into time.  The calculated speed of data flowing through a fiber optic cable is about 125 miles per millisecond (0.001 seconds).  In real-world terms, since Chicago is located about 800 miles from New York City, it would take about 6.4 milliseconds for a “Hello world” message to traverse that distance.

As we discussed last week, for certain automated trading platforms that operate in the realm of microseconds (0.000001 seconds), 6.4 milliseconds is an eon of lost time.  These systems can make or lose millions of dollars at the blink of an eye.  For that reason you’ll find the serious players setting up shop right next door to their data center.  The rest of us, on the other hand, can pretty much remain in our seats, even for real-time cloud applications.

Why?  Well, first of all, the majority of industrial applications are already optimized for location.  Most SCADA systems are implemented directly inside a plant, or as close as physically practical to the processes they monitor.  Engineers who configure wide-area distributed systems are well aware of the location/time trade-offs involved, and take them into account in their designs.  Furthermore, they keep their mission-critical data communication self-contained, not exposed to the corporate LAN, much less to potential latencies introduced by passing data through the cloud.

Of course, a properly configured hybrid cloud or cloud-enhanced SCADA can separate the potential latencies of the cloud system from the stringent requirements of the core system.  What results is a separation between the deterministic response of the control system and the good-enough response time of the cloud system, which we have defined in a previous blog as “remote accessibility to data with local-like immediacy.

Another area where the location question arises is for the Internet of Things.  As we have seen, great value can be derived from connecting devices through the cloud.  These of course can be located just about anywhere, and most of them can send data as quickly as required.  For example, devices like temperature sensors, GPS transmitters, and RFID chips respond to environmental input that is normally several orders of magnitude slower than even a slow Internet connection.  Latencies in the range of even a few hundred milliseconds make little difference to most users of this data.  People don’t react much faster than that, anyway.

As we have already seen, user interactions with a cloud system have a time cushion of about 200 milliseconds (ms), the average human response time.  How much of that gets consumed by the impact of location?  Joe Weinmann tells us that the longest possible round trip message, going 1/2 way around the world and back, such as from New York to Singapore and back to New York, takes about 160 ms.  Not bad.  That seems to leave some breathing room.  But Weinmann goes on to point out that real-world HTTP response times vary between countries, ranging from just under 200 ms to almost 2 seconds.  And even within a single country, such as the USA, average latencies can reach a whole second for some locations.

However, a properly designed real-time cloud system still has a few important cards to play.  Adhering to our core principles for data rates and latency we recall that a good real-time system does not require round-trip polling for data updates.  A single subscribe request will tell the data source to publish the data whenever it changes.  With the data being pushed to the cloud, no round trips are necessary.  This elimination of the “response” cycle cuts the time in half.  Furthermore, a data-centric infrastructure removes the intervening HTML, XML, SQL etc. translations, freeing the raw data to flow in its simplest form across the network.

What does this do to our Singapore-to-New York scenario?  Does it now approach 80 ms?  It’s quite possible.  Such a system would have to be implemented and tested under real-world conditions, but there is good reason to believe that for many locations with modern infrastructures, data latency can be well under the magic 200 ms threshold.  To the extent that this is true, location really does not matter.

Cloud Economics 3: The Value of Timeliness

The other day at our local supermarket the line seemed to be going slower than usual.  When it came my turn to pay, I realized why.  The store had “upgraded” their debit card readers, and the new type of machine was agonizingly slow.  Instead of the usual one second to read my card and tell me to enter my PIN number, the thing took at least three whole seconds.  Then it took an additional couple of seconds to calculate and complete the transaction.

Now you might think I’m making a big deal about nothing, but don’t we all expect instant response these days?  There is an enormous value in timeliness, especially when you are providing a service.  The “single most important factor in determining a shopper’s opinion of the service he or she receives is waiting time,” according to Paco Underhill, CEO of Envirosell, in his book Why We Buy.  He continues, “… a short wait enhances the entire shopping experience and a long one poisons it.”  This insight was quoted and expanded on by Joe Weinman in his book Cloudonomics.

Wienmann points out the direct relationship between timeliness and the bottom line.  For example, he quotes a recent Aberdeen Group study showing that a one-second delay in load time for a web page causes an 11% drop in page views, which cascades into a 7% reduction in conversions (people taking action), and a 16% decrease in customer satisfaction.

Well below the one-second benchmark, new interactive abilities on the web compete to beat the speed of human reaction time.  Since I can type fairly quickly, I’m not a big fan of the Google pop-down suggestion box, but you have to admire the technology.  For the first letter you type, it goes out and finds a list of the most-searched words.  Each new letter modifies the list, completing a round-trip message to the server before you can even type the next letter.  How’s that for quick service?  No wonder I get frustrated at the supermarket.

Computer-to-computer communication operates at still finer magnitudes of scale.  For example, one of the colocation/cloud data center services provided by the New York Stock Exchange guarantees a round trip time for data at under 70 microseconds.  That’s just 0.00007 seconds.  This speed is highly valued by the traders who use the service, and they are willing to pay a premium for it.  It’s basic cloud economics.

Wonderful as all this is, Weinmann points out that there are limits to how quickly data can travel over a network.  Once you are already sending bits close to the speed of light through a fiber optic cable, the only other ways to speed things up are to move closer to your data source, and/or optimize your processing.  Whatever it takes to achieve it, faster reponse time means less wait, more satisfied customers, and more cash in the till.

Real-time cloud computing is all about the value of timeliness.  People who are watching and interacting with real-time processes expect at least the same kind of responsiveness as you get with Google.  When you click a button or adjust a gauge, the value should change immediately, not after 2 or 3 seconds.  All of this is possible when the core requirements for real-time computing are implemented, particularly those for high data rates and low latency.

How to move large quantities of rapidly changing data through the cloud, and allow meaningful user interaction in the 200 ms range of average human response time is a problem for the software engineers and techies to grapple with.  What is clear is that everyone—be it a customer waiting at the checkout counter, a manager viewing plant data, or a highly energized commodities trader—everyone at their own level knows the value of timeliness.