Post Syndicated from The HiveMQ Team original https://www.hivemq.com/blog/12-questions-you-should-ask-your-broker-vendor-about-mqtt-clustering/
Not all MQTT broker clusters are created equal. While clusters are typically used to solve the single point of failure challenge in pub/sub systems, there are many pitfalls and edge cased that need to be addressed. These cases aren’t always obvious for companies that are evaluating MQTT solutions. We gathered the top 12 questions about MQTT clustering we saw from many of our customers evaluation processes. This allows you to check with your MQTT broker vendor if the guarantees that your project deserves are met by the MQTT broker of your choice.
Questions you should ask yourself
First, it’s very important to ask yourself the following questions to find out what’s important for your MQTT project.
- Do you want the MQTT broker cluster to behave like a single MQTT broker from the clients perspective, regardless to which cluster node a client is connected?
- Do you require the MQTT broker to preserve Quality of Service 1 and 2 guarantees in the cluster?
- Should your cluster stay online and keep working even if networking problems (e.g. Network Splits) arise?
- Should the cluster scale elastically by having the possibility to add and remove cluster nodes at runtime?
- Do you plan to deploy the MQTT broker cluster in a cloud environment?
- Should MQTT client sessions still be available, even if one or more cluster nodes crash?
If you answer at least one of the questions with “yes”, a resilient, reliable and scalable MQTT broker cluster is be what you’re looking for. This also means, that a replicated topic tree is probably not enough and you need a more holistic clustering approach that also covers QoS messages, client sessions, queued messages and retained messages. Read on to decide which questions may be important to ask to get the cluster characteristics you desire.
Questions you should ask your MQTT broker vendor
Marketing material doesn’t always answer all the questions, especially when it comes to edge cases and error cases. These 12 questions are the most popular questions we got asked from our customers in their evaluation process. Profit by also adding these questions to your broker evaluations:
The following paragraphs discuss the reasoning behind the questions and give background information that might be useful for understanding the implications of cluster characteristics to your MQTT deployment.
Is the cluster distributed or fully replicated?
Full replication is most often used in high availability scenarios that consist of two nodes and only one broker node serves all requests, the second node is the failover node. For use cases that need massive scalability, full replication simply doesn’t scale. The main reason for that is, that RAM, disk space and bandwidth (between cluster nodes) is finite. If full replication would be used in a consistent way, there are n-1 replicas, which means a 10 node cluster must replicate each data to 9 other nodes, a 20 node cluster to 19 other nodes and so on.
This situation is not as severe if only the topic tree is replicated (although this has the same problems discussed above), since message routing itself doesn’t necessarily need full replication. If the broker cluster would only replicate the topic tree, this has severe impact on other aspects of MQTT. We will discuss that in a minute.
Split Brain: In case of a network split, does the broker continue to operate properly?
No network is reliable; a reliable network as prerequisite for MQTT brokers is a warn sign that message and data loss is inevitable in the long term. When it comes to distributed systems, you can’t sacrifice partition tolerance. So it’s a key question for MQTT brokers how they behave in case of network splits.
It’s important that the cluster doesn’t lose its availability by refusing to accept new MQTT clients or even disconnecting MQTT clients, independent of the fact if the clients reside in a minority partition or not. So it’s also extremely important that the cluster can handle split brain scenarios. An example would be if 500.000 MQTT clients are connected to each MQTT broker node. If a split brain occurs and a single MQTT node would refuse to operate properly (e.g. due to a minority partition), half a million MQTT clients are forced to disconnect, which could result in a reconnection storm on all other cluster nodes.
What guarantees does the broker have when a network split occurs?
As discussed above, network splits are inevitable in cloud environments (and in on-premise networks), so the question is how the MQTT brokers deal with a network split or other cluster communication failures which could occur due to high latencies. There are essentially three ways to deal with these partitions:
- Don’t handle the network split and the whole system potentially loses availability and consistency.
- Trade consistency with eventual consistency and provide reconciliation mechanics.
- Trade availability and make at least the minority partition(s) unavailable, which means they are out of operation until the network split is resolved.
The CAP Theorem defines that we unfortunately can’t have all these 3 characteristics in a distributed system: Consistency, Availability, and Partition Tolerance. So the decision is really about consistency and availability and the questions you should ask yourself are:
- Is there something in your use case that doesn’t allow Eventual Consistency? If yes, would you rather want to have parts of your broker deployment unavailable but have the guarantee of strong consistency?
- Is uptime and availability important? Should the cluster be resilient enough to handle network partitions and still serve MQTT clients?
If your MQTT broker cluster does not replicate/distribute data other than the topic subscriptions, you are unlikely to hit edge cases where the CAP tradeoffs are important to consider. In such a case, you will face many other challenges. More on that in the next paragraphs (FIXME: LINK).
Are clients able to resume a persistent session on other cluster nodes?
MQTT broker clusters typically reside behind a hardware or software load balancer. The MQTT clients are typically not aware if they are communicating with a MQTT broker cluster or a single MQTT broker. If the MQTT clients use persistent sessions, the clients expect to continue the session on any broker node in case they got disconnected for any reason. This also means that the broker cluster needs to queue messages in case the client was offline.
If the MQTT broker does not support continuing a MQTT session on a different node, messages will be lost in the following cases:
- The broker node that contains the information about the client session gets unavailable
- The client connects do a different broker node and creates a new session
Allowing a MQTT client to continue it’s session on a different cluster node is one of the key features every MQTT broker should provide if you need a reliable messaging solution based on MQTT.
Does the broker give message order guarantees in the cluster?
The MQTT 3.1.1 specification defines Ordered Topics, which means the message ordering must be guaranteed for a single MQTT topic. Message ordering is a fundamental problem in distributed systems, though. So most MQTT broker clusters use one of the two following approaches for guaranteeing message ordering across cluster nodes:
- Wall Clock Timestamps
- Logical Timestamps
Wall Clock Timestamps require a time synchronization between all cluster nodes. This is often achieved by using NTP, which doesn’t solve the problem that POSIX timestamps are not monotonic, though. So if your broker uses wall clock time stamps for message ordering, you might get ordering issues, especially for queued messages.
Can you remove and add cluster nodes at runtime?
Many MQTT project start with a small amount of MQTT devices and scale up over time. So, depending on your project, you may want to add and remove cluster nodes at runtime to meet traffic spikes for MQTT connections or message throughput. In high availability scenarios broker hardware can be upgraded at runtime by removing, upgrading the server, and adding the node back to the broker cluster. Even use cases where the number of connections and message throughput is rather static, it might still be a useful feature if you are able to replace faulty hardware at any point of time by removing and adding broker nodes without sacrificing cluster uptime.
What consistency guarantees does the broker give when it acknowledges messages?
MQTT uses acknowledgement messages for responding to CONNECT, SUBSCRIBE and UNSUBSCRIBE messages. When a broker acknowledges these messages, it indicates that the request and the state modification on the broker was successful (or unsuccessful). The same principle is also desired for MQTT broker clusters.
Imagine a case where a MQTT broker node would acknowledge a SUBSCRIBE message to the client but did not notify all other cluster nodes of the new subscription before it sends out the acknowledgement to the client. This would mean that the MQTT client that just subscribed misses messages from other nodes for an arbitrary time (until all cluster nodes are aware of the new subscription). If you depend on messages delivered to your client, make sure the consistency guarantees in the cluster are the guarantees your use case desires.
Is client session data preserved if a node is unavailable?
MQTT sessions typically consist of the following information:
- Session meta information
- Queued Messages
- QoS 1 and 2 message flow state
If session information is not distributed or replicated across the cluster, you are going to lose state if a broker node crashes or is unavailable for any other reason. This means that in case of persistent sessions, you will in best case lose session information and subscriptions and in worst case you are going to lose queued messages or messages in the message flow – even if you declared them with QoS 1 or 2!
If your applications can’t tolerate to lose messages or persistent sessions, make sure your broker supports replication of these messages in the cluster.
Does the broker cluster survive without human interaction if one or more cluster node crash?
In worst case scenarios where one or more broker nodes crash, a cluster needs to transition to a stable state as quickly as possible in order to avoid cascading failures. If human interaction (e.g. reconfiguration) is required to regain a stable cluster state, this may result in long outages and complex failures. Sophisticated broker implementations don’t need any human interaction and have self-healing mechanisms implemented.
Are elastic node discovery methods supported?
Depending on the deployment environment, different broker discovery methods need to be supported. To form a broker cluster, all MQTT broker nodes need to have a mechanism to find themselves, a so called node discovery. The most popular cluster discovery methods that most brokers implement are
- Static Discovery with pre-defined cluster nodes
- UDP Multicast
For use cases that don’t require elastic scaling, static discovery might be well suited. For elastic deployments that need to scale over time, UDP multicast is a good fit. Unfortunately most IAAS providers like AWS don’t support multicast in their network. If you plan to deploy your MQTT broker to cloud environments, you might need to rely on different discovery methods. So check if the discovery methods your broker supports are the discovery methods your project needs.
Are rolling upgrades supported?
Downtimes must be avoided at all costs for most MQTT deployments. One of the advantages of sophisticated broker clusters is, that you can add and remove nodes at runtime. But just because you can add and remove nodes at runtime, this does not necessarily mean that you can also update the cluster while old versions are running inside the cluster. Even if you don’t plan to upgrade the MQTT broker versions often, make sure there is an upgrade path that doesn’t require bringing down the whole cluster, otherwise you can lose state and MQTT messages and all your MQTT clients will disconnect.
Are client takeovers supported?
MQTT client takeovers are an important MQTT feature for improving the resilience of your MQTT applications. In a nutshell, a client takeover means, that if a second client connects with the same client identifier, the MQTT broker will disconnect the already connected client. This is very important to mitigate the half open sockets problem for MQTT clients. But what happens if the first client is connected on one node and the second client connects to another node? In case the broker cluster would allow to join a second client with the same client identifier, inconsistent session state (that could result in message reordering and even message loss) would become a problem.
If you plan to utilize MQTT clients that are connected via mobile networks or any other kind of unreliable network, you may want your broker to support client takeovers.
Unexpected situations and maintenance happen in production. Some MQTT brokers have the ability to form broker clusters to solve the single point of failure problem for pub/sub architectures. Unfortunately not all cluster implementations are created equal and can result in negative surprises in error and edge cases for the customer. The questions we discussed in this blog post should be the minimum questions you should check in your MQTT evaluation to decide if a MQTT broker is the right choice for you. As always, there is no ultimate answer for every single deployment and so you need to decide on your own what is important for your MQTT scenario.
If you are interested in how HiveMQ solves the challenges we discussed in the blog post, get in touch!
What additional questions do you think we should add to this list? Let us know in the comments!