MQTT Tutorial for Arduino and ESP8266

In this article you learn what MQTT is and how this message protocol works.

This tutorial covers the following parts:

Sequence of MQTT
Message Protocol
Message Formats
Security of the MQTT Protocol.

MQTT stand for Message Queuing Telemetry Transport and was invented by Andy Stanford-Clark of IBM and Arlen Nipper of Cirrus Link in 1999. Over the last years the “internet of thinks (IoT) became very popular. The core component of the IoT is the communication and interaction of different physical devices directly or over the internet. Therefore a machine to machine (M2M) communication protocol is needed. That is what MQTT is. MQTT is a light messaging transport protocol based on publish/subscribe messaging and works on top of TCP/IP.

Therefore the protocol is suitable for microcontrollers like the Arduino, NodeMCU or Raspberry Pi. I personal use MQTT for sending data from my weather stations, build with an NodeMCU, to my Raspberry Pi which is the central control unit for my smart home. There are also industry applications based on MQTT. For example the Facebook messenger is based on MQTT.

The payload which can be send via the MQTT protocol is plain text. Therefore the corresponding unity has to be added by the subscriber. The minimal message length is 2 bytes and the maximal message length is 265 megabytes.

MQTT- How it works

As mentioned MQTT base on a publish and subscribe pattern. Therefore a message broker, often called server, is needed to manage the connection between the publisher and the subscriber. There is no limitation that only one broker can interact in the network. Publisher and subscriber are also called clients.

Information are organized in a hierarchy of topics. This means that every information has a

In total there are 3 different parts which interact differently in an MQTT interaction:

Publisher
The publisher sends information to the broker. In a smart home use case a weather station would be a publisher because it sends temperature information. One advantage of the MQTT protocol is that a publisher does not need any information about the subscribers in quantity and connection.
Subscriber
Subscriber get information from the broker. A laptop with a dashboard that shows graphics of the temperature from the weather station would be a subscriber. Like the publisher, the subscriber does not need any information about the publishers connection.
Broker
The message broker can also be publisher or subscriber at the same time. I use a Raspberry Pi as server, which is at the same time subscriber to all publisher to show me a dashboard of the current smart home status.
The broker will always save the last message from every topic even if there is no subscriber for the topic. Therefore if there is a new client subscribing for a topic, the subscriber will get the last message instead of waiting for the next time a publisher sends data to the broker.
There are different form of broker. We distinguish between self hosted broker like Mosquitto or HiveMQ and cloud based broker like IBM or Microsoft (Azure).

Sequence of MQTT

Publisher and subscriber connect to broker
When Publisher get new data to distribute, the publisher sends a message including the topic and the data to the broker.
The broker distributes the incoming data to all clients which are subscribed to the topic.

MQTT Message Protocol


	0	1	2	3	4	5	6	7
Byte 1	Message Type = 8				DUP	QoS Level		-	MQTT fixed header
Byte 2	Remaining Length								MQTT fixed header
Byte 3	Message ID (MSB)								MQTT variable header
Byte 4	Message ID (LSB)								MQTT variable header
Byte 5	Topic Name String Length (MSB)								List of topics
Byte 6	Topic Name String Length (LSB)
Byte 7	Topic Name
...
Byte n
	Reserved ved (not used)						QoS Level

The MQTT protocol works by exchanging a series of MQTT Control Packets. An MQTT Control Packet consists of up to three parts, always in the following order:

Fixed header, present in all MQTT Control Packets
Variable header, present in some MQTT Control Packets
Payload, present in some MQTT Control Packets

Fixed header

The fixed header has a length of 1 byte and is split in two parts. Bit 7 to 4 contain the MQTT Control Packet type and bit 3 to 0 flags specific to each MQTT Control Packet type. The following table shows the 14 different MQTT Control Packet types and their corresponding flags.

C = Client
S = Server
DUP = Duplicate message flag. Indicates to the receiver that this message may have already been received.
- =1: Client or server re-delivers a PUBLISH, PUBREL, SUBSCRIBE or UNSUBSCRIBE message.
QoS = PUBLISH Quality of Service
- = 0: At-most-once delivery (Fire and Forget): There is no guarantee that the messages are delivered. MQTT is depended to the delivery guarantees of the underlying network (TCP/IP)
- =1: At-least-once delivery: Messages are guaranteed to arrive, but there may be duplicates.
- =2: Exactly-once delivery: This is the highest level that also incurs most overhead in terms of control messages and the need for locally storing the messages.
RETAIN = 1: Instructs the server to retain the last received PUBLISH message and deliver it as the first message to new subscriptions.


Name	Value	Direction of flow	Description	Fixed header flags	Bit 0	Bit 1	Bit 2	Bit 3
Reserved	0	Forbidden	Reserved
CONNECT	1	C → S	Client request to connect to Server	Reserved	0	0	0	0
CONNACK	2	S → C	Connect acknowledgment		0	0	0	0
PUBLISH	3	C → S, S → C	Publish message	Reserved	RETAIN	QoS	QoS	DUP
PUBACK	4	C → S, S → C	Publish acknowledgment	Reserved	0	0	0	0
PUBREC	5	C → S, S → C	Publish received (assured delivery part 1)	Reserved	0	1	0	0
PUBREL	6	C → S, S → C	Publish release (assured delivery part 2)	Reserved	0	1	0	0
PUBCOMP	7	C → S, S → C	Publish complete (assured delivery part 3)	Reserved	0	0	0	0
SUBSCRIBE	8	C → S	Client subscribe request	Reserved	0	1	0	0
SUBACK	9	S → C	Subscribe acknowledgment	Reserved	0	0	0	0
UNSUBSCRIBE	10	C → S	Unsubscribe request	Reserved	0	1	0	0
UNSUBACK	11	S → C	Unsubscribe acknowledgment	Reserved	0	0	0	0
PINGREQ	12	C → S	PING request	Reserved	0	0	0	0
PINGRESP	13	S → C	PING response	Reserved	0	0	0	0
DISCONNECT	14	C → S	Client is disconnecting	Reserved	0	0	0	0
Reserved	15	Forbidden	Reserved

The following message sequence chart shows the CONNECT and SUBSCRIBE setup

Remaining Length

The Remaining Length is the number of bytes remaining within the current packet, including data in the variable header and the payload. The Remaining Length does not include the bytes used to encode the Remaining Length. The Remaining Length is encoded using a variable length encoding scheme which uses a single byte for values up to 127. Larger values are handled as follows.

The least significant seven bits of each byte encode the data, and the most significant bit is used to indicate that there are following bytes in the representation. Thus each byte encodes 128 values and a “continuation bit”. The maximum number of bytes in the Remaining Length field is four.

The equation to calculate the remaining length is the following: a*128^0+b*128^1+c*128^2+d*128^3

Example Remaining Lengh = 364

= 2*128^1 -> b=2, CB1=0
+ 108*128^0 -> a=108, CB0=1

Example Remaining Lengh = 25897

= 1*128^2 -> c=1, CB2=0
+ 74*128^1 -> b=74, CB1=1
+ 41*128^0 -> a=41, CB0=1


Bit	CBX	1	2	3	4	5	6	DEC
Byte 0	1	0	1	1	0	1	1	108
Byte 1	0	1	0	0	0	0	0	2


Bit	CBX	0	1	2	3	4	5	6	DEC
Byte 0	1	1	0	0	1	0	1	0	41
Byte 1	1	0	1	0	1	0	0	1	74
Byte 2	0	1	0	0	0	0	0	0	1

In total there are 4 bytes reserved for the Remaining Length. The following table shows the size of Remaining Length field.


Digits	From	To
1	0 (0x00)	127 (0x7F) → 0111\|1111
2	128 (0x80, 0x01)	16 383 (0xFF, 0x7F)
3	16 384 (0x80, 0x80, 0x01)	2 097 151 (0xFF, 0xFF, 0x7F)
4	2 097 152 (0x80, 0x80, 0x80, 0x01)	268 435 455 (0xFF, 0xFF, 0xFF, 0x7F)

Variable Header

Some types of MQTT Control Packets contain a variable header component. It resides between the fixed header and the payload. The content of the variable header varies depending on the Packet type. The Packet Identifier field of variable header is common in several packet types.

Packet Identifier Bytes


Bit	0	1	2	3	4	5	6	7
Byte 1	Packet Identifier MSB (most significant bit)
Byte 2	Packet Identifier LSB (last significant bit)

The following table shows, which packet types uses a Packet Identifier.


Control Packet	Packet Identifier field
CONNECT	No
CONNACK	No
PUBLISH	Yes (if QoS > 0)
PUBACK	Yes
PUBREC	Yes
PUBREL	Yes
PUBCOMP	Yes
SUBSCRIBE	Yes
SUBACK	Yes
UNSUBSCRIBE	Yes
UNSUBACK	Yes
PINGREQ	No
PINGRESP	No
DISCONNECT	No

SUBSCRIBE, UNSUBSCRIBE, and PUBLISH (in cases where QoS > 0) Control Packets MUST contain a non-zero 16-bit Packet Identifier. Each time a Client sends a new packet of one of these types it MUST assign it a currently unused Packet Identifier.

If a Client re-sends a particular Control Packet, then it MUST use the same Packet Identifier in subsequent re-sends of that packet.

The Packet Identifier becomes available for reuse after the Client has processed the corresponding acknowledgment packet. The following table shows the corresponding acknowledgment packet for the packet types.


Packet type	Acknowledgment Packet
PUBLISH (QoS = 1)	PUBACK
PUBLISH (QoS = 2)	PUBCOMP
SUBSCRIBE	SUBACK
UNSUBSCRIBE	UNSUBACK

A PUBLISH Packet MUST NOT contain a Packet Identifier if its QoS value is set to 0.
A PUBACK, PUBREC or PUBREL Packet MUST contain the same Packet Identifier as the PUBLISH Packet that was originally sent. Similarly SUBACK and UNSUBACK MUST contain the Packet Identifier that was used in the corresponding SUBSCRIBE and UNSUBSCRIBE Packet.

Payload

Some MQTT Control Packets contain a payload as the final part of the packet. In the case of the PUBLISH packet this is the Application Message. The following table show the Control Packets that contain a Payload.


Control Packet	Payload
CONNECT	Required
CONNACK	None
PUBLISH	Optional
PUBACK	None
PUBREC	None
PUBREL	None
PUBCOMP	None
SUBSCRIBE	Required
SUBACK	Required
UNSUBSCRIBE	Required
UNSUBACK	None
PINGREQ	None
PINGRESP	None
DISCONNECT	None

MQTT Message Formats

CONNECT

The CONNECT message contains many session-related information as optional header fields.


	0	1	2	3	4	5	6	7
Byte 1	Message Type = 1				-	-		-	MQTT fixed header
Byte 2	Remaining Length								MQTT fixed header
Byte 3	Protocol name UTF-8 encoded (e.g. Light_Protocol) prefixed with 2 bytes string length (MSB first)								MQTT variable header
...
Byte n	Protocol version (value 0x03 for MQTT version 3)
Byte n+1	Username Flag	Password Flag	Will Retain	Will QoS		Will Flag	Clean Session	Reserved
Byte n+2	Keep Alive Timer MSB
Byte n+3	Keep Alive Timer LSB
Byte n+4	Client Identifier								Optional payload
	Will Topic
	Will Message
	Username
Byte m	Password

Protocol Name: UTF-8 encoded protocol name string. Example “Light_Protocol”.
Protocol Version: Value 3 for MQTT V3.
Username Flag: If set to 1 indicates that payload contains an username.
Password Flag: If set to 1 indicates that payload contains a password. If username flag is set, password flag and password must be set as well.
Will Retain: If set to 1 indicates to server that it should retain a Will message for the client which is published in case the client disconnects unexpectedly.
Will QoS: Specifies the QoS level for a Will message.
Will Flag: Indicates that the message contains a Will message in the payload along with Will retain and Will QoS flags.
Clean Session: If set to 1 the server discards any previous information about the (re)-connecting client (clean new session). If set to 0 the server keeps the subscriptions of a disconnecting client including storing QoS level 1 and 2 messages for this client. When the client reconnects, the server publishes the stored messages to the client.
Keep Alive Timer: Used by the server to detect broken connections to the client.
Client Identifier: The client identifier (between 1 and 23 characters) uniquely identifies the client to the server. The client identifier must be unique across all clients connecting to a server.
Will Topic: Will topic to which a will message is published if the will flag is set.
Will Message: Will message to be published if will flag is set.
Username and Password: Username and password if the corresponding flags are set.

CONNACK

Reserved: Reserved field for future use.
Connect Return Code:
- 0: Connection Accepted
- 1: Connection Refused, reason = unacceptable protocol version
- 2: Connection Refused, reason = identifier rejected
- 3: Connection Refused, reason = server unavailable
- 4: Connection Refused, reason = bad user name or password
- 5: Connection Refused, reason = not authorized
- 6-255: Reserved for future use

PUBLISH


	0	1	2	3	4	5	6	7
Byte 1	Message Type = 3				DUP	QoS Level		RETAIN	MQTT fixed header
Byte 2	Remaining Length								MQTT fixed header
Byte 3	Topic Name String Length (MSB)								MQTT variable header
Byte 4	Topic Name String Length (LSB)
Byte 5	Topic Name
...
Byte n
Byte n+1	Message ID (MSB)
Byte n+2	Message ID (LSB)
Byte n+3	Publish Message								Payload
Byte m	Publish Message								Payload

Topic Name with Topic Name String Length: Name of topic to which the message is published. The first 2 bytes of the topic name field indicate the topic name string length.
Message ID: A message ID represent if QoS is 1 (At least once delivery, acknowledgment delivery) or 2 (Exactly-once delivery).
Publish Message: Message as an array ob bytes. The structure of the published message is application-specific.


	0	1	2	3	4	5	6	7
Byte 1	Message Type = 8				DUP	QoS Level		-	MQTT fixed header
Byte 2	Remaining Length								MQTT fixed header
Byte 3	Message ID (MSB)								MQTT variable header
Byte 4	Message ID (LSB)								MQTT variable header
Byte 5	Topic Name String Length (MSB)								List of topics
Byte 6	Topic Name String Length (LSB)
Byte 7	Topic Name
...	Topic Name
Byte n	Reserved (not in use)						QoS Level

Message ID: The message ID field is used for acknowledgment of the SUBSCRIBE message since these have a QoS level of 1.
Topic Name with Topic Name String Length: Name of topic to which the client subscribes. The first 2 bytes of the topic name field indicate the topic name string length. Topic name strings can contain wildcard characters. Multiple topic names along with their requested QoS level may appear in a SUBSCRIBE message.
QoS Level: QoS level at which the clients wants to receive messages from the given topics.

If you are interested in the full message format description, you find the whole specification here.

MQTT security (example of Mosquitto as broker)

Like in every other connection between different devices the level security you need is depended on your use case. From my side I would recommend a basic level of security because the effort you have to do is nearly zero. Our objective is to protect the data, which is transferred between publisher, broker and subscriber.

The MQTT protocol defines that the security mechanisms are initiated by broker and applied by the clients. In total there are 3 ways to verify the identity of a client by the broker.

Identify a client via the client ID

The qualification of the identification via client ID is, that every MQTT client has to provide a client id. When a client subscribes to a topic or different topics, the client ID is linked to the topic and to the TCP connection. Due to persistent connection, the broker remembers the client ID and the corresponding subscribed topic.

Username and password

The security by username and password is the most used one in a MQTT connection because it is easy to implement. The broker requests a valid username and password from client before a connection is permitted. The client transmit the username and password as plain text. If the username and the password are valid, the connection between client and server is established. However if the username and password are invalid, the connection is chocked off by the server.

The downside is that the transmission of the username and password is not secured without an additional transport encryption like SSL for example. An additional benefit for this security mechanisms is, that the username can also be used as authentication of accessing topics.

There is also the possibility of accessing the server as anonymous client. Therefore there are multiple options if an access is restricted by the broker depending on the anonymous access option and the username and password file. The following tables shows all combination and if the access is restricted or not.


Anonymous access	Password file specified	Access restricted
True	No	No
True	Yes	If the client sends an username/password then it must be valid other wise an authentication error is returned. If it doesn’t send one then none is required and a normal connection results.
False	No	The client must send an username and password, but it is not checked. If the client doesn’t send an username/password then and authentication error code is generated.
False	Yes	Yes

Certificates

There is also the possibility of security by certificates. This is the most secured method of client authentication but also the most difficult because of certificate management.

There are two main cryptographic protocols which you can use to secure your MQTT connection:

Transport Layer Security (TLS)
Secure Sockets Layer (SSL)

Both provide a secure communication channel between a client and a server. Therefore a handshake mechanism is used to negotiate various parameters to create a secure connection. After the handshake is complete, an encrypted communication between client and server is established and no attacker can eavesdrop any part of the communication. There is a drawback to using MQTT over TLS: Security comes at a cost in terms of CPU usage and communication overhead.

alex9ufo 聰明人求知心切

2020年7月3日星期五