Simplest way to send short messages using TCP/IP - embedded

I typically use hardwired serial port connections between embedded devices for custom command/response/status protocols.
In this application I plan to use the microchip TCP/IP stack and Wi-Fi module with no OS to exchange short (<= 100 byte) commands and responses.
The stack is up and running on a microchip ethernet development kit and I am able to ping it from my desktop (not using the Wi-Fi module just yet).
I suppose I could hack into ping (microchip provides the c source for the stack) and add the messages I need to it, but I am looking for the correct/simplest/best method.

Correct / simplest / best aren't necessarily the same thing. But if I were you, I would consider using UDP instead of TCP.
UDP is a datagram protocol; TCP is stream-oriented and has a lot more overhead (and benefits that come with the overhead). But UDP more closely matches the current serial port byte-oriented (packet-oriented) approach you have today.
Chances are you have some higher-level protocol that receives/buffers/checksums/delimits/parses the data stream you receive from the UART. If you use UDP, you can mimic this nicely with a lean, lightweight UDP implementation. With UDP you just shoot out the bytes (packets) & cross your fingers that they got to the other end (a lot like serial).
TCP is heavier-weight connection-based protocol with built-in resending, acknowledgments, in-order delivery, timers, back-off algorithms, etc. On most embedded systems I've worked with using TCP (several different stacks), UDP is lighter-weight & outperforms TCP in terms of code size & throughput.
Also don't forget that TCP is used as the backbone of the internet; some packets pass through a dozen or more hops (routers/gateways) en route to the final destination. Lots of places for packets to get dropped, so TCP handles a lot of messy details transparently. I'd guess in your system/situation, we're talking about a LAN (everyone on the same wire) and transmission will be pretty reliable... thus the TCP overhead isn't really necessary.
There are times when the benefits of TCP justify the overhead, but from what you've written I think you should consider a basic UDP datagram set up. Just google "simple udp example" and you'll see the basic structure. For example, here is a simple UDP client/server example using just 43 lines (server) and 30 lines (client).

When you have a TCP/IP stack, it should provide a send() function to send some data messages.
Some small devices come only with a simple UDP/IP implementation, since this is a lot simpler. If you don't need the sequence control and reliability of TCP you could consider to use UDP to send short messages. This is much better to hack the ICMP messages.
But if you need the comfort and reliabilty of the TCP stream protocol, dont re-invent this on base of IUDP. Usually you can't do it better, more efficient and with less effort.

Related

Reliable transport protocol on top of UDP

UDP has one good feature - it is connectionless. But it has many bad features - packets can be lost, arrive multiple times, there is no packet sequence - packet 2 can arrive faster than 1. How to keep good and remove bad?. Is there any good implementations that provide reliable transport protocol on top of udp so that we are still conectionless but without mentioned problems. One example of what can be done with it is mosh.
What you describe as bad isn't really bad depending on the context.
For example UDP is used a lot in realtime streaming, delivery confirmation and resending is useless in this context.
That being said there are e few implementations that you might want to look at:
ENet (http://enet.bespin.org/)
RUDP (https://en.wikipedia.org/wiki/Reliable_User_Datagram_Protocol)
UDT (https://en.wikipedia.org/wiki/UDP-based_Data_Transfer_Protocol)
I work in embedded context:
CoAP (https://en.wikipedia.org/wiki/Constrained_Application_Protocol) also implements a lot of these features, so its worth a look.
What is your reason for not choosing TCP?

Why UDP does not employ the path MTU?

According to my knowledge, UDP does not use the path MTU to avoid fragmentation which however TCP does. I am trying to come up a reason to this particular design.
TCP needs to avoid fragmentation because it has to retransmit the whole datagram even if just one fragment is dropped. On the contrary, UDP does not need to retransmit datagram, it's up to the application layer to ensure the integrity.
In conclusion, fragment will slow the TCP transport protocol but not UDP transport protocol.
Here comes the problem, for communication that need integrity, whether employ TCP which natrually guarantee the integrity, or develop an application layer retransmit protocol upon UDP, it will need to resend the whole datagram if not ACKed. Then, fragmentation will slow down application layer retransmit protocol upon UDP just the same as TCP.
What's wrong with my reasoning?
UDP is a datagram protocol where each packet means a single entity, independent of the other packets (UDP does not detected duplication, reordering etc). TCP instead is a stream protocol, that is the whole transfer consists of a single unstructured octet stream similar to a large file. To make the transfer of this stream more efficient it makes sense to detect the MTU of the connection and try to send mostly packets which max out this MTU, thus reducing the overhead of the transfer. To further reduce the overhead TCP will merge multiple consecutive writes into as few packets (with max MTU) as possible.
UDP instead can not avoid fragmentation by itself because it transmits the datagram as it is, that is datagram boundary is determined by packet boundary. Any kinds of optimization to reduce overhead have to be done by the application itself.
Thus TCP is best suited for applications where its features like guaranteed and ordered delivery and efficient use of bandwidth are needed. Unfortunately these features come with drawbacks like comparable slow connection setup, higher latency (in case of packet loss) etc. But there are applications which don't need all the good parts but have to avoid the bad parts. For example real time audio and video can deal with packet loss but needs a low latency, i.e. it does not matter if all data arrive but they have to arrive fast. In these cases the more simple UDP protocol is better suited.
Because there's nothing useful it can do with the MTU. It's a datagram protocol.
Because there is no path. It's an connectionless protocol.

Chip to chip communication protocol over SPI

I'm trying to design an efficient communication protocol between a micro-controller on one side and an ARM processor on a multi-core TI chip on the other side through SPI.
The requirements for the needed protocol:
1 - Multi-session with queuing support, as I have multiple sending/receiving threads, so it will be more than one application using this communication protocol and I need the protocol to handle queuing these requests (I will keep holding the buffer if the transmission is queue but I just need the protocol to manage scheduling the queues).
2 - Works over SPI as an underlying protocol.
3 - Simple error checking.
In this thread: "Simple serial point-to-point communication protocol", PPP was a recommended option, however I see PPP does only part of the job.
I also found Light weight IP (LwIP) project featuring PPP over serial (which I assume that I can use it over SPI), so I thought about the possibility of utilizing any of the upper layers protocols like TCP/UDP to do the rest of the required jobs. Fortunately, I found TI including LwIP as part of their ethernet SW in the starterware package, which I assume to ease porting at least on the TI chip side.
So, my questions are:
1 - Is it valid to use LwIP for this communication scheme? Won't this introduce much overhead due to IP headers which are not necessary for a point to point (on the chip level) communication and kill the throughput?
2 - Will the TCP or any similar protocol residing in LwIP handle the queuing of transmission requests, for example if I request transmission through a socket while the communication channel is busy transmitting/receiving request for another socket (session) of another thread, will this be managed by the protocol stack? If so, which protocol layer manages it?
3 - Is their a more efficient protocol stack than LwIP, that meets the above requirements?
Update 1: More points to consider
1 - SPI is the only available option, I use it with available GPIOs to indicate to the master when the slave has data to send.
2 - The current implemented (non-standard) protocol uses DMA with SPI, and a message format of《STX_MsgID_length_payload_ETX》with a fixed message fragments length, however the main drawback of the current scheme is that the master waits for a response on the message (not fragment) before sending another one, which kills the throughput and does not utilise the full duplex nature of SPI.
3- An improvement to this point was to use a kind of mailbox for receiving fragments, so a long message can be interrupted by a higher priority one so that fragments of a single message can arrive non sequentially, but the problem is that this design lead to complicating things especially that I don't have much available resources for many buffers to use the mailbox approach on the controller (master) side. So I thought that it's like I'm re-inventing the wheel by designing a protocol stack for a simple point to point link which may not be efficient.
4- What kind of higher level protocols can be normally used above SPI to establish multiple sessions and solve the queuing/scheduling of messages?
Update 2: Another useful thread "A good serial communications protocol/stack for embedded devices?"
Update 3: I had a look at Modbus protocol, it seems to specify the application layer then directly the data link layer for serial line communication, which sounds to skip the unnecessary overhead of network oriented protocols layers.
Do you think this will be a better option than LwIP for the intended purpose? Also, is there a widely used open source implementation like LwIP but for Modbus?
I think that perhaps you are expecting too much of the humble SPI.
An SPI link is little more a pair of shift registers one in each node. The master selects a single node to connect to its SPI shift register. As it shifts in its data, the slave simultaneously shifts data out. Data is not exchanged unless the master explicitly clocks the data out. Efficient protocols on SPI involve the slave having something useful to output while the master inputs. This may be difficult to arrange, so you usually need a means of indicating null data.
PPP is useful when establishing a connection between two arbitrary endpoints, when the endpoints are fixed and known a priori, PPP would serve no purpose other than to complicate things unnecessarily.
SPI is not a very sophisticated nor flexible interface and probably unsuited to heavyweight general purpose protocols such as TCP/IP. Since "addressing" on SPI is performed by physical chip-select, the addressing inherent in such protocols is meaningless.
Flow control is also a problem with SPI. The master has no way of determining that the slave has copied the data from SPI the shift register before pushing more data. If your slave SPI supports DMA you would be wise to use it.
Either way I suggest that you develop something specific to your purpose. Since SPI is not a network as such, you only need a means to address threads on the selected node. This could be as simple as STX<thread ID><length><payload>ETX.
Added 27 September 2013 in response to comments
Generally SPI as its names suggests is used to connect to peripheral devices, and in that context the protocol is defined by the peripheral. EEPROMS for example typically use a common or at least compatible command interface across vendors, and SD/MMC card SPI interface uses a standardised command test and protocol.
Between two microcontrollers, I would imagine that most implementations are proprietary and application specific. Open protocols are designed for generic interoperability and to achieve that might impose significant unnecessary overhead for a closed system, unless perhaps the nodes were running a system that already had a network stack built in.
I would suggest that if you do want to use a generic network stack that you should abstract the SPI with device drivers at each end that give the SPI a standard I/O stream interface (open(), close(), read(), write() etc.), then you can use the higher-level PPP and TCP/IP protocols (although PPP can probably be avoided since the connection is permanent). However that would only be attractive if both nodes already supported these protocols (running Linux for example), otherwise it will be significant effort and code for little benefit, and would certainly not be "efficient".
I assume you dont really want or have room for a full ip (lwip) stack on the microcontroller? This just sounds like a lot of overkill. Why not just roll your own simple packet structure to move the data items you need to move. Depending on how spi is supported on both sides you may or may not be able to use it to define the frame for your data, if not a simple start pattern, length and a trailing checksum and maybe tail pattern would suffice for finding packet boundaries in the stream (no different than a serial/uart solution). You can even use the PPP solution for that with a start pattern and I think end pattern with the payload using a two byte pattern whenever the start pattern happens to show up in the data. I dont remember all the details now.
Whatever your frame is then add a packet type and your handshakes, or if the data is going to just be microcontroller to arm then you dont even need to do that.
To get back to your direct question. Yes, I think that an ip stack (lwip or other) will introduce a lot of overhead. both bandwidth and more important the amount of code needed to support that stack will chew up rom/ram on both sides. If you ultimately need to present this data in an ip fashion (a website hosted by the embedded system) then somewhere in the path you need an ip stack, etc.
I cant imagine that lwip manages your queues for you. I assume you would need to do that yourself. the various queues might want to talk to a single driver that deals with the single spi bus (assuming there is a single spi bus with multiple chip selects). It also depends on how you are using the spi interface, if you are allowing the arm to talk to multiple microcontrollers and the packets of data are broken up into a little bit from this controller a little from that controller so that nobody has to wait to long before they get a few more bytes of data. Or will a complete frame have to move from one microcontroller before moving onto the next gpio interrupt to pull that guys data? The long and short of it is I would assume you have to manage the shared resource just like you would in any other situation where you have multiple users of a shared resource (rtos, full blown operating system, etc). I dont remember lwip that well at all but with a full blown berkeley sockets application interface the user could write separate applications where each application only cared about one TCP or UDP port and the libraries and drivers managed separating those packets out to each application as well as all of the rules for the IP stack.
If you are not already doing experiments with moving data over the spi interface(s) I would start with simple experiments first just to get the feel for how well it is or isnt going to work, the sizes of transfers you can do reliably per spi transction, etc. Your solution may naturally just fall out of those experiments.

Moving from Qt to Boost - Networking

I's using Qt to build Udp and Tcp servers/clients. Now in recent project, Qt is not allowed and i'm supposed to use Boost. Having gone through boost::asio, i feel confused since there are so many features and usages are very different from Qt. For Udp (multicast,broadcast etc) i'm more or less clear now. But for Tcp i'm having problems understanding:
(have to do all async)
TCP is stream oriented so there is no way to know the number of bytes that have arrived. So we use asio::ip::tcp::socket::async_read_some(). However in Qt we have readAll() for a socket which can be made to read all data on readyRead() signal and return a QByteArray, the size of which we needn't specify as it can grow. How do i achieve this in Boost? readyRead() signal i think would map roughly to async_read_some(), what about the rest (readAll() and "growable" buffer/array?
disconnected() signal is emitted in Qt if the TCP connection disconnects (server/client off, lan cable unplugged etc). What would be the equivalent check in Boost? Do i need a timer to periodically send and expect heartbeat messages myself?
Use some overload of async_read free function with the appropriate completion-condition.
As you see there, this function accepts asio::streambuf, which is growing automatically.
Yes, the only portable & reliable way is to send some heartbeat. I don't know how Qt does that, but try cutting the network cable and see whether you get any notification from Qt...

How does Jgroup on UDP become reliable?

In my mind, UDP is fast but not reliable, and in a lot of places jgroup is based on UDP, is that reliable? I saw a lot of places use jgroup to transfer caching information. Does jgroup have to use TCP to make transition reliable?
JGroups provides components to implement reliability above UDP, see here for some details,
http://www.jgroups.org/manual/html_single/index.html#d0e5392