MacOS: strange delay between UDP/TCP packets - objective-c

I am developing an application that sends data per UDP using AsyncUDPSocket class to another client on Mac and Windows. It is very important that packets arrive instantly.
The problem is that every approx. 1000 packets I get a delay for about 2 seconds when receiving Packets. A delay of 100-200 ms would be OK, but 2 seconds produce bad user experience.
I have the UDP communication in a separate Thread, so it is little affected by user interaction with UI and such. I have already tried sending Packets faster, slower, different Packet sizes: the delay stays there. Tried using TCP instead of UDP - same result :(
It does not seem to happen on Windows Cliets.
Maybe there is some system buffer in MacOS that needs to be flushed every time it hast N packets or N bytes of data???
Has anyone an idea how can I prevent the delay from happening?

There are a lot of things that can slow down a network program temporarily, it's hard to know where to start. Have you tried this on multiple networks? Both wireless and ethernet networks? What kind of switch do you have? Does this happen on different OS X computers, or just on one? Can you reproduce the delay with a simpler command line program? Are you using garbage collection? (Grasping at straws here...)
Just out of curiosity, I tested the roundtrip time on UDP echo packets sent from my Mac to another computer on the same LAN. Out of over 60,000 UDP packets with a 1,000 byte payload, none of them took longer than 32 ms, the mean round trip was 0.6 ms, and the sample deviation was 0.21.
(I'm also curious what you need such low latency for.)

Related

How do I detect the ideal UDP payload size?

I heard a UDP payload of 508 bytes will be safe from fragments. I heard the real MTU is 1500 but people should use a payload of 1400 because headers will eat the rest of the bytes, I heard many packets will be fragmented so using around 64K is fine. But I want to forget about all of these and programmatically detect what's gets me good latency and throughput from my local machine to my server.
I was thinking about implementing something like a sliding window that TCP has. I'll send a few UDP packets then more and more until packets are lost. I'm not exactly sure how to tell if a packet was delayed VS lost and I'm not sure how to slide by down without going to far back. Is there an algorithm typically used for this? If I know the average hop between my machine and server or the average ping is there a way to estimate the maximum delay time of a packet?

Losing data with UDP over WiFi when multicasting

I'm currently working a network protocol which includes a client-to-client system with auto-discovering of clients on the current local network.
Right now, I'm periodically broadsting over 255.255.255.255 and if a client doesn't emit for 30 seconds I consider it dead (then offline). The goal is to keep an up-to-date list of clients runing. It's working well using UDP, but UDP does not ensure that the packets have been sucessfully delivered. So when it comes to the WiFi parts of the network, I sometimes have "false postivives" of dead clients. Currently I've reduced the time between 2 broadcasts to solve the issue (still not working well), but I don't find this clean.
Is there anything I can do to keep a list of "online" clients without this risk of "false positives" ?
To minimize the false positives, due to dropped packets you should alter a little bit the logic of your heartbeat protocol.
Rather than relying on a single packet broadcast per N seconds, you can send a burst 3 or more packets immediately one after the other every N seconds. This is an approach that ping and traceroute tools follow. With this method you decrease significantly the probability of a lost announcement from a peer.
Furthermore, you can specify a certain number of lost announcements that your application can afford. Also, in order to minimize the possibility of packet loss using wireless network, try to minimize as much as possible the size of the broadcast UDP packet.
You can turn this over, so you will broadcast "ServerIsUp" message
and every client than can register on server. When client is going offline it will unregister, otherwise you can consider it alive.

Having difficulty sending small lwip packets immediately using the lwip API

I am creating a server on a ST Cortex M3 device. I am using the lwip API and FreeRTOS. All is working, but the response time is way off. I am currently using lwip 1.3.2 and FreeRTOS 7.3.
A single client connects to the server and must have some time-critical data sent frequently. These packets are on the order of 6 or so bytes. Other times, I am sending upwards of 20K.
The problem I am having is that these smaller packets seem to be taking forever to be sent. I assume this is because lwip is waiting for more data to be enqueued to make more efficient transmissions. I cannot wait around for 2 or 3 seconds for the data to be sent; the client is expecting the data nominally in a few micro-seconds or milli-seconds.
I have tried using lwip_send and lwip_write. (I understand that one is the same as the other with a flag passed at the end. Just had to try...) I have tried setting TCP_NODELAY on the socket to no avail. I tried to set SO_SNDLOWAT to '1', but this always returned -1, so I do not think it is supported.
I do not want to redo all of my code using TCP RAW. Is there a way to invoke the tcp_output() function outside of TCP RAW mode? Is there any way to speed things up or is this just how slow lwip TCP with small packets is?
Any and all suggestions are welcome. Thanks.
--EDIT--
I would also like to add that once I am ready to transmit, I make sure that my TX task in FreeRTOS is at the highest priority. There are no other tasks running up to the point at which I call lwip_send/write.
I'm fairly experienced with bare metal lwIP on xilinx and lwip does not wait to send things out. It will pump packets out as fast as your interrupts are acknowledged based on the ethernet hardware. I've been using UDP only. What is coming to mind though, is your problem might be on the receive end. If you are doing TCP, maybe those small packets are coming out late because you are having receive issues. What you need to do is find in the code the lowest level point at which ethernet is transmit, put a general purpose output toggle on that. Then also put a general purpose output toggle on when a ethernet packet is received. Look at the signals on a scope. If it confirms your hypothesis, then move the output toggles around to narrow down the issue. Wash, rinse and repeat until you are down to where the issue its. It's crude and time consuming, but oftentimes this brute force approach solves many "impossible" embedded software problems, due to pure determination. Good luck!

The most reliable and efficient udp packet size?

Would sending lots a small packets by UDP take more resources (cpu, compression by zlib, etc...). I read here that sending one big packet of ~65kBYTEs by UDP would probably fail so I'm thought that sending lots of smaller packets would succeed more often, but then comes the computational overhead of using more processing power (or at least thats what I'm assuming). The question is basically this; what is the best scenario for sending the maximum successful packets and keeping computation down to a minimum? Is there a specific size that works most of the time? I'm using Erlang for a server and Enet for the client (written in c++). Using Zlib compression also and I send the same packets to every client (broadcasting is the term I guess).
The maximum size of UDP payload that, most of the time, will not cause ip fragmentation is
MTU size of the host handling the PDU (most of the case it will be 1500) -
size of the IP header (20 bytes) -
size of UDP header (8 bytes)
1500 MTU - 20 IP hdr - 8 UDP hdr = 1472 bytes
#EJP talked about 534 bytes but I would fix it to 508. This is the number of bytes that FOR SURE will not cause fragmentation, because the minimum MTU size that an host can set is 576 and IP header max size can be 60 bytes (508 = 576 MTU - 60 IP - 8 UDP)
By the way i'd try to go with 1472 bytes because 1500 is a standard-enough value.
Use 1492 instead of 1500 for calculation if you're passing through a PPPoE connection.
Would sending lots a small packets by UDP take more resources ?
Yes, it would, definitely! I just did an experiment with a streaming app. The app sends 2000 frames of data each second, precisely timed. The data payload for each frame is 24 bytes. I used UDP with sendto() to send this data to a listener app on another node.
What I found was interesting. This level of activity took my sending CPU to its knees! I went from having about 64% free CPU time, to having about 5%! That was disastrous for my application, so I had to fix that. I decided to experiment with variations.
First, I simply commented out the sendto() call, to see what the packet assembly overhead looked like. About a 1% hit on CPU time. Not bad. OK... must be the sendto() call!
Then, I did a quick fakeout test... I called the sendto() API only once in every 10 iterations, but I padded the data record to 10 times its previous length, to simulate the effect of assembling a collection of smaller records into a larger one, sent less often. The results were quite satisfactory: 7% CPU hit, as compared to 59% previously. It would seem that, at least on my *NIX-like system, the operation of sending a packet is costly just in the overhead of making the call.
Just in case anyone doubts whether the test was working properly, I verified all the results with Wireshark observation of the actual UDP transmissions to confirm all was working as it should.
Conclusion: it uses MUCH less CPU time to send larger packets less often, then the same amount of data in the form of smaller packets sent more frequently. Admittedly, I do not know what happens if UDP starts fragging your overly-large UDP datagram... I mean, I don't know how much CPU overhead this adds. I will try to find out (I'd like to know myself) and update this answer.
534 bytes. That is required to be transmitted without fragmentation. It can still be lost altogether of course. The overheads due to retransmission of lost packets and the network overheads themselves are several orders of magnitude more significant than any CPU cost.
You're probably using the wrong protocol. UDP is almost always a poor choice for data you care about transmitting. You wind up layering sequencing, retry, and integrity logic atop it, and then you have TCP.

GCDAsyncUdpSocket dropping packets and creating lots of DISPATCH_WORKER_THREADs

I'm building a multicast client with GCDAsyncUdpSocket and I'm facing a lot of packet loss.
I have monitored the server with Wireshark as well as captured the WiFi packets in the air with AirCap, and I'm sure the packets are transmitted properly. I also looked at the debug traces from the GCDAsyncUdpSocket library and I see that sometimes socket4FDBytesAvailable: is called with a large argument, like 4000, but when it reads the socket it read fewer bytes -- maybe 500 -- and that's where the packets are lost. I increased the socket buffer but that doesn't help.
Last, I noticed using Instruments' time profiler that, coincidence or not, each time I lose packets one new instance of a DISPATCH_WORKER_THREAD is created. Is this normal?