GCDAsyncUdpSocket dropping packets and creating lots of DISPATCH_WORKER_THREADs - cocoa-touch

I'm building a multicast client with GCDAsyncUdpSocket and I'm facing a lot of packet loss.
I have monitored the server with Wireshark as well as captured the WiFi packets in the air with AirCap, and I'm sure the packets are transmitted properly. I also looked at the debug traces from the GCDAsyncUdpSocket library and I see that sometimes socket4FDBytesAvailable: is called with a large argument, like 4000, but when it reads the socket it read fewer bytes -- maybe 500 -- and that's where the packets are lost. I increased the socket buffer but that doesn't help.
Last, I noticed using Instruments' time profiler that, coincidence or not, each time I lose packets one new instance of a DISPATCH_WORKER_THREAD is created. Is this normal?

Related

Losing data with UDP over WiFi when multicasting

I'm currently working a network protocol which includes a client-to-client system with auto-discovering of clients on the current local network.
Right now, I'm periodically broadsting over 255.255.255.255 and if a client doesn't emit for 30 seconds I consider it dead (then offline). The goal is to keep an up-to-date list of clients runing. It's working well using UDP, but UDP does not ensure that the packets have been sucessfully delivered. So when it comes to the WiFi parts of the network, I sometimes have "false postivives" of dead clients. Currently I've reduced the time between 2 broadcasts to solve the issue (still not working well), but I don't find this clean.
Is there anything I can do to keep a list of "online" clients without this risk of "false positives" ?
To minimize the false positives, due to dropped packets you should alter a little bit the logic of your heartbeat protocol.
Rather than relying on a single packet broadcast per N seconds, you can send a burst 3 or more packets immediately one after the other every N seconds. This is an approach that ping and traceroute tools follow. With this method you decrease significantly the probability of a lost announcement from a peer.
Furthermore, you can specify a certain number of lost announcements that your application can afford. Also, in order to minimize the possibility of packet loss using wireless network, try to minimize as much as possible the size of the broadcast UDP packet.
You can turn this over, so you will broadcast "ServerIsUp" message
and every client than can register on server. When client is going offline it will unregister, otherwise you can consider it alive.

UDP broadcast/multicast vs unicast behaviour (dropped packets)

I have an embedded device (source) which is sending out a stream of (audio) data in chunks of 20 ms (= about 330 bytes) by means of a UDP packets. The network volume is thus fairly low at about 16kBps (practically somewhat more due to UDP/IP overhead). The device is running the lwIP stack (v1.3.2) and connects to a WiFi network using a WiFi solution from H&D Wireless (HDG104, WiFi G-mode). The destination (sink) is a Windows Vista PC which is also connected to the WiFi network using a USB WiFi dongle (WiFi G-mode). A program is running on the PC which allows me to monitor the amount of dropped packets. I am also running Wireshark to analyze the network traffic directly. No other clients are actively sending data over the network at this point.
When I send the data using broadcast or multicast many packets are dropped, sometimes upto 15%. However, when I switch to using UDP unicast, the amount of packets dropped is negligible (< 2%).
Using UDP I expect packets to be dropped (which is OK in my Audio application), but why do I see such a big difference in performance between Broadcast/Multicast and unicast?
My router is a WRT54GS (FW v7.50.2) and the PC (sink) is using a trendnet TEW-648UB network adapter, running in WiFi G-mode.
This looks like it is a well known WiFi issue:
Quoted from http://www.wi-fiplanet.com/tutorials/article.php/3433451
The 802.11 (Wi-Fi) standards specify support for multicasting as part of asynchronous services. An 802.11 client station, such as a wireless laptop or PDA (not an access point), begins a multicast delivery by sending multicast packets in 802.11 unicast data frames directed to only the access point. The access point responds with an 802.11 acknowledgement frame sent to the source station if no errors are found in the data frame.
If the client sending the frame doesnt receive an acknowledgement, then the client will retransmit the frame. With multicasting, the leg of the data path from the wireless client to the access point includes transmission error recovery. The 802.11 protocols ensure reliability between stations in both infrastructure and ad hoc configurations when using unicast data frame transmissions.
After receiving the unicast data frame from the client, the access point transmits the data (that the originating client wants to multicast) as a multicast frame, which contains a group address as the destination for the intended recipients. Each of the destination stations can receive the frame; however, they do not respond with acknowledgements. As a result, multicasting doesnt ensure a complete, reliable flow of data.
The lack of acknowledgments with multicasting means that some of the data your application is sending may not make it to all of the destinations, and theres no indication of a successful reception. This may be okay, though, for some applications, especially ones where its okay to have gaps in data. For instance, the continual streaming of telemetry from a control valve monitor can likely miss status updates from time-to-time.
This article has more information:
http://hal.archives-ouvertes.fr/docs/00/08/44/57/PDF/RR-5947.pdf
One very interesting side-effect of the multicast implementation (at the WiFi MAC layer) is that as long as your receivers are wired, you will not experience any issues (due to the acknowledgement on the receiver side, which is really a unicast connection). However, with WiFi receivers (as in my case), packet loss is enormous and completely unacceptable for audio.
Multicast does not have ack packets and so there is no retransmission of lost packets. This makes perfect sense as there are many receivers and it's not like they can all reply at the same time (the air is shared like coax Ethernet). If they were all to send acks in sequence using some backoff scheme it would eat all your bandwidth.
UDP streaming with packet loss is a well known challenge and is usually solved using some type of forward error correction. Recently a class known as fountain codes, such as Raptor-Q, shows promise for packet loss problem in particular when there are several unreliable sources for the data at the same time. (example: multiple wifi access points covering an area)

Having difficulty sending small lwip packets immediately using the lwip API

I am creating a server on a ST Cortex M3 device. I am using the lwip API and FreeRTOS. All is working, but the response time is way off. I am currently using lwip 1.3.2 and FreeRTOS 7.3.
A single client connects to the server and must have some time-critical data sent frequently. These packets are on the order of 6 or so bytes. Other times, I am sending upwards of 20K.
The problem I am having is that these smaller packets seem to be taking forever to be sent. I assume this is because lwip is waiting for more data to be enqueued to make more efficient transmissions. I cannot wait around for 2 or 3 seconds for the data to be sent; the client is expecting the data nominally in a few micro-seconds or milli-seconds.
I have tried using lwip_send and lwip_write. (I understand that one is the same as the other with a flag passed at the end. Just had to try...) I have tried setting TCP_NODELAY on the socket to no avail. I tried to set SO_SNDLOWAT to '1', but this always returned -1, so I do not think it is supported.
I do not want to redo all of my code using TCP RAW. Is there a way to invoke the tcp_output() function outside of TCP RAW mode? Is there any way to speed things up or is this just how slow lwip TCP with small packets is?
Any and all suggestions are welcome. Thanks.
--EDIT--
I would also like to add that once I am ready to transmit, I make sure that my TX task in FreeRTOS is at the highest priority. There are no other tasks running up to the point at which I call lwip_send/write.
I'm fairly experienced with bare metal lwIP on xilinx and lwip does not wait to send things out. It will pump packets out as fast as your interrupts are acknowledged based on the ethernet hardware. I've been using UDP only. What is coming to mind though, is your problem might be on the receive end. If you are doing TCP, maybe those small packets are coming out late because you are having receive issues. What you need to do is find in the code the lowest level point at which ethernet is transmit, put a general purpose output toggle on that. Then also put a general purpose output toggle on when a ethernet packet is received. Look at the signals on a scope. If it confirms your hypothesis, then move the output toggles around to narrow down the issue. Wash, rinse and repeat until you are down to where the issue its. It's crude and time consuming, but oftentimes this brute force approach solves many "impossible" embedded software problems, due to pure determination. Good luck!

multicast packet loss - running two instances of the same application

On Redhat Linux, I have a multicast listener listening to a very busy multicast data source. It runs perfectly by itself, no packet losses. However, once I start the second instance of the same application with the exactly same settings (same src/dst IP address, sock buffer size, user buffer size, etc.) I started to see very frequent packet losses from both instances. And they lost exact the same packets. If I stop the one of the instances, the remaining one returns to normal without any packet loss.
Initially, I though it is the CPU/kernel load issue, maybe it could not get the packets out of buffer quickly enough. So I did another test. I still keep one instance of the application running. But then started a totally different multicast listener on the same computer but use the second NIC card and listen to a different but even busier multicast source. Both applications run fine without any packet loss.
So it looks like one NIC card is not powerful enough to support two multicast applications, even though they listen to exact the same thing. The possible cause to the packet loss problem might be that, in this scenario, the NIC card driver needs to copy the incoming data to two sock buffers, and this extra copy task is too much for the ether card to handle so it drops packets. Any deeper analysis on this issue and any possible solutions?
Thanks
You are basically finding out that the kernel is inefficient at fan-out of multicast packets. Worst case scenario the code is for every incoming packet allocating two new buffers, the SKB object and packet payload, and copying the NIC buffer twice.
Pick the best case scenario, for every incoming packet a new SKB is allocated but the packet payload is shared between the two sockets with reference counting. Now imagine what happens when two applications, each on their own core and on separate sockets. Every reference to the packet payload is going to cause the memory bus to stall whilst both core caches have to flush and reload, and above that each application is having to kernel context switch back and forth to pass the socket payload. The result is terrible performance.
You aren't the first to encounter such a problem and many vendors have created solutions to it. The basic design is to limit the incoming data to one thread on one core on one socket, then have that thread distribute the data to all other interested threads, preferably using user space code building upon shared memory and lockless data structures.
Examples are TIBCO's Rendezvous and 29 West's Ultra Messaging showing a 660ns IPC bus:
http://www.globenewswire.com/newsroom/news.html?d=194703

iPhone: Sending large data with Game Kit

I am trying to write an app that exchanges data with other iPhones running the app through the Game Kit framework. The iPhones discover each other and connect fine, but the problems happens when I send the data. I know the iPhones are connected properly because when I serialize an NSString and send it through the connection it comes out on the other end fine. But when I try to archive a larger object (using NSKeyedArchiver) I get the error message "AGPSessionBroadcast failed (801c0001)".
I am assuming this is because the data I am sending is too large (my files are about 500k in size, Apple seems to recommend a max of 95k). I have tried splitting up the data into several transfers, but I can never get it to unarchive properly at the other end. I'm wondering if anyone else has come up against this problem, and how you solved it.
I had the same problem w/ files around 300K. The trouble is the sender needs to know when the receiver has emptied the pipe before sending the next chunk.
I ended up with a simple state engine that ran on both sides. The sender transmits a header with how many total bytes will be sent and the packet size, then waits for acknowledgement from the other side. Once it gets the handshake it proceeds to send fixed size packets each stamped with a sequence number.
The receiver gets each one, reads it and appends it to a buffer, then writes back to the pipe that it got packet with the sequence #. Sender reads the packet #, slices out another buffer's worth, and so on and so forth. Each side keeps track of the state they're in (idle, sending header, receiving header, sending data, receiving data, error, done etc.) The two sides have to keep track of when to read/write the last fragment since it's likely to be smaller than the full buffer size.
This works fine (albeit a bit slow) and it can scale to any size. I started with 5K packet sizes but it ran pretty slow. Pushed it to 10K but it started causing problems so I backed off and held it at 8096. It works fine for both binary and text data.
Bear in mind that the GameKit isn't a general file-transfer API; it's more meant for updates of where the player is, what the current location or other objects are etc. So sending 300k for a game doesn't seem that sensible, though I can understand hijacking the API for general sharing mechanisms.
The problem is that it isn't a TCP connection; it's more a UDP (datagram) connection. In these cases, the data isn't a stream (which gets packeted by TCP) but rather a giant chunk of data. (Technically, UDP can be fragmented into multiple IP packets - but lose one of those, and the entire UDP is lost, as opposed to TCP, which will re-try).
The MTU for most wired networks is ~1.5k; for bluetooth, it's around ~0.5k. So any UDP packet that you sent (a) may get lost, (b) may be split into multiple MTU-sized IP packets, and (c) if one of those packets is lost, then you will automatically lose the entire set.
Your best strategy is to emulate TCP - it sends out packets with a sequence number. The receiving end can then request dupe transmissions of packets which went missing afterwards. If you're using the equivalent of an NSKeyedArchiver, then one suggestion is to iterate through the keys and write those out as individual keys (assuming each keyed value isn't that big on its own). You'll need to have some kind of ACK for each packet that gets sent back, and a total ACK when you're done, so the sender knows it's OK to drop the data from memory.