Maximum number of concurrent NSURLConnections to the same host? - objective-c

I'm running in to an issue in an OS X app that creates multiple, persistent connections to the same host using NSURLConnection. I create a separate connection for different rooms, and it stays connected the entire time the room is open to consume a streaming API. When opening many rooms, it stops working correctly.
I created a separate sample app that creates 10 connections, and it seems to only allow 6 connections to work, and the others are queued. Does anyone know if there is a way to override this limit? I can't find it documented anywhere. The only workaround I've found is it seems to be per host name, so testing with "localhost" and "127.0.0.1" allows 6 connections per host. I uploaded a sample project with client and server here - http://cl.ly/1x3K0D1F072V3U2T0C0I.

I filed a Radar for something that seems like the same issue but on iOS. I found that you can't have more than 5 connections open at once. The connections don't have to be pointing to the same domain. Anything after that would be queued. So if you have 5 connections open to an extremely slow endpoint, any other connections will not go through.
Radar: http://openradar.appspot.com/radar?id=2542401
Apple's reply:
This is the effect of our NSURLConnection connection cache. It is expected. We expect to address this type of configuration with new API.
I asked if they could give me anymore information (does it vary? does the type of connection affect it?) and they said:
Unfortunately, we can't give details about the connection limit behavior.
User agents in general (Chrome, Firefox, Safari) use six simultaneous TCP connections per hostname, with potential one-offs.

You could break this limitation by using CFNetwork API (CFHTTPMessage).
Here is the CFNetwork Programming Guide.
https://developer.apple.com/library/mac/documentation/Networking/Conceptual/CFNetwork/Introduction/Introduction.html#//apple_ref/doc/uid/TP30001132
BTW, if you decide to use CFNetwork, you'll need to work around the proxy and authenticate.
Wish this could helped!

Related

Seemingly random WebRTC ICE connection failure when connecting to same machine

I have an app which creates two instances of RTCPeerConnection (within the same JS context) which attempt to connect to each other. While I'm developing, I reload the page often, maybe several times per minute. About 10% of the time, WebRTC will fail to progress to the 'iceConnectionState == "connected"' stage. This failure occurs even when I pass no STUN/TURN servers to createPeer().
I primarily use Chrome (OSX, currently version 81.0.4044.138). I have never been able to reproduce this on Firefox.
I have captured nearly-identical dumps of the success and failure cases using chrome://webrtc-internals.
I have spent many hours on this and haven't found any clue as to why this might be failing. Is it just some kind of temporary local network outage? Is there anything I can do within the code to have a 100% local connection rate?
I have seen similar flakiness because of mDNS candidates. Try disabling #enable-webrtc-hide-local-ips-with-mdns in chrome://flags and see if that helps!
After that I would grab a tcpdump and confirm you see ICE traffic flowing each way.

Extra TCP connections on the RabbitMQ server after resource alarm

I have RabbitMQ Server 3.6.0 installed on Windows (I know it's time to upgrade, I've already done that on the other server node).
Heartbeats are enabled on both server and client side (heartbeat interval 60s).
I have had a resource alarm (RAM limit), and after that I have observed the raise of amount of TCP connections to RMQ Server.
At the moment there're 18000 connections while normal amount is 6000.
Via management plugin I can see there is a lot of connections with 0 channels, while our "normal" connection have at least 1 channel.
And even RMQ Server restart won't help: all connections would re-establish.
   1. Does that mean all of them are really alive?
Similar issue was described here https://github.com/rabbitmq/rabbitmq-server/issues/384, but as I can see it was fixed exactly in v3.6.0.
   2. Do I understand right that before RMQ Server v3.6.0 the behavior after resource alarm was like that: several TCP connections could hang on server side per 1 real client autorecovery connection?
Maybe important: we have haProxy between the server and the clients. 
   3. Could haProxy be an explanation for this extra connections? Maybe it prevents client from receiving a signal the connection was closed due to resource alarm?
Are all of them alive?
Only you can answer this, but I would ask - how is it that you are ending up with many thousands of connections? Really, you should only create one connection per logical process. So if you really have 6,000 logical processes connecting to the server, that might be a reason for that many connections, but in my opinion, you're well beyond reasonable design limits even in that case.
To check, see how many connections decrease when you kill one of your logical processes.
Do I understand right that before RMQ Server v3.6.0 the behavior after resource alarm was like that: several TCP connections could hang on server side per 1 real client autorecovery connection?
As far as I can tell, yes. It looks like the developer in this case ran across a common problem in sockets, and that is the detection of dropped connections. If I had a dollar for every time someone misunderstood how TCP works, I'd have more money than Bezos. So, what they found is that someone made some bad assumptions, when actually read or write is required to detect a dead socket, and the developer wrote code to (attempt) to handle it properly. It is important to note that this does not look like a very comprehensive fix, so if the conceptual design problem had been introduced to another part of the code, then this bug might still be around in some form. Searching for bug reports might give you a more detailed answer, or asking someone on that support list.
Could haProxy be an explanation for this extra connections?
That depends. In theory, haProxy as is just a pass-through. For the connection to be recognized by the broker, it's got to go through a handshake, which is a deliberate process and cannot happen inadvertently. Closing a connection also requires a handshake, which is where haProxy might be the culprit. If haProxy thinks the connection is dead and drops it without that process, then it could be a contributing cause. But it is not in and of itself making these new connections.
The RabbitMQ team monitors this mailing list and only sometimes answers questions on StackOverflow.
I recommended that this user upgrade from Erlang 18, which has known TCP connection issues -
https://groups.google.com/d/msg/rabbitmq-users/R3700QdIVJs/taDYKI6bAgAJ
I've managed to reproduce the problem: in the end it was a bug in the way our client used RMQ connections.
It created 1 auto-recovery connection (that's all fine with that) and sometimes it created a separate simple connection for "temporary" purposes.
Step to reproduce my problem were:
Reach memory alarm in RabbitMQ (e.g. set up an easily reached RAM
limit and push a lot of big messages). Connections would be in state
"blocking".
Start sending message from our client with this new "temp" connection.
Ensure the connection is in state "blocked".
Without eliminating resource alarm, restart RabbitMQ node.
The "temp" connection itself was here! Despite the fact auto-recovery
was not enabled for it. And it continued sending heartbeats so the
server didn't close it.
We will fix the client to use one and the only connection always.
Plus we of course will upgrade Erlang.

JMeter and Connect Times for SSL Connections

For a benchmarking test, I have a very basic test setup wherein I have a single user looping for 100 times (loop delay 100ms) hitting an https endpoint (GET) with HttpClient4 implementation, keep-alive has been turned on.
In the test results, I have observed a pattern wherein every 5/6th request the connect metric is higher as if a full SSL handshake is occurring, check the image below. I am a bit confused with this, any ideas on whats going on here and why the connect times are higher every n request?
[UPDATE]
I was able to troubleshoot this issue a bit further today after turning on access logs on the load balancer (target of this test) and I can see a pattern wherein JMeter seems to be switching the ports on the client side every few requests - the frequency matches the pattern observed previously with the JMeter test results.
This should probably explain the elevated connect times, now the question is why JMeter switches the port?
This could be keep-alive, it certainly was for my issue. Firstly make sure it's enabled on the sampler. Then there's also this JMeter setting to say how long to keep connections alive for.
httpclient4.time_to_live
I've set to 120000 in jmeter.properties but looking at the docs user.properties file should be used. I know jmeter.properties with a setting of 120000 worked for me.
I set the value high to see if it is an http keep alive causing the port switch. Whatever you set it to you need to ensure the client you are emulating does the same.
As you get some quick results I would guess it is a short timer somewhere and not the server side not allowing keep alive at all. Wireshark can help you pin point this as it could be the server side resetting the connection after a certain time. The above config extends the client side time which may get the information you need, if not have a look at the server side equivalent which will vary depending on what services the endpoint.

RTI DDS subscriber not getting data from publisher

Short story: My DDS subscriber cannot see data from my DDS publisher. What am I missing?
Long story:
QNX 6.4.1 VM A -- Broken Publisher. IP ends with .113
QNX 6.4.1 VM B -- Working Publisher. IP ends with .114
Windows 7 -- Subscriber/Main Dev box. IP ends with .203
RTI DDS 5.0 -- Middleware version
I have a QNX VM (hosted on the network, not on my machine) that is publishing some data via RTI DDS. The data never shows up in my Windows 7 subscriber application.
Interestingly enough, I can put the same code on VM B, and the subscriber gets data. Thinking this must be a Windows 7 firewall issue, I swapped VM A's IP address with VM B, but this did not help.
Using Wireshark, I can see some heartbeat traffic from VM A, but no data. From VM B, I see the heartbeat traffic and the data. Below is a sanitized Wireshark snippet.
NDDS_DISCOVERY_PEERS is set to include both multicast and the explicit IP address of the other side of each conversation. The QOS profiles are the same, and the RTI Analyzer indicates the Match Analysis was successful (all green).
VM A:
NDDS_DISCOVERY_PEERS=udpv4://239.255.0.1,udpv4://127.0.0.1,udpv4://BLAH.203
VM B:
NDDS_DISCOVERY_PEERS=udpv4://239.255.0.1,udpv4://127.0.0.1,udpv4://BLAH.203
Windows 7 box:
NDDS_DISCOVERY_PEERS=udpv4://239.255.0.1,udpv4://127.0.0.1,udpv4://BLAH.113,udpv4://BLAH.114
When I include them in the NDDS_DISCOVERY_PEERS line, other folks on the network can see DDS traffic from VM A with DDS SPY on their Windows 7 box. My Windows 7 box can not.
Windows 7 event log does not appear to show any firewall or WFP stopping the data packets.
RTI DDS Spy run from my Windows 7 machine shows that VM A (0A061071) writers are visible on the network, but no data is being received. It also shows that the readers in my subscriber on my Windows 7 machine are visible, though it shows up at an odd IP address.
Bonus question (out of curiosity only, NOT the primary question): why does traffic on my local machine show up in DDS SPY as 192.168.11.1 instead of my machine's IP or even 127.0.0.1?
Main Question: What am I missing?
Update:
route print on my Windows 7 box appears to show that I have joined a multicast group with VM A.
netsh interface ip show joins seemed to concur.
Investigation Update:
I rebooted the VM to no effect.
I rebooted the Windows box to no effect.
I removed the multicast from the NDDS_DISCOVERY_PEERS environment
variables on both sides to no effect.
The Windows 7 box has three network interfaces (plus loopback): 1
LAN connection and 2 (unrelated) VM adapters. We are working with
the LAN connection. The QNX VM has one network interface (plus
loopback). Note that the working VM and the broken VM use a
different ethernet driver than each other, as they are slightly
different flavors of QNX 6.4.1. The broken one has wm0 as the
interface, and the working one has en0 as the interface. I don't believe this is the issue, but it is a difference.
I ran DDS SPY on the "broken" QNX VM while it was playing back, and
I got DDS data. I don't have a good method to sniff the network
between where the VM is hosted and my Windows 7 machine to see if it
makes it out of the interface, but looking at the transmitted packet
count out of the ethernet interface on the QNX VM indicates that it
is definitely transmitting something, and the Wireshark captures on the Windows 7 machine itself show that at least some traffic is making it through.
Other folks on the LAN here can see the DDS traffic from the
"broken" VM, which leads me to believe it is a Windows setup issue,
rather than a broken VM--I just can't see what it could be. I've
re-checked the firewall, to no avail. I would have thought that if it were a firewall issue, the problem would have gone away when I swapped IP addresses between VM A and VM B. In any case, the Windows 7 firewall is currently off, to no avail.
Below are several screens of Wireshark output. I skipped a bunch between the third and the fourth, as after the fourth, the traffic tended to look like the bottom of the fourth until the end.
(Skipped a bunch here)
(Pretty much continues on like the last 11 lines above)
What else should I try?
Update:
To answer Rose's question below, using rtiddsping -publisher on the bad VM and rtiddsping -subscriber works appropriately.
I wonder if this issue is caused by the weird IP address. The IP address it happens to publish and somehow latch on to is a local VM ethernet adapter (separate from VM A). See the screenshot below.
The address I would like it to attach to is 10.6.6.203. Any way I can specify that?
More than a year later this happened to me again with a different virtual machine. I had it working yesterday, so I was very suspicious. I scoured all my code changes for the past 24 hours for issues, but didn't find any. Then I decided to see if IT had pushed any patches to my computer.
Guess what? The Windows Firewall had been aggressively updated since the day before. Rules missing or changed, etc. The log said packets were being dropped. I opened the firewall filters up a bit, and suddenly, everything worked again. I hesitate to close this issue, as I am not 100% this was exactly the same as last year, but it feels very similar. I suspect that last year the settings in the firewall were not logging the packet drops.
Long and short of it: if DDS suddenly stops working, check your firewall settings.
A couple of things to try:
Try running rtiddsping -publisher on the broken VM and rtiddsping -subscriber on Windows. This has two advantages:
The data type is small and well-known, so if there's some problem with the data being fragmented due to the different Ethernet drivers, it will not happen with rtiddsping, and may help track down the problem.
Rtiddsping prints out when the publisher and subscriber discover each other, so you will be able to confirm that discovery is completing correctly on both sides. I am guessing discovery is working, because Analyzer is showing both applications, but it is good to confirm.
If you see the same problem with rtiddsping that you see with your application, increase the verbosity to rtiddsping -verbosity 3, and then 5. At the highest verbosity level, this will print (a lot of) additional output, which may give us a hint about what is happening.
To answer your bonus question about spy: The reason why spy is showing that IP address is because that is one of the addresses that is being announced as part of discovery. During discovery, a DomainParticipant can announce up to four IP addresses that can be used to reach it. Spy will choose one of those to display, but it may not be the actual address that is being used to communicate with the application. If your machine does not have any interface with the 192.168.11.1 IP address, this could indicate a larger problem. (This may be normal, though - as long as the correct IP is one of the four that are announced.)
Looking through the packet trace images, there is nothing that is obviously the problem. A few things I notice:
There seems to be a normal pattern of heartbeats/ACKNACKs in the final packet trace image. This indicates that there is some bidirectional communication between the two applications.
It is difficult to tell from these images whether the DATA being sent from .113 to .203 consists of participant-to-participant messages, or real discovery messages - except for two packets: packet #805, and packet #816 (fragments 811-815) look like discovery announcements that are being sent to .203. This indicates that you have at least four entities (DataWriters or DataReaders) in your application on .113.
So, discovery data is being sent by the application on .113. It is being received and reassembled by WireShark, but that doesn't always mean it was received correctly by the application.
Packet #816 has a heartbeat on the end of it. It is possible that packet #818 or #819 might be the ACKNACK that is responding to that heartbeat, but I can't be sure from the image. The next step is to look at those ACKNACKs from .203 to .113 to see if .203 thinks it has received all the discovery data. Here is an example of a HB/ACKNACK pair where a discovery DataReader has received all data:
Submessage: HEARTBEAT
...
firstSeqNumber: 1
lastSeqNumber: 1
The heartbeat sequence number is 1, which indicates it has only sent an announcement about a single DataReader.
Submessage: ACKNACK
...
readerSNState: 2/0:
bitmapBase: 2
numBits: 0
The readerSNState is 2/0, meaning it has received everything before sequence number two, and there is nothing missing. If there is something other than a 0 in the bitmap, it indicates the DataReader did not receive some data.
If you can confirm that the application is receiving all the discovery data correctly, it will be helpful if you can use a WireShark filter to show only user data, since the images aren't highlighting discovery vs. user data.
WireShark filter for just rtps2 user data:
rtps2 && (rtps2.traffic_nature == 3 || rtps2.traffic_nature == 1)
We had a similar issue with this. Here is the environment in a very summarized way:
A publisher
A working subscriber (laptop)
A non-working subscriber (desktop)
Both subscribers held exactly the same software (the desktop was a clone from the laptop, through Clonezilla), but rtiddsspy was blind from the desktop point of view; however, the opposite way worked well: the publisher machine's rtiddsspy saw the desktop. Laptop and publisher machines' always worked well. Laptop and desktop too (they saw each other's subscriptions)
The workaround for this (based on https://community.rti.com/content/forum-topic/discovery-issues) was to increase the MTU on the desktop NIC. Don't ask me why, but it worked.
EDIT: At the beginning, the MTU in the publisher was set to a higher value than the subscriber. So, we changed the MTU in the subscriber to match the publisher's.

Too many TIME_WAIT connections

We have a fairly busy website (1 million page views/day) using Apache mod proxy that keeps getting overloaded with connections (>1,000) in the TIME_WAIT state. The connections are to port 3306 (mysql), but mysql only shows a few connections (show process list) and is performing fine.
We have tried changing a bunch of things (keep alive on/off), but nothing seems to help. All other system resources are within reasonable range.
I've searched around, which seems to indicate changing the tcp_time_wait_interval. But that seems a bit drastic. I've worked on busy website before, but never had this problem.
Any suggestions?
Each time_wait connection is a connection that has been closed.
You're probably connecting to mysql, issuing a query, then disconnecting. Repeat for each query on the page. Consider using a connection pooling tool, or at very least, a global variable that holds on to your database connection. If you use a global, you'll have to close the connection at the end of the page. Hopefully you have someplace common you can put that, like a footer include.
As a bonus, you should get a faster page load. MySQL is quick to connect, but not having to re-connect is even faster.
If your client applications are using JDBC, you might be hitting this bug:
http://bugs.mysql.com/bug.php?id=56979
I believe that php has the same problem
Cheers,
Gilles.
We had a similar problem, where our web servers all froze up because our php was making connections to a mysql server that was set up to do reverse host lookups on incoming connections.
When things were slow it worked fine, but under load the responstimes shot through the roof and all the apache servers got stuck in time_wait.
The way we figured the problem out was through using xdebug to create profiling data on the scripts under high load, and looking at that. the mysql_connect calls took up 80-90% of the execution time.