ARP Timeouts. Why fixed periodic? - udp

This one's been bugging me for years.
Basic question: Is there some reason ARP has to be implemented with fixed timeouts on ARP cache entries?
I do a lot of work in Real Time ciricles. We do most of our inter-system communications these days on dedicated UDP/IP links. This for the most part works reliably in Real Time, but for one nit: ARP entry timeouts.
The way typical implementations do ARP is the following:
When client asks to send an IP packet to an IP address with an unkown MAC address, instead of sending that IP packet, the stack sends out an ARP request. If an upper layer (TCP) does resends, that's no problem. But since we use UDP, the original message is lost. At startup time this is OK, but in the middle of operation this is a Bad Thing™.
(Dynamic) ARP table entries are removed from the ARP table periodicly, even if we just got a packet from that system a millisecond ago. This means the Bad Thing™ happens to our system regularly.
The obvious solution (which we use religously) is to make all the ARP entries static. However, that's a royal PITA (particularly on RTOS's where finding an interface's MAC address is not always a matter of a couple of easy GUI clicks).
Back when we wrote our own IP stack, I solved this problem by never (ever) timing out ARP table entries. That has obvious drawbacks. A more robust and perfectly reasonable solution might be to refresh the entry timeout whenever a packet from the same MAC/IP combo is seen. That way an entry would only get timed-out if it hadn't communicated with the stack in that amount of time.
But now we're using our vendor's IP stack, and we're back to the stupid ARP timeouts. We have enough leverage with this vendor that I could perhaps get them to use a less inconvienient scheme. However, the universality of this brain-dead timeout algorithm leads me to believe it might be a required part of the implementation.
So that's the question. Is this behavior somehow required?

RFC1122 Requirements for Internet Hosts discusses this.
2.3.2.1 ARP Cache Validation
An implementation of the Address Resolution Protocol (ARP)
[LINK:2] MUST provide a mechanism to flush out-of-date cache
entries. If this mechanism involves a timeout, it SHOULD be
possible to configure the timeout value.
...
DISCUSSION:
The ARP specification [LINK:2] suggests but does not
require a timeout mechanism to invalidate cache entries
when hosts change their Ethernet addresses. The
prevalence of proxy ARP (see Section 2.4 of [INTRO:2])
has significantly increased the likelihood that cache
entries in hosts will become invalid, and therefore
some ARP-cache invalidation mechanism is now required
for hosts. Even in the absence of proxy ARP, a long-
period cache timeout is useful in order to
automatically correct any bad ARP data that might have
been cached.
Networks can be very dynamic; DHCP servers can assign the same IP address to different computers when old lease times expire (making current ARP data invalid), there can be IP conflicts that will never be noticed unless ARP requests are periodically made, etc.
It also provides a mechanism for checking if a host is still on the network. Imagine you're streaming a video over UDP to some IP address 192.168.0.5. If you cache the MAC address of that machine forever, you'll just keep spamming out UDP packets even if the host goes down. Doing an ARP request every now and then will stop the stream with a destination unreachable error because no one responded with a MAC for that IP.

It originated in distrust of routing protocols, especially in the non-Ethernet world (especially MIT's CHAOS networks). Chris Moon, one of the early "ARPAnauts" was quoted specifically about this in the original ARP RFC.
You can, of course, keep the other guys' ARP caches from timing out by proactively broadcasting your own ARP announcements. Most Ethernet layers will accept gratuitous ARP responses into their caches without trying to correlate them to ARP requests they have previously sent.

Related

What causes a SOAP service to keep disconnecting TLS clients after responding to a single message?

I loaded a client-side .svclog file inside Microsoft Service Trace Viewer and there are a lot of entries in the log saying setting up secure session and close secure session. On the server side, I can see many instances of trust/RST/SCT/Cancel, indicating that the connections are being closed on the server side, but only after giving a response to a SOAP message. It seems like every web service call involves setting up a TLS session for SOAP, and then the connection being closed immediately after sending a response, requiring that TLS be set up again for the very next call.
I read this article: https://blogs.technet.microsoft.com/tspring/2015/02/23/poor-mans-guide-to-troubleshooting-tls-failures/
It said:
Keep in mind that TCP resets should always be expected at some point as the client closes out the session to the server. However, if there are a high volume of TCP resets with little or no “Application Data” (traffic which contains the encapsulated encrypted data between client and server) then you likely have a problem. Particularly if the server side is resetting the connection as opposed to the client.
Unfortunately, the article doesn't expand on this, because it is exactly what I am seeing!
This is a net.tcp web service installed in some customer environment, set up to use Windows authentication.
What's the next step in my diagnosis?
Most likely the behavior you are seeing is normal, and unless you are experiencing some problems I would not be concerned. The MSFT document you quote is referring to TCP resets, but you said your logs show trust/RST/SCT/Cancel entries, and in that context RST means RequestSecurityToken. In other words, your log messages don't in any way imply that there are TCP reset (RST) frames occurring.
The Web Services Secure Conversation Language (WS-SecureConversation) spec (here) says:
It is not uncommon for a requestor to be done with a security context
token before it expires. In such cases the requestor can explicitly
cancel the security context using this specialized binding based on
the WS-Trust Cancel binding. The following Action URIs are used with
this binding:
http://schemas.xmlsoap.org/ws/2005/02/trust/RST/SCT/Cancel
http://schemas.xmlsoap.org/ws/2005/02/trust/RSTR/SCT/Cancel
Once a
security context has been cancelled it MUST NOT be allowed for
authentication or authorization or allow renewal. Proof of possession
of the key associated with the security context MUST be proven in
order for the context to be cancelled.
If you actually are experiencing transport problems due to unexpected TCP RST frames, or if you are seeing them and are curious to understand their underlying cause, then you'll need to capture network traffic to see how and why TCP resets are occurring, and whether they are normal or abnormal.
I'd do that by firing up WireShark and looking at the frames. If you see FIN, ACK messages from each side then you expect the connection to be closed gracefully after a waiting period. Otherwise you'll see RST frames for a variety of reasons: application resets (performed to avoid tying up a lot of ports in Wait states), bad sequence number when re-accessing a port that's in a Wait state, router or firewall RST messages (typically sent both directions), retransmit timeouts, port choice RST messages, and others.
There are lots of resources to help with TCP traffic analysis. You might find it helpful to take a look at https://blogs.technet.microsoft.com/networking/2009/08/12/where-do-resets-come-from-no-the-stork-does-not-bring-them/ for a quick overview.
If you're not familiar with WireShark it can seem a little complicated, but the thing you want to do here is very simple and you can get your answer very quickly even with no prior experience. Just search for wireshark tutorials and you'll find one that fits your cognitive style.
You can also use WireShark to troubleshoot higher level protocols, including TLS. You can find information about that in many places. I'll just list a few to get you started:
WireShark documentation on SSL is here.
Wikiversity section on HTTPS is here.
A 5-minute youtube tutorial for looking at SSL traffic is here.
I believe this covers your next diagnostic step reasonably well, but if not, feel free to post more information and I can try to provide a better answer.

Extra TCP connections on the RabbitMQ server after resource alarm

I have RabbitMQ Server 3.6.0 installed on Windows (I know it's time to upgrade, I've already done that on the other server node).
Heartbeats are enabled on both server and client side (heartbeat interval 60s).
I have had a resource alarm (RAM limit), and after that I have observed the raise of amount of TCP connections to RMQ Server.
At the moment there're 18000 connections while normal amount is 6000.
Via management plugin I can see there is a lot of connections with 0 channels, while our "normal" connection have at least 1 channel.
And even RMQ Server restart won't help: all connections would re-establish.
   1. Does that mean all of them are really alive?
Similar issue was described here https://github.com/rabbitmq/rabbitmq-server/issues/384, but as I can see it was fixed exactly in v3.6.0.
   2. Do I understand right that before RMQ Server v3.6.0 the behavior after resource alarm was like that: several TCP connections could hang on server side per 1 real client autorecovery connection?
Maybe important: we have haProxy between the server and the clients. 
   3. Could haProxy be an explanation for this extra connections? Maybe it prevents client from receiving a signal the connection was closed due to resource alarm?
Are all of them alive?
Only you can answer this, but I would ask - how is it that you are ending up with many thousands of connections? Really, you should only create one connection per logical process. So if you really have 6,000 logical processes connecting to the server, that might be a reason for that many connections, but in my opinion, you're well beyond reasonable design limits even in that case.
To check, see how many connections decrease when you kill one of your logical processes.
Do I understand right that before RMQ Server v3.6.0 the behavior after resource alarm was like that: several TCP connections could hang on server side per 1 real client autorecovery connection?
As far as I can tell, yes. It looks like the developer in this case ran across a common problem in sockets, and that is the detection of dropped connections. If I had a dollar for every time someone misunderstood how TCP works, I'd have more money than Bezos. So, what they found is that someone made some bad assumptions, when actually read or write is required to detect a dead socket, and the developer wrote code to (attempt) to handle it properly. It is important to note that this does not look like a very comprehensive fix, so if the conceptual design problem had been introduced to another part of the code, then this bug might still be around in some form. Searching for bug reports might give you a more detailed answer, or asking someone on that support list.
Could haProxy be an explanation for this extra connections?
That depends. In theory, haProxy as is just a pass-through. For the connection to be recognized by the broker, it's got to go through a handshake, which is a deliberate process and cannot happen inadvertently. Closing a connection also requires a handshake, which is where haProxy might be the culprit. If haProxy thinks the connection is dead and drops it without that process, then it could be a contributing cause. But it is not in and of itself making these new connections.
The RabbitMQ team monitors this mailing list and only sometimes answers questions on StackOverflow.
I recommended that this user upgrade from Erlang 18, which has known TCP connection issues -
https://groups.google.com/d/msg/rabbitmq-users/R3700QdIVJs/taDYKI6bAgAJ
I've managed to reproduce the problem: in the end it was a bug in the way our client used RMQ connections.
It created 1 auto-recovery connection (that's all fine with that) and sometimes it created a separate simple connection for "temporary" purposes.
Step to reproduce my problem were:
Reach memory alarm in RabbitMQ (e.g. set up an easily reached RAM
limit and push a lot of big messages). Connections would be in state
"blocking".
Start sending message from our client with this new "temp" connection.
Ensure the connection is in state "blocked".
Without eliminating resource alarm, restart RabbitMQ node.
The "temp" connection itself was here! Despite the fact auto-recovery
was not enabled for it. And it continued sending heartbeats so the
server didn't close it.
We will fix the client to use one and the only connection always.
Plus we of course will upgrade Erlang.

Fragmented UDP packet loss?

We have an application doing udp broadcast.
The packet size is mostly higher than the mtu so they will be fragmented.
tcpdump says the packets are all being received but the application doesn't get them all.
The whole stuff isn't happening at all if the mtu is set larger so there isn't fragmentation. (this is our workaround right now - but Germans don't like workarounds)
So it looks like fragmentation is the problem.
But I am not able to understand why and where the packets get lost.
The app developers say they can see the loss of the packets right at the socket they are picking them up. So their application isn't losing the packets.
My questions are:
Where is tcpdump monitoring on linux the device?
Are the packets there already reassembled or is this done later?
How can I debug this issue further?
tcpdump uses libpcap which gets copies of packets very early in the Linux network stack. IP fragment reassembly in the Linux network stack would happen after libpcap (and therefore after tcpdump). Save the pcap and view with Wireshark; it will have better analysis features and will help you find any missing IP fragments (if there are any).

The Node.js event loop - nginx/apache

Both nginx and Node.js have event loops to handle requests. I put nginx in front of Node.js as has been recommended here
Using Node.js only vs. using Node.js with Apache/Nginx
with the setup shown here
Node.js + Nginx - What now?
How do the two event loops play together? Is there any risk of conflicts between the two? I wonder because Nginx may not be able to handle as many events per second as Node.js or vice versa. For example, if Nginx can handle 1000 events per second but node.js only 500, won't that cause issues? (I have no idea if 1000,500 are reasonable orders of magnitude, you could correct me on that.)
What about putting Apache in front of Node.js? Apache has no event loop. Just threads. So won't putting Apache in front of Node.js defeat the purpose?
In this 2010 talk, Node.js creator Ryan Dahl had vision to get rid of nginx/apache/whatever entirely and make node talk directly to the internet. When do you think this will be reality?
Both nginx and Node use an asynchronous and event-driven approach. The communication between them will go more or less like this:
nginx receives a request
nginx forwards the request to the Node process and immediately goes back to wait for more requests
Node receives the request from nginx
Node handles the request with minimal CPU usage, until at some point it needs to issue one or more I/O requests (read from a database, write the response, etc). At this point it launches all these I/O requests and goes back to wait for more requests.
The above can repeat lots of times. You could have hundreds of thousands of requests all in a non-blocking wait state where nginx is waiting for Node and Node is waiting for I/O. And while this happens both nginx and Node are ready to accept even more requests!
Eventually async I/O started by the Node process will complete and a callback function will get invoked.
If there are still I/O requests that haven't completed for this request, then Node goes back to its loop one more time. It can also happen that once an I/O operation completes this data is consumed by the Node callback and then new I/O needs to happen, so Node can start more async I/O requests before going back to the loop.
Eventually all I/O operations started by Node for a particular request will be complete, including those that write the response back to nginx. So Node ends this request, and then as always goes back to its loop.
nginx receives an event indicating that response data has arrived for a request, so it takes that data and writes it back to the client, once again in a non-blocking fashion. When the response has been written to the client and event will trigger and nginx will then end the request.
You are asking about what would happen if nginx and Node can handle a different number of maximum connections. They really don't have a maximum, the maximum in general comes from operating system configuration, for example from the maximum number of open handles the system can have at a time or the CPU throughput. So your question does not really apply. If the system is configured correctly and all processes are I/O bound, neither nginx or Node will ever block.
Putting Apache in front of Node will only work well if you can guarantee that your Apache never blocks (i.e it never reaches its maximum connection limit). This is hard/impossible to achieve for large number of connections, because Apache uses an individual process or thread for each connection. nginx and Node scale really well, Apache does not.
Running Node without another server in front works fine and it should be okay for small/medium load sites. The reason putting a web server in front of it is preferred is that web servers like nginx come with features that Node does not have and you would need to implement yourself. Things like caching, load balancing, running multiple apps from the same server, etc.
I think your questions have been largely covered by some of the others answers, but there are a few pieces missing, and some that I disagree with, so here are mine:
The event loops are isolated from each other at the process level, but do interact. The issues you're most likely to encounter are around the configuration of nginx response buffers, chunked data, etc. but this is optimisation rather than error resolution.
As you point out, if you use Apache you're nullifying the benefit of using Node.js, i.e. massive concurrency and websockets. I wouldn't recommend doing that.
People are already using Node.js at the front of their stack. Searching for benchmarks returns some reasonable-looking results in Node's favour, so performance to my mind isn't an issue. However, there are still reasons to put Nginx in front of Node.
Security - Node has been given increasing scrutiny, but it's still young. You may not have problems here, but caution is often your friend.
Training - Ops staff that you hire will know how to manage Nginx, but the configuration and management of your custom Node app will only ever be understood by those people your developers successfully communicate it to. In some companies this is nobody.
Operational Flexibility - If you reach scale you might want to split out the serving of static content, purely to reduce the load on your app servers. You might want to split content amongst different domains and have it managed separately, or have different SSL or proxying behaviour for different domains or URL patterns. These are the things that are easy for Ops guys to configure in Nginx, but you'd have to code manually in a Node app.
The event loops are independent. Event loops are implemented at the application level, so neither cares what sort of architecture the other uses.
NodeJS is good at many things, but there are some places where it still falters. Once example is serving static files. At the moment, nodejs performs fairly poorly in this test, so having a dedicated web server for your static files greatly improves response time. Also, nodejs is still in its infancy, and has not been "tested and hardened" in the matters of security like Apache on nginX.
It'll take a long time for people to consider fronting nodejs all by itself. The cluster module is a step in the right direction, but it'll take a long time even after it reaches v1 before it happens.
Both event loops are unrelated. They don't play together.
Yes, it is pretty useless. Apache is not a load balancer.
What Ryan Dahl said may be applicable already. The limit of concurrent users is definitely higher than that of Apache. Before node.js websites with fair amount of concurrent users had to use nginx to balance the load. For small to medium sized businesses it can be done with node.js alone. But ruling out nginx completely will take time. Let node.js be stable before it can follow this ambitious dream.

RTI DDS subscriber not getting data from publisher

Short story: My DDS subscriber cannot see data from my DDS publisher. What am I missing?
Long story:
QNX 6.4.1 VM A -- Broken Publisher. IP ends with .113
QNX 6.4.1 VM B -- Working Publisher. IP ends with .114
Windows 7 -- Subscriber/Main Dev box. IP ends with .203
RTI DDS 5.0 -- Middleware version
I have a QNX VM (hosted on the network, not on my machine) that is publishing some data via RTI DDS. The data never shows up in my Windows 7 subscriber application.
Interestingly enough, I can put the same code on VM B, and the subscriber gets data. Thinking this must be a Windows 7 firewall issue, I swapped VM A's IP address with VM B, but this did not help.
Using Wireshark, I can see some heartbeat traffic from VM A, but no data. From VM B, I see the heartbeat traffic and the data. Below is a sanitized Wireshark snippet.
NDDS_DISCOVERY_PEERS is set to include both multicast and the explicit IP address of the other side of each conversation. The QOS profiles are the same, and the RTI Analyzer indicates the Match Analysis was successful (all green).
VM A:
NDDS_DISCOVERY_PEERS=udpv4://239.255.0.1,udpv4://127.0.0.1,udpv4://BLAH.203
VM B:
NDDS_DISCOVERY_PEERS=udpv4://239.255.0.1,udpv4://127.0.0.1,udpv4://BLAH.203
Windows 7 box:
NDDS_DISCOVERY_PEERS=udpv4://239.255.0.1,udpv4://127.0.0.1,udpv4://BLAH.113,udpv4://BLAH.114
When I include them in the NDDS_DISCOVERY_PEERS line, other folks on the network can see DDS traffic from VM A with DDS SPY on their Windows 7 box. My Windows 7 box can not.
Windows 7 event log does not appear to show any firewall or WFP stopping the data packets.
RTI DDS Spy run from my Windows 7 machine shows that VM A (0A061071) writers are visible on the network, but no data is being received. It also shows that the readers in my subscriber on my Windows 7 machine are visible, though it shows up at an odd IP address.
Bonus question (out of curiosity only, NOT the primary question): why does traffic on my local machine show up in DDS SPY as 192.168.11.1 instead of my machine's IP or even 127.0.0.1?
Main Question: What am I missing?
Update:
route print on my Windows 7 box appears to show that I have joined a multicast group with VM A.
netsh interface ip show joins seemed to concur.
Investigation Update:
I rebooted the VM to no effect.
I rebooted the Windows box to no effect.
I removed the multicast from the NDDS_DISCOVERY_PEERS environment
variables on both sides to no effect.
The Windows 7 box has three network interfaces (plus loopback): 1
LAN connection and 2 (unrelated) VM adapters. We are working with
the LAN connection. The QNX VM has one network interface (plus
loopback). Note that the working VM and the broken VM use a
different ethernet driver than each other, as they are slightly
different flavors of QNX 6.4.1. The broken one has wm0 as the
interface, and the working one has en0 as the interface. I don't believe this is the issue, but it is a difference.
I ran DDS SPY on the "broken" QNX VM while it was playing back, and
I got DDS data. I don't have a good method to sniff the network
between where the VM is hosted and my Windows 7 machine to see if it
makes it out of the interface, but looking at the transmitted packet
count out of the ethernet interface on the QNX VM indicates that it
is definitely transmitting something, and the Wireshark captures on the Windows 7 machine itself show that at least some traffic is making it through.
Other folks on the LAN here can see the DDS traffic from the
"broken" VM, which leads me to believe it is a Windows setup issue,
rather than a broken VM--I just can't see what it could be. I've
re-checked the firewall, to no avail. I would have thought that if it were a firewall issue, the problem would have gone away when I swapped IP addresses between VM A and VM B. In any case, the Windows 7 firewall is currently off, to no avail.
Below are several screens of Wireshark output. I skipped a bunch between the third and the fourth, as after the fourth, the traffic tended to look like the bottom of the fourth until the end.
(Skipped a bunch here)
(Pretty much continues on like the last 11 lines above)
What else should I try?
Update:
To answer Rose's question below, using rtiddsping -publisher on the bad VM and rtiddsping -subscriber works appropriately.
I wonder if this issue is caused by the weird IP address. The IP address it happens to publish and somehow latch on to is a local VM ethernet adapter (separate from VM A). See the screenshot below.
The address I would like it to attach to is 10.6.6.203. Any way I can specify that?
More than a year later this happened to me again with a different virtual machine. I had it working yesterday, so I was very suspicious. I scoured all my code changes for the past 24 hours for issues, but didn't find any. Then I decided to see if IT had pushed any patches to my computer.
Guess what? The Windows Firewall had been aggressively updated since the day before. Rules missing or changed, etc. The log said packets were being dropped. I opened the firewall filters up a bit, and suddenly, everything worked again. I hesitate to close this issue, as I am not 100% this was exactly the same as last year, but it feels very similar. I suspect that last year the settings in the firewall were not logging the packet drops.
Long and short of it: if DDS suddenly stops working, check your firewall settings.
A couple of things to try:
Try running rtiddsping -publisher on the broken VM and rtiddsping -subscriber on Windows. This has two advantages:
The data type is small and well-known, so if there's some problem with the data being fragmented due to the different Ethernet drivers, it will not happen with rtiddsping, and may help track down the problem.
Rtiddsping prints out when the publisher and subscriber discover each other, so you will be able to confirm that discovery is completing correctly on both sides. I am guessing discovery is working, because Analyzer is showing both applications, but it is good to confirm.
If you see the same problem with rtiddsping that you see with your application, increase the verbosity to rtiddsping -verbosity 3, and then 5. At the highest verbosity level, this will print (a lot of) additional output, which may give us a hint about what is happening.
To answer your bonus question about spy: The reason why spy is showing that IP address is because that is one of the addresses that is being announced as part of discovery. During discovery, a DomainParticipant can announce up to four IP addresses that can be used to reach it. Spy will choose one of those to display, but it may not be the actual address that is being used to communicate with the application. If your machine does not have any interface with the 192.168.11.1 IP address, this could indicate a larger problem. (This may be normal, though - as long as the correct IP is one of the four that are announced.)
Looking through the packet trace images, there is nothing that is obviously the problem. A few things I notice:
There seems to be a normal pattern of heartbeats/ACKNACKs in the final packet trace image. This indicates that there is some bidirectional communication between the two applications.
It is difficult to tell from these images whether the DATA being sent from .113 to .203 consists of participant-to-participant messages, or real discovery messages - except for two packets: packet #805, and packet #816 (fragments 811-815) look like discovery announcements that are being sent to .203. This indicates that you have at least four entities (DataWriters or DataReaders) in your application on .113.
So, discovery data is being sent by the application on .113. It is being received and reassembled by WireShark, but that doesn't always mean it was received correctly by the application.
Packet #816 has a heartbeat on the end of it. It is possible that packet #818 or #819 might be the ACKNACK that is responding to that heartbeat, but I can't be sure from the image. The next step is to look at those ACKNACKs from .203 to .113 to see if .203 thinks it has received all the discovery data. Here is an example of a HB/ACKNACK pair where a discovery DataReader has received all data:
Submessage: HEARTBEAT
...
firstSeqNumber: 1
lastSeqNumber: 1
The heartbeat sequence number is 1, which indicates it has only sent an announcement about a single DataReader.
Submessage: ACKNACK
...
readerSNState: 2/0:
bitmapBase: 2
numBits: 0
The readerSNState is 2/0, meaning it has received everything before sequence number two, and there is nothing missing. If there is something other than a 0 in the bitmap, it indicates the DataReader did not receive some data.
If you can confirm that the application is receiving all the discovery data correctly, it will be helpful if you can use a WireShark filter to show only user data, since the images aren't highlighting discovery vs. user data.
WireShark filter for just rtps2 user data:
rtps2 && (rtps2.traffic_nature == 3 || rtps2.traffic_nature == 1)
We had a similar issue with this. Here is the environment in a very summarized way:
A publisher
A working subscriber (laptop)
A non-working subscriber (desktop)
Both subscribers held exactly the same software (the desktop was a clone from the laptop, through Clonezilla), but rtiddsspy was blind from the desktop point of view; however, the opposite way worked well: the publisher machine's rtiddsspy saw the desktop. Laptop and publisher machines' always worked well. Laptop and desktop too (they saw each other's subscriptions)
The workaround for this (based on https://community.rti.com/content/forum-topic/discovery-issues) was to increase the MTU on the desktop NIC. Don't ask me why, but it worked.
EDIT: At the beginning, the MTU in the publisher was set to a higher value than the subscriber. So, we changed the MTU in the subscriber to match the publisher's.