Inaccurate packet counter in OpenvSwitch - openflow

I attempted to send a file from host A to B and capture the packet loss using OpenvSwitch. I connected host A and B to an OpenvSwitch VM separately and connect the two OpenvSwitch VMs. The topology looks like this:
A -- OVS_A -- OVS_B -- B
On each OpenvSwitch VM, I added two very simple flows using the commands below:
ovs-ofctl add-flow br0 in_port=1,actions=output:2
ovs-ofctl add-flow br0 in_port=2,actions=output:1
Then I sent a 10GB file between A and B and compared the packet counts of the egress flow on the sending switch and the ingress flow on the receiving switch. I found that the packet count on the receiving switch is much larger than the count on the sending switch, indicating more packets are received than being sent!
I tried to match more specific flows, e.g. a TCP flow from IP A.A.A.A to B.B.B.B on port C and got the same result. Is there anything wrong with my settings? Or this is a known bug in OpenvSwitch? Any ideas?
BTW, is there any other way to passively capture packet loss rate? Meaning measuring the loss rate w/o introducing any intrusive test flows, but simply use statistics available on the sending/receiving ends or switches.
Thanks in advance!

I just realized that it was not Open vSwitch's fault. I tested with a UDP stream and packet count was correct. I also used tcpdump to capture inbound TCP packets on the switches and the switch at the receiving end had more packets than that at the sending end. The result is consistent with that captured with Open vSwitch's flow counters. I guess I must have missed something important about TCP.

Related

Delay of incoming network package on Linux - How to analyse?

The problem is: Sometimes tcpdump sees that the receiving of a UDP packet is held back until the next incoming UDP packet, although the network tap device shows it goes without delay through the cable.
Scenary: My profinet stack on Linux (located in user space) has a cyclic connection where it receives and sends Profinet protocol packets every 4ms (via raw sockets). About every 30 ms it also receives UDP packets in another thread on a UDP socket and replies them immediately, according to that protocol. It's around 10% CPU load. Sometimes it seems such received UDP packets are stuck in the network driver. After 2 seconds the next UDP packet comes in and both, the missed UDP packet and that next one is received. There are no dropped packets.
My measuring:
I use tcpdump -i eth0 --time-stamp-precision=nano --time-stamp-type=adapter_unsynced -w /tmp/tcpdump.pcap to record the UDP traffic to a RAM disk file.
At the same time I use a network tap device to record the traffic.
Question:
How to find out where the delay comes from (or is it a known effect)?
(2. What does the timestamp (which tcpdump sets to each packet) tell me? I mean, which OSI layer refers it to, in other words: When is it taken?)
Topology: "embedded device with Linux and eth0" <---> tap-device <---> PLC. The program "tcpdump" is running on the embedded device. The tap device is listening on the cable. The actual Profinet connection is between PLC and embedded device. A PC is connected on the tap device to record what it is listening to.
Wireshark (via tap and tcpdump): see here (packet no. 3189 in tcpdump.pcap)
It was a bug in the freescale Fast Ethernet Driver (fec_main.c) which NXP has fixed by its awesome support now.
The actual answer (for the question "How to find out where the delay comes from?") is: One has to build a Linux with kernel tracing on, patch the driver code with kernel tracing and then analyse such tracing with the developer Linux tool trace-cmd. It's a very complicated thing but I'm very happy it is fixed now:
trace-cmd record -o /tmp/trace.dat -p function -l fec_enet_interrupt -l fec_enet_rx_napi -e 'fec:fec_rx_tp' tcpdump -i eth0 --time-stamp-precision=nano --time-stamp-type=adapter_unsynced -w /tmp/tcpdump.pcap

Fragmented UDP packet loss?

We have an application doing udp broadcast.
The packet size is mostly higher than the mtu so they will be fragmented.
tcpdump says the packets are all being received but the application doesn't get them all.
The whole stuff isn't happening at all if the mtu is set larger so there isn't fragmentation. (this is our workaround right now - but Germans don't like workarounds)
So it looks like fragmentation is the problem.
But I am not able to understand why and where the packets get lost.
The app developers say they can see the loss of the packets right at the socket they are picking them up. So their application isn't losing the packets.
My questions are:
Where is tcpdump monitoring on linux the device?
Are the packets there already reassembled or is this done later?
How can I debug this issue further?
tcpdump uses libpcap which gets copies of packets very early in the Linux network stack. IP fragment reassembly in the Linux network stack would happen after libpcap (and therefore after tcpdump). Save the pcap and view with Wireshark; it will have better analysis features and will help you find any missing IP fragments (if there are any).

How to filter packets seen on unnumbered eth then dump raw filtered stream out another eth without using iptables

I can capture packets using tcpdump OK as the source eth1 port is connected to a cisco switch span port, and filter using tcpdump options (at this stage interested in DNS packets to and from a particualar IP only). Rather than writing to a file, I want to simply dump the filtered raw (DNS) packets onto eth2 (which could be unnumbered or numbered). The reason for this is that a 3rd party needs access to the raw data, but I need to filter non-DNS traffic (otherwise I'd just let them connect to the switch span port).
Preferably I also want to run the process continuously. Is there an easy way to direct the tcpdump output to an unnumbered eth interface, or is there a better way of achieving this?

UDP Health Check

So we have an application that makes udp calls and sends packets. However, since responses are given for UDP calls, how could we ensure that the service is up and the port is open and that things are getting stored?
The only thought we have right now is to send in test packets and ensure they are getting saved out to the db.
So my over all question is, is there a better, easier way to ensure that our udp calls are succeeding?
On the listening host, you can validate that the port is open with netstat. For example, if your application uses UDP port 68, you could run:
# Grep for :<port> from netstat output.
$ netstat -lnu | grep :68
udp 0 0 0.0.0.0:68 0.0.0.0:*
You could also send some test data to your application, and then check your database to verify that the fixture data made it into your database. That doesn't mean it always will be, just that it's working at the time of the test.
Ultimately, the problem is that UDP packets are best-effort, and not guaranteed. So unless you can configure your logging platform to send some sort of acknowledgment after data is received and/or written, then you can't guarantee anything. The very nature of UDP is that it leaves acknowledgments (if any) to the application layer.
We took a different approach and we are checking to make sure the calls made it to the db. Its easy enough to query a table and ensure records are in there. If none recent, we know something is wrong. CodeGnome had a good idea, just not the route we went. Thanks!

Why is SNMP usually run over UDP and not TCP/IP?

This morning, there were big problems at work because an SNMP trap didn't "go through" because SNMP is run over UDP. I remember from the networking class in college that UDP isn't guaranteed delivery like TCP/IP. And Wikipedia says that SNMP can be run over TCP/IP, but UDP is more common.
I get that some of the advantages of UDP over TCP/IP are speed, broadcasting, and multicasting. But it seems to me that guaranteed delivery is more important for network monitoring than broadcasting ability. Particularly when there are serious high-security needs. One of my coworkers told me that UDP packets are the first to be dropped when traffic gets heavy. That is yet another reason to prefer TCP/IP over UDP for network monitoring (IMO).
So why does SNMP use UDP? I can't figure it out and can't find a good reason on Google either.
UDP is actually expected to work better than TCP in lossy networks (or congested networks). TCP is far better at transferring large quantities of data, but when the network fails it's more likely that UDP will get through. (in fact, I recently did a study testing this and it found that SNMP over UDP succeeded far better than SNMP over TCP in lossy networks when the UDP timeout was set properly). Generally, TCP starts behaving poorly at about 5% packet loss and becomes completely useless at 33% (ish) and UDP will still succeed (eventually).
So the right thing to do, as always, is pick the right tool for the right job. If you're doing routine monitoring of lots of data, you might consider TCP. But be prepared to fall back to UDP for fixing problems. Most stacks these days can actually use both TCP and UDP.
As for sending TRAPs, yes TRAPs are unreliable because they're not acknowledged. However, SNMP INFORMs are an acknowledged version of a SNMP TRAP. Thus if you want to know that the notification receiver got the message, please use INFORMs. Note that TCP does not solve this problem as it only provides layer 3 level notification that the message was received. There is no assurance that the notification receiver actually got it. SNMP INFORMs do application level acknowledgement and are much more trustworthy than assuming a TCP ack indicates they got it.
If systems sent SNMP traps via TCP they could block waiting for the packets to be ACKed if there was a problem getting the traffic to the receiver. If a lot of traps were generated, it could use up the available sockets on the system and the system would lock up. With UDP that is not an issue because it is stateless. A similar problem took out BitBucket in January although it was syslog protocol rather than SNMP--basically, they were inadvertently using syslog over TCP due to a configuration error, the syslog server went down, and all of the servers locked up waiting for the syslog server to ACK their packets. If SNMP traps were sent over TCP, a similar problem could occur.
http://blog.bitbucket.org/2012/01/12/follow-up-on-our-downtime-last-week/
Check out O'Reilly's writings on SNMP: https://library.oreilly.com/book/9780596008406/essential-snmp/18.xhtml
One advantage of using UDP for SNMP traps is that you can direct UDP to a broadcast address, and then field them with multiple management stations on that subnet.
The use of traps with SNMP is considered unreliable. You really should not be relying on traps.
SNMP was designed to be used as a request/response protocol. The protocol details are simple (hence the name, "simple network management protocol"). And UDP is a very simple transport. Try implementing TCP on your basic agent - it's considerably more complex than a simple agent coded using UDP.
SNMP get/getnext operations have a retry mechanism - if a response is not received within timeout then the same request is sent up to a maximum number of tries.
Usually, when you're doing SNMP, you're on a company network, you're not doing this over the long haul. UDP can be more efficient. Let's look at (a gross oversimplification of) the conversation via TCP, then via UDP...
TCP version:
client sends SYN to server
server sends SYN/ACK to client
client sends ACK to server - socket is now established
client sends DATA to server
server sends ACK to client
server sends RESPONSE to client
client sends ACK to server
client sends FIN to server
server sends FIN/ACK to client
client sends ACK to server - socket is torn down
UDP version:
client sends request to server
server sends response to client
generally, the UDP version succeeds since it's on the same subnet, or not far away (i.e. on the company network).
However, if there is a problem with either the initial request or the response, it's up to the app to decide. A. can we get by with a missed packet? if so, who cares, just move on. B. do we need to make sure the message is sent? simple, just redo the whole thing... client sends request to server, server sends response to client. The application can provide a number just in case the recipient of the message receives both messages, he knows it's really the same message being sent again.
This same technique is why DNS is done over UDP. It's much lighter weight and generally it works the first time because you are supposed to be near your DNS resolver.