How do I target the MSS value in a TCP packet using BPF - iptables

I am learning BPF and converting some iptables rules to BPF bytcode. I am primarily using the nfbpf_compile application to do this, rather than trying to write C or Assembler. I am having a lot of luck but the syntax of one rule is escaping me.
I'd like to drop packets with the syn flag set that is also missing an MSS value. In iptables the MSS is targetted with --tcp-option 2. I know that MSS is in the TCP options that start at byte 22 of the TCP packet, and MSS is 'kind' 2. I am able to filter the MSS by using tcp[22:2]==$NUMBER in BPF syntax. However, what I want to do is target SYN packets where the MSS is missing entirely.
I have tried every variant of "null" I can think of but am having no luck.
Does anyone know the equivalent of iptables ! --tcp-option 2 in BPF syntax?
An example of something I have tried:
$ ./nfbpf_compile RAW 'tcp[22:2]==0x0' (I know this won't work..it's an example)
12,48 0 0 0,84 0 0 240,21 0 8 64,48 0 0 9,21 0 6 6,40 0 0 6,69 4 0 8191,177 0 0 0,72 0 0 22,21 0 1 0,6 0 0 65535,6 0 0 0
# iptables -I INPUT -m bpf --bytecode '12,48 0 0 0,84 0 0 240,21 0 8 64,48 0 0 9,21 0 6 6,40 0 0 6,69 4 0 8191,177 0 0 0,72 0 0 22,21 0 1 0,6 0 0 65535,6 0 0 0
' -j DROP

TL;DR If you know there are only 0 or 1 TCP options, or if you know the MSS option is always the first option, then you can use the following filter:
tcp && (tcp[tcpflags] == tcp-syn) && ((((tcp[12] & 0xf0) >> 2) < 21) || tcp[20] != 2)
If you don't know this (there are several TCP options and the MSS option may be any of them), which is generally the case, I don't think it's possible to express a matching filter with nfbpf_compile's syntax. In that case, I would recommend writing a C program and loading it with -m bpf --object-pinned /path/to/pinned/bpf.
Let me explain the above filter first. You have two cases to match: 1) there is no TCP option or 2) the first TCP option is not the MSS:
tcp[12] & 0xf0 extracts the data offset field from the TCP header, i.e., the number of 32bit word in the TCP header.
(tcp[12] & 0xf0) >> 2 multiplies this by 4 to get the number of bytes.
If you have less than 21 bytes in your TCP header, then you know there are no TCP options.
tcp[20] != 2 checks that the Option-Kind field of the first TCP option (starts at offset 20) is not 2, the Option-Kind for MSS.
Why is the general case harder to match? TCP options have a variable length (depending on their Option-Kind) and there is a variable, bounded number of TCP options. Say you want to extend the above filter to match on the second TCP option. You first need to know where that option starts; the first option has a variable length so this is not a fixed offset.
With cBPF (the BPF bytecode emitted by nfbpf_compile), you might be able to express that by storing the current option offset in the X register and then loading a byte into register A with the 2nd addressing mode (see the Linux documentation, BPF engine and instruction set). However, I do not think you can do this with the limited nfbpf_compile syntax (assuming it's the same syntax as tcpdump's).

Related

ZFS: Unable to expand pool after increasing disk size in vmware

I have a Centos7 VM with ZFS on linux installed.
The VM has a disk /dev/sdb, that I've added to a pool named 'backup', and in this pool created a dataset.
Now, I wanted to increase the size of the disk in VMware, and then expand the size of the pool, but I'm not getting this to work.
I've tried 'zpool online -e backup sdb', but nothing changes.
I've tried running 'partprobe /dev/sdb' before and after the live above, but nothing changes.
I've tried rebooting + the above, nothing changes.
I've tried "parted /dev/sdb",resizing the partition (it suggests the actual new size of the volume), and then all of the above. But nothing changes
I've tried 'zpool export backup' + 'zpool import backup' in various combinations with all of the above. No luck
And also: 'lsblk' and 'df -h' reports the old/wrong size of /dev/sdb, even if parted seems to understand that it has been increased.
PS: autoexpand=on
What to do?
I faced a similar issue today and had to try a lot before finding the solution.
When I tried the known solutions (using zpool) of setting autoexpand as on and also restarting the partprobe, system would not auto expand (even after a restart).
Finally, I could solve it using parted instead of getting into zpool at all.
We need to be careful here since wrong partition selections can cause data loss.
What worked for me in your situation
Step 1: Find which pool you are trying to expand. In my case, it is 5 as seen below (unallocated space is after this pool). Use parted -l
parted -l
Output
Model: VMware, VMware Virtual S (scsi)
Disk /dev/sda: 69.8GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 2097kB 1049kB bios_grub
2 2097kB 540MB 538MB fat32 EFI System Partition boot, esp
3 540MB 2009MB 1469MB swap
4 2009MB 3592MB 1583MB zfs
5 3592MB 32.2GB 28.6GB zfs
Step 2: Instructing explictly to expany pool number 5 to 100% available. Note that '5' is not static. You need to use the pool id you wish to expand. Double-check this. Use parted /dev/XXX resizepart YY 100%
parted /dev/sda resizepart 5 100%
After this, I was able to use the entire space in VM.
For reference:
LSBSK Before
sda 8:0 0 65G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 513M 0 part /boot/grub
│ /boot/efi
├─sda3 8:3 0 1.4G 0 part
│ └─cryptoswap 253:1 0 1.4G 0 crypt [SWAP]
├─sda4 8:4 0 1.5G 0 part
└─sda5 8:5 0 29.5G 0 part
LSBSK After
sda 8:0 0 65G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 513M 0 part /boot/grub
│ /boot/efi
├─sda3 8:3 0 1.4G 0 part
│ └─cryptoswap 253:1 0 1.4G 0 crypt [SWAP]
├─sda4 8:4 0 1.5G 0 part
└─sda5 8:5 0 61.7G 0 part

How to set up RSS hash fuction in XL710 to receive IPv4 flow type?

In DPKD the ETH_RSS_IPV4 data flow is not activated by default for XL710 Intel NIC. So, when you want to distribute packets among lcores you have to select other IPv4 data flows which are supported by XL710, namely ETH_RSS_FRAG_IPV4, ETH_RSS_NONFRAG_IPV4_TCP, ETH_RSS_NONFRAG_IPV4_UDP, ETH_RSS_NONFRAG_IPV4_SCTP, and ETH_RSS_NONFRAG_IPV4_OTHER. However you will face a silly problem when you are dealing with the fragmented IP packets. If you choose to go with ETH_RSS_FRAG_IPV4 and ETH_RSS_NONFRAG_IPV4_TCP options then some fragmented packets of a connection will fall into another queue, because they don't have L4 port numbers. If you exclude ETH_RSS_NONFRAG_IPV4_TCP function then the ETH_RSS_FRAG_IPV4 hash function will not be applied to non-fragmented packets and those packets will go to queue 0. All other combination of hash functions will not work. So, what should we do?
The behavior of XL710 is not compatible with the conventions in DPDK. So, you must directly work with the API offered by i40e driver in order to set up RSS for ETH_RSS_IPV4. As mentioned in the Intel® Ethernet Controller 710 Series Specification Update, page 18 (release Jan 2017):
Functions that require the Hash (RSS) filters on IPv4 packets should
set all IPv4 PCTYPEs in the PFQF_HENA / VFQF_HENA (PCTYPEs 31, 33…36)
Supported packet types (PCTYPE) are mentioned in Intel® Ethernet Controller 710 Series Datasheet pages 597 and 598 (release Jan 2017). You can see that there is no packet type defined for IPv4.
However there is a solution. The clue is to modify the input set for all required flow types (or packet types). Let's try it with testpmd tool which is provided by DPDK in app folder. After compiling DPDK and the app, run the testpmd application:
./app/test-pmd/testpmd -c ff -n 2 -w 0a:00.0 -w 0a:00.1 -- -i --rxq=4 --txq=4
We have two XL710 in our system. With the following commands you can configure XL710 to behave as you want to support IPv4 data flow.
port config all rss all
set_hash_input_set 0 ipv4-tcp src-ipv4 select
set_hash_input_set 0 ipv4-tcp dst-ipv4 add
set_hash_input_set 0 ipv4-udp src-ipv4 select
set_hash_input_set 0 ipv4-udp dst-ipv4 add
set_hash_input_set 1 ipv4-tcp src-ipv4 select
set_hash_input_set 1 ipv4-tcp dst-ipv4 add
set_hash_input_set 1 ipv4-udp src-ipv4 select
set_hash_input_set 1 ipv4-udp dst-ipv4 add
set_hash_global_config 0 default ipv4-frag enable
set_hash_global_config 0 default ipv4-tcp enable
set_hash_global_config 0 default ipv4-udp enable
set_hash_global_config 1 default ipv4-frag enable
set_hash_global_config 1 default ipv4-tcp enable
set_hash_global_config 1 default ipv4-udp enable
It selects the proper input set for TCP and UDP flow types by removing the L4 port section. The set_hash_global_config command enables the symmetric hash if you need it. By modifying the TCP input set, it behaves just like Frag IPv4 flow type and as a result all packets belonging to the same connection go to the same lcore.
Note that the default input set for Frag IPv4 and NonFIPv4, Other is IP4-S and IP4-D. So it doesn't need to be modified. Remember to modify all other IPv4 flows input set and symmetric quality of them.
You can find the API functions of those commands by looking at the source code of the testpmd application.

LVS: All connections are InActConn

All connections are InActConn
I'm a newbie using LVS. I've tried LVS/TUN and LVS/DR, the result is the same, all connections are InActConn. But the realservers can be reach (through PING). Pls help!!!
OS: CentOS 6.2
RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP 192.168.10.240:2345 rr
-> 192.168.10.251:2345 Tunnel 1 0 10
-> 192.168.10.252:2345 Tunnel 1 0 9
-> 192.168.10.253:2345 Tunnel 1 0 9
This is the expected behavior for services not maintaining connections, like UDP. You may want to read the LVS Howto, especially the part about Active/Inactive connections :
http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.ipvsadm.html#ActiveConn
Old Question : But I got to this post from Google and want to paste my findings here.
In the above answer, the link pasted by #remi-ggacogne missed 1 step for Real server.
You have to turn rp_filter off (esp. in Centos / RHEL ) https://www.slashroot.in/linux-kernel-rpfilter-settings-reverse-path-filtering
Open /etc/sysctl.conf and paste below lines ( as per your network interface )
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.tunl0.rp_filter = 0
To make the above active -->
$systcl -p

size of ICMP type 11 packet payload

What's the size of the ICMP packet payload when the type is 11, i.e. time exceeded?
Since it contains an IP header and the first 8 Bytes of the IP packet payload generating the ICMP message, I thought its size was 20 + 8 = 28.
I'm replaying some common user traffic with TTL=1. In the ICMP messages I have dumped I noticed that:
all ICMP packets generated by UDP packets have payload of size 28 Bytes
all those generated by TCP packets have payload of size 40 Bytes
Since I need to match ICMP time-exceeded messages with the packets that triggered them by comparing those bytes, this piece of information is essential, but I can't find figure out why this happens.
The problem is that you're quoting the 8-byte header payload from RFC 792, Page 4, but the requirements were changed by RFC 1812...
Time Exceeded Message (in RFC 792)
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unused |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Internet Header + 64 bits of Original Data Datagram |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
RFC 1812, Section 4.3.2.3 dramatically increases the allowable payload in an ICMP Error message (emphasis mine):
4.3.2.3 Original Message Header
Historically, every ICMP error message has included the Internet
header and at least the first 8 data bytes of the datagram that
triggered the error. This is no longer adequate, due to the use of
IP-in-IP tunneling and other technologies. Therefore, the ICMP
datagram SHOULD contain as much of the original datagram as possible
without the length of the ICMP datagram exceeding 576 bytes. The
returned IP header (and user data) MUST be identical to that which
was received, except that the router is not required to undo any
modifications to the IP header that are normally performed in
forwarding that were performed before the error was detected (e.g.,
decrementing the TTL, or updating options).
The ICMP Errors you're generating from Scapy packets should contain all the information from the IP and TCP layers of the original packet.
As you noted, the ICMP payload is the IP header plus 8 octets of the original packet's payload. IP headers, however, are not always 20 octets long; 20 is only the minimum. The IP header itself may contain options, and the header length is indicated by the value in the IHL field of the header. See sec 3.1 of RFC 791. So it looks like the TCP packets have 12 additional octets of options in their IP headers. RFC 791 defines some standard options such as source routing and timestamping. You'll have to decode the header to determine what options are being used.
I would like to add for future reference that not only do ICMP payloads vary in size as Mike said, they might also be longer than 128 Bytes in the case of ICMP extensions for MPLS. See this draft for more information

How to tell an IP address with 4 LEDs?

I am developing a net-managed device with the .NET Micro Framework. Since the idea is to have a bunch of devices in an office, sometimes it is necessary for the user to know the IP address of a specific device.
So I've been trying to come with ideas on how to indicate the IP address the user. The only user interface is 4 LED lights that I can blink on and off at varying speeds.
So far, the best idea I could come up with is this: seeing how the IP address has 4 parts and I have 4 LEDs, it would make sense that each LED be responsible for a single IP address part.
So for address like 192.168.0.34, I'd have LED1 blink once, then pause, then blink 9 times, pause, then blink 2 times. The action would then shift to the LED2, which would blink out 168 in a similar manner and so on. Number 0 would be indicated by blinking really fast for half a second.
Any other ideas?
Use all 4 displays at once for each number, showing it in binary. Blink all 4 really fast for a 0, light all 4 longer to denote a point.
[ ] [ ] [ ] [x] # 1
[x] [ ] [ ] [x] # 9
[ ] [ ] [x] [ ] # 2
[x] [x] [x] [x] # . (long)
[ ] [ ] [ ] [x] # 1
[ ] [x] [x] [ ] # 6
[x] [ ] [ ] [ ] # 8
[x] [x] [x] [x] # . (long)
[x] [x] [x] [x] # 0 (short)
Alternatively you can use an un-used number (ie: 10) to denote 0
[ ] [ ] [ ] [x] # 1
[x] [ ] [ ] [x] # 9
[ ] [ ] [x] [ ] # 2
[x] [x] [x] [x] # .
[ ] [ ] [ ] [x] # 1
[ ] [x] [x] [ ] # 6
[x] [ ] [ ] [ ] # 8
[x] [x] [x] [x] # .
[x] [ ] [x] [ ] # 0
Having a lookup table ready by the device should be enough for those who don't know binary.
I'd do the reverse. From a control station, I would bring up a list of all IPs used by my devices. I'd then select one to start blinking in a pattern that would be easy to recognize (like 1 2 3 4 over and over) until shut off. That way I could ask everybody who's LEDs are blinking like that and know what device owned that IP.
I'd then write the IP on the bottom of the device in magic marker. There's an amazing amount of bandwidth in a sharpie.
Provide a well-mounted cord for the user to swing the device around in the air like a lasso
Then flash the LEDs like a propeller clock
(source: embedds.com)
You might also consider binary, displaying a single digit at a time. But this would require the user to know (or take a crash course on) binary.
9: 1 0 0 1
8: 1 0 0 0
7: 0 1 1 1
6: 0 1 1 0
5: 0 1 0 1
4: 0 1 0 0
3: 0 0 1 1
2: 0 0 1 0
1: 0 0 0 1
0: 0 0 0 0
To indicate the decimal point, you could show 1 1 1 1. It would be ideal if you had a button or some form of user interaction so that you could iterate through the digits.
You could translate the number to HEX and print off the hex representation in binary.
F: 1 1 1 1
E: 1 1 1 0
D: 1 1 0 1
C: 1 1 0 0
B: 1 0 1 1
A: 1 0 1 0
9: 1 0 0 1
8: 1 0 0 0
7: 0 1 1 1
6: 0 1 1 0
5: 0 1 0 1
4: 0 1 0 0
3: 0 0 1 1
2: 0 0 1 0
1: 0 0 0 1
0: 0 0 0 0
192.168.0.34 becomes C0.A8.00.22. Very similar to the solution put forth by #JYelton, just taken a step further to reduce the amount of work an individual needs to do to read the message out of the LEDs. Still require a bit of translation though because you have to go from hex to decimal again (standard calculator is an easy/handy tool).
I'm thinking outside the box.. but one of the biggest complaints I see here is the translation. What about an app that takes a video (recording or prerecorded) and does the interpretation? This reminds me of iphone apps that can read upc codes.
Alternatively, but along the same thought, what about a parallel port or usb?
Why don't you get an external LCD screen... no teaching users binary and you can display loads more information. If you provide me with which micro framework device you are using I may be able to provide more detailed help.
LCDs - SparkFun <= good products and service
LCDs - Jameco
LCDs - Mouser
Could do it the way that the pulse dialing worked in the phone system. Basically one blink is zero, and it counts up from there.
1 = ** (2 blinks)
9 = ********** (10 blinks)
2 = *** (3 blinks)
Long Blink
1 = ** (2 blinks)
6 = ******* (7 blinks)
8 = ********* (9 blinks)
Long Blink
0 = * (1 blink)
0 = * (1 blink)
1 = ** (2 blinks)
Long Blink
2 = *** (3 blinks)
4 = ***** (5 blinks)
1 = ** (2 blinks)
Depending on how geeky your users are, you could also use:
Morse code
Display IP as a sequence of digits in binary
...
if it's DHCP, and they can access a list of the devices ip addresses on a computer next to the devices' MAC addresses, you could write the MAC address on each device and then they'd be able to tell which device had which IP.
If you think MAC addresses would be too un user friendly then you could have a table of the MAC addresses with a short description or the name of the devices.
Even more, you could write a program that got the list of ip addresses next to MAC addresses and matched it up with the table of device names next to MAC addresses.
If you replace one of the leds with an IR led, you can write an app for a cell phone IR sensor that decodes and displays the binary pattern for the IP address.
What about Broadcasting UDP packets and using a winforms app to listen for those packets. If you have multiple of these devices, the following might work.
Open Windows Client that is listening on the correct port.
Reset device or push a button on it to activate the UDP Broadcast.
Maybe combine the LED's flashing actively on that unit for 1 minute.
The Windows Client would then receive the IP Address and any other status information.
There may be an option here to set a unique ID (1-16 binary) in the device that is displayed on the device and broadcast with the IP Information. (??DIP Switches??)
This gets away from having users interpreting binary flashing of the LED's.
So device 1010 shows it's LED's and the output in the Windows App shows
On, Off, On, Off = 192.168.0.150
If you got fancy with this using Images of an LED On and Off would be even better.
I'm in a similar situation and haven't tested these theories yet.
Well, does the IP address need to be interpreted by a machine or by a human? Because your suggestion is using decimal digits, which is wonderful for humans but very complicated for computers to understand.
IP addresses are actually just a 32-bit binary number. The IP 192.168.0.34 is seen by the router (and broadcasted across the internet) as 11000000 10101000 00000000 00100010
If you're having a computer or other hardware device interpret the IP address, I suggest just using binary. You could have one light which displays the next digit, and another which toggles a "ready" light to indicate that the digit is in fact the next one and not a repetition of the previous one. This would only require 2 LEDs, and you would essentially display the aforementioned address like so:
on on,
on off,
off on,
off off,
off on,
off off,
off on,
off off,
on on,
off off,
on on,
off off,
on on,
off off,
off on,
off off,
etc.
Make sure the second bit has toggled before reading the first bit, otherwise you could read the same number twice.
If you want to display it using four LEDs for human interpretation then having the LEDs blinks according to digit might be difficult since humans have trouble counting 4 numbers simultaneously. It may be easier if you just went through all the digits 1, 9, 2, 1, 6, 8, 0, 0, 0, 0, 3, 4 (3 digits per number) and displayed these in binary using all four LEDs.
off off off on,
on off off on,
off off on off,
etc.
With a pause in between each one.
Adding an LCD display would work really well, but would add a lot to the cost. However, what about using 8 LEDs instead of 4? If you purchase the 8 LEDs in the form of a 7-segment LED display with decimal point, it might not cost much more than 4 discrete LEDs, but it would let you display the decimal digits of the IP address sequentially. No complicated translation scheme for the users to master.
It depends on your environment, but I'd not display an entire IP address, just the component that is relevant, and map that itself to a single 4-bit number. This assumes you only need to uniquely identify < 2^4 entities. If you need more, then just use more LEDs (if possible).
In this way, you'd only need to indicate a local mapping, which could then be used to look the actual IP address up via a local internal website. You can use the typical binary strategy that's been described in this thread already to have the LEDs flash out a 4bit number, and it should be pretty easy to train people on (which appropriate labeling on the device).