Boost ASIO iostream random delays on reading - boost-asio

I have a client talking to server with TCP via localhost. The server uses Boost ASIO iostream in blocking mode. It accepts the incoming connections, reads the request, sends response and closes the socket. The problem is - sometimes server have a random delay for 10-200 milliseconds on the first read via getline. I've set TCP_NODELAY flag on both server's and client's socket. What can be the reason for this delays? I know, that i should use select before reading from socket, but i expected that there shouldn't be such a great delay via localhost.
Here is the relevant part of server's code:
asio::io_service io_service;
ip::tcp::endpoint endpoint(bindAddress, 80);
ip::tcp::acceptor acceptor(io_service, endpoint);
for(;;)
{
ip::tcp::iostream stream;
acceptor.accept(*stream.rdbuf(), peer);
ip::tcp::no_delay no_delay(true);
stream.rdbuf()->set_option(no_delay);
string str;
getline(stream, str); // at this line i get random delays
//the main part of code
}
I have around 200 requests/second, delay happens several times per minute.
netstat -m shows, that there is enough buffers.
UPDATE:
It looks like the problem of client, not server: Apache HttpClient random delays under high requests/second

Answering this question for the sake of closing it out.
Apache HttpClient random delays under high requests/second
Apache's ab(1) also has "saw tooth"-like performance because it dispatches -c connections that it monitors via select(2), then once all connections have returned, it will dispatch another -c connections. The alternate (and better) approach would be to establish a new connection and readd the file-descriptor to ab(1)'s select(2) array to make sure -c connections are always active processing.
I've seen ab(1) give some very misleading results because one connection out of a thousand hung (still not a good thing, but it skews results very negatively when using it through a load balancer).

Related

UDP server and connected sockets

[edit]
Seems my question was asked nearly 10 years ago here...
Emulating accept() for UDP (timing-issue in setting up demultiplexed UDP sockets)
...with no clean and scalable solution. I think this could be solved handily by supporting listen() and accept() for UDP, just as connect() is now.
[/edit]
In a followup to this question...
Can you bind() and connect() both ends of a UDP connection
...is there any mechanism to simultaneously bind() and connect()?
The reason I ask is that a multi-threaded UDP server may wish to move a new "session" to its own descriptor for scalability purposes. The intent is to prevent the listener descriptor from becoming a bottleneck, similar to the rationale behind SO_REUSEPORT.
However, a bind() call with a new descriptor will take over the port from the listener descriptor until the connect() call is made. That provides a window of opportunity, albeit briefly, for ingress datagrams to get delivered to the new descriptor queue.
This window is also a problem for UDP servers wanting to employ DTLS. It's recoverable if the clients retry, but not having to would be preferable.
connect() on UDP does not provide connection demultiplexing.
connect() does two things:
Sets a default address for transmit functions that don't accept a destination address (send(), write(), etc)
Sets a filter on incoming datagrams.
It's important to note that the incoming filter simply discards datagrams that do not match. It does not forward them elsewhere. If there are multiple UDP sockets bound to the same address, some OSes will pick one (maybe random, maybe last created) for each datagram (demultiplexing is totally broken) and some will deliver all datagrams to all of them (demultiplexing succeeds but is incredibly inefficient). Both of these are "the wrong thing". Even an OS that lets you pick between the two behaviors via a socket option is still doing things differently from the way you wanted. The time between bind() and connect() is just the smallest piece of this puzzle of unwanted behavior.
To handle UDP with multiple peers, use a single socket in connectionless mode. To have multiple threads processing received packets in parallel, you can either
call recvfrom on multiple threads which process the data (this works because datagram sockets preserve message boundaries, you'd never do this with a stream socket such as TCP), or
call recvfrom on a single thread, which doesn't do any processing, just queues the message to the thread responsible for processing it.
Even if you had an OS that gave you an option for dispatching incoming UDP based on designated peer addresses (connection emulation), doing that dispatching inside the OS is still not going to be any more efficient than doing it in the server application, and a user-space dispatcher tuned for your traffic patterns is probably going to perform substantially better than a one-size-fits-all dispatcher provided by the OS.
For example, a DNS (DHCP) server is going to transact with a lot of different hosts, nearly all running on port 53 (67-68) at the remote end. So hashing based on the remote port would be useless, you need to hash on the host. Conversely, a cache server supporting a web application server cluster is going to transact with a handful of hosts, and a large number of different ports. Here hashing on remote port will be better.
Do the connection association yourself, don't use socket connection emulation.
The issue you described is the one I encountered some time ago doing TCP-like listen/accept mechanism for UDP.
In my case the solution (which turned out to be bad as I will describe later) was to create one UDP socket to receive any incoming datagrams and when one arrives making this particular socket connected to sender (via recvfrom() with MSG_PEEK and connect()) and returning it to new thread. Moreover, new not connected UDP socket was created for next incoming datagrams. This way the new thread (and dedicated socket) did recv() on the socket and was handling only this particular channel from now on, while the main one was waiting for new datagrams coming from other peers.
Everything had worked well until the incoming datagram rate was higher. The problem was that while the main socket was transitioning to connected state, it was buffering not one but a few more datagrams (coming from many peers) and thus thread created to handle the particular sender was reading in effect a few more datagrams not intended to it.
I could not find solution (e.g. creating new connected socket (instead connecting the main one) and pass the received datagram on main socket to its receive buffer for futher recv()). Eventually, I ended up with N threads, each one having one "listening" socket (with use of SO_REUSEPORT) with datagram scattering done on OS level.

What is the correct method to receive UDP data from several clients synchronously?

I have 1 server and several (maybe up to 20) clients. All clients are sending UDP datagram at random time. Each datagram is quite short (about 10B), but I must make sure all the data from each client is received correctly.
If I let all clients send datagram to the same port, and client B sends it datagram at the exact time when the server is receiving data from client A, it seems the server will miss the data from client A.
So what's the correct method to do this job? Do I need to create a listener for each of the 20 clients?
When you bind a UDP socket to a port, the networking stack will allocate a buffer for a finite number of incoming UDP packets for you, so that (assuming you call recv() in a relatively timely manner), no incoming packets should get lost.
If you want see your buffer size in terminal, you can take a look at:
/proc/sys/net/core/rmem_default for recv
and
/proc/sys/net/core/wmem_default for send
I think the default buffer size on Linux is 131071B.
On Linux, you can change the UDP buffer size (e.g. to 26214400) by (as root):
sysctl -w net.core.rmem_max=26214400
You can also make it permanent by adding this line to /etc/sysctl.conf:
net.core.rmem_max=26214400
Since each packet is only 10B, shouldnt be a problem.
If you are still worried about packet loss you could implement a protocol where your client waits for a ACK from the server or it will resend. Many protocols use such a feature, but this is only possible if timing allows it. For example in streaming data it is not useful because there is no time to resend.
or consider using tcp ( if it is an option)

OpenSSL SSL_ERROR_WANT_WRITE never recovers during SSL_write()

I have two applications talking to each other over SSL. The client is running on a windows machine, the server is a linux based application. The client is sending a large amount of data to the server on startup. The data is sent in ~4000byte chunks over to the server that contains 30 entries. I have to send about 50000 entries over.
During that transmission the server sends a message to the client, the message size is ~4000bytes. After that happens, the SSL_write() on the client side begins to return error of SSL_ERROR_WANT_WRITE. The client sleeps for 10ms, and retries the SSL_write with the exact same parameters, however, the SSL_write fails infinitely. Subsequently it aborts. If it tries to send a new message, I get an error indicating I am not sending the same aborted message from earlier.
error:1409F07F:SSL routines:SSL3_WRITE_PENDING: bad write retryā€¯
The server eventually kills the connection since it has not heard from the client for 60s and re-establishes a new one. This is just an FYI, the real issue is how can I get SSL_write to resume.
If the server does not send a request during the receive the problem goes away. If I shrink the size of the request from 16K to 100 bytes the problem does not happen.
The SSL CTX MODE is set to SSL_MODE_AUTO_RETRY and SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER.
Does anyone have an idea what might cause a simultaneous transmission from both sides with large information can cause this failure. What can I do to prevent it if this is a limitation other than capping the size that goes out from the server to the client. My concern is that if the client is not sending anything the throttling I applied to avoid this issue is a waste.
On the client side I tried to perform an SSL_read to see if I need to read during a write even though I never receive an SSL_ERROR_PENDING_READ, but the buffer is not that big anyway. ~1000bytes in size.
Any insight on this would be appreciated.
SSL_ERROR_WANT_WRITE - This error is returned by OpenSSL (I am assuming you are using OpenSSL) only when socket send gives it an EWOULDBLOCK or EAGAIN error. The socket send will give a EWOUDLBLOCK error when the send side buffer is full, which in turn means that your Server is not reading the messages sent from Client.
So, essentially, the problem lies with your Server which is not reading the messages sent to it. You need to check your server and fix it, which will automatically fix your client problem.
Also, why have you set the option "SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER"? SSL always expects that the record which it is trying to send should be sent completely before the next record can be sent.
As it turns out that with both the client and server side app, the read and writes are processed in one thread. In a perfect storm as I described above, the client is busy writing (non blocking). The server then decides to do a write a large set of messages of its own in between processing its rx buffers. The server tx is a blocking call. The server gets stuck writing, starves the read, the buffers fill up and we have a deadlock scenario.
The default windows buffer is 8k bytes so it doesn't take much to fill it up.
The architecture should be such that there is a separate thread for the rx and tx processing on both sides. As a short cut/term fix, once can increase the rx buffers and rate limit the tx side to prevent the deadlock.

tcp stream replay tool

I'm looking for a tool for recording and replaying one side of a TCP stream for testing.
I see tools which record the entire TCP stream (both server and client) for testing firewalls and such, but what I'm looking for is a tool which would record just the traffic submitted by the client (with timing information) and then resubmit it to the server for testing.
Due to the way that TCP handles retransmissions, sequence numbers, SACK and windowing this could be a more difficult task than you imagine.
Typically people use tcpreplay for packet replay; however, it doesn't support synchronizing TCP sequence numbers. Since you need to have a bidirectional TCP stream, (and this requires synchronization of seq numbering) use one of the following options:
If this is a very interactive client / server protocol, you could use scapy to strip out the TCP contents of your stream, parse for timing and interactivity. Next use this information, open a new TCP socket to your server and deserialize that data into the new TCP socket. Parsing the original stream with scapy could be tricky, if you run into TCP retransmissions and windowing dynamics. Writing the bytes into a new TCP socket will not require dealing with sequence numbering yourself... the OS will take care of that.
If this is a simple stream and you could do without timing (or want to insert timing information manually), you can use wireshark to get the raw bytes from a TCP steam without worrying about parsing with scapy. After you have the raw bytes, write these bytes into a new TCP socket (accounting for interactivity as required). Writing the bytes into a new TCP socket will not require dealing with sequence numbering yourself... the OS will take care of that.
If your stream is strictly text (but not html or xml) commands, such as a telnet session, an Expect-like solution could be easier than the aforementioned parsing. In this solution, you would not open a TCP socket directly from your code, using expect to spawn a telnet (or whatever) session and replay the text commands with send / expect. Your expect library / underlying OS would take care of seq numbering.
If you're testing a web service, I suspect it would be much easier to simulate a real web client clicking through links with Selenium or Splinter. Your http library / underlying OS would take care of seq numbering in the new stream.
Take a look at WirePlay code.google.com/p/wireplay or github.com/abhisek/wireplay which promises to replay either client or server side of a captured TCP session with modification of all the SYN/ACK sequence numbers as required.
I don't know if there are any binary builds available, you'll need to compile it yourself.
Note I have not tried this myself yet, but am looking into it.
Yes, it is a difficult task to implement such a tool.
I started to implement this kind of tool two years ago and the tool is mature now.
Try it and maybe you will find that it is the tool that you are looking for.
https://github.com/wangbin579/tcpcopy
I wanted something similar so I worked with scapy for a bit and came up with a solution that worked for me. My goal was to replay the client portion of a captured pcap file. I was interested in getting responses from the server - not necessarily with timings. Below is my scapy solution - it is by no means tested or complete but it did what I wanted it to do. Hopefully it's a good example of how to replay a TCP stream using scapy.
from scapy.all import *
import sys
#NOTE - This script assumes that there is only 1 TCP stream in the PCAP file and that
# you wish to replay the role of the client
#acks
ACK = 0x10
#client closing the connection
RSTACK = 0x14
def replay(infile, inface):
recvSeqNum = 0
first = True
targetIp = None
#send will put the correct src ip and mac in
#this assumes that the client portion of the stream is being replayed
for p in rdpcap(infile):
if 'IP' in p and 'TCP' in p:
ip = p[IP]
eth = p[Ether]
tcp = p[TCP]
if targetIp == None:
#figure out the target ip we're interested in
targetIp = ip.dst
print(targetIp)
elif ip.dst != targetIp:
# don't replay a packet that isn't to our target ip
continue
# delete checksums so that they are recalculated
del ip.chksum
del tcp.chksum
if tcp.flags == ACK or tcp.flags == RSTACK:
tcp.ack = recvSeqNum+1
if first or tcp.flags == RSTACK:
# don't expect a response from these
sendp(p, iface=inface)
first=False
continue
rcv = srp1(p, iface=inface)
recvSeqNum = rcv[TCP].seq
def printUsage(prog):
print("%s <pcapPath> <interface>" % prog)
if __name__ == "__main__":
if 3 != len(sys.argv):
printUsage(sys.argv[0])
exit(1)
replay(sys.argv[1], sys.argv[2])
Record a packet capture of the full TCP client/server communication. Then, you can use tcpliveplay to replay just the client side of the communication to a real server. tcpliveplay will generate new sequence numbers, IP addresses, MAC addresses, etc, so the communication will flow properly.

how to timeout periodically in libpcap packet receiving functions

I found this post in stackoverflow.com
listening using Pcap with timeout
I am facing a similar (but different) problem: what is the GENERIC (platform-independent) method to timeout periodically when receiving captured packets by using libpcap packet receiving functions?
Actually, I am wondering if it is possible to periodically timeout from the pcap_dispatch(pcap_t...) / pcap_next_ex(pcap_t...)? If that is possible, I can use them just like using the classic select(...timeout) function ( http://linux.die.net/man/2/select ).
In addition, from the official webpage ( http://www.tcpdump.org/pcap3_man.html ), I found the original timeout mechanism is considered buggy and platform-specific (This is bad, since my program may run on different Linux and Unix boxes):
"... ... to_ms specifies the read timeout in milliseconds. The read timeout is used to arrange that the read not necessarily return immediately when a packet is seen, but that it wait for some amount of time to allow more packets to arrive and to read multiple packets from the OS kernel in one operation. Not all platforms support a read timeout; on platforms that don't, the read timeout is ignored ... ...
NOTE: when reading a live capture, pcap_dispatch() will not necessarily return when the read times out; on some platforms, the read timeout isn't supported, and, on other platforms, the timer doesn't start until at least one packet arrives. This means that the read timeout should NOT be used in, for example, an interactive application, to allow the packet capture loop to "poll" for user input periodically, as there's no guarantee that pcap_dispatch() will return after the timeout expires... ..."
Therefore, I guess I need to implement the GENERIC (platform-independent) timeout mechanism by myself like below?
create a pcap_t structure with pcap_open_live().
set it in nonblocking mode with pcap_setnonblock(pcap_t...).
poll this nonblocking pcap_t with registered OS timer like:
register OS timer_x, and reset timer_x;
while(1) {
if(timer_x times out)
{do something that need to be done periodically; reset timer_x;}
poll pcap_t by calling pcap_dispatch(pcap_t...)/pcap_next_ex(pcap_t...) to receive some packets;
do something with these packets;
}//end of while(1)
Regards,
DC
You can get the handle with pcap_fileno() and select() it.
There's a sample here in OfferReceiver::Listen().