Use of frame_max in RabbitMQ - rabbitmq

I've read about the frame_max for rabbitMQ, it said frame_max is "Maximum permissible size of a frame (in bytes) to negotiate with clients. Setting to 0 means "unlimited" but will trigger a bug in some QPid clients. Setting a larger value may improve throughput; setting a smaller value may improve latency."
Why the default value is 128 KB. I think in production environment, there is no case in which one wants to have high latency, then why the default value is set so much low. It can be set by default as very high, so as one can have high throughput always. Is there any harm in having high value by default ??? Also, beyond what value is the frame_max behaves as if it is zero i.e. unlimited, which can trigger a bug in QPid clients....

First, you shouldn't need to change this value.
Second, frame_max sets a size of chunks, a unit of multiplexing. It's used to avoid a situation when a single channel can saturate the whole connection. If you publish few big messages, on different AMQP channels, they will be multiplexed. Smaller messages at the same time will be able to move through.
Actually, better concurrency could be achieved by using multiple connections but that's a different story.

Related

WebRTC Datachannel for high bandwidth application

I want to send unidirectional streaming data over a WebRTC datachannel, and is looking of the best configuration options (high BW, low latency/jitter) and others' experience with expected bitrates in this kind of application.
My test program sends chunks of 2k, with a bufferedAmountLowThreshold event callback of 2k and calls send again until bufferedAmount exceeds 16k. Using this in Chrome, I achieve ~135Mbit/s on LAN and ~20Mbit/s from/to a remote connection, that has 100Mbit/s WAN connection on both ends.
What is the limiting factor here?
How can I see if the data is truly going peer to peer directly, or whether a TURN server is used?
My ultimate application will use the google-webrtc library on Android - I'm only using JS for prototyping. Can I set options to speed up bitrate in the library, that I cannot do in official JS APIs?
There are many variables that impact throughput and it also highly depends on how you've measured it. But I'll list a couple of things I have adjusted to increase the throughput of WebRTC data channels.
Disclaimer: I have not done these adjustments for libwebrtc but for my own WebRTC data channel library called RAWRTC, which btw also compiles for Android. However, both use the same SCTP library underneath, both use some OpenSSL-ish library and UDP sockets, so all of this should be appliable to libwebrtc.
Note that WebRTC data channel implementations using usrsctp are usually CPU bound when executed on the same machine, so keep that in mind when testing. With RAWRTC's default settings, I'm able to achieve ~520 Mbit/s on my i7 5820k. From my own tests, both Chrom(e|ium) and Firefox were able to achieve ~350 Mbit/s with default settings.
Alright, so let's dive into adjustments...
UDP Send/Receive Buffer Size
The default send/receive buffer of UDP sockets in Linux is quite small by default. If you can, you may want to adjust it.
DTLS Cipher Suites
Most Android devices have ARM processors without hardware AES support. ChaCha20 usually performs better in software and thus you may want to prefer it.
(This is what RAWRTC negotiates by default, so I have not included it in the end results.)
SCTP Send/Receive Buffer Size
The default send/receive window size of usrsctp, the SCTP stack used by libwebrtc, is 256 KiB which is way too small to achieve high throughput with moderate delay. The theoretical maximum throughput is limited by mbits = (window / (rtt_ms / 1000)) / 131072. So, with the default window of window=262144 and a fairly moderate RTT of rtt_ms=20, you will end up with a theoretical maximum of 100 Mbit/s.
The practical maximum is below that... actually, way lower than the theoretical maximum (see my test results). This may be a bug in the usrsctp stack (see sctplab/usrsctp#245).
The buffer size has been increased in Firefox (see bug 1051685) but not in libwebrtc used by Chrom(e|ium).
Release Builds
Optimisation level 3 makes a difference (duh!).
Message Size
You probably want to send 256 KiB sized messages.
Unless you need to support Chrome < ??? (sorry, I currently don't know where it landed...), then the maximum message size is 64 KiB (see issue 7774).
Unless you also need to support Firefox < 56, in which case the maximum message size is 16 KiB (see bug 979417).
It also depends on how much you send before you pause sending (i.e. the buffer's high water mark), and when you continue sending after the buffer has been drained (i.e. the buffer's low water mark). My tests have shown that targeting a high water mark of 1 MiB and setting a low water mark of 256 KiB results in adequate throughput.
This reduces the amount of API calls and can increase throughput.
End Results
Using optimisation level 3 with default settings on RAWRTC brought me up to ~600 Mbit/s.
Based on that, increasing the SCTP and UDP buffer sizes to 4 MiB brought me up further to ~700 Mbit/s, with one CPU core at 100% load.
However, I believe there is still room for improvements but it's unlikely to be low-hanging.
How can I see if the data is truly going peer to peer directly, or whether a TURN server is used?
Open about:webrtc in Firefox or chrome://webrtc-internals in Chrom(e|ium) and look for the chosen ICE candidate pair. Or use Wireshark.

Optimal Sizes of data for sends and receives in MPI

I am writing a parallel application with MPI in which the master process has data of size approximately as large as the cache(4MB on the platform I am working on) to send over to each process. As 4MB might be too large for the master to send at a time, it is necessary that it break the entire data into smaller chunks of a certain size suitable for sending and receiving.
My question is, Is there any suggestion on what should be the optimal size for sending and receiving each smaller chunk given the size of the entire data?
Thanks.
4MB won't be any problem for any MPI implementation out there; I'm not sure what you mean by "too large" though.
A rule of thumb is that, if you can easily send the data all in one message, that is usually faster -- the reason being that there is some finite amount of time required to send and receive any one message (the latency) that comes from the function calls, calls to the transport layer, etc. On top of that, there is some, usually close-to-fixed amount of time it takes to send any additional byte of data (which is one over the bandwidth.) That's only a very crude approximation to the real complexity of sending messages (especially large messages) between processors, but it's a very useful approximation. Within that model, the fewer messages you send, the better, because you incur the latency overhead fewer times.
The above is almost always true if you are contemplating sending many little messages; however, if you're talking about sending (say) 4 1MB messages vs 1 4MB messages, even under that model the difference may be small, and may be overwhelmed by other effects specific to your transport. If you want a more accurate assessment of how long things take for your platform, there's really no substitute for empirical measurement of how long things actually take. The best way would just be to try it in your code a few ways and see what is best. That's really the only definitive answer. A second method would be to take a look at MPI "microbenchmarks":
The Intel MPI Benchmarks (IMB)
The Ohio State University MPI Benchmarks (OSU)
both of the above include benchmarks of how long it takes to send and receive messages of various sizes; you compile the above with your MPI and you can simply read off how long it takes to send/receive (say) a 4MB message vs 4x 1MB messages and that may give you some clues as to how to proceed.

Suggestions to Increase tcp level throughput

we have an application requirement where we'll be receiving messages from around 5-10 clients at a rate of 500KB/sec and doing some internal logic then distrubuting the received messages among 30-35 other network entities.
What all the tcp level or thread level optimizations are suggested ?
Sometimes programmers can "shoot themselves in the foot". One example is attempting to increase a linux user-space application's socket buffer size with setsockopt/SO_RCVBUF. On recent Linux distributions, this deactivates auto-tuning of the receive window, leading to poorer performance than what would have been seen had we not pulled the trigger.
~4Mbits/sec (8 x 500KB/sec) per TCP connection is well within the capability of well written code without any special optimizations. This assumes, of course, that your target machine's clock rate is measured in GHz and isn't low on RAM.
When you get into the range of 60-80 Mbits/sec per TCP connection, then you begin to hit some bottlenecks that might need profiling and countermeasures.
So to answer your question, unless you're seeing trouble, no TCP or thread optimizations are suggested.

How can I calculate an optimal UDP packet size for a datastream?

Short radio link with a data source attached with a needed throughput of 1280 Kbps over IPv6 with a UDP Stop-and-wait protocol, no other clients or noticeable noise sources in the area. How on earth can I calculate what the best packet size is to minimise overhead?
UPDATE
I thought it would be an idea to show my working so far:
IPv6 has a 40 byte header, so including ACK responses, that's 80 bytes overhead per packet.
To meet the throughput requirement, 1280 K/p packets need to be sent a second, where p is the packet payload size.
So by my reckoning that means that the total overhead is (1280 K/p)*(80), and throwing that into Wolfram gives a function with no minima, so no 'optimal' value.
I did a lot more math trying to shoehorn bit error rate calculations into there but came up against the same thing; if there's no minima, how do I choose the optimal value?
Your best bet is to use a simulation framework for networks. This is a hard problem, and doesn't have an easy answer.
NS2 or SimPy can help you devise a discrete event simulation to find optimal conditions, if you know your model in terms of packet loss.
Always work with the largest packet size available on the network, then in deployment configure the network MTU for the most reliable setting.
Consider latency requirements, how is the payload being generated, do you need to wait for sufficient data before sending a packet or can you immediately send?
The radio channel is already optimized for noise as the low packet level, you will usually have other demands of the implementation such as power requirements: sending in heavy batches or light continuous load.

Simple robust error correction for transmission of ascii over serial (RS485)

I have a very low speed data connection over serial (RS485):
9600 baud
actual data transmission rate is about 25% of that.
The serial line is going through an area of extremely high EMR. Peak fluctuations can reach 3000 KV.
I am not in the position (yet) to force a change in the physical medium, but could easily offer to put in a simple robust forward error correction scheme. The scheme needs to be easy to implement on a PIC18 series micro.
Ideas?
This site claims to implement Reed-Solomon on the PIC18. I've never used it myself, but perhaps it could be a helpful reference?
Search for CRC algorithm used in MODBUS ASCII protocol.
I develop with PIC18 devices and currently use the MCC18 and PICC18 compilers. I noticed a few weeks ago that the peripheral headers for PICC18 incorrectly map the Busy2USART() macro to the TRMT bit instead of the TRMT2 bit. This caused me major headaches for short time before I discovered the problem. Example, a simple transmission:
putc2USART(*p_value++);
while Busy2USART();
putc2USART(*p_value);
When the Busy2USART() macro was incorrectly mapped to the TRMT bit, I was never waiting for bytes to leave the shift register because I was monitoring the wrong bit. Before I realized the inaccurate header file, the only way I was able to successfully transmit a byte over 485 was to wait 1 ms between bytes. My baud rate was 91912 and the delays between bytes killed my throughput.
I also suggest implementing a means of collision detection and checksums. Checksums are cheap, even on a PIC18. If you are able to listen to your own transmissions, do so, it will allow you to be aware of collisions that may result from duplicate addresses on the same loop and incorrect timings.