how framesize changes throughput for different size messages - rabbitmq

While testing the performance of RabbitMQ, I found that after increasing the message size above around 250 KB the throughput was decreasing very quickly. Then I changed the frame_size to 5MB from 256KB and I got, as expected increase in the throughput. This increase was also for the message of smaller size i.e. for message of size 50 KB, throughput increases from 700 to 1100 messages per sec. and that for messages of size 10,000 KB it increases from 2 to 3 messages per sec.
My question is : Can somebody explain me how the messages are passed on wire from client to the broker and vice versa. If the size of message is greater than the size of frame then, I guess multiple frames will be made for that message(that's why, I think, when frame size was 256 KB and message size was increased from 256 KB there was large drop in throughput). But if the size of message is say around 1/10th the size of frame, then is it that each frame will be now holding 10 messages (if 10 messages are ready to be delivered).
If it is so, then say if 10 messages are not ready then for how much time the frame will wait so as to have 10 messages, means there has to be some time out after which the frame will deliver any number of message it is currently holding.
I think I have put all required observations in the question. Any help is warmly welcome.

Related

Apache Flume stuck after ChannelFullException is occured 500 times

I have flume configuration with rabbitmq source, file channel and solr sink. Sometimes sink becomes so busy and file channel is filling up. At that time ChannelFullException is being thrown by file channel. After 500 number of ChannelFullException are thrown flume stuck and never responds and recover itself. I want to learn that, where does 500 value come from? How can I change it? 500 is strict because when flume stucks, I count exceptions and I find 500 number of ChannelFullException log line everytime.
You are walking into a typical producer-consumer problem, where one is working faster than the other. In your case, there are two possibilities (or a combination of both):
RabbitMQ is sending messages faster than Flume can process.
Solr cannot ingest messages fast enough so that they remain stuck in Flume.
The solution is to send messages more slowly (i.e. throttle RabbitMQ) or tweak Flume so that it can process messages faster. I think the last thing is what you want. Furthermore, the unresponsiveness of Flume is probably caused by the java heap size being full. Increase the heap size and try again until the error disappears.
# Modify java maximum memory size
vi bin/flume-ng
JAVA_OPTS="-Xmx2048m"
Additionally, you can also increase the number of agents, channels, or the capacity of those channels. This would naturally cause a higher footprint on the java heap size, so try that first.
# Example configuration
agent1.channels = ch1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 10000
agent1.channels.ch1.transactionCapacity = 10000
agent1.channels.ch1.byteCapacityBufferPercentage = 20
agent1.channels.ch1.byteCapacity = 800000
I don't know where the exact number 500 comes from, a wild guess would be that when there are 500 exceptions thrown the java heap size is full and Flume does not respond.
Another possibility is that the default configuration above causes it to be exactly 500. So try tweaking it so if it ends up being different or better, that it does not occur anymore.

Google Pub/Sub + Cloud Run scalability

I have a python application writing pubsub msg into Bigquery. The python code use the google-cloud-bigquery library and the TableData.insertAll() method quota is 10,000 requests per second per table.Quotas documentation.
Cloud Run container auto scaling is set to 100 with 1000 requests per container.So technically, I should be able to reach 10 000 requests/sec right? With the BQ insert API being the biggest bottleneck.
I only have a few 100 requests per sec at the moment, with multiple service running at the same time.
CPU and RAM at 50%.
Now confirming your project structure, and a few details given in the comments; I would then review the Pub/Sub quotas and limits, especially the Quota and the Resource limits, both tables where you can check this information depending on the size and the Throughput quota units sections tells you how to calculate quota usage.
I would answer your question as a yes, you are able to reach 10,000 req/sec. And as in this question depending on the byte size you can have 10,000 row inserts unless the recommendation is 500.
The concurrency in Cloud Run can be modified in case you need to change it.

Ignite off-heap memory consumption

Is there a way to know how much off-heap memory will each cache record take? My cache is:
IgniteCache<String, byte[]>
Each key is around 24-26 symbols and value is 12 bytes. After putting 40000 records off-heap usage grew by 8MB, which is around 210 bytes for each record. Page size is configured as 1KB, metrics show that page fill factor is around 0.97-1.0. Assuming there is not backups
Is there anywhere to read about how each record is stored in off-heap to understand where those 210 bytes come from? Queries are disabled. Or what could possible cause such consumption?
According to the docs https://www.gridgain.com/docs/latest/administrators-guide/capacity-planning it's exactly about the 200 bytes overhead for an entry, so I think it's kind of expected.

Redis dequeue rate 10x slower over the network

I was testing enqueue and dequeue rate of redis over the network which has 1Gbps LAN speed, and both the machines has 1Gbps ethernet card.
Redis version:3.2.11
lpush 1L items having 1 byte per item using python client.
Dequeuing items using rpop took around 55 secs over the network which is just 1800 dequeues sec. Whereas the same operation completes within 5 secs which I dequeue from local which is around 20,000 dequeues sec.
Enqueue rates are almost close to dequeue rate.
This is done using office network when no much usage are there. The same is observed on production environments too!
A drop of less than 3x over the network is accepted. Around 10x looks like I am doing something wrong.
Please suggest if I need to make any configuration changes on server or client side.
Thanks in Advance.
Retroactively replying in case anyone else discovers this question.
Round-trip latency and concurrency are likely your bottlenecks here. If all of the dequeue calls are in serial, then you are stacking that network latency. With 1 million calls at 2ms latency, you'd have at least 2 million ms of latency overhead, or 33 mins). This is to say that your application is waiting for the server to receive the payload, do something, and reply to acknowledge the operation was successful. Some redis clients also perform multiple calls to enqueue / dequeue a single job (pop & ack/del), potentially doubling that number.
The following link illustrates different approaches for using redis keys by different libraries (ruby's resque vs. clojure's carmine, pay note to the use of multiple redis commands that are executed on the redis server for a single message). This is likely the cause of the 10x vs. 3x performance you were expecting.
https://kirshatrov.com/2018/07/20/redis-job-queue/
An oversimplified example of two calls per msg dequeue (latency of 1ms and redis server operations take 1 ms):
|client | server
~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1ms | pop msg >--(1ms)--> receive pop request
2ms | [process request (1ms)]
3ms | receive msg <--(1ms)--< send msg to client
4ms | send del >--(1ms)--> receive del
5ms | [delete msg from queue (1ms)]
6ms | receive ack <--(1ms)--< reply with delete ack
Improving dequeue times often involves using a client that supports multi-threaded or multi-process concurrency (i.e. 10 concurrent workers would significantly reduce the overall time to completion). This ensures your network is better utilized by sending a stream of dequeue requests, instead of waiting for one request to complete before grabbing the next one.
As for 1 byte vs 500 bytes, the default TCP MTU is 1500 bytes. Subtracting TCP headers, the payload is ~ 1460 bytes (less if tunneling with GRE/IPsec, more if using jumbo frames). Since both payload sizes would fit in a single TCP packet, they will have similar performance characteristics.
A 1gbps ethernet interface can deliver anywhere between 81,274 and 1,488,096 packets per second (depending on payload size).
So really, it's a question of how many processes & threads you can run concurrently on the client to keep the network & redis server busy.
Redis is generally I/O bound, not CPU bound. It may be hitting network bandwidth limits. Given the small size of your messages most of the bandwidth may be eaten by TCP overhead.
On a local machine you are bound by memory bandwidth, which is much faster than your 1Gbps network bandwidth. You can likely increase network throughput by increasing the amount of data you grab at a time.

Impact of increasing the MAXMSGL parm of a receiver channel

What is the impact of increasing the MAXMSGL parm of a receiver channel?" Does it automatically increase the amount of memory allocated for the channel, regardless of the size of the messages that flow across the channel? For a cluster-receiver channel, which typically supports multiple channel instances, does it increase the memory allocation for each channel instances? (example: if the channel is supporting 10 connections, and we increase MAXMSGL from 4MB to 100MB, does it increase memory usage from 40MB to 1GB?
We are using MQ v7.5.0.3 on AIX.
Thanks!!
The short answer is no. If the message size is 100KB that flows across the channel then you will see no difference. On the other hand, if the message size jumps to 100MB then the memory size will increase but I don't believe it will hit 1GB.