Apache QPID queue size and count - apache

I have a qpid queue with this parameters:
bus-sync-queue --durable --file-size=48 --file-count=64
I want to put to this queue 1 000 000 messages. Each message is just a string with 12 characters. (002000333222, 002000342678 and so on). What values I must set to config --file-size=X --file-count=Y to able to fit all messages to queue?

There is quite a big overhead on single persistent message, in you case one message will require at least 128 bytes of storage. You should rethink your design, either decrease expected number of no-acknowledged messages or use different approach.

Related

How to send 2MB of data through UDP?

I am using TMS570LS3137 (DP84640 Phy). Trying to program UPD(unicast) using lwip to send 2MB of data.
As of now i can send upto 63kb of data. How to send 2MB of data at a time. UDP support upto 63kb of transmission only, but in this link
https://stackoverflow.com/questions/32512345/how-to-send-udp-packets-of-size-greater-than-64-kb#:~:text=So%20it's%20not%20possible%20to,it%20up%20into%20multiple%20datagrams.
They have mentioned as "If you need to send larger messages, you need to break it up into multiple datagrams.", how to proceed with this?
Since UDP uses IP, you're limited to the maximum IP packet size of 64 KiB generally, even with fragmentation. So, the hard limit for any UDP payload is 65,535 - 28 = 65,507 bytes.
I need to either
chunk your data into multiple datagrams. Since datagrams may arrive out of sending order or even get lost, this requires some kind of protocol or header. That could be as simple as four bytes at the beginning to define the buffer offset the data goes to, or a datagram sequence number. While you're at it, you won't want to rely on fragmentation but, depending on the scenario, use either the maximum UDP payload size over plain Ethernet (1500 bytes MTU - 20 bytes IP header - 8 bytes UDP header = 1472 bytes), or a sane maximum that should work all the time (e.g. 1432 bytes).
use TCP which can transport arbitrarily sized data and does all the work for you.

High precision queue statistics from RabbitMQ

I need to log with the highest possible precision the rate with which messages enter and leave a particular queue in Rabbit. I know the API already provides publishing and delivering rates, but I am interested in capturing raw incoming and outgoing values in a known period of time, so that I can estimate rates with higher precision and time periods of my choice.
Ideally, I would check on-demand (i.e. on a schedule of my choice) e.g. the current cumulative count of messages that have entered the queue so far ("published" messages), and the current cumulative count of messages consumed ("delivered" messages).
With these types of cumulative counts, I could:
Compute my own deltas of messages entering or exiting the queue, e.g. doing Δ_count = cumulative_count(t) - cumulative_count(t-1)
Compute throughput rates doing throughput = Δ_count / Δ_time
Potentially infer how long messages stay on the queue throughout the day.
The last two would ideally rely on the precise timestamps when those cumulative counts were calculated.
I am trying to solve this problem directly using RabbitMQ’s API, but I’m encountering a problem when doing so. When I calculate the message cumulative count in the queue, I get a number that I don’t expect.
For example consider the screenshot below.
The Δ_message_count between entries 90 and 91 is 1810-1633 = 177. So, as I stated, I suppose that the difference between published and delivered messages should be 177 as well (in particular, 177 more messages published than delivered).
However, when I calculate these differences, I see that the difference is not 177:
Δ of published (incoming) messages: 13417517652009 - 13417517651765 = 244
Δ of delivered (outgoing) messages: 1341751765667 - 1341751765450 = 217
so we get 244 - 217 =27 messages. This suggests that there are 177 - 27 = 150 messages "unaccounted" for.
Why?
I tried taking into account the redelivered messages given by the API but they were constant when I run my tests, suggesting that there were no redelivered messages, so I wouldn't expect that to play a role.

Apache Nifi PutElasticsearch can wait forever to fill up batch size?

I am trying to write streaming data into elasticsearch with apache-nifi.putElasticSearch processor,
PutElasticSearch has property named "Batch Size", when I set this value to 1 all events are written to elasticsearch ASAP.
But such a low "batch size" obviously not working when the load is high. So in order to have a reasonable throughput I need to set it to 1000.
My question is, does PutElasticSearch waits till the batch size of events available. If yes it can wait hours when there are 999 events waiting on processor.
I am searching to understand how logstash doing same job on elasticsearch output plugin. There may be some flushing logic implemented based on time ( if events are waiting ~2 sec flush events to elasticsearch )..
You have any idea?
Edit: I just found logstash implemented this https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-idle_flush_time :)
How can I do same functionality on nifi
According to the code the batch size parameter is a maximum number of FlowFiles from the incoming queue.
For example in case of value batch size = 1000:
1/ if incoming queue contains 1001 flow files - only 1000 will be taken in one transaction.
2/ if incoming queue contains 999 flow files - 999 will be taken in one transaction.
And everything will be processed as soon as there is something in the incoming queue and there are available threads in nifi.
references:
PutElasticsearch.java
ProcessSession.java

when sending multiple redis write commands in pipeline, do I need to read the return values?

for example:
344 r.Send("HINCRBY", key, set_timestamp, value)
345 r.Send("EXPIRE", key, 84600)
346 r.Flush()
347 //r.Receive()
348 //r.Receive()
do I need lines 347 and 348 to be uncommented? I don't care about the return values. Is there an advantage to not reading them?
Pipelined responses are queued in memory until read. See http://redis.io/topics/pipelining
While the client sends commands using pipelining, the server will be forced to queue the replies, using memory. So if you need to send a lot of commands with pipelining, it is better to send them as batches having a reasonable number, for instance 10k commands, read the replies, and then send another 10k commands again, and so forth. The speed will be nearly the same, but the additional memory used will be at max the amount needed to queue the replies for this 10k commands.
You should read the replies for all requests at the end of the pipeline.
If you're using the Go github.com/garyburd/redigo/redis package, this can be with by using Do as the final call in the pipeline; and calling Do with an empty command argument will only flush the output buffer and return all replies. Not only do you want to receive the responses to clear the queue on the server, but getting unexpected responses or errors on a later call to Do could lead to hard to find bugs.
Also, since redis 3.2, you have the option of turning off the reply from the server with CLIENT REPLY ON|OFF|SKIP.

Query random but unread keys the Redis way

I have thousands of messages each stored like a list of properties (text, subject, date, etc) in a separate key: msg:1001, msg:1002 etc...
There is also a list keyed as messages with ids of all existing messages: 1001,1002,1003...
Now I need to get 10 random messages.
But, I only need those messages that are not flagged by the user (sort of unread).
There is a hash for each user keyed as flags:USERID = 1001=red,1005=blue,1010=red,...
Currently I have to keep in memory of my application a full list of messages plus all flags for all users currently logged in and do all the math by hand (in JavaScript).
Is there a way to do such a query in Redis way, with no duplicating all the data on the application end?
Your question is an example of a space–time tradeoff. On the one hand, you say that you don't want to keep a list of the unflagged messages in your system, but I would guess that you also want to keep your application relatively fast. Therefore, I suggest giving up some space and keeping a set of unflagged messages.
As messages are created in your system, add them both to messages (SADD messages <messageid>) and messages_unflagged (SADD messages_unflagged <messageid>). After a user adds a flag to a message, remove the message from the unflagged set (SREM messages_unflagged <messageid>). When you need 10 random, unflagged messages, you can get their IDs in constant time (SRANDMEMBER messages_unflagged 10).