ActiveMQ ActiveMQ.Advisory.TempQueue getting bigger and bigger - activemq

Problem,
on ActiveMQ for some reasons(I dont know why) the ActiveMQ.Advisory.TempQueue is getting bigger and bigger (1GB per day).
here is a snapshot:
Name Producer # Consumer # Enqueue # Dequeue # Memory % Dispatch # Always retroactive Average blocked time Average enqueue time Average message size Blocked producer warning interval Blocked sends Dlq Expired count Forward count In flight count Max audit depth Max enqueue time Max message size Max page size Max producers to audit Memory limit Memory usage byte count Memory usage portion Min enqueue time Min message size Options Prioritized messages Producer flow control Queue size Slow consumer strategy Store message size Total blocked time Use cache Object name
ActiveMQ.Advisory.TempQueue | 0 | 816 | 187550135 | 0 | 0 | 187836323 | FALSE | 0 | 0.3694736 | 1024 | 30000 | 0 | FALSE | 0 | 0 | 187836323 | 2048 | 1233 | 1024 | 200 | 1024 | 668309914 | 0 | 1 | 0 | 1024 | FALSE | TRUE | 0 | 0 | 0 | TRUE | org.apache.activemq:type=Broker,brokerName=localhost,destinationType=Topic,destinationName=ActiveMQ.Advisory.TempQueue
Any idea?

Advisory Topics in ActiveMQ don't accumulate data, they are Topics and as such when there are no consumers on the Topics the messages sent to them are dropped. If you have a consumer on the advisory Topic then message pass through it but are not stored in the persistent storage on the broker. The stats can sometimes be deceiving given the enqueue count keeps ticking up.
Without knowing more about what you are seeing there's not much more help that can be offered.
If you are seeing growth in the KahaDB logs then it is unrelated to your Advisory Topics as I've stated they don't store messages ever so there is something else going on. There's some nice instructions on the ActiveMQ WebSite on how to take a look at what is keeping the KahaDB journal files alive which you should use to help debug your issue.

Related

How to Cluster and create a timechart in splunk

I have a field with LogMsg error messages That I am grouping based on similarities using cluster.
What I am trying to achieve is a display that will show a timeseries with the grouped error
index="my_index_here" LogLevel=ERROR
| cluster showcount=t t=0.2 field=Message | eval "Error Count" = cluster_count
| head 10 | timechart count("Error Count") By LogMsg span=60m
The Idea is this
Get all the error Messages LogLevel=ERROR
Group the items based on Message field | cluster showcount=t t=0.2 field=Message | eval "Error Count" = cluster_count
Get top 10 results | head 10
Draw a timechart timechart count("Error Count") By LogMsg span=60m. The time chart should have a plot of number different error messages generated from the cluster against time, something like
Message
8.00
9:00
10.00
11:00
Unable to authenticate
90
40
30
60
Another Error
80
40
30
60
Yet another error
70
40
30
60
---
---
---
---
---
The 10th most frequent error
50
40
30
60
My approach above is not working returning a blank plot,
The way to debug SPL is to execute one pipe at time and verify the results before adding the next pipe.
One thing I believe you'll discover is the head command ruins the timechart. It's possible all of the top 10 results will be in the same hour so the results may be less than useful.
A common cause of a "blank plot" is a stats or timechart command that references a non-existent or null field. You should discover which field is null during the debug.
FWIW, here's a run-anywhere query similar to yours that produces a plot.
index=_internal log_level=INFO
| cluster showcount=t t=0.2 field=event_message
| eval "Error Count" = cluster_count
| head 10
| timechart count("Error Count") By group span=60m

Multiplexing RabbitMQ message

For example i have 4 sources which publish meterics.
I would like to multiplex / merge all theses messages in one queue/exchange
--------+----+----+----+----+ -------+---------+----+---------+---------+
Source1 | M1 | M2 | M3 | | => Result | M1 | M4 | M2 | M3 | M6 | M5 | M7 |
Source2 | M4 | | | M5 |
Source3 | | | M6 | |
Source4 | | | | M7 |
For each queue:
* Read one message
* Publish message to the Result queue
Is there a "native" way to do this in RabbitMQ or should i write my own Consumer/Publisher ?
EDIT 1
Some example to clarify, let's say after some time I have
Processing "window"
+-+
Source1 |X|XXXXXXXXXXXXX
Source2 |Y|YYYYYYY
Source3 |Z|ZZZZZZZZZZ
Source4 |W|WW
+-+
And then later
Processing "window"
+-+
Source1 XXX|X|XXXXXXXXXX
Source2 YYY|Y|YYYY
Source3 ZZZ|Z|ZZZZZZZ
Source4 WWW| |
+-+
And then later
Processing "window"
+-+
Source1 XXXXXXXXX|X|XXXX
Source2 YYYYYYYY | |
Source3 ZZZZZZZZZ|Z|Z
Source4 WWW | |
+-+
The result consuming order will be:
X Y Z W X Y Z W X Y Z W X Y Z X Y Z X Y Z X Y Z X Y Z X Z X Z X Z X X X
X,Y,Z,W then
X,Y,Z,W then
X,Y,Z,W then
X,Y,Z
...
X,Z
...
This way, even if a source is "spamming" all other messages from other sources have a chance to be consumed.
For technical/financial reasons I need to consume only 1 message a time.
The consumer is way slower than the producers but the producers publish a lot but occasionaly.
If each source published to an exchange bound to the same queue, the result might be XXXXXXXXXXXXXX YYYYYYYY ZZZZZZZZZZZ WWW or
XXXXX Y XXXXX YYY XXX YYYY ZZZZZZZZZZZ WWW (depending on the publish rate of each source)
I think what you want can be achieved simply by running a single script that subscribes to all the queues.
The key requirement is to use a single application thread to handle all messages, regardless of which queue they arrive from. What that looks like will vary depending on what language and client library you're using - if you're using PHP, you'd have to really go out of your way not to be single-threaded, but maybe there are some client libraries that assume each callback is on a separate worker thread, and you'll need some shared resource for them to block on.
In terms of the actual RabbitMQ side of things, you will need to:
register a subscription for the server to push messages to you, with basic.consume; this is generally recommended over explicitly polling with basic.get anyway
use a single "channel" for all the basic.consume calls
use manual acknowledgements so that messages remain in the queue until your process has finished
set a per-queue prefetch limit of 1 with basic.qos
If you have 4 queues, A, B, C, and D, which have varying amounts of messages when you start the consumer:
When you first subscribe, the prefetch limit will mean that one message from each queue will be sent to the channel; call them A1, B1, C1, and D1
The client library will raise an asynchronous event in your application for each of these in turn
Your single worker thread will handle the first of these events, and start processing message A1
Until you manually acknowledge that message, no other messages can arrive
Once you acknowledge the first message (A1), a new message can be pre-fetched from that queue (A2)
Meanwhile, your worker thread will unblock and handle the next event which was already raised, for message B1
Only once you've processed the pending events for B1, C1, and D1 will the worker thread see the event for message A2
As long as the queues have messages waiting, they will be processed in a round-robin fashion. Even if all but one of the queues become empty, they will slot back into rotation as soon as a message arrives, because only one message from the busy queue will have been pre-fetched, the rest will just be waiting on the RabbitMQ server.

Can RTMP multiplex messages within a single chunk stream?

Reading the RTMP specification, in an effort to write a rudimentary RTMP server, I can't tell whether multiple messages (message stream id) can be sent over the same chunk stream (chunk stream id).
Section 5.3.2 shares two examples: one where multiple messages with the same stream id are sent sequentially over multiple chunks for a single chunk stream id and one where a single message is sent over multiple chunks for a single chunk stream id.
But there's no example demonstrating multiple messages with different stream ids being sent concurrently over multiple chunks for a single chunk stream id. I can't find anything that would prevent this, but I'd like confirmation.
For example, say you have two messages like in example 2
+-----------+-------------------+-----------------+-----------------+
| | Message Stream ID | Message TYpe ID | Time | Length |
+-----------+-------------------+-----------------+-----------------+
| Msg # 1 | 27 | 9 (video) | 1000 | 307 |
+-----------+-------------------+-----------------+-----------------+
| Msg # 2 | 42 | 9 (video) | 1000 | 197 |
+-----------+-------------------+-----------------+-----------------+
Can the RTMP client send the following sequence of chunks?
Type 0 message for 27
Type 0 message for 42
Type 3 message for 27
Type 3 message for 27 (completely sent Msg # 1)
Type 3 message for 42 (completely sent Msg # 2)
In other words, is chunk 3 expected to use the header from 1 or from 2 (ie. based on message stream id)?

How to Optimize Google Big Query Bytes Billed

I have recently discovered Google Big Query and it's open datasets. Upon performing the following query on the 311_service requests table in the new_york dataset, the cloud console reports the bytes billed to be 130 MB.
SQL Query:
SELECT unique_key FROM `bigquery-public-data.new_york.311_service_requests` LIMIT 10
Query Returns:
+------+-------------+
| Rows | unique_key |
+------+-------------+
| 1 | 37911459 |
| 2 | 38162601 |
| 3 | 32560181 |
| 4 | 38259076 |
| 5 | 36034528 |
| 6 | 36975822 |
| 7 | 38028455 |
| 8 | 37993135 |
| 9 | 37988664 |
| 10 | 35382611 |
+------+-------------+
For a query returning such a small amount of data, why is the bytes billed valued at 130 MB?
Is there a way to optimize this? Should the results of a query be stored in another database for later retrieval?
why is the bytes billed valued at 130 MB?
Query pricing refers to the cost of running your SQL commands and user-defined functions. BigQuery charges for queries by using one metric: the number of bytes processed (also referred to as bytes read). You are charged for the number of bytes processed whether the data is stored in BigQuery or in an external data source such as Cloud Storage, Google Drive, or Cloud Bigtable.
When you run a query, you're charged according to the total data processed in the columns you select, even if you set an explicit LIMIT on the results. The total bytes per column is calculated based on the types of data in the column. For more information about how we calculate your data size, see Data size calculation.
Query pricing is based on your usage pattern: a monthly flat rate for queries or pricing based on interactive queries. Enterprise customers generally prefer flat-rate pricing for queries because that model offers consistent month-to-month costs. On-demand (or interactive) pricing offers flexibility and is based solely on usage.
You can see more at https://cloud.google.com/bigquery/pricing
So, in your case 130MB is the size of the respective unique_key column
Should the results of a query be stored in another database for later retrieval?
sure
You can do so to manage cost for consecutive processing of that small data w/o touching the original one
Have in mind - this will invoke storage price for you - see same above mentioned link for details

What is the meaning of the "Load" column in Apache balancer-manager?

I've set up the Apache (2.4) load-balancer which is working okay. To monitor its performance, I enabled the balancer-manager handler, which shows the status of the balancers.
I noticed a "Load" column, which was not present in version 2.2, with a value that may be negative, but I don't understand its meaning nor I was able to find documentation relative to this.
Can anyone explain the meaning of that value or point me to the right documentation?
I now understood, how the calculation of "Load" works. Here is a I think more simpler example than on the apache documents page.
Let's say we have 3 worker and a configured load factor of 1.
1) Start
a | b | c
--+---+---
0 | 0 | 0
add the load factor of 1 to all workers
a | b | c
--+---+---
1 | 1 | 1
now select the one with highest value --> a and decrease by the sum of the factor of all (=3) - this is the selected worker
a | b | c
---+---+---
-2 | 1 | 1
2) next round, add again 1 to all
a | b | c
---+---+---
-1 | 2 | 2
now select the one with highest value --> b and decrease by the sum of the factor of all (=3) - this is the selected worker
a | b | c
---+----+----
-1 | -1 | 2
3) next round, add again 1
a | b | c
---+----+----
0 | 0 | 3
now select the one with highest value --> c and decrease by the sum of the factor of all (=3) - this is the selected worker
a | b | c
---+----+----
0 | 0 | 0
startover again :)
I hope this helps others.
The Load value is populated by lbstatus based on this line of code:
ap_rprintf(r, "<td>%d</td><td>", worker->s->lbstatus);
in https://svn.apache.org/viewvc/httpd/httpd/trunk/modules/proxy/mod_proxy_balancer.c?view=markup#l1767 (line might changed when the code modified)
Since your method is by request, lbstatus is specified by mod_lbmethod_byrequests which define:
lbstatus is how urgent this worker has to work to fulfill its quota of
work.
Details on the algorithm can be found here: https://httpd.apache.org/docs/2.4/mod/mod_lbmethod_byrequests.html
i too want to know to description for others column like BUSY, ELECTED etc.. my LB has BUSY over 100 already.. i though BUSY should not exceed 100 ( as in 100% server busyness or something )