I am running an application which will produce JSON data to activemq and another process which will consume it and do some processing on this data. But as the JSON data which I am producing to the queue becomes larger in size I am getting broken pipe exception.Is there any limit in size of data which I can store/produce into activeMQ ? Any help will be greatly appreciated.
Thanks
Well, you can configure a max frame size on the transport connector in ActiveMQ. By default in recent versions, it's around 100MB. Anyway, when you have that size of messages, you should think about splitting your data into smaller chunks.
Check out the ActiveMQ logs as well, maybe you have a clue there if it's a frame size limit that is hit or some other thing. Broken pipe simply means that the connection is broken by some reason, so that message does not say much.
Related
I have flume configuration with rabbitmq source, file channel and solr sink. Sometimes sink becomes so busy and file channel is filling up. At that time ChannelFullException is being thrown by file channel. After 500 number of ChannelFullException are thrown flume stuck and never responds and recover itself. I want to learn that, where does 500 value come from? How can I change it? 500 is strict because when flume stucks, I count exceptions and I find 500 number of ChannelFullException log line everytime.
You are walking into a typical producer-consumer problem, where one is working faster than the other. In your case, there are two possibilities (or a combination of both):
RabbitMQ is sending messages faster than Flume can process.
Solr cannot ingest messages fast enough so that they remain stuck in Flume.
The solution is to send messages more slowly (i.e. throttle RabbitMQ) or tweak Flume so that it can process messages faster. I think the last thing is what you want. Furthermore, the unresponsiveness of Flume is probably caused by the java heap size being full. Increase the heap size and try again until the error disappears.
# Modify java maximum memory size
vi bin/flume-ng
JAVA_OPTS="-Xmx2048m"
Additionally, you can also increase the number of agents, channels, or the capacity of those channels. This would naturally cause a higher footprint on the java heap size, so try that first.
# Example configuration
agent1.channels = ch1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 10000
agent1.channels.ch1.transactionCapacity = 10000
agent1.channels.ch1.byteCapacityBufferPercentage = 20
agent1.channels.ch1.byteCapacity = 800000
I don't know where the exact number 500 comes from, a wild guess would be that when there are 500 exceptions thrown the java heap size is full and Flume does not respond.
Another possibility is that the default configuration above causes it to be exactly 500. So try tweaking it so if it ends up being different or better, that it does not occur anymore.
I know that Message Queuing has a Message Size Limit of 4MB, but have recently run into situations where it will be necessary for me to support messages that are greater than 4MB. I have seen it mentioned that it is possible to use a transactional queue and split a message into 'chunks', then re-assemble them on the receiving end, but have seen very little information on how to accomplish this. The messages I am sending contain SQL record data formatted in XML (We use some Nvarchar(MAX) and varbinary(Max) fields, which is why the size limit is an issue. Any assistance in accomplishing this would be most appreciated!
MSFT have documented a code sample here: https://support.microsoft.com/en-us/kb/198686
IF producer is sending message of large message size (let's say 120 MB). How does kahadb and levelDB handle such messages.
kahadb: What I understand is journal size is 32 MB by default. and If i send message more than 32MB, how it will handle such message ? Do I need to change this size to appropriate value according to message size?
leveldb: By default 100mB is default size to store message data. After which rolling happens. IF message is more than 100mB. How does it handle ?
Thanks,
ANuj
For Kahadb we have rolling log of messages, messages and commands are stored in data file of fixed length,if the length exceeds the size of message journal then new file is created.
KahaDB just appends the new message to existing journal and takes care of creating new journal.
Also,KahaDB holds indexes to messages in form a BTree.These Btree indexes hold references to the messages in data logs indexed by their message ID.In short KahaDB will know exactly where your message is stored with the help of this index.So, adding any new configuration for storing this message should not be required
Regarding the whole message in single data log file, I am not sure may be bit research needed
and before trying your luck at changing the journalSizeLength for KahaDB please got through this
link (reading comments there might be helpful)
Good luck!
Hope it helps.
I am using LogStash to collect the logs from my service. The volume of the data is so large (20GB/day) that I am afraid that some of the data will be dropped at peak time.
So I asked question here and decided to add a Redis as a buffer between ELB and LogStash to prevent data loss.
However, I am curious about when will LogStash exceed the queue capacity and drop messages?
Because I've done some experiments and the result shows that LogStash can completely process all the data without any loss, e.g., local file --> LogStash --> local file, netcat --> LogStash --> local file.
Can someone give me a solid example when LogStash eventually drop messages? So I can have a better understanding about why we need a buffer in front of it.
As far as I know, Logstash queue is very small. Please refer to here.
Logstash sets each queue size to 20. This means only 20 events can be pending into the next phase.
This helps reduce any data loss and in general avoids logstash trying to act as a data storage
system. These internal queues are not for storing messages long-term.
As you say, your daily logs size are 20GB. It's quite large amount. So, it is recommended that install a redis before logstash. The other advantage for installing a redis is when your logstash process have error and shutdown, redis can buffer the logs for you, otherwise all your logs will be drop.
The maximum queue size is configurable and the queue can be stored on-disk or in-memory. (Strongly advise in-memory due to high volume).
When the queue is full, logstash will stop reading log messages and drop incoming logs.
For log files, logstash will stop reading further when tit can't keep up, it can resume reading later. It's keeping track of active log files and last read position. The files are basically acting like an enormous buffer, it's really unlikely to lose data (unless files are deleted).
For TCP/UDP input, messages can be lost if the queue is full.
For other inputs/outputs, you have to check the doc, whether it can support back pressure, whether it can replay missed messages if a network connection was lost.
Generally speaking, 20 GB a day is pretty low (even in 2014 when it was originally posted), we're talking about 1000 messages a second. logstash really doesn't need a redis in front.
For very large deployments (multiple TB per day), it's common to encounter kafka somewhere in the chain to buffer messages. At this stage there are typically many clients with different types of messages, flowing over a variety of protocols.
I have a message queue named amq570queue, after accumulating 2 million messages it started to slow down. What broker settings do I need to adjust to fix this issue? I temporarily moved it into a new message queue(in the same broker) and it is working fine. I initially thought that the kahadb has reached its size limit that is why it is getting slow. Is there a way to limit the size of Message Dequeued? Thank you in advance for any inputs.
Regards,
Walter
One possible reason is the producer flow control and systemusage settings. Take a look at
http://activemq.apache.org/producer-flow-control.html