I know that Message Queuing has a Message Size Limit of 4MB, but have recently run into situations where it will be necessary for me to support messages that are greater than 4MB. I have seen it mentioned that it is possible to use a transactional queue and split a message into 'chunks', then re-assemble them on the receiving end, but have seen very little information on how to accomplish this. The messages I am sending contain SQL record data formatted in XML (We use some Nvarchar(MAX) and varbinary(Max) fields, which is why the size limit is an issue. Any assistance in accomplishing this would be most appreciated!
MSFT have documented a code sample here: https://support.microsoft.com/en-us/kb/198686
Related
Am trying to understand how streaming works w.r.t mule 4.4.
Am reading a large file and am using 'Repeating file store stream' as streaming strategy
'In memory size' = 128 KB
The file is 24 MB and for sake of argument lets say 1000 records is equivalent to 128 KB
so about 1000 records will be stored in memory and rest all will be written to file store by mule .
Here's the flow:
At stage#1 we are reading a file
At stage#2 we are logging payload - so I am assuming initially 128KB worth of data is logged and internally mule will move rest of the data from file storage to in memory and this data will be written to log.
Question : so does the heap memory increase from 128KB to 24 MB ?
I am assuming no , but needed confirmation ?
At stage#3 we are using transform script to create a json payload
So what happens here :
so now is the json payload all in memory now ? ( say 24 MB ) ?
what has happened to the stream ?
so really I am struggling to understand how stream is beneficial if during transformation the data is stored in memory ?
Thanks
It really depends on how each component works but usually logging means to load the full payload in memory. Having said that, logging 'big' payloads is considered a bad practice and you should avoid doing it in the first place. Even a few KBs logs are really not a good idea. Logs are not intended to be used that way. Using logs, as any computational operation, have a cost in processing and resource usage. I have seen several times people causing out of memory errors or performance issues because of excessive logging.
The case with the Transform component is different. In some cases it is able to benefit from streaming, depending on the format used and the script. Sequential access to records is required for streaming to work. If you try an indexed access to the 24MB payload it will probably load the entire payload in memory (example payload[14203]). Also referencing the payload more than once in a step may fail. Streamed records are consumed after being read so it is not possible to use them twice.
Streaming for Dataweave needs to be enabled (it is not the default) by using the property streaming=true.
You can find more details in the documentation for DataWeave Streaming and Mule Streaming.
I am running an application which will produce JSON data to activemq and another process which will consume it and do some processing on this data. But as the JSON data which I am producing to the queue becomes larger in size I am getting broken pipe exception.Is there any limit in size of data which I can store/produce into activeMQ ? Any help will be greatly appreciated.
Thanks
Well, you can configure a max frame size on the transport connector in ActiveMQ. By default in recent versions, it's around 100MB. Anyway, when you have that size of messages, you should think about splitting your data into smaller chunks.
Check out the ActiveMQ logs as well, maybe you have a clue there if it's a frame size limit that is hit or some other thing. Broken pipe simply means that the connection is broken by some reason, so that message does not say much.
Bigquery officially becomes our device log data repository and live monitor/analysis/diagnostic base. As one step further, We need to measure and monitor data streaming performance. Any relevant benchmark you are using for Bigquery live stream? What relevant once I can refer to?
Since streaming has a limited payload size, see Quota policy it's easier to talk about times and other side effects.
We measure between 1200-2500 ms for each streaming request, and this was consistent over the last month as you can see in the chart.
We seen several side effects although:
the request randomly fails with type 'Backend error'
the request randomly fails with type 'Connection error'
the request randomly fails with type 'timeout'
some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.
For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take for these is an exponential-backoff with retry, even the support told to do so. Which personally doesn't make me happy.
UPDATE
Someone requested in the comments new stats, so I posted 2017. It's still the same, there was some heavy data reorganization for us, you see the spike, but essentially it's the same it's around 2sec if you use the max of the streaming insert.
IF producer is sending message of large message size (let's say 120 MB). How does kahadb and levelDB handle such messages.
kahadb: What I understand is journal size is 32 MB by default. and If i send message more than 32MB, how it will handle such message ? Do I need to change this size to appropriate value according to message size?
leveldb: By default 100mB is default size to store message data. After which rolling happens. IF message is more than 100mB. How does it handle ?
Thanks,
ANuj
For Kahadb we have rolling log of messages, messages and commands are stored in data file of fixed length,if the length exceeds the size of message journal then new file is created.
KahaDB just appends the new message to existing journal and takes care of creating new journal.
Also,KahaDB holds indexes to messages in form a BTree.These Btree indexes hold references to the messages in data logs indexed by their message ID.In short KahaDB will know exactly where your message is stored with the help of this index.So, adding any new configuration for storing this message should not be required
Regarding the whole message in single data log file, I am not sure may be bit research needed
and before trying your luck at changing the journalSizeLength for KahaDB please got through this
link (reading comments there might be helpful)
Good luck!
Hope it helps.
If I declare a queue with x-max-length, all messages will be dropped or dead-lettered once the limit is reached.
I'm wondering if instead of dropped or dead-lettered, RabbitMQ could activate the Flow Control mechanism like the Memory/Disk watermarks. The reason is because I want to preserve the message order (when submitting; FIFO behaviour) and would be much more convenient slowing down the producers.
Try to realize queue length limit on application level. Say, increment/decrement Redis key and check it max value. It might be not so accurate as native RabbitMQ mechanism but it works pretty good on separate queue/exchange without affecting other ones on the same broker.
P.S. Alternatively, in some tasks RabbitMQ is not the best choice and old-school relational databases (MySQL, PostgreSQL or whatever you like) works the best, but RabbitMQ still can be used as an event bus.
There are two open issues related to this topic on the rabbitmq-server github repo. I recommended expressing your interest there:
Block publishers when queue length limit is reached
Nack messages that cannot be deposited to all queues due to max length reached