Here is my flow graph:
File source > Throttle > Packet encoder > Packed to unpacked > Packet decoder > File sink.
No matter what i do, the final 1 or 2 packets (depending upon the number of bytes from file source) don't get written to file sink. The problem is the same if i replace file source and file sink with TCP source and TCP sink.
I think it is an issue with Packet encoder and decoder. Any idea on how to fix this?
Probably this issue is related with the internal buffering of each block, or the buffering of the file sink. Try to decrease the amount of buffered items in each block and/or set the unbuffered option at the file sink to On.
Another solution would be to choose from the flowgraph's options the No-GUI option and the Run to completion. With this way when the file source block reaches the end of file, it sends a special value to the following blocks indicating that the flowgraph is stopping. Perhaps with this way all buffered items in the flowgraph blocks will eventually be flushed.
Related
I have flume configuration with rabbitmq source, file channel and solr sink. Sometimes sink becomes so busy and file channel is filling up. At that time ChannelFullException is being thrown by file channel. After 500 number of ChannelFullException are thrown flume stuck and never responds and recover itself. I want to learn that, where does 500 value come from? How can I change it? 500 is strict because when flume stucks, I count exceptions and I find 500 number of ChannelFullException log line everytime.
You are walking into a typical producer-consumer problem, where one is working faster than the other. In your case, there are two possibilities (or a combination of both):
RabbitMQ is sending messages faster than Flume can process.
Solr cannot ingest messages fast enough so that they remain stuck in Flume.
The solution is to send messages more slowly (i.e. throttle RabbitMQ) or tweak Flume so that it can process messages faster. I think the last thing is what you want. Furthermore, the unresponsiveness of Flume is probably caused by the java heap size being full. Increase the heap size and try again until the error disappears.
# Modify java maximum memory size
vi bin/flume-ng
JAVA_OPTS="-Xmx2048m"
Additionally, you can also increase the number of agents, channels, or the capacity of those channels. This would naturally cause a higher footprint on the java heap size, so try that first.
# Example configuration
agent1.channels = ch1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 10000
agent1.channels.ch1.transactionCapacity = 10000
agent1.channels.ch1.byteCapacityBufferPercentage = 20
agent1.channels.ch1.byteCapacity = 800000
I don't know where the exact number 500 comes from, a wild guess would be that when there are 500 exceptions thrown the java heap size is full and Flume does not respond.
Another possibility is that the default configuration above causes it to be exactly 500. So try tweaking it so if it ends up being different or better, that it does not occur anymore.
I have a huge file that can have anywhere from few hundred thousand to 5 million records. Its tab-delimited file. I need to read the file from ftp location , transform it and finally write it in a FTP location.
I was going to use FTP connector get the repeatable stream and put it into mule batch. Inside mule batch process idea was to use a batch step to transform the records and finally in batch aggregate FTP write the file to destination in append mode 100 records at a time.
Q1. Is this a good approach or is there some better approach?
Q2. How does mule batch load and dispatch phase work (https://docs.mulesoft.com/mule-runtime/4.3/batch-processing-concept#load-and-dispatch ) Is it waiting for entire stream of millions of records to be read in memory before dispatching a mule batch instance ?
Q3. While doing FTP write in batch aggregate there is a chance that parallel threads will start appending content to FTP at same time thereby corrupting the records. Is that avoidable. I read about File locks (https://docs.mulesoft.com/ftp-connector/1.5/ftp-write#locks) . My assumption is it will simply raise File lock exception and not necessarily wait to write FTP in append mode.
Q1. Is this a good approach or is there some better approach?
See answer Q3, this might not work for you. You could instead use a foreach and process the file sequentially though that will increase the time for processing significantly.
Q2. How does mule batch load and dispatch phase work
(https://docs.mulesoft.com/mule-runtime/4.3/batch-processing-concept#load-and-dispatch
) Is it waiting for entire stream of millions of records to be read in
memory before dispatching a mule batch instance ?
Batch doesn't load big numbers of records in memory, it uses file based queues. And yes, it loads all records in the queue before starting to process them.
Q3. While doing FTP write in batch aggregate there is a chance that
parallel threads will start appending content to FTP at same time
thereby corrupting the records. Is that avoidable. I read about File
locks (https://docs.mulesoft.com/ftp-connector/1.5/ftp-write#locks) .
My assumption is it will simply raise File lock exception and not
necessarily wait to write FTP in append mode
The file write operation will throw a FILE:FILE_LOCK error if the file is already locked. Note that Mule 4 doesn't manage errors through exceptions, it uses Mule errors.
If you are using DataWeave flatfile to parse the input file, note that it will load the file in memory and use significantly more memory than the file itself to process it, so you probably are going to get an out of memory error anyway.
I do have bunch of xml files say hundreds in my source directory. I have made my flow processing strategy to be synchronous to execute only 1 xml file at a time as performance
is not much priority to me. But i do have batch processing in my flow. So what i under stand is flow thread is creating a child thread to execute my Batch processing and control moves forward. My whole transformation code lies in batch processing which takes 30secs to execute a xml. So nothing much logic in my main flow except file inbound EP and batch execute component(to trigger batch job). So file inbound endpoint is keep on pollingfiles and whole bunch xmls getting picked in very less time make my mule memory out and unexpected behavior occurs.
Came to know fork-join pattern very late and it may or not fit into my req.
So is there any configuration to make my batch process completely and
execute and pick the next files. Help me out. I already made processing strategy synchronous!!
Shouldn't you in this case just adjust the polling frequency at the file inbound endpoint?
https://docs.mulesoft.com/mule-user-guide/v/3.7/file-connector
Polling Frequency
(Applies to inbound File endpoints only.)
Specify how often the endpoint should check for incoming messages. The default value is 1000 ms.
Set maxThreadsActive and maxBufferSize
https://docs.mulesoft.com/mule-user-guide/v/3.6/tuning-performance#calculating-threads
I have a streamed service. The message returned from the operation has a stream as the only body member, which is a stream to a file in the file system. I wonder if there's a way to record how much time it takes to the client to consume such file, from the server?
One of the ways you can go - return from server not only stream, but data structure, contains file size as well.
On client - you can use timer and calculate time against already read vs took time vs full file size.
See this example: http://www.codeproject.com/Articles/20364/Progress-Indication-while-Uploading-Downloading-Fi
What is currently considered state-of-art, so to speak, when transferring large files over Apache NMS (using ActiveMQ)? Putting the whole content into a StreamMessage? However, I've seen the naming here is a bit misleading as the file isn't actually streamed over JMS, the entire content will reside in memory (or disk?) and will be sent all at once. Here I got some problems with files > 100 MB: Apache.NMS.NMSException : Unable to write data to the transport connection: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
BlobMessage is not supported in NMS...I really see no option but to split the file in chunks, re-assemble them on the other side, etc.
Thank you,
Cristian.
how about using GZIPInputStream,
for example :
GZIPInputStream inputStream = new GZIPInputStream(new ByteArrayInputStream(gzipped));