How to write a large CSV file to SFTP in Mule 4 - mule

I am trying to write a large CSV file to SFTP.
Used For each to split the records and write using SFTP connector.
But the file is not reaching the SFTP.
What am I doing wrong here?
Below is the flow:
<flow name="sftp-Flow" doc:id="294e7265-0bb3-466b-add4-5819088bd33c">
<file:listener doc:name="File picked from path" directory="${processing.folder}" config-ref="File-Inbound" autoDelete="true" matcher="filename-regex-filter" doc:id="bbfb12df-96a4-443f-a137-ef90c74e7de1" outputMimeType="application/csv" primaryNodeOnly="true" timeBetweenSizeCheck="1" timeBetweenSizeCheckUnit="SECONDS">
<repeatable-in-memory-stream initialBufferSize="1" bufferSizeIncrement="1" maxBufferSize="500" bufferUnit="MB"/>
<scheduling-strategy>
<fixed-frequency frequency="${file.connector.polling.frequency}"/>
</scheduling-strategy>
</file:listener>
<set-variable value="#[attributes.fileName]" doc:name="fileName - Variable" doc:id="5f064507-be62-4484-86ea-62d6cfb547fc" variableName="fileName"/>
<foreach doc:name="For Each" doc:id="87b79f6d-1321-4231-bc6d-cffbb859d94b" batchSize="500" collection="#[payload]">
<sftp:write doc:name="Push file to SFTP" doc:id="d1562478-5276-4a6f-a7fa-4a912bb44b8c" config-ref="SFTP-Connector" path='#["${sftp.remote.folder}" ++ "/" ++ vars.fileName]' mode="APPEND">
<reconnect frequency="${destination.sftp.connection.retries.delay}" count="${destination.sftp.connection.retries}"/>
</sftp:write>
</foreach>
<error-handler ref="catch-exception-strategy"/>

I have found the solution. The foreach directive only supports collections in JSON, XML, or JSON formats. I just placed a transformer to convert the CSV to JSON before the foreach. Now the file is properly saved in batches.

Instead of splitting the payload in rexords try setting the CSV reader to streaming mode.
outputMimeType="application/csv; streaming=true"
Update: the best solution might just to remove both the foreach and the outputMimeType attribute from the File listener. The file will be read and write as a binary using streaming to the SFTP write operation. Remove outputMimeType will prevent Mule from trying to parse the big file as CSV, which is not really needed since the only processing the flow is doing as a CSV is the foreach, which will no longer be needed. This method will be faster and consume less resources.

Related

How to write the files which are present in one folder to another folder with the same name and based on timestamp using Mule

Requirement: we have two different folders eg: input and output
Input folder contains different files, here we need to write the files to the output folder with the same file names based on the time stamp that is created.
Note: File should be write based on the time that is created eg: First in First Out.
You can use the List operation of the File connector. It returns an array for each matching file in a directory. Each entry contains in the payload the contents of the file but also attributes like creationTime.
You can sort the list using this criteria with a DataWeave expression. For example payloadd orderBy ($.attributes.creationDate).
Then iterate over each with a foreach to write each entry as a separate file using the Write operation.
Example:
<file:list doc:name="List" directoryPath="someDirectory"/>
<ee:transform doc:name="Transform Message">
<ee:message>
<ee:set-payload ><![CDATA[
%dw 2.0
output application/java
---
(payload orderBy $.attributes.creationTime) ]]>
</ee:set-payload>
</ee:message>
</ee:transform>
<foreach doc:name="For Each">
<file:write ... >
</foreach>

ND-JSON Split in SFTP

I have a large ND-JSON file in SFTP (~20K lines). Is there a way to generate sub files out of this (~500 lines each) and place in another folder in SFTP?
Does Mule 4 has the capability to split a large file and write in SFTP? Or is there a need for a Java component?
Please advise.
If the input file is parsed as NDJSON, you can use the DataWeave function divideBy() to separate the array read from the file into subarrays of n elements.
Example:
%dw 2.0
output application/java
import * from dw::core::Arrays
---
payload divideBy 500
Then you should be able to use a to process each segment and output an NDJSON file inside.

Apache NiFi - Process content of a CSV S3Object

I am trying to process a CSV stored in a S3 Bucket with Apache NiFi.
For this aim I am using the following flow:
The thing is that I need to replace some text of the csv file, but what I get as a output of FetchS3Object is a file, not a text.
How can I access to the text of the S3Object?
Thanks!
Finally I found the way to fix the problem.
Using SplitText processor after FetchS3Object is possible to split line by line (count = 1) and process each line after that processor.

How to write a file object to outbound endpoint in Mule

I have a <file:inbound-endpoint> which reads a large file and pass it to a java component which splits large file into multiple smaller files. I add all these smaller files into a list and return list from java component into mule flow.
Now, in mule flow, I am using <collection-splitter> or <foreach> to output those files to the <file:outbound-endpoint>.
The problem is that
It is outputting only a single file (it overwrites the file, not using the original filename for output file)
The content of the file is filename and not the file content.
You need to add a file:file-to-byte-array-transformer after you've split the List<File> and before file:outbound-endpoint so Mule will read the actual content of the java.io.File.
You need to define an outputPattern on the file:outbound-endpoint, using a MEL expression to construct a unique file name based on the properties of the in-flight message and also on other expressions, like timestamp or a UUID, whatever fits your needs.
or 1st I did as #David suggested to add file:file-to-byte-array-transformer.
For 2nd part, to get the name of file outputting to <file:outbound-endpoint> same as the file name assigned while creating file, I did following:
<foreach>
<set-variable variableName="fname" value="#[payload.path]"/>
<logger level="INFO" message="fname is: #[fname]" />
<file:file-to-byte-array-transformer />
<file:outbound-endpoint path="${file.csv.path}" outputPattern="#[fname]"/>
</foreach>
Before converting file to byte array, get the file name as after byte array conversion, its not available in #[payload] though you may still get it from #[originalPayload]

Mule unzip transformer for processing zip file contents

I need to unzip a zip file, so I am looking for unzip transformer similar to gzip-uncompress-transformer.
<sub-flow name="unzip" doc:name="unzip">
<gzip-uncompress-transformer></gzip-uncompress-transformer>
<logger level="INFO" message="Unzipped payload" doc:name="Logger" />
<byte-array-to-string-transformer
doc:name="Byte Array to String" />
<logger message="Payload is #[payload]" level="INFO" doc:name="Logger" />
</sub-flow>
Does mule provide such kind of transformer out of the box, or do I need to write custom transformer?
I do not believe Mule has a zip transformer due to the number of files that can result from using it. Such as decompressing a single zip could result in X files (one input file results in many output files). Whereas a gzip transformer is always 1 to 1 (one input file results in one output file).