I have a <file:inbound-endpoint> which reads a large file and pass it to a java component which splits large file into multiple smaller files. I add all these smaller files into a list and return list from java component into mule flow.
Now, in mule flow, I am using <collection-splitter> or <foreach> to output those files to the <file:outbound-endpoint>.
The problem is that
It is outputting only a single file (it overwrites the file, not using the original filename for output file)
The content of the file is filename and not the file content.
You need to add a file:file-to-byte-array-transformer after you've split the List<File> and before file:outbound-endpoint so Mule will read the actual content of the java.io.File.
You need to define an outputPattern on the file:outbound-endpoint, using a MEL expression to construct a unique file name based on the properties of the in-flight message and also on other expressions, like timestamp or a UUID, whatever fits your needs.
or 1st I did as #David suggested to add file:file-to-byte-array-transformer.
For 2nd part, to get the name of file outputting to <file:outbound-endpoint> same as the file name assigned while creating file, I did following:
<foreach>
<set-variable variableName="fname" value="#[payload.path]"/>
<logger level="INFO" message="fname is: #[fname]" />
<file:file-to-byte-array-transformer />
<file:outbound-endpoint path="${file.csv.path}" outputPattern="#[fname]"/>
</foreach>
Before converting file to byte array, get the file name as after byte array conversion, its not available in #[payload] though you may still get it from #[originalPayload]
Related
I have multiple API request bodies and Im passing them using txt file in CSV Data Set Config. This API request bodies have certain values to parameterize. How Do I acheive this? something like paramertize inside parameterized csv file.
If your CSV file contains JMeter Functions or Variables which you want to evaluate in the runtime you need to wrap the variable(s) defined in the CSV Data Set Config into __eval() function
For example if you have:
test.csv file with the single line containing ${foo} and CSV Data Set Config reading this file into some-variable
User Defined Variables which assigns the variable foo the value of bar
And a couple of Debug Samplers for visualization
You will see that:
${some-variable} will return ${foo}, basically the line from the CSV file
${__eval(${some-variable})} will return bar because the variable will be evaluated and its respective value will be resolved.
Requirement: we have two different folders eg: input and output
Input folder contains different files, here we need to write the files to the output folder with the same file names based on the time stamp that is created.
Note: File should be write based on the time that is created eg: First in First Out.
You can use the List operation of the File connector. It returns an array for each matching file in a directory. Each entry contains in the payload the contents of the file but also attributes like creationTime.
You can sort the list using this criteria with a DataWeave expression. For example payloadd orderBy ($.attributes.creationDate).
Then iterate over each with a foreach to write each entry as a separate file using the Write operation.
Example:
<file:list doc:name="List" directoryPath="someDirectory"/>
<ee:transform doc:name="Transform Message">
<ee:message>
<ee:set-payload ><![CDATA[
%dw 2.0
output application/java
---
(payload orderBy $.attributes.creationTime) ]]>
</ee:set-payload>
</ee:message>
</ee:transform>
<foreach doc:name="For Each">
<file:write ... >
</foreach>
I am trying to write a large CSV file to SFTP.
Used For each to split the records and write using SFTP connector.
But the file is not reaching the SFTP.
What am I doing wrong here?
Below is the flow:
<flow name="sftp-Flow" doc:id="294e7265-0bb3-466b-add4-5819088bd33c">
<file:listener doc:name="File picked from path" directory="${processing.folder}" config-ref="File-Inbound" autoDelete="true" matcher="filename-regex-filter" doc:id="bbfb12df-96a4-443f-a137-ef90c74e7de1" outputMimeType="application/csv" primaryNodeOnly="true" timeBetweenSizeCheck="1" timeBetweenSizeCheckUnit="SECONDS">
<repeatable-in-memory-stream initialBufferSize="1" bufferSizeIncrement="1" maxBufferSize="500" bufferUnit="MB"/>
<scheduling-strategy>
<fixed-frequency frequency="${file.connector.polling.frequency}"/>
</scheduling-strategy>
</file:listener>
<set-variable value="#[attributes.fileName]" doc:name="fileName - Variable" doc:id="5f064507-be62-4484-86ea-62d6cfb547fc" variableName="fileName"/>
<foreach doc:name="For Each" doc:id="87b79f6d-1321-4231-bc6d-cffbb859d94b" batchSize="500" collection="#[payload]">
<sftp:write doc:name="Push file to SFTP" doc:id="d1562478-5276-4a6f-a7fa-4a912bb44b8c" config-ref="SFTP-Connector" path='#["${sftp.remote.folder}" ++ "/" ++ vars.fileName]' mode="APPEND">
<reconnect frequency="${destination.sftp.connection.retries.delay}" count="${destination.sftp.connection.retries}"/>
</sftp:write>
</foreach>
<error-handler ref="catch-exception-strategy"/>
I have found the solution. The foreach directive only supports collections in JSON, XML, or JSON formats. I just placed a transformer to convert the CSV to JSON before the foreach. Now the file is properly saved in batches.
Instead of splitting the payload in rexords try setting the CSV reader to streaming mode.
outputMimeType="application/csv; streaming=true"
Update: the best solution might just to remove both the foreach and the outputMimeType attribute from the File listener. The file will be read and write as a binary using streaming to the SFTP write operation. Remove outputMimeType will prevent Mule from trying to parse the big file as CSV, which is not really needed since the only processing the flow is doing as a CSV is the foreach, which will no longer be needed. This method will be faster and consume less resources.
What I want to do is the following...
I want to divide the input file into registers, convert each record into a
file and leave all the files in a directory.
My .csv file has the following structure:
ERP,J,JACKSON,8388 SOUTH CALIFORNIA ST.,TUCSON,AZ,85708,267-3352,,ALLENTON,MI,48002,810,710-0470,369-98-6555,462-11-4610,1953-05-00,F,
ERP,FRANK,DIETSCH,5064 E METAIRIE AVE.,BRANDSVILLA,MO,65687,252-5592,1176 E THAYER ST.,COLUMBIA,MO,65215,557,291-9571,217-38-5525,129-10-0407,1/13/35,M,
As you can see it doesn't have Header row.
Here is my flow.
My problem is that when the Split Proccessor divides my csv into flows with 400 lines, it isn't save in my output directory.
It's first time using NIFI, sorry.
Make sure your RecordReader controller service is configured correctly(delimiter..etc) to read the incoming flowfile.
Records per split value as 1
You need to use UpdateAttribute processor before PutFile processor to change the filename to unique value (like UUID) unless if you are configured PutFile processor Conflict Resolution strategy as Ignore
The reason behind changing filename is SplitRecord processor is going to have same filename for all the splitted flowfiles.
Flow:
I tried your case and flow worked as expected, Use this template for your reference and upload to your NiFi instance, Make changes as per your requirements.
I have a custom Extractor with AtomicFileProcessing set to false. It extracts a large no of JSON files (each line in the file is a JSON document) and output two files with successful and failed requests, both of them contains the json rows (AUs allocated more than 1 to extract the files). Problem is when I use the same extractor to extract the outputted files in first step with more than one AU, it fails with the error, Unexpected character encountered while parsing value: e. Path '', line 0, position 0.
If I assign 1 AU on Azure or run this locally with AU set to more than 1, it successfully processes the data. Is this behavior because of more AU provided to process a single JSON file and since the file is in non-splittable format, it can't be parallelized?
you can solve this problem converting your json file to Jsonlines.
http://jsonlines.org/examples/
Then you need to read the file using text extractor and use JsonFunctions available on Microsoft.Analytics.Samples.Formats
to read the json.
That transformation will make your file splittable and you can parallelized it!