I have a source file in a Storage account that is json "Document per line". I transform it through a Data Flow and would like the outpout to a json file to be an array of objects(e.g. actually valid json format).
Source file:
{..}
{..}
{..}
Expected output (array of objects):
[
{},
{},
{}
]
The sink in a Data Flow doesn't have any options to specify the desired output. Do anyone have any idea in how to achieve this?
My current effort consists of selecting "Output to single file" in the Sink settings. But there are no options for selecting json format.
I fixed by sinking into a csv (everything will be set to string), adjusted every column with coalesce (because it would create null values) and used copy data to transform to json.
In the sink tab of the copy data, set the File Pattern to Array of Objects
Related
I have a job which processes files and then lands them as single CSVs on a blob storage container. The problem I face is that I also need to land empty files, which only contain the header. How can this be achieved when I use .saveSingleFile?
Example Code snipped:
df.coalesce(1)
.write
.options(configuration.readAndWriteOptions)
.partitionBy(INGESTION_TIME)
.format("csv")
.mode("append")
.saveSingleFile(path.toString)
Example readAndWriteOptions:
{"sep": ";", "header": "true"}
In other words:
In above case, if df.show() is only displaying a header, no CSV file is written. However, I want to output a csv file without data but column names. Is there an option which would allow this ? Both cases need to be possible, if data is available and if data is not available, therefore something like .take(1) will not be a sufficient solution.
Update:
Looks like this is related to a Spark API Bug and should have been resolved with Version 3.
In a dataflow job written in Kotlin
using a PubSub subscription as input i receive a Proto object (Event) and map this object to Strings.
My pipeline has type:
PCollection<KV<Event, String>>
These strings are the lines of a file that must be written in GCS.
The Event Object has a "Id" that must be used to set the filename, and a "name" to set the folder.
Is it possible using FileIO ?
pipeline.apply(
FileIO.writeDynamic<String, String>()
.to("gs://my-bucket")
// withNaming?
)
My goal is to write the right lines in the right files, based on the information in the Event object
File-names can be customized by providing a FileNaming implementation to the withNaming() API.
However this currently does not support mapping input elements directly to final file names. Input elements can be mapped to groups using the dynamic destinations API and for each group you can provide a file-naming strategy.
To fully customize naming using input element values you might need to implement a new sink transform.
When running a Postman requests through Collection runner by passing the values contain in a CSV file for the input parameters, how can I validate each response contain the String text mentioned in the expected value column in this CSV file. I also want to write each response to the Actual result column in this CSV file.
For example after executing the 1st request in the above pic using the Postman collection runner I want to validate the response contains 'Sydney' as a text value and give the result as 'PASS' or 'FAIL' as well as write the actual response to the Actual result column of the above file. This should continue till the last row of the CSV file.
You can access the data in the CSV using the data dictionary, for example data.Scenario.
Checkout this article, chapter "Data variables in pre-request and test scripts".
I don't think it is possible to write data back to the CSV file.
I have written a simple query and join it with a json reference data. I can see correct results when testing the query in "Test results" tab. However, no output is generated when starting the job.
I have confirmed that the output blob is created when no join with reference data is used in the query.
Any help is appreciated. The sample reference json follows:
[
{
"DeviceId":"DEV-021",
"Brand":"brand01",
"Model":"model01"
}
]
Use flat json structure instead of array. It should give you the output
Check the path you specified in the reference data, maybe it is not correct or you did not specify the file name. Does it contain something like {date}/{time}/filename.json?
If you forget to specify the file name, it does not work as well.
And if you are testing the job, usually you specify the file manually and that is why your query works.
I have a simple conversion of JSON to XML using MuleSoft. In "Transform Message" component, I provided JSON Schema as Input and XML Schema as Output. When I run the app, the conversion happens if the file matches with both schema but it generates an empty XML file if it doesn't match.
I want below conditions:
1) If the file matches with schema, the converted output file should be sent to converted folder and the original file should move to Success folder.
2) If the file doesn't match with schema, the original file should move to the Failure folder instead of conversion.
Hope, I explained it comprehensively as I am new to MuleSoft. Here is a sample diagram which may simplify my requirement. Provide me with a new one if I badly designed the process.
First thing you need to create a flowVar that will hold your original payload.
When your doing your evaluation, if its XML then use a simple XPath expression like //elementName[not(node())]
Lastly, on your success use scatter-gather for multi-threading write. Pull your original payload from flowVar and write to Success and Write your regular payload to your Converted folder