Mulesoft Anypoint Platform: what to put in the "content" section of sftp write connector general configuration - anypoint-studio

Using Mulesoft Anypoint Platform: I have a use case where the request is to download a csv from azure blob and upload the same file to a sftp server. The problem arises in the sftp connector (write), at the last step, which is the actual upload to sftp. There are two fields under the general settings for this step. They are "path" and "content", as path I have chosen the variable which holds the downloaded csv file. All the data looks good in the preview by pressing down arrow on the variable block. I don't understand what should go in the content field, it says "content to be written into the file". I just want to pass the file along in the flow, no transformation. Is this necessary, any ideas? The error in the output section reads "Cannot write a null content". If I keep the value as "payload" in the content field.

Related

airbyte ETL ,connection between http API source and big query

i have a task in hand, where I am supposed to create python based HTTP API connector for airbyte. connector will return a response which will contain some links of zip files.
each zip file contains csv file, which is supposed to be uploaded to the bigquery
now I have made the connector which is returning the URL of the zip file.
The main question is how to send the underlying csv file to the bigquery ,
i can for sure unzip or even read the csv file in the python connector, but i am stuck on the part of sending the same to the bigquery.
p.s if you guys can tell me even about sending the CSV to google cloud storage, that will be awesome too
When you are building an Airbyte source connector with the CDK your connector code must output records that will be sent to the destination, BigQuery in your case. This allows to decouple extraction logic from loading logic, and makes your source connector destination agnostic.
I'd suggest this high level logic in your source connector's implementation:
Call the source API to retrieve the zip's url
Download + unzip the zip
Parse the CSV file with Pandas
Output parsed records
This is under the assumption that all CSV files have the same schema. If not you'll have to declare one stream per schema.
A great guide, with more details on how to develop a Python connector is available here.
Once your source connector outputs AirbyteRecordMessages you'll be able to connect it to BigQuery and chose the best loading method according to your need (Standard or GCS staging).

Executing Pentaho transformation(ktr) using node js with Pentaho CE

I can able to successfully execute the .ktr files using browser and as well as using postman tool by using below url
http://localhost:8089/kettle/executeTrans/?trans=D:\Pentaho\ktr\MyJson_to_Database.ktr
But I want to automate the process and this ktr and it need to accept a json file as input(right now the json data is in side the ktr file itself). As I am using NodeJS to automate the ktr executing processing, I am trying to use wreck and post method to execute it(I am new to wreck), I am facing difficulties to identify the problem whether the error is due to wrek or kettle transformation itself
In the mean time I am trying to execute it without passing path as query string in url and instead I want to use it in body, I have searched google with no success so far.
EDIT 1
I am able to reach to the ktr file from NodeJS Microservice and now the challenge is to read the file path inside docker image.
Could you work storing the json data in a file, and modifying/adding the transformation to read the file and pass the information in the file?

Reading yaml properties file from S3

I have a yaml properties file stored in a S3 bucket. In Mule4 I can read this file using S3 connector. I need to use properties defined in this file (for dynamic values reading and using it in Mule4) in DB connectors. I am not able to create properties from this file such that I can use them as ${dbUser} in mule configuration or flow as an example. Any guidance on how can I accomplish this?
You will not be able to use the S3 connector to do that. The connector can read the file in an operation at execution time, but properties placeholders, like ${dbUser} have to be defined earlier, at deployment time.
You might be able to to read the value into a variable (for example: #[vars.dbUser]) and use the variable in the database connector configuration. That is called a dynamic configuration, because it is evaluated dynamically at execution time.

Is there a way to import backups in NiFi?

Using NiFi v0.6.1 is there a way to import backups/archives?
And by backups I mean the files that are generated when you call
POST /controller/archive using the REST api or "Controller Settings" (tool bar button) and then "Back-up flow" (link).
I tried unzipping the backup and importing it as a template but that didn't work. But after comparing it to an exported template file, the formats are reasonably different. But perhaps there is a way to transform it into a template?
At the moment my current work around is to not select any components on the top level flow and then select "create template"; which will add a template with all my components. Then I just export that. My issue with this is it's a bit more tricky to automate via the REST API. I used Fiddler to determine what the UI is doing and it first generates a snippet that includes all the components (labels, processors, connections, etc.). Then it calls create template (POST /nifi-api/contorller/templates) using the snippet ID. So the template call is easy enough but generating the definition for the snippet is going to take some work.
Note: Once the following feature request is implemented I'm assuming I would just use that instead:
https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
The entire flow for a NiFi instance is stored in a file called flow.xml.gz in the conf directory (flow.xml.tar in a cluster). The back-up functionality is essentially taking a snapshot of that file at the given point in time and saving it to the conf/archive directory. At a later point in time you could stop NiFi and replace conf/flow.xml.gz with one of those back-ups to restore the flow to that state.
Templates are a different format from the flow.xml.gz. Templates are more public facing and shareable, and can be used to represent portions of a flow, or the entire flow if no components are selected. Some people have used templates as a model to deploy their flows, essentially organizing their flow into process groups and making template for each group. This project provides some automation to work with templates: https://github.com/aperepel/nifi-api-deploy
You just need to stop NiFi, replace the nifi flow configuration file (for example this could be flow.xml.gz in the conf directory) and start NiFi back up.
If you have trouble finding it check your nifi.properties file for the string nifi.flow.configuration.file= to find out what you've set this too.
If you are using clustered mode you need only do this on the NCM.

Use of FTP "append" command

I want to upload a file to a ftp server programmatically (C++). If the connection is lost while uploading a file, I wouldn't want to upload the file from scratch, but to upload only the part that I haven't sent.
Does the APPE command fulfill my demand? What list of FTP commands should I use exactly? And how?
I am googling details about APPE FTP command, what actually it does but most site just state only append. Then I try out the command to make sure it behave as expected.
I designing FTP auto sender that is used to send a log file from a machine to a server for reporting. I only want to send the last line of the log file.
When using a APPE command, it actually append the whole file content and append to the existing one in the server. This will cause the line entry duplicated.
The answer:
To do the resume of file if the last transfer is failed, there is no such command for that, but we need to use a sequence of command to achieve it.
The key point here is seek your local file to the last uploaded byte if you are using APPE command or using command REST. REST will start transfer on that particular byte start position. I end-up with this solution to perform after connection established:
Use APPE (I got the idea from FileZilla log):
Use SIZE to check for file exist and use it as resume marker.
Open local file system and seek on the marker.
Use APPE to upload and FTP server will append it automatically.
Use STOR with REST (I got the idea from edtFTPnet):
Use SIZE to check for file exist and use it as resume marker.
Send REST with the result you get from SIZE to tell FTP server to start write on the position.
Open local file system and seek on the marker.
Use STOR as normal upload.
Note that not all FTP server support for both way. I see FileZilla switch this two way depending on the server. My observation shows that using REST is the standard way. Download can also use REST to start download on the given byte position.
Remember that using resume support for ASCII transfer type will produce unexpected result since Unix and Windows have different line break byte count.
Try to manipulate FileZilla to see the behave in the log.
You can also check this useful open source FTP for .NET library on how they do it.
edtFTPnet
Check the RFC and specifically the APPEND command:
This
command causes the server-DTP to
accept the data transferred via the
data connection and to store the data
in a file at the server site. If the
file specified in the pathname exists
at the server site, then the data
shall be appended to that file;
otherwise the file specified in the
pathname shall be created at the
server site.
Note that you cannot simply APPEND the same file again. You should send the bytes remaining. That is, continue at the same position when the connection was lost.