Append data to a file using REST API - api

Does any one have an example of appending data to a file stored in azure datalake from source using Data Factory and the rest API here
Can I use a copy activity with REST dataset on the sink side ?
Bellow is my pipeline, it consists of a copy activity inside a foreach loop. My requirement is : if the file already exists on the sink then append data to the same file. (the copy activity here overwrite the existing file with just the new data)
Sink :

Currently, the append data to a file in Azure Data Lake is not supported in the Azure Data Factory.
As a workaround,
Load multiple files using ForEach activity into data lake.
Merge the individual files to get a single file (final file) using copy data activity.
Delete the individual files after merging.
Please refer this SO thread for similar process.

Related

Azure Data Factory Copy Activity for JSON to Table in Azure SQL DB

I have a copy activity that takes a bunch of JSON files and merges them into a singe JSON.
I would now like to copy the merged single JSON to Azure SQL DB. Is that possible?
Ok, it appears to be working however the output in SQL is just countryCode and CompanyId
However, I need to retrieve all the financial information in the JSON as well
Azure Data Factory Copy Activity for JSON to Table in Azure SQL DB
I repro'd the same and below are the steps.
Two json files are taken as source.
Those files are merged into single file using copy activity.
Then Merged Json data is taken as source dataset in another copy activity.
In sink, dataset for Azure SQL db is created and Auto create table option is selected.
In sink dataset, edit checkbox is selected and sink table name is given.
Once the pipeline is run, data is copied to table.

Get the blob URL of output file written by sink operation in a Data Flow Activity - Azure Synapse Analytics

I have a Data Flow which reads multiple CSV files from Azure Data Lake and writes them into Azure Blob Storage as a single file. I need to get the url of the file written to the blob.
This data flow is a part of a pipeline and I need to give the Blob url as the output of the pipeline. Is there any way achieve this? Thanks in advance for your help.
To create a new column name in your data stream that is the source
file name and path utilize "Column to store file name" field under
Source transformation --> Source Options.
Source Options settings
Data Preview
You can also have a parameter inside your data flow to hold file name or file path and add that parameter value as new column using derived column transformation.
Please Note: You need supply value in to your data flow parameter from data flow activity.

Data Factory Childitem modified or created date

I have a Data Factory V2 pipeline consisting of 'get metadata' and 'forEach' activities that reads a list of files on a file share (on-prem) and logs it in a database table. Currently, I'm only able to read file name, but would like to also retrieve the date modified and/or date created property of each file. Any help, please?
Thank you
According to the MS documentation.
We can see File system and SFTP both support the lastModified property. But we only can get the lastModified of one file or folder at a time.
I'm using File system to do the test. The process is basically the same as the previous post, we need to add a GetMetaData activity to the ForEach activity.
This is my local files.
First, I created a table for logging.
create table Copy_Logs (
Copy_File_Name varchar(max),
Last_modified datetime
)
In ADF, I'm using Child Items at Get Metadata1 activity to get the file list of the folder.
Then add dynamic content #activity('Get Metadata1').output.childItems at ForEach1 activity.
Inside the ForEach1 activity, using Last modified at Get Metadata2 activity.
In the dataset of Get Metadata2 activity, I key in #item().name as follows.
Using CopyFiles_To_Azure activity to copy local files to the Azure Data Lake Storage V2.
I key in #item().name at the source dataset of CopyFiles_To_Azure activity.
At Create_Logs activity, I'm using the following sql to get the info we need.
select '#{item().name}' as Copy_File_Name, '#{activity('Get Metadata2').output.lastModified}' as Last_modified
In the end, sink to the sql table we created previously. The result is as follows.
One way , I can think of is please add a new Getmetdata inside the FE loop and use paramterized dataset and pass a filename as the paramter . The below animation should helped , I did tested the same .
HTH .

load two files in the same DB table with Azure Data Flow

How can i Load two files(csv) the same shemas into my sqlDatabase with AzureDatafactory flow?
I've created one flow with two input and the same output but i get just the one table data the other one (NULL) .
As you said, the two csv file have the same schema, you could put them in same folder or container, make sure the container or folder only have the two files.
Then you could follow my steps:
My Container :
Data and file schema:
Data FLOW Source dataset settings:
Just choose the container or folder, all the csv files in it will be chosen. When we preview data, the data of the two csv files will be merged together
Sink dataset settings and data preview(the data will be inserted to the Sink table):
Run the pipeline:
Check the data in Sink table:
Hope this helps.

Getting files and folders in the datalake while reading from datafactory

While reading azure sql table data (which actually consists of path of the directories) from azure data factory by using the paths how to dynamically get the files from the datalake.
Can any one tell me what should I give in the dataset
Screenshot
You could use lookup activity to read data from azure sql, and then following it by an foreach activity. And then, pass #item(). to your dataset parameter k1.