Get the blob URL of output file written by sink operation in a Data Flow Activity - Azure Synapse Analytics

Get the blob URL of output file written by sink operation in a Data Flow Activity - Azure Synapse Analytics - azure-synapse

I have a Data Flow which reads multiple CSV files from Azure Data Lake and writes them into Azure Blob Storage as a single file. I need to get the url of the file written to the blob.
This data flow is a part of a pipeline and I need to give the Blob url as the output of the pipeline. Is there any way achieve this? Thanks in advance for your help.

To create a new column name in your data stream that is the source
file name and path utilize "Column to store file name" field under
Source transformation --> Source Options.
Source Options settings
Data Preview
You can also have a parameter inside your data flow to hold file name or file path and add that parameter value as new column using derived column transformation.
Please Note: You need supply value in to your data flow parameter from data flow activity.

Related

Azure Data Factory creates .CSV that's incompatible with Power Query

I have a pipeline that creates a dataset from a stored procedure on Azure SQL Server.
I want to then manipulate it in a power query step within the factory, but it fails to load in the power query editor with this error.
It opens up the JSON file (to correct it, I assume) but I can't see anything wrong with it.
If I download the extract from blob and upload it again as a .csv then it works fine.
The only difference I can find is that if I upload a blob direct to storage then the file information for the blob looks like this:
If I just let ADF create the .csv in blob storage the file info looks like this:
So my assumption is that somewhere in the process in ADF that creates the .csv file it's getting something wrong, and the Power Query module can't recognise it as a valid file.
All the other parts of the pipeline (Data Flows, other datasets) recognise it fine, and the 'preview data' brings it up correctly. It's just PQ that won't read it.
Any thooughts? TIA

I reproduced the same. When data is copied from SQL database to BLOB as csv file, Power query is unable to read. Also, Power query doesn't support json file. But when I tried to download the csv file and reupload, it worked.
Below are the steps to overcome this issue.
When I tried to upload the file in Blob and create the dataset for that file in power query, Schema is imported from connection/Store. It forces us to import schema either from connection/store or from sample file. There is no option as none here.
When data is copied from sql db to Azure blob, and dataset which uses the blob storage didn't have schema imported by default.
Once imported the schema, power query activity ran successfully.
Output before importing schema in dataset
After importing schema in dataset

Append data to a file using REST API

Does any one have an example of appending data to a file stored in azure datalake from source using Data Factory and the rest API here
Can I use a copy activity with REST dataset on the sink side ?
Bellow is my pipeline, it consists of a copy activity inside a foreach loop. My requirement is : if the file already exists on the sink then append data to the same file. (the copy activity here overwrite the existing file with just the new data)
Sink :

Currently, the append data to a file in Azure Data Lake is not supported in the Azure Data Factory.
As a workaround,
Load multiple files using ForEach activity into data lake.
Merge the individual files to get a single file (final file) using copy data activity.
Delete the individual files after merging.
Please refer this SO thread for similar process.

Dynamic filename in Data Factory dataflow source

I’m working with a pipeline that loads table data from onpremise SQL to a datalake csv file dynamically, sinking a .csv file for each table that I already set to load in a versionControl table in a AzureSQL using Foreach.
So, after load the data, i want to update the versionControl table with the lastUpdate date, based on the MAX(lastUpdate) field of each .csv file loaded. To accomplish that, i know that i need to add a dataflow after the copy activity, so i can use the aggregate transformation, but don’t know how to pass the filename to the source of the dataflow dynamically in a parameter.
Thanks!

2 options:
Parameterized dataset. Use a source dataset in the dataflow that has a parameter for the file name. You can then pass in that filename as a pipeline parameter.
Parameterized Source wildcard. You can also use a source dataset in the dataflow that points just to a folder in your container. You can then parameterize the wildcard property in the Source and send in the filename there as a pipeline parameter.

Azure Data Factory 2 : How to split a file into multiple output files

I'm using Azure Data Factory and am looking for the complement to the "Lookup" activity. Basically I want to be able to write a single line to a file.
Here's the setup:
Read from a CSV file in blob store using a Lookup activity
Connect the output of that to a For Each
within the For Each, take each record (a line from the file read by the Lookup activity) and write it to a distinct file, named dynamically.
Any clues on how to accomplish that?

Use Data flow, use the derived column activity to create a filename column. Use the filename column in sink. Details on how to implement dynamic filenames in ADF is describe here: https://kromerbigdata.com/2019/04/05/dynamic-file-names-in-adf-with-mapping-data-flows/

Data Flow would probably be better for this, but as a quick hack, you can do the following to read the text file line by line in a pipeline:
Define your source dataset to output a line as a single column. Normally I would use "NoDelimiter" for this, but that isn't supported by Lookup. As a workaround, define it with an incorrect Column Delimiter (like | or \t for a CSV file). You should also go to the Schema tab, and CLEAR the schema. This will generate a column in the output named "Prop_0".
In the foreach activity, set the Items to the Lookup's "output.value" and check "Sequential".
Inside the foreach, you can use item().Prop_0 to grab the text of the line:
To the best of my understanding, creating a blob isn't directly supported by pipelines [hence my suggestion above to look into Data Flow]. It is, however, very simple to do in Logic Apps. If I was tackling this problem, I would create a logic app with an HTTP Request Received trigger, then call it from ADF with a Web activity and send the text line and dynamic file name in the payload.

Getting files and folders in the datalake while reading from datafactory

While reading azure sql table data (which actually consists of path of the directories) from azure data factory by using the paths how to dynamically get the files from the datalake.
Can any one tell me what should I give in the dataset
Screenshot

You could use lookup activity to read data from azure sql, and then following it by an foreach activity. And then, pass #item(). to your dataset parameter k1.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get the blob URL of output file written by sink operation in a Data Flow Activity - Azure Synapse Analytics - azure-synapse

Related

Azure Data Factory creates .CSV that's incompatible with Power Query

Append data to a file using REST API

Dynamic filename in Data Factory dataflow source

Azure Data Factory 2 : How to split a file into multiple output files

Getting files and folders in the datalake while reading from datafactory

Categories

Resources