Getting files and folders in the datalake while reading from datafactory - azure-sql-database

While reading azure sql table data (which actually consists of path of the directories) from azure data factory by using the paths how to dynamically get the files from the datalake.
Can any one tell me what should I give in the dataset
Screenshot

You could use lookup activity to read data from azure sql, and then following it by an foreach activity. And then, pass #item(). to your dataset parameter k1.

Related

Azure Data Factory Copy Activity for JSON to Table in Azure SQL DB

I have a copy activity that takes a bunch of JSON files and merges them into a singe JSON.
I would now like to copy the merged single JSON to Azure SQL DB. Is that possible?
Ok, it appears to be working however the output in SQL is just countryCode and CompanyId
However, I need to retrieve all the financial information in the JSON as well
Azure Data Factory Copy Activity for JSON to Table in Azure SQL DB
I repro'd the same and below are the steps.
Two json files are taken as source.
Those files are merged into single file using copy activity.
Then Merged Json data is taken as source dataset in another copy activity.
In sink, dataset for Azure SQL db is created and Auto create table option is selected.
In sink dataset, edit checkbox is selected and sink table name is given.
Once the pipeline is run, data is copied to table.

Azure Data Factory creates .CSV that's incompatible with Power Query

I have a pipeline that creates a dataset from a stored procedure on Azure SQL Server.
I want to then manipulate it in a power query step within the factory, but it fails to load in the power query editor with this error.
It opens up the JSON file (to correct it, I assume) but I can't see anything wrong with it.
If I download the extract from blob and upload it again as a .csv then it works fine.
The only difference I can find is that if I upload a blob direct to storage then the file information for the blob looks like this:
If I just let ADF create the .csv in blob storage the file info looks like this:
So my assumption is that somewhere in the process in ADF that creates the .csv file it's getting something wrong, and the Power Query module can't recognise it as a valid file.
All the other parts of the pipeline (Data Flows, other datasets) recognise it fine, and the 'preview data' brings it up correctly. It's just PQ that won't read it.
Any thooughts? TIA
I reproduced the same. When data is copied from SQL database to BLOB as csv file, Power query is unable to read. Also, Power query doesn't support json file. But when I tried to download the csv file and reupload, it worked.
Below are the steps to overcome this issue.
When I tried to upload the file in Blob and create the dataset for that file in power query, Schema is imported from connection/Store. It forces us to import schema either from connection/store or from sample file. There is no option as none here.
When data is copied from sql db to Azure blob, and dataset which uses the blob storage didn't have schema imported by default.
Once imported the schema, power query activity ran successfully.
Output before importing schema in dataset
After importing schema in dataset

Append data to a file using REST API

Does any one have an example of appending data to a file stored in azure datalake from source using Data Factory and the rest API here
Can I use a copy activity with REST dataset on the sink side ?
Bellow is my pipeline, it consists of a copy activity inside a foreach loop. My requirement is : if the file already exists on the sink then append data to the same file. (the copy activity here overwrite the existing file with just the new data)
Sink :
Currently, the append data to a file in Azure Data Lake is not supported in the Azure Data Factory.
As a workaround,
Load multiple files using ForEach activity into data lake.
Merge the individual files to get a single file (final file) using copy data activity.
Delete the individual files after merging.
Please refer this SO thread for similar process.

Get the blob URL of output file written by sink operation in a Data Flow Activity - Azure Synapse Analytics

I have a Data Flow which reads multiple CSV files from Azure Data Lake and writes them into Azure Blob Storage as a single file. I need to get the url of the file written to the blob.
This data flow is a part of a pipeline and I need to give the Blob url as the output of the pipeline. Is there any way achieve this? Thanks in advance for your help.
To create a new column name in your data stream that is the source
file name and path utilize "Column to store file name" field under
Source transformation --> Source Options.
Source Options settings
Data Preview
You can also have a parameter inside your data flow to hold file name or file path and add that parameter value as new column using derived column transformation.
Please Note: You need supply value in to your data flow parameter from data flow activity.

Bulk copy multiple csv files from Blob Container to Azure SQL Database

Environment:
MS Azure:
Blob Container, multiple csv files saved in a folder. This is my source.
Azure Sql Database. This is my target
Goal:
Use Azure Data Factory and build a pipeline to "copy" all files from the container and store them in their respective tables in the Azure Sql database by automatically creating those tables.
How do I do that? I tried following this but I just end up having tables incorrectly created in the database, where table is created with a single column having same name as the table name.
I believe I followed the instructions from that link pretty must as they are.
My CSV file is as follows, one column contains the table name.
The previous steps will not be repeated,it is the same as the link.
At Step3 inside the Foreach activity, we should add a Lookup activity to query the table name from the source dataset.
We can declare a String type variable tableName pervious, then set the value via expression #activity('Lookup1').output.firstRow.tableName.
At sink setting of the Copy activity, we can key in #variables('tableName').
ADF will auto create the table for us.
The debug result is as follows: