Bulk copy multiple csv files from Blob Container to Azure SQL Database - sql

Environment:
MS Azure:
Blob Container, multiple csv files saved in a folder. This is my source.
Azure Sql Database. This is my target
Goal:
Use Azure Data Factory and build a pipeline to "copy" all files from the container and store them in their respective tables in the Azure Sql database by automatically creating those tables.
How do I do that? I tried following this but I just end up having tables incorrectly created in the database, where table is created with a single column having same name as the table name.
I believe I followed the instructions from that link pretty must as they are.

My CSV file is as follows, one column contains the table name.
The previous steps will not be repeated,it is the same as the link.
At Step3 inside the Foreach activity, we should add a Lookup activity to query the table name from the source dataset.
We can declare a String type variable tableName pervious, then set the value via expression #activity('Lookup1').output.firstRow.tableName.
At sink setting of the Copy activity, we can key in #variables('tableName').
ADF will auto create the table for us.
The debug result is as follows:

Related

Azure Data Factory Copy Activity for JSON to Table in Azure SQL DB

I have a copy activity that takes a bunch of JSON files and merges them into a singe JSON.
I would now like to copy the merged single JSON to Azure SQL DB. Is that possible?
Ok, it appears to be working however the output in SQL is just countryCode and CompanyId
However, I need to retrieve all the financial information in the JSON as well
Azure Data Factory Copy Activity for JSON to Table in Azure SQL DB
I repro'd the same and below are the steps.
Two json files are taken as source.
Those files are merged into single file using copy activity.
Then Merged Json data is taken as source dataset in another copy activity.
In sink, dataset for Azure SQL db is created and Auto create table option is selected.
In sink dataset, edit checkbox is selected and sink table name is given.
Once the pipeline is run, data is copied to table.

How Paramterize Copy Activity to SQL DB with Azure Data Factory

I'm trying to automatically update tables in Azure SQL Database from another SQLDB with Azure Data Factory. At the moment, the only way to update the table Azure SQL Database is to physically select the table you want to update in Azure SQL Database, as shown here:
My configuration to automatically select a table the SQLDB that I want to copy to Azure SQL Database is as follows:
The parameters are as follows:
#concat('SELECT * FROM ',pipeline().parameters.Domain,'.',pipeline().parameters.TableName)
Can someone let me know how to configure my SINK and/or connection to automatically insert the table selected from SOURCE.
My SINK looks like the following:
And my connection looks like the following:
Can someone let me know how to configure my SINK and/or connection to
automatically insert the table selected from SOURCE.
You can use Edit option in the SQL dataset.
Create a dataset parameter for the sink table name. In the SQL sink dataset check the Edit checkbox in it and use the dataset parameter. If you want, you can use dataset parameter for the database name also. Here I have given directly (dbo).
Now in the copy activity sink, you can give the table name dynamically from any pipeline parameter (give your parameter in this case) or any variable using the dynamic content.
Also, enable the Auto create table which will create new table if the table with the given name not exists and if it exists it ignores creation and copies data to it.
My sample result:

Get CSV Data from Blob Storage to SQL server Using ADF

I want to transfer data from csv file which is in an azure blob storage with the correct data types to SQL server table.
How can I get the structure for the table in the CSV file? ( I mean like when we do script table to new query in SSMS).
Note that the CSV file is not available on premise.
If your target table is already created in SSMS, copy activity will take care of the schema of source and target tables.
This is my sample csv file from blob:
In the sink I have used a table from Azure SQL database. For you, you can create SQL server dataset by SQL server linked service.
You can see the schema of csv and target tables and their mapping.
Result:
if your target table is not created in SSMS, you can use dataflows and can define the schema that you want in the Projection.
Create a data flow and in the sink give our blob csv file. In the projection of sink, we can give the datatypes that we want for the csv file.
As our target table is not created before, check on edit in the dataset and give the name for the table.
In the sink, give this dataset (SQL server dataset in your case) and make sure you check on the Recreate table in the sink Settings, so that a new table with that name will be created.
Execute this Dataflow, your target table will be created with your user defined data types.

Table name is getting appending with column names in resultent file in azure datafactory

I was trying to get data from On-prem hive Source to Azure data lake gen 2 using azure data factory.
As I need to get data for multiple tables I have created and file(ex: tnames.txt) with all my table names and stored in data lake gen 2.
In Azure Data Factory created a lookup activity and passed tnames.txt file to it.
Then added a foreach activity to that lookup actvity and in foreach activity added a copy activity.
In copy activity in source, I was giving query to extract data.
Sink is datalake gen 2.
Example code:
select * from tableName
Here table is dynamically passed from tnames.txt.
But after data is copied into data lak,e I am getting headers in copied data are like:
"tablename.columnname".
For example: Table name is Employee and few columns are ID, Name, Gender,....
My resultent file columns are like Employee.ID,Employee.Name,Employee.Gender, but my requirement is just column name.
Basically tabe name is append to column name.
How to solve this issue/Is there any other way to get data for multiple tables in single pipeline/copy activity?
Check the mapping tab of your copy activity . If the mapping is enabled, clear it and use auto-create table . It will auto-generate the schema according to the source schema. No need to explicitly create the table with defined schema. Let it be auto create table. It will generate required mapping automatically.

U-SQL job to query multiple tables with dynamic names

Our challenge is the following one :
in an Azure SQL database, we have multiple tables with the following table names : table_num where num is just an integer. These tables are created dynamically so the number of tables can vary. (from table_1, table_2 to table_N) All tables have the same columns.
As part of a U-SQL script file, we would like to execute the same query on all of these tables and generate an output csv file with the combined results of all these queries.
We tried several things :
U-SQL does not allow looping so we were thinking creating a View in our Azure SQL database that would combine all the tables using a cursor of some sort. Then, the U-SQL file would query this View (using external source). However, a View in Azure SQL database can only be created via a function and a function cannot execute dynamic SQL or even call a stored procedure...
We did not find a way to call a stored procedure of the external data source directly from U-SQL
we dont want to update our U-SQL job each time a new table is added...
Is there a way to do that in U-SQL through a custom extractor for instance? Any other ideas?
One solution I can think of is to use Azure Data Factory (v2) to assist in this.
You could create a pipeline with the following activities:
Lookup activity configured to execute the stored procedure
For Each activity that uses the output of the lookup activity as a source
As a child item use a U-Sql Activity that executes your U-Sql script which writes the output of a single table (the item of the For Each activity) to blob or datalake
Add a Copy Activity that merges the blobs from step 2.1 to one final blob.
If you have little or no experience working with ADF v2 do mind that it takes some time to get to know it but once you do, you won't regret it. Having a GUI to create the pipeline is a nice bonus.
Edit: as #wBob mentions another (far easier) solution is to somehow create a single table with all rows since all dynamically generated table have the same schema. You can create a stored procedure for populating this table for example.