Table name is getting appending with column names in resultent file in azure datafactory - azure-data-factory-2

I was trying to get data from On-prem hive Source to Azure data lake gen 2 using azure data factory.
As I need to get data for multiple tables I have created and file(ex: tnames.txt) with all my table names and stored in data lake gen 2.
In Azure Data Factory created a lookup activity and passed tnames.txt file to it.
Then added a foreach activity to that lookup actvity and in foreach activity added a copy activity.
In copy activity in source, I was giving query to extract data.
Sink is datalake gen 2.
Example code:
select * from tableName
Here table is dynamically passed from tnames.txt.
But after data is copied into data lak,e I am getting headers in copied data are like:
"tablename.columnname".
For example: Table name is Employee and few columns are ID, Name, Gender,....
My resultent file columns are like Employee.ID,Employee.Name,Employee.Gender, but my requirement is just column name.
Basically tabe name is append to column name.
How to solve this issue/Is there any other way to get data for multiple tables in single pipeline/copy activity?

Check the mapping tab of your copy activity . If the mapping is enabled, clear it and use auto-create table . It will auto-generate the schema according to the source schema. No need to explicitly create the table with defined schema. Let it be auto create table. It will generate required mapping automatically.

Related

Azure Data Factory Copy Activity for JSON to Table in Azure SQL DB

I have a copy activity that takes a bunch of JSON files and merges them into a singe JSON.
I would now like to copy the merged single JSON to Azure SQL DB. Is that possible?
Ok, it appears to be working however the output in SQL is just countryCode and CompanyId
However, I need to retrieve all the financial information in the JSON as well
Azure Data Factory Copy Activity for JSON to Table in Azure SQL DB
I repro'd the same and below are the steps.
Two json files are taken as source.
Those files are merged into single file using copy activity.
Then Merged Json data is taken as source dataset in another copy activity.
In sink, dataset for Azure SQL db is created and Auto create table option is selected.
In sink dataset, edit checkbox is selected and sink table name is given.
Once the pipeline is run, data is copied to table.

How Paramterize Copy Activity to SQL DB with Azure Data Factory

I'm trying to automatically update tables in Azure SQL Database from another SQLDB with Azure Data Factory. At the moment, the only way to update the table Azure SQL Database is to physically select the table you want to update in Azure SQL Database, as shown here:
My configuration to automatically select a table the SQLDB that I want to copy to Azure SQL Database is as follows:
The parameters are as follows:
#concat('SELECT * FROM ',pipeline().parameters.Domain,'.',pipeline().parameters.TableName)
Can someone let me know how to configure my SINK and/or connection to automatically insert the table selected from SOURCE.
My SINK looks like the following:
And my connection looks like the following:
Can someone let me know how to configure my SINK and/or connection to
automatically insert the table selected from SOURCE.
You can use Edit option in the SQL dataset.
Create a dataset parameter for the sink table name. In the SQL sink dataset check the Edit checkbox in it and use the dataset parameter. If you want, you can use dataset parameter for the database name also. Here I have given directly (dbo).
Now in the copy activity sink, you can give the table name dynamically from any pipeline parameter (give your parameter in this case) or any variable using the dynamic content.
Also, enable the Auto create table which will create new table if the table with the given name not exists and if it exists it ignores creation and copies data to it.
My sample result:

Azure Data Factory Incremental Load data by using Copy Activity

I would like to load incremental data from data lake into on premise SQL, so that i created data flow do the necessary data transformation and cleaning the data.
after that i copied all the final data sink to staging data lake to stored CSV format.
I am facing two kind of issues here.
when ever i am trigger / debug to loading my dataset(data flow full activity ), the first time data loaded in CSV, if I load second time similar pipeline, the target data lake, the CSV file loaded empty data, which means, the column header loaded but i could not see the any value inside file.
coming to copy activity, which is connected to on premise SQL server, i am trying to load the data but if we trigger this pipeline again and again, the duplicate data loaded, i want to load only incremental or if updated data comes from data lake CSV file. how do we handle this.
Kindly suggest.
When we want to incrementally load our data to a database table, we need to use the Upsert option in copy data tool.
Upsert helps you to incrementally load the source data based on a key column (or columns). If the key column is already present in target table, it will update the rest of the column values, else it will insert the new key column with other values.
Look at following demonstration to understand how upsert works. I used azure SQL database as an example.
My initial table data:
create table player(id int, gname varchar(20), team varchar(10))
My source csv data (data I want to incrementally load):
I have taken an id which already exists in target table (id=1) and another which is new (id=4).
My copy data sink configuration:
Create/select dataset for the target table. Check the Upsert option as your write behavior and select a key column based on which upsert should happen.
Table after upsert using Copy data:
Now, after the upsert using copy data, the id=1 row should be updated and id=4 row should be inserted. The following is the final output achieved which is inline with expected output.
You can use the primary key in your target table (which is also present in your source csv) as the key column in Copy data sink configuration. Any other configuration (like source filter by last modified configuration) should not effect the process.

Bulk copy multiple csv files from Blob Container to Azure SQL Database

Environment:
MS Azure:
Blob Container, multiple csv files saved in a folder. This is my source.
Azure Sql Database. This is my target
Goal:
Use Azure Data Factory and build a pipeline to "copy" all files from the container and store them in their respective tables in the Azure Sql database by automatically creating those tables.
How do I do that? I tried following this but I just end up having tables incorrectly created in the database, where table is created with a single column having same name as the table name.
I believe I followed the instructions from that link pretty must as they are.
My CSV file is as follows, one column contains the table name.
The previous steps will not be repeated,it is the same as the link.
At Step3 inside the Foreach activity, we should add a Lookup activity to query the table name from the source dataset.
We can declare a String type variable tableName pervious, then set the value via expression #activity('Lookup1').output.firstRow.tableName.
At sink setting of the Copy activity, we can key in #variables('tableName').
ADF will auto create the table for us.
The debug result is as follows:

Best way to merge JSON blob files to SQL table using Azure Data Factory

I have a bunch of JSON files coming into Azure data lake gen 2, the JSON files contains new data as well as updates.
The data needs to be merged into a SQL table so I can start to do some reporting. The way I solved the problem has been to create a Azure Data factory that looks like this
Create and copy to temp table:
First I use the copy data to take the JSON and create a table from the schema and dump the content into the table.
Create delivery table:
Creates a table with the right schema if it doesn't already exsist
Merge temp with delivery:
Here I use a merge clause to cast and merge the data from the table that was created at step 1 with the table from step 2.
Delete temp data:
Deletes the table from step 1
This data factory gets triggered each time there's a new file in the data lake.
The pipeline solves my problem but I feel like there's a lot of unnecessary overhead by creating and dropping a new table each time I process a file.
Is there a way to optimize this flow, maybe by merging the JSON directly to the "Delivery" table?
Thanks in advance