Azure Data Factory copy data from XML to SQL Synapse pool - sql

I've created a SQL Dedicated pool table in Synapse, and now trying to copy data from multiple XML files to this database
I've mapped all fields from the XML file that I need to each specific column in the destination table, but the following error is blocking the copy data activity:
Message=Column count in target table does not match column count specified in input. If BCP command, ensure format file column count matches destination table. If SSIS data import, check column mappings are consistent with target.,Source=.Net SqlClient Data Provider,SqlErrorNumber=107098,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=107098,State=1,Message=Column count in target table does not match column count specified in input. If BCP command, ensure format file column count matches destination table. If SSIS data import, check column mappings are consistent with target.,},],'
Any idea what I am doing wrong?

This is because the count of column in the XML files and Synapse is not equal. You need to check the count of your XML files and make them be same as the count of your sink's column.

Related

Azure Data Factory Incremental Load data by using Copy Activity

I would like to load incremental data from data lake into on premise SQL, so that i created data flow do the necessary data transformation and cleaning the data.
after that i copied all the final data sink to staging data lake to stored CSV format.
I am facing two kind of issues here.
when ever i am trigger / debug to loading my dataset(data flow full activity ), the first time data loaded in CSV, if I load second time similar pipeline, the target data lake, the CSV file loaded empty data, which means, the column header loaded but i could not see the any value inside file.
coming to copy activity, which is connected to on premise SQL server, i am trying to load the data but if we trigger this pipeline again and again, the duplicate data loaded, i want to load only incremental or if updated data comes from data lake CSV file. how do we handle this.
Kindly suggest.
When we want to incrementally load our data to a database table, we need to use the Upsert option in copy data tool.
Upsert helps you to incrementally load the source data based on a key column (or columns). If the key column is already present in target table, it will update the rest of the column values, else it will insert the new key column with other values.
Look at following demonstration to understand how upsert works. I used azure SQL database as an example.
My initial table data:
create table player(id int, gname varchar(20), team varchar(10))
My source csv data (data I want to incrementally load):
I have taken an id which already exists in target table (id=1) and another which is new (id=4).
My copy data sink configuration:
Create/select dataset for the target table. Check the Upsert option as your write behavior and select a key column based on which upsert should happen.
Table after upsert using Copy data:
Now, after the upsert using copy data, the id=1 row should be updated and id=4 row should be inserted. The following is the final output achieved which is inline with expected output.
You can use the primary key in your target table (which is also present in your source csv) as the key column in Copy data sink configuration. Any other configuration (like source filter by last modified configuration) should not effect the process.

Table name is getting appending with column names in resultent file in azure datafactory

I was trying to get data from On-prem hive Source to Azure data lake gen 2 using azure data factory.
As I need to get data for multiple tables I have created and file(ex: tnames.txt) with all my table names and stored in data lake gen 2.
In Azure Data Factory created a lookup activity and passed tnames.txt file to it.
Then added a foreach activity to that lookup actvity and in foreach activity added a copy activity.
In copy activity in source, I was giving query to extract data.
Sink is datalake gen 2.
Example code:
select * from tableName
Here table is dynamically passed from tnames.txt.
But after data is copied into data lak,e I am getting headers in copied data are like:
"tablename.columnname".
For example: Table name is Employee and few columns are ID, Name, Gender,....
My resultent file columns are like Employee.ID,Employee.Name,Employee.Gender, but my requirement is just column name.
Basically tabe name is append to column name.
How to solve this issue/Is there any other way to get data for multiple tables in single pipeline/copy activity?
Check the mapping tab of your copy activity . If the mapping is enabled, clear it and use auto-create table . It will auto-generate the schema according to the source schema. No need to explicitly create the table with defined schema. Let it be auto create table. It will generate required mapping automatically.

How to Set default value of empty data of column in copy activity from csv file using azure data factory v2

I've multiple csv files and multiple tables.
The table name is file name and column name is first row of csv file.
Now I want to add default value of empty string to the sink table.
Consider my scenario,
employee:
id int, name varchar, is_active bit NULL
employee.csv:
id|name|is_active
1|raja|
Now I'm trying to copy the csv data to PostgreSQL table its throwing error.
Expected result is default value if its empty value.
You can use NULLIF in PostgreSQL:
NULLIF(argument_1,argument_2);
The NULLIF function returns a null value if argument_1 equals to argument_2, otherwise it returns argument_1.
This way you can replace NULL value with some other value
If your error is related to Type mismatch then consider typecasting the column first
Thanks!
As per the issue, tried to repro the scenario and here is the following outcome which was successfully copied. You have to use
Source Dataset: employee.csv from Azure Blob Storage
Sink Dataset : Here, I have used the sink as Azure SQL DB for some limitations but as you have used PostgreSQL is almost similar.
Copy Activity Settings:
Under the mapping settings there will be type conversion, where you have to import schema else you can dynamically add
Output:
Alternative to use DataFlow - if you have multiple data fields, you need to use the derived column transformation to generate new columns in your data flow or to modify existing fields.
For more details, refer Derived column transformation in mapping data flow.
You can even refer to this Microsoft Q&A post for more insights: Copy Task failure because of conversion failure

Bulk copy multiple csv files from Blob Container to Azure SQL Database

Environment:
MS Azure:
Blob Container, multiple csv files saved in a folder. This is my source.
Azure Sql Database. This is my target
Goal:
Use Azure Data Factory and build a pipeline to "copy" all files from the container and store them in their respective tables in the Azure Sql database by automatically creating those tables.
How do I do that? I tried following this but I just end up having tables incorrectly created in the database, where table is created with a single column having same name as the table name.
I believe I followed the instructions from that link pretty must as they are.
My CSV file is as follows, one column contains the table name.
The previous steps will not be repeated,it is the same as the link.
At Step3 inside the Foreach activity, we should add a Lookup activity to query the table name from the source dataset.
We can declare a String type variable tableName pervious, then set the value via expression #activity('Lookup1').output.firstRow.tableName.
At sink setting of the Copy activity, we can key in #variables('tableName').
ADF will auto create the table for us.
The debug result is as follows:

CSV file matadata validation (comparing with existing SQl Table)

I have a requirement of validating CSV file before loading into staged-folder, and later have to load into sql table.
I need to validate metadata (the structure of the file must be same as target sql table)
No. of columns should be equal to the target sql table
order of columns should be same as target sql table
Data types of columns (no text values should exist in numeric field of csv file)
looking for some easy and efficient way achieve this.
Thanks for help
A Python program and module that does most of what you're looking for is chkcsv.py: https://pypi.org/project/chkcsv/. It can be used to verify that a CSV file contains a specified set of columns and that the data type of each column conforms to the specification. It does not, however, verify that the order of columns in the CSV file is the same as the order in the database table. Instead of loading the CSV file directly into the target table, you can load it into a staging table and then move it from there into the target table--this two-step process eliminates column order dependence.
Disclaimer: I wrote chkcsv.py
Edit 2020-01-26: I just added an option that allows you to specify that the column order should be checked also.