Data Flow output to Azure SQL Database contains only NULL data on Azure Data Factory - sql

I'm testing the data flow on my Azure Data Factory. I created Data Flow with the following details:
Source dataset linked service - from CSV files dataset from Blob storage
Sink linked service - Azure SQL database with pre-created table
My CSV files are quite simple as they contain only 2 columns (PARENT, CHILD). So, my table in SQL DB also have only 2 columns.
For the sink setting of my data flow, I have allowed insert data and leaving other options as default.
I have also mapped the 2 columns for input and output columns as per screenshot.
The pipeline with data flow ran successfully when I checked the result, I could see thqat 5732 rows were processed. Is this the correct way to check? As this is the first time I try this functionality in Azure Data Factory.
But, when I click on Data preview tab, they are all NULL value.
And; when I checked my Azure SQL DB in the table where I tried to insert the data from CSV files from Blob storage with selecting top 1000 rows from this table, I don't see any data.
Could you please let me know what I configured incorrectly on my Data Flow? Thank you very much in advance.
Here is the screenshot of ADF data flow source data, it does see the data on the right side as they are not NULL, but on the left side are all NULLs. I imagine that the right side are the data from the CSV from the source on the blob right? And the left side is the sink destination as the table is empty for now?
And here is the screenshot for the sink inspect input, I think this is correct as it reads the 2 columns correctly (Parent, Child), is it?
After adding Map drifted, to map "Parent" => "parent" and "Child" => "child"
I get this error message after running the pipeline.
When checking on sink data preview, I get this error message. It seems like there is incorrect mapping?
I rename the MapDrifted1 expression to "toString(byName('Parent1))" and Child1 as suggested.
The data flow executed successfully, however I still get NULL data in the sink SQL table.

Can you copy/paste the script behind your data flow design graph? Go to the ADF UI, open the data flow, then click the Script button on top right.
In your Source transformation, click on Data Preview to see the data. Make sure you are seeing your data, not NULLs. Also, look at the Inspect on the INPUT for your Sink, to see if ADF is reading additional columns.

Related

Azure Data Factory Copy Activity for JSON to Table in Azure SQL DB

I have a copy activity that takes a bunch of JSON files and merges them into a singe JSON.
I would now like to copy the merged single JSON to Azure SQL DB. Is that possible?
Ok, it appears to be working however the output in SQL is just countryCode and CompanyId
However, I need to retrieve all the financial information in the JSON as well
Azure Data Factory Copy Activity for JSON to Table in Azure SQL DB
I repro'd the same and below are the steps.
Two json files are taken as source.
Those files are merged into single file using copy activity.
Then Merged Json data is taken as source dataset in another copy activity.
In sink, dataset for Azure SQL db is created and Auto create table option is selected.
In sink dataset, edit checkbox is selected and sink table name is given.
Once the pipeline is run, data is copied to table.

Azure Data Factory Incremental Load data by using Copy Activity

I would like to load incremental data from data lake into on premise SQL, so that i created data flow do the necessary data transformation and cleaning the data.
after that i copied all the final data sink to staging data lake to stored CSV format.
I am facing two kind of issues here.
when ever i am trigger / debug to loading my dataset(data flow full activity ), the first time data loaded in CSV, if I load second time similar pipeline, the target data lake, the CSV file loaded empty data, which means, the column header loaded but i could not see the any value inside file.
coming to copy activity, which is connected to on premise SQL server, i am trying to load the data but if we trigger this pipeline again and again, the duplicate data loaded, i want to load only incremental or if updated data comes from data lake CSV file. how do we handle this.
Kindly suggest.
When we want to incrementally load our data to a database table, we need to use the Upsert option in copy data tool.
Upsert helps you to incrementally load the source data based on a key column (or columns). If the key column is already present in target table, it will update the rest of the column values, else it will insert the new key column with other values.
Look at following demonstration to understand how upsert works. I used azure SQL database as an example.
My initial table data:
create table player(id int, gname varchar(20), team varchar(10))
My source csv data (data I want to incrementally load):
I have taken an id which already exists in target table (id=1) and another which is new (id=4).
My copy data sink configuration:
Create/select dataset for the target table. Check the Upsert option as your write behavior and select a key column based on which upsert should happen.
Table after upsert using Copy data:
Now, after the upsert using copy data, the id=1 row should be updated and id=4 row should be inserted. The following is the final output achieved which is inline with expected output.
You can use the primary key in your target table (which is also present in your source csv) as the key column in Copy data sink configuration. Any other configuration (like source filter by last modified configuration) should not effect the process.

Why isn't there an option to upsert data in Azure Data Factory inline sink

The problem I'm trying to tackle is inserting and/or updating dynamic tables in a sink within an Azure Data Factory data flow. I've managed to get the source data, transform it how I want it and then send it to a sink. The pipeline ran successfully and it said it copied 37 rows (as expected) but investigation showed that no data was actually deposited in the target table. This was because the Table Action on the sink was set to 'None'. So in trying to fix this last part, it seems I don't have the 'Create' option but do have the 'Recreate' option (see screenshot of the sink below) which is not what I want as the datasource will eventually only have changed data. I need the process to create the table if it doesn't exist and then Upsert data. (Recreate drops the table and then creates it).
If I change the sink type from Inline to Dataset, then I can select Insert and Upsert, etc options but this is then not dynamic as I need to select a specific dataset.
So has anyone come across the same issue and have you managed to have dynamic sinks in your data flow where the table is created if it doesn't exist, then upsert data.
I guess I can add a Pre SQL script which takes care of the 'create the table if it doesn't exist' but I still can't select the Upsert option with inline tables.
For the CREATE TABLE IF NOT EXISTS issue, I would recommend a Stored Procedure that is executed in the pipeline prior to the Data Flow.
For Inline vs Dataset, you can make the Dataset very flexible:
So still based on your runtime table name and no schema, so no need to target a specific table.
For the UPSERT issue, make sure you have an AlterRow activity before the Sink:

Azure Data Factory: trivial SQL query in Data Flow returns nothing

I am experimenting with Data Flows in Azure Data Factory.
I have:
Set up a LinkedService to a SQL Server db. This db only has 2 tables.
The two tables are called "dummy_data_table1" and "dummy_data_table1" and are registered as Datasets
The ADF is copying data from these 2 tables, and in the Data Flow they are called "source1" and "source2"
However, when I select a source, go to Source options, and change Input from Table to Query and enter a simple query, it returns 0 columns (there are 11 columns in dummy_data_table1). I suspect my syntax is wrong, but how should I change it?
Hopefully this screenshot will help.
The problem was not the syntax. The problem was that the data flow could not recognize "dummy_data_table1" because it didn't refer to anything known. To make it work, I had to:
Enable Data Flow Debug (at the top of the page, not visible in my screenshot)
Once that's enabled, I had to click on "import projection" to import the schema of my table
Once this is done, the table name and fields are all automatically recognized and can be referenced to in the query just like one would do in SQL Server.
Source:
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-source#import-schema

Copy Data from Blob to SQL via Azure data factory

I have two sample files in blob as sample1.csv and sample2.csv as below
data sample
SQL table name sample2, with column Name,id,last name,amount
Created a ADF flow without schema, it results as below
preview data
source settings are allow schema drift checked.
sink setting are auto mapping turned on. allow insert checked. table action none.
I have also tried setting a define schema in dataset, its result are same.
any help here?
my expected outcome would be data in sample1 will inserted null into the column "last name"
If I understand correctly, you said: "my expected outcome would be data in sample1 will inserted null into the column last name", you only need to add a derived column to you sample1.csv file.
You could follow my steps:
I create a sample1.csv file in Blob Storage and a sample2 table in my SQL database:
Using DerivedColumn to create new column last name with null value:
expression: toString(null())
Sink settings:
Run the pipeline and check the data in table:
Hope this helps.
You cannot mix schemas in the same source in the same data flow execution.
Schema Drift will handle changes to the schema on an execution-per-execution basis.
But if you are reading multiple different schemas from a folder, you will get non-deterministic results.
Instead, if you loop through those files in a pipeline ForEach one-by-one, data flow will be able to handle the evolving schema.