Azure Data Factory V2: How to pass a file name to stored procedure variable - sql

I have a big fact Azure SQL table with the following structure:
Company Revenue
-------------------
A 100
B 200
C 100
. .
. .
. .
I am now building a stored procedure on Azure Data Factory V2 that will delete all records of a special company from the Azure SQL fact table above in a monthly basis. For this exercise this special company shall be identified by the variable #company. The structure of the stored procedure was created as:
#company NVARCHAR(5)
DELETE FROM table
WHERE [company] = #company
As I will have different Excel files from each company that will be inserting data into this table in a monthly basis (with Copy Activity), I want to use the stored procedure above to delete the old data from that company before I add the most updated one.
I would like then to pass to the variable "#company" the name of that Excel file (stored in a blob container) so that the stored procedure knows what is the relevant data to be deleted from the fact table. For example: If the Excel file is "A", the stored procedure shall be "delete from table where company = A".
Any ideas on how to pass the Excel file names to the variable "#company" and set this up on Azure Data Factory V2?

Any ideas on how to pass the Excel file names to the variable
"#company" and set this up on Azure Data Factory V2?
Based on your description, I found Event-based trigger in azure data factory maybe will meet your needs. Event-based trigger runs pipelines in response to an event, such as the arrival of a file, or the deletion of a file, in Azure Blob Storage.
So, when the new excel file created in the blob storage (BTW, it only supports V2 storage account,more details please refer to this article), you could get the #triggerBody().folderPath and #triggerBody().fileName. To use the values of these properties in a pipeline, you must map the properties to pipeline parameters. After mapping the properties to parameters, you can access the values captured by the trigger through the #pipeline.parameters.parameterName expression throughout the pipeline. (doc)
You could get the filename and pass it into your stored procedure. Then do the deletion and copy activities.

I you use the stored procedure activity you can fill the parameters from pipeline parameters: https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-stored-procedure
You can click on "Import" to automatically add the parameters after selecting a stored procedure or add them by hand. Then you can use a pipeline expression to dynamically fill them, e.g. as suggested by Jay Gong using the trigger parameters #triggerBody().folderPath and #triggerBody().fileName.
Alternatively, instead of using a stored procedure, you could add a pre-copy script to your copy activity:
This shows up only for appropriate sinks, such as a database table. You can fill this script dynamically as well. In your case, this could look like:
#{concat('DELETE FROM table
WHERE [company] = ''',triggerBody().fileName,'''')}
It might also make sense, to add a parameter to the pipeline containing the filename, and setting it to #triggerBody().fileName or any more complex expression, in case you are using it multiple times.

Related

How Paramterize Copy Activity to SQL DB with Azure Data Factory

I'm trying to automatically update tables in Azure SQL Database from another SQLDB with Azure Data Factory. At the moment, the only way to update the table Azure SQL Database is to physically select the table you want to update in Azure SQL Database, as shown here:
My configuration to automatically select a table the SQLDB that I want to copy to Azure SQL Database is as follows:
The parameters are as follows:
#concat('SELECT * FROM ',pipeline().parameters.Domain,'.',pipeline().parameters.TableName)
Can someone let me know how to configure my SINK and/or connection to automatically insert the table selected from SOURCE.
My SINK looks like the following:
And my connection looks like the following:
Can someone let me know how to configure my SINK and/or connection to
automatically insert the table selected from SOURCE.
You can use Edit option in the SQL dataset.
Create a dataset parameter for the sink table name. In the SQL sink dataset check the Edit checkbox in it and use the dataset parameter. If you want, you can use dataset parameter for the database name also. Here I have given directly (dbo).
Now in the copy activity sink, you can give the table name dynamically from any pipeline parameter (give your parameter in this case) or any variable using the dynamic content.
Also, enable the Auto create table which will create new table if the table with the given name not exists and if it exists it ignores creation and copies data to it.
My sample result:

Why isn't there an option to upsert data in Azure Data Factory inline sink

The problem I'm trying to tackle is inserting and/or updating dynamic tables in a sink within an Azure Data Factory data flow. I've managed to get the source data, transform it how I want it and then send it to a sink. The pipeline ran successfully and it said it copied 37 rows (as expected) but investigation showed that no data was actually deposited in the target table. This was because the Table Action on the sink was set to 'None'. So in trying to fix this last part, it seems I don't have the 'Create' option but do have the 'Recreate' option (see screenshot of the sink below) which is not what I want as the datasource will eventually only have changed data. I need the process to create the table if it doesn't exist and then Upsert data. (Recreate drops the table and then creates it).
If I change the sink type from Inline to Dataset, then I can select Insert and Upsert, etc options but this is then not dynamic as I need to select a specific dataset.
So has anyone come across the same issue and have you managed to have dynamic sinks in your data flow where the table is created if it doesn't exist, then upsert data.
I guess I can add a Pre SQL script which takes care of the 'create the table if it doesn't exist' but I still can't select the Upsert option with inline tables.
For the CREATE TABLE IF NOT EXISTS issue, I would recommend a Stored Procedure that is executed in the pipeline prior to the Data Flow.
For Inline vs Dataset, you can make the Dataset very flexible:
So still based on your runtime table name and no schema, so no need to target a specific table.
For the UPSERT issue, make sure you have an AlterRow activity before the Sink:

How to pass dynamic table names for sink database in Azure Data Factory

I am trying to copy tables from one schema to another with the same Azure SQL db. So far, I have created a lookup pipeline and passed the parameters for the for each loop and copy activity. But my sink dataset is not taking the parameter value I have given under "table option" field rather it is taking the dummy table I chose when creating the sink dataset. Can someone tell how can I pass dynamic table name to a sink dataset?
I have given concat('dest_schema.STG_',#{item().table_name})} in the table option field.
To make the schema and table names dynamic, add Parameters to the Dataset:
Most important - do NOT import a schema. If you already have one defined in the Dataset, clear it. For this Dataset to be dynamic, you don't want improper schemas interfering with the process.
In the Copy activity, provide the values at runtime. These can be hardcoded, variables, parameters, or expressions, so very flexible.
If it's the same database, you can even use the same Dataset for both, just provide different values for the Source and Sink.
WARNING: If you use the "Auto-create table" option, the schema for the new table will define any character field as varchar(8000), which can cause serious performance problems.
MY OPINION:
While you can do this, one of my personal rules is to not cross the database boundary. If the Source and Sink are on the same SQL database, I would try to solve this problem with a Stored Procedure rather than a data factory.

Bulk copy multiple csv files from Blob Container to Azure SQL Database

Environment:
MS Azure:
Blob Container, multiple csv files saved in a folder. This is my source.
Azure Sql Database. This is my target
Goal:
Use Azure Data Factory and build a pipeline to "copy" all files from the container and store them in their respective tables in the Azure Sql database by automatically creating those tables.
How do I do that? I tried following this but I just end up having tables incorrectly created in the database, where table is created with a single column having same name as the table name.
I believe I followed the instructions from that link pretty must as they are.
My CSV file is as follows, one column contains the table name.
The previous steps will not be repeated,it is the same as the link.
At Step3 inside the Foreach activity, we should add a Lookup activity to query the table name from the source dataset.
We can declare a String type variable tableName pervious, then set the value via expression #activity('Lookup1').output.firstRow.tableName.
At sink setting of the Copy activity, we can key in #variables('tableName').
ADF will auto create the table for us.
The debug result is as follows:

U-SQL job to query multiple tables with dynamic names

Our challenge is the following one :
in an Azure SQL database, we have multiple tables with the following table names : table_num where num is just an integer. These tables are created dynamically so the number of tables can vary. (from table_1, table_2 to table_N) All tables have the same columns.
As part of a U-SQL script file, we would like to execute the same query on all of these tables and generate an output csv file with the combined results of all these queries.
We tried several things :
U-SQL does not allow looping so we were thinking creating a View in our Azure SQL database that would combine all the tables using a cursor of some sort. Then, the U-SQL file would query this View (using external source). However, a View in Azure SQL database can only be created via a function and a function cannot execute dynamic SQL or even call a stored procedure...
We did not find a way to call a stored procedure of the external data source directly from U-SQL
we dont want to update our U-SQL job each time a new table is added...
Is there a way to do that in U-SQL through a custom extractor for instance? Any other ideas?
One solution I can think of is to use Azure Data Factory (v2) to assist in this.
You could create a pipeline with the following activities:
Lookup activity configured to execute the stored procedure
For Each activity that uses the output of the lookup activity as a source
As a child item use a U-Sql Activity that executes your U-Sql script which writes the output of a single table (the item of the For Each activity) to blob or datalake
Add a Copy Activity that merges the blobs from step 2.1 to one final blob.
If you have little or no experience working with ADF v2 do mind that it takes some time to get to know it but once you do, you won't regret it. Having a GUI to create the pipeline is a nice bonus.
Edit: as #wBob mentions another (far easier) solution is to somehow create a single table with all rows since all dynamically generated table have the same schema. You can create a stored procedure for populating this table for example.