I have a scenario to copy output of GET Metadata activity into a SQL table. Can I do this directly without using Databricks notebook?
You can make use of look up activity.
GetMetadata -> Lookup
And write insert SQL statement in Query, or use stored procedure.
Related
I am trying to create a dataflow in SSIS where the source data originates from an excel file and reaches to a temporary staging table in a SQL server where I can add various stored procedures to the data.
The dataflow that I have created stores the data permanently on what is supposed to be the staging area.
I would like to get some ideas on creating the staging table in SQL with the SSIS dataflow.
your question is a bit confusing. I suppose that you are maybe trying to make the data loaded in the table of the staging area temporary without keeping the past loaded data.
If I'm right what you're trying to accomplish is a "full resfresh" data flow.
From your description I assume you alerady have the staging table (so no nedd to CREATE it) but you need to truncate it at every run. You can achive this by using a Execut SQL Task element to the control flow with a TRUNCATE TABLE <YOUR TABLE NAME> in it. The data flow loading the data must be in dependency of this task with the result of truncating your table at every run.
If you need to CREATE a table you can do it in the control flow with the Execute SQL Task (you can execute any kind of query with this task), rember to set correctly the connection manager of the task.
I have a file on Google Cloud Storage that contains a number of queries( Create table, truncate/delete table, insert, merge, select etc). I need to execute all statements in sequence as they appear in the file against bigquery. How do I do that?
Currently, at this moment there is no way to achieve this. You might follow the procedure:
1. Separate your file in order to have instructions DDL following the correct syntax and running them.
2. Create a CSV and import the data into the bigQuery, following the procedure.
If your database is huge, you may want to do the import using the API.
Also, here the documentation for DML Syntax.
Our challenge is the following one :
in an Azure SQL database, we have multiple tables with the following table names : table_num where num is just an integer. These tables are created dynamically so the number of tables can vary. (from table_1, table_2 to table_N) All tables have the same columns.
As part of a U-SQL script file, we would like to execute the same query on all of these tables and generate an output csv file with the combined results of all these queries.
We tried several things :
U-SQL does not allow looping so we were thinking creating a View in our Azure SQL database that would combine all the tables using a cursor of some sort. Then, the U-SQL file would query this View (using external source). However, a View in Azure SQL database can only be created via a function and a function cannot execute dynamic SQL or even call a stored procedure...
We did not find a way to call a stored procedure of the external data source directly from U-SQL
we dont want to update our U-SQL job each time a new table is added...
Is there a way to do that in U-SQL through a custom extractor for instance? Any other ideas?
One solution I can think of is to use Azure Data Factory (v2) to assist in this.
You could create a pipeline with the following activities:
Lookup activity configured to execute the stored procedure
For Each activity that uses the output of the lookup activity as a source
As a child item use a U-Sql Activity that executes your U-Sql script which writes the output of a single table (the item of the For Each activity) to blob or datalake
Add a Copy Activity that merges the blobs from step 2.1 to one final blob.
If you have little or no experience working with ADF v2 do mind that it takes some time to get to know it but once you do, you won't regret it. Having a GUI to create the pipeline is a nice bonus.
Edit: as #wBob mentions another (far easier) solution is to somehow create a single table with all rows since all dynamically generated table have the same schema. You can create a stored procedure for populating this table for example.
I need to store all the SQL Queries under one folder and refer in the SSIS package to better organize the list of SQL file I am using so we can easily change the SQL file later without having to rebuild the package. This will include all queries that I am using for "Execute SQL task " as well as the queries in "OLE DB Data Source" under Data Flow component.
Under Data Flow task, Instead of putting the SQL query for source data base into the Query Window:
I see four options under Data Access mode for OLE DB Data source-
1. Table or View
2. Table of view name variable
3. SQL Command
4. SQL Command from variable
I understand I could use a variable, store the query in the variable and refer it in "Execute SQL Task" but I am looking for a way to store all the queries in SQL files and it in Data Flow component as well as in "Execute SQL Script Task". I can't seem to find a way to make it dynamic in Data Flow task. Can anyone help with this please?
I don't think storing them as sql files is any good for any type of scenario.
Here is what I would do given what you have described.
You can store the queries as varchar in a table in a database instead of as files. Then you can foreach-loop over the result set and map each row to the variable that you would then use as the query for your oledb data source in the dataflow.
So create a variable and make a for each loop that loops over "select query from dbo.queries" or what ever. Set the output to the variable you created and in the container create your dataflow and set either the source-task with an expression or with "SQL Command from variable".
As for the control flow queries why not just have them be stored procedures that you can change when you need to?
In one SQL Task can I create a table variable
DELCARE #TableVar TABLE (...)
Then in another SQL Task or DataSource destination and select or insert into the table variable?
The other option I have considered is using a Temp Table.
CREATE TABLE #TempTable (...)
I would prefer to use Table Variable so that it remains in memory. But can use temp table if it is not possible to use table variable. Also I cannot use the record set destination as I need to preform straight SQL tasks on it later on.
The use case that this is trying to solve is essentially performing a transformation in the stead of BizTalk. There is a very large flat file to flat file transformation that BizTalk has to transform unfortunately the data volume would produce unacceptable load on the BizTalk server so the idea is to off load it to SSIS. However, it is not a simple row to row transformation, there are different types of rows which have relations to each other. The first task in SSIS is to load the row into appropriate (temp) tables, then in the second data task a select is preformed with the correct format for output.
You could use some of the techniques in this post: http://consultingblogs.emc.com/jamiethomson/archive/2006/11/19/SSIS_3A00_-Using-temporary-tables.aspx
especially the ones about using RetainSameConnection=TRUE on the connection manager.
I would be interested to see more information about what use case you have that requires you to write out data to a temp table or table variable before further SSIS processing. Couldn't you take care of all of the SQL required steps in your source query before you start processing the dataflow with SSIS?
Table variables are not kept solely in memory and can be written to disk under memory pressure. I tend to use table variables for very small lookups. If you Need to push a table into SQL Server due to necessary and complex transformations, then use a 'permanent' temp table that is truncated within the SSIS package prior to insert. Simple and will get what you need done.
The SSIS package would be run in a job. I assume it runs inside a SQL job. In that case, using a temp table won't harm. SQL Jobs are generally run after office hours so it does not matter.