In SQL External table take a time for select and insert data into temp table - sql

I am using external table of main database to Datawarehouse database. While selecting data from external table to # table it take almost 9 min, sometime it take more time. How I can improve this performance of external table?

Is to use the following TSQL to perform the query in the external database and get only the data required. The filter will be applied first in the external database, and then the data from the filter will be received by the database.
When you enable the query's Actual Execution Plan option, you can see
that the query : Select * from PerformanceVarcharNVarchar, brings data
from an external database to the temporal database, and then the
engine applies the filter.
Here is the Official Microsoft Documents :EXECUTE (Transact-SQL) | Docs
Else you can use Azure Data Sync : SQL Data Sync is an Azure SQL Database-based service that allows you to synchronize selected data bidirectionally between multiple databases, both on-premises and in the cloud.
The Original Post has got detailed insights: Lesson Learned #56:
External tables and performance issues | techcommunity

Related

ADF - How should I copy table data from source Azure SQL Database to 6 other Azure SQL Databases?

We curate data in the "Dev" Azure SQL Database and then currently use RedGate's Data Compare tool to push up to 6 higher Azure SQL Databases. I am trying to migrate that manual process to ADFv2 and would like to avoid copy/pasting the 10+ copy data actives for each database (x6) to keep it more maintainable for future changes. The static tables have some customization in the copy data activity but the basic idea follows this post to perform an upsert.
How can the implementation described above be done in Azure Data Factory?
I was imagining something like the following:
Using one parameterized link service that has the server name & database name configurable to generate a dynamic connection to Azure SQL Database.
Creating a pipeline for each table's copy data activity.
Creating a master pipeline to then nest each table's pipeline in.
Using variables loop over the different connections an passing those to the sub-pipelines parameters.
Not sure if that is the most efficient plan or even works yet. Other ideas/suggestions?
we can not tell you if that's the most efficient plan. But I think so. Just make it works.
As you said in the comment:
we can use Dynamic Pipelines - Copy multiple tables in Bulk with
'Lookup' & 'ForEach'. we can perform dynamic copies of your data
table lists in bulk within a single pipeline. Lookup returns either
the lists of data or first row of data. ForEach - #activity('Azure
SQL Table lists').output.value ;
#concat(item().TABLE_SCHEMA,'.',item().TABLE_NAME,'.csv') + This is
efficient and cost optimized since we are using less number of
activities and datasets.
In usually, we also will choose same solution with you: dynamic parameter/pipeline, lookup + foreach active to achieve the scenario. In one word, make the pipeline has a strong logic, simple and efficient.
Added the same info mentioned in the Comment as Answer.
Yup, we can use Dynamic Pipelines - Copy multiple tables in Bulk with 'Lookup' & 'ForEach'.
We can perform dynamic copies of your data table lists in bulk within a single pipeline. Lookup returns either the lists of data or first row of data.
ForEach - #activity('Azure SQL Table lists').output.value ;
#concat(item().TABLE_SCHEMA,'.',item().TABLE_NAME,'.csv')
This is efficient and cost optimized since we are using less number of activities and datasets.
Attached pic as ref-

How to insert table between 2 server in Azure data warehouse?

I want to insert table from different factory / server in Azure data warehouse. Is it possible to insert by query?
Because it takes a lot of time if I make dataset and pipeline for each table in Azure data factory.
Just from your screenshot, according the icon of the Azure SQL, you're using Azure SQL database, not Azure Data Warehouse.
You could use Elastic Query to do a cross database query in Azure.
Ref the tutorial: Get started with cross-database queries (vertical partitioning) (preview)
Elastic database query (preview) for Azure SQL Database allows you to run T-SQL queries that span multiple databases using a single connection point. This article applies to vertically partitioned databases.
When completed, you will: learn how to configure and use an Azure SQL Database to perform queries that span multiple related databases.
Then you can inset data from external table to current database table.
Note: Azure SQL database already has the master key, you don't need create it again.
Hope this helps.

SQL Server Merge with two tables from different azure SQL Database from same Azure SQL SERVER

I need to merge with 2 different tables from 2 different azure SQL databases where as these two azure sql database are from same azure sql server.
also for performance imporvement purpose, what I need to do is bulk insert and/or bulk update. also, this will be continous activity. for very first time I have to merge all data which is huge. and then whenever respective topic recivies message, I need to add/update that single record only.
what are the different options to do the same. for both processes.
please help. thanks.
You can use Azure SQL Data Sync to merge those tables located on 2 different databases into a third and new database. You just need to create the table with no records, then use Azure SQL Data Sync with one-way sync from those 2 databases (member databases) to the newly created table on the new database (hub database). On the first sync data will be merged on the new table located on the hub database. Every time a record gets updated, deleted or new record arrive on the member databases then that data change is replicated to the hub database and to the merged table.
To know more about the free Azure SQL Data Sync please read here.

Move data between two Azure SQL databases without using elastic query

I am in need of suggestion to move data from a particular table in one azure sql database to the other azure sql database which has the same table structure without using elastic query
Using SQL Server Management Studio to connect to SQL azure database, right click the source database and select generate scripts.
During the wizard, after have select the tables that you want to output to a query window, then click advanced. About half way down the properties window there is an option for "type of data to script". Select that and change it to "data only", then finish the wizard.
The heck the script, rearrange the inserts for constraints, and change the using at the top to run it against my target DB.
Then right click on the target database and select new query, copy the script into it, and run it.
This will migrate the data.
Please consider using the "Transfer SQL Server Objects task" in SSIS. You can learn all the advantages it provides on this article.
You can use PowerShell to query each database and move data between them as needed. Here's an example article on how to get this done.
Using PowerShell when working with Azure has a number of other benefits in what you can do and can control as well. It's a good choice to spend time learning.
In the source database I created SPs to select the data from the tables.
In the target database I created table types (which would be available in programmability) for the tables with the same structure as in the source.
I used Azure function to move the data into table type from source.
In the target database I created SPs to insert data into the tables from their respective table types.
After ensuring the transfer of data, I would be deleting those records moved to the target in the source database and for this I created SPs.

U-SQL Paralell reading from SQL Table

I have a scenario in which I am ingesting data from a MS SQL DB into Azure Data Lake using U-SQL. My table is quite big, with over 16 millions records (soon it will be much more). I just do a SELECT a, b, c FROM dbo.myTable;
I realized, however, that only one vertex is used to read from the table.
My question is, is there any way to leverage parallelism while reading from a SQL table?
I don't believe parallelism for external data sources is supported yet for U-SQL (although happy to be corrected). If you feel this is an important missing feature you can create a request and vote for it here:
https://feedback.azure.com/forums/327234-data-lake
As a workaround, you could manually parallelise your queries, depending on the columns available in your datasource. eg by date
// External query working
USE DATABASE yourADLADB;
// Create the external query for year 2016
#results2016 =
SELECT *
FROM EXTERNAL yourSQLDBDataSource EXECUTE
#"SELECT * FROM dbo.yourBigTable WITH (NOLOCK) WHERE yourDateCol Between '1 Jan 2016 and 31 Dec 2016'";
// Create the external query for year 2017
#results2017 =
SELECT *
FROM EXTERNAL yourSQLDBDataSource EXECUTE
#"SELECT * FROM dbo.yourBigTable WITH (NOLOCK) WHERE yourDateCol Between '1 Jan 2017 and 31 Dec 2017";
// Output 2016 results
OUTPUT #results2016
TO "/output/bigTable/results2016.csv"
USING Outputters.Csv();
// Output 2017 results
OUTPUT #results2017
TO "/output/bigTable/results2017.csv"
USING Outputters.Csv();
Now, I have created a different issue by breaking up the files into multiple parts. However you could then read these using filesets which will also parallelise, eg:
#input =
EXTRACT
... // your column list
FROM "/output/bigTable/results{year}.csv"
USING Extractors.Csv();
I would ask why you are choosing to move such a large file into your lake given ADLA and U-SQL offer the you ability to query data where it lives. Can you explain further?
Queries to external datasources are not automatically parallelized in U-SQL. (This is something we are considering for the future)
wBob's answer does give one option for achieving somewhat the same effect - though it of course requires you to manually partition and query the data using multiple U-SQL statements.
Please note that doing parallel read in a non-transacted environment can lead to duplicate or missed data if parallel writes occur at the source. So some care needs to be taken and the users will need to know the tradeoffs.
Another potential solution here would be to create an HDInsight cluster backed by the same ADLS store as your ADLA account.
You can then use Apache Sqoop to copy the data in parallel from SQL server to a directory in ADLS, and then import that data (which will be split across multiple files) to tables using U-SQL.