Comparing Azure table data against Cosmos Db - azure-sql-database

We need to do a weekly sync of data in an Azure SQL table into a Cosmos db table. The Azure SQL table is the source and has millions of records. Has anyone done this before and was there a tool used to do this?

Surely,i'd suggest you using Copy activity in Azure Data Factory which is applied for data transfer. You could configure Azure SQL DB as source and Cosmos db as sink. Please see the document:https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#supported-data-stores-and-formats
For your need, you need to do a weekly sync of data so you could create a schedule trigger for the copy activity.

Related

How to insert table between 2 server in Azure data warehouse?

I want to insert table from different factory / server in Azure data warehouse. Is it possible to insert by query?
Because it takes a lot of time if I make dataset and pipeline for each table in Azure data factory.
Just from your screenshot, according the icon of the Azure SQL, you're using Azure SQL database, not Azure Data Warehouse.
You could use Elastic Query to do a cross database query in Azure.
Ref the tutorial: Get started with cross-database queries (vertical partitioning) (preview)
Elastic database query (preview) for Azure SQL Database allows you to run T-SQL queries that span multiple databases using a single connection point. This article applies to vertically partitioned databases.
When completed, you will: learn how to configure and use an Azure SQL Database to perform queries that span multiple related databases.
Then you can inset data from external table to current database table.
Note: Azure SQL database already has the master key, you don't need create it again.
Hope this helps.

SQL Server Merge with two tables from different azure SQL Database from same Azure SQL SERVER

I need to merge with 2 different tables from 2 different azure SQL databases where as these two azure sql database are from same azure sql server.
also for performance imporvement purpose, what I need to do is bulk insert and/or bulk update. also, this will be continous activity. for very first time I have to merge all data which is huge. and then whenever respective topic recivies message, I need to add/update that single record only.
what are the different options to do the same. for both processes.
please help. thanks.
You can use Azure SQL Data Sync to merge those tables located on 2 different databases into a third and new database. You just need to create the table with no records, then use Azure SQL Data Sync with one-way sync from those 2 databases (member databases) to the newly created table on the new database (hub database). On the first sync data will be merged on the new table located on the hub database. Every time a record gets updated, deleted or new record arrive on the member databases then that data change is replicated to the hub database and to the merged table.
To know more about the free Azure SQL Data Sync please read here.

Update changes in Azure SQL Data Warehouse using polybase

I want help regarding Azure SQL Data Warehouse, I'm using Polybase to ELT data from Azure Data Lake Storage Gen2 to Azure SQL DW. When we load data first time into DW no issues. But when we load data again/incremental load how do we upsert data?
Flow we are using
ASDL2 -> (polybase) -> External table -> (CTAS) -> Staging tables -> (transformation) -> dimension tables
Everytime data changes we reload data into ASDL2,
What is the best way to UPSERT data or we should also reload data into SQLDW?
Because MERGE is not supported in Azure Data Warehouse, you need to use other means to load data from the External tables to your Stage tables. PolyBase can be used to load both initial and incremental data to the external table schema but it is how you perform the loading to the staging tables.
The following is a great tutorial on how to deploy this solution: Using PolyBase to Update Tables in Data Warehouse from ADLS
Once the data is loaded to the external tables via PolyBase in a ADFv2 pipeline, a trigger is called to execute an sp in ADWH to perform the load to the staging tables.

Azure SQL External table of Azure Table storage data

Is it possible to create an external table in Azure SQL of the data residing in Azure Table storage?
Answer is no.
I am currently facing similiar issue and this is my research so far:
Azure SQL Database doesn't allow Azure Table Storage as a external data source.
Sources:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017
Reason:
The possible data source scenarios are to copy from Hadoop (DataLake/Hive,..), Blob (Text files,csv) or RDBMS (another sql server). The Azure Table Storage is not listed.
The possible external data formats are only variations of text files/hadoop: Delimited Text, Hive RCFile, Hive ORC,Parquet.
Note - even copying from blob in JSON format requires implementing custom data format.
Workaround:
Create a copy pipeline with Azure Data Factory.
Create a copy
function/script with Azure Functions using C# and manually transfer
the data
Yes, there are a couple options. Please see the following:
CREATE EXTERNAL TABLE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external table for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external table created for PolyBase cannot be used for Elastic Database queries. Similarly, an external table created for Elastic Database queries cannot be used for PolyBase, etc.
CREATE EXTERNAL DATA SOURCE (Transact-SQL)
APPLIES TO: SQL Server (starting with 2016) Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse
Creates an external data source for PolyBase, or Elastic Database queries. Depending on the scenario, the syntax differs significantly. An external data source created for PolyBase cannot be used for Elastic Database queries. Similarly, an external data source created for Elastic Database queries cannot be used for PolyBase, etc.
What is your use case?

How to ensure faster response time using transact-SQL in Azure SQL DW when I combine data from SQL and non-relational data in Azure blob storage?

What should I do to ensure optimal query performance using transact-SQL in Azure SQL Data Warehouse while combining data sets from SQL and non-relational data in Azure Blob storage? Any inputs would be greatly appreciated.
The best practice is to load data from Azure Blob Storage into SQL Data Warehouse instead of attempting interactive queries over that data.
The reason is that when you run a query against your data residing in Azure Blob Storage (via an external table), SQL Data Warehouse (under-the-covers) imports all the data from Azure Blob Storage into SQL Data Warehouse temporary tables to process the query. So even if you a run SELECT TOP 1 query on your external table, the entire dataset for that table will be imported temporarily to process the query.
As a result, if you know that you will querying the external data frequently, it is recommended that you explicitly load the data into SQL Data Warehouse permanently using a CREATE TABLE AS SELECT command as shown in the document: https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-load-with-polybase/.
As a best practice, break your Azure Storage data into no more than 1GB files when possible for parallel processing with SQL Data Warehouse. More information about how to configure Polybase in SQL Data Warehouse to load data from Azure Storage Blob is here: https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-load-with-polybase/
Let me know if that helps!