I've done a fair bit of reading and it seems like there are a couple of off-the-shelf products that replicate/sync data from on-premise database to Azure SQL Data Warehouse but I've found nothing that syncs using an Azure database as the source. The Azure Data Factory holds some promise however it looks more suited to one off loads.
Anyone know of a way? (SSIS package not really an option as I want the transfer to occur wholly inside the cloud)
Azure Data Factory can run continuous loads from SQL Database to SQL Data Warehouse. You'll want to look into the frequency and interval parameters for the pipeline
The documentation is here https://azure.microsoft.com/en-us/documentation/articles/data-factory-create-datasets/.
Related
I have N databases, for example 10 databases.
Every database has the same schema, but different data.
Now i would like to take every data of each database from the table "Table1" and insert them in a common table in a new database "DWHDatabase" in a table named Table1Common.
so it's an insert like n to 1.
How i can do that? i'm trying to solve my issues with the elastic queries but seems it's a 1 to 1 stuff
Use Azure Data Factory with Linked Services to each database. Use the Copy activity to load the data.
You can also paramaterize the solution.
Parameterize linked services
Parameters in Azure Data Factory by Catherine Wilhemsen
Elastic query is best suited for reporting scenarios in which the majority of the processing (filtering, aggregation) may be done on the external source side. It is unsuitable for ETL procedures involving significant amounts of data transfer from a distant database (s). Consider Azure Synapse Analytics for large reporting workloads or data warehousing applications with more sophisticated queries.
You may use the Copy activity to copy data across on-premises and
cloud-based data storage. After you've copied the data, you may use
other actions to alter and analyse it. The Copy activity may also be
used to publish transformation and analysis findings for use in
business intelligence (BI) and application consumption.
MSFT Copy Activity Overview: Here.
Due to reasons (I've been told it's a networking issue with MIs; regardless, we can't fix it, we're waiting on a solution from MS that may or may not come out this year), we cannot talk from on-prem to managed instances. However, we can reach Azure SQL Databases.
We would like to replicate lookup data from on-prem to Azure Managed Instances (MIs) as well as ASDs. Is there any way to use the ASD as a "jump" box for replication, maybe by putting the Distributor on an MI that can talk to the ASD?
Looked at Azure Data Sync, but the 5-minute-minimum makes it a no-go.
Otherwise, our current fallback is to run an Azure VM/AKS instance, replicate to it, then from there to the ASDs/MIs. But man, I'd rather not have to do that.
Any suggestions appreciated.
One Way Transactional replication using SQL Data Sync for Azure.
If they wish to maintain the replication running after the migration to Managed Instances, transactional replication will be the best option at this time. Replication to Azure SQL Database
Or using ETL via Azure DataFactory
Transfer data from a SQL Server database to an Azure SQL Database using Azure Blob Storage and the Azure Data Factory (ADF): this is a supported legacy technique that benefits from a replicated staging copy.
ADF pipeline consisting of two data migration processes. They work together to transfer data between a SQL Server database and an Azure SQL Database on a regular basis. The two actions are as follows:
Data should be copied from a SQL Server database to an Azure Blob Storage account
I have a need to load data into Azure Hyperscale incrementally.
Source data is in Azure VM that has SQL server installed in it.
Source database is about 6Tb in size and has about 370 tables.
We need a way to get incremental changes in the last X amount of hours and sync them into the same database in Hyperscale.
Ideally, we would extend our database with the availability group setup but since Hyperscale does not support that, we need to find a way to keep these in sync.
Source database does have change data capture enabled.
The best on-line migration option is to use the Azure Database Migration Service (link) where the Online (continuous sync) migration support scenario (link) you need is supported:
The sync will essentially run in the background until completed while being able to access the data that has been migrated. I believe this is a continuous copy scenario and is not incremental. With PaaS database services, you do not have access to perform snapshot replication operations from external data sources. The Hyperscale instance is built upon snapshot replication but it currently only serves the hosted database functionality.
Regards,
Mike
I need expert opinion on a project I am working on. We currently get data files that we load into our Azure sql database using a local script that calls stored procedures. I am planning on replacing the script with ssis jobs to load the data into our Azure Sql but wondering if that's a good option given our needs.I am opened to different suggestions too. The process we go through is to load data file to staging tables and validate before making updates to live tables. The validation and updates are done by calling stored procedures...so the ssis package will just load the data and make calls to those stored procedures. I have looked at ADF IR and Databricks but they seem overkill but am open to hear people with experience using those as well. I am currently running the ssis package locally as well. Any suggestion on better architecture or tools for this scenario? Thanks!
I would definitely have a look at Azure Data Factory Data flows. With this you can easily build your ETL pipelines in the a Azure Data Factory GUI.
In the following example two text files from a Blob Storage are read, joined, a surrogate key is added and finally the data is loaded to Azure Synapse Analytics (would be the same for Azure SQL):
You finally put this Mapping Data Flow into a pipeline and can trigger it, e. g. if new data arrives.
You can just BULK INSERT data from Azure Blob Store:
https://learn.microsoft.com/en-us/sql/relational-databases/import-export/examples-of-bulk-access-to-data-in-azure-blob-storage?view=sql-server-ver15#accessing-data-in-a-csv-file-referencing-an-azure-blob-storage-location
Then you can use ADF (no IR) or Databricks or Azure Batch or Azure Elastic Jobs to schedule the execution.
We are wanting to use Azure servers to run our Power Apps applications, however we have local SQL servers which contains our data warehouse we want only certain tables to be on Azure and want to create data feeds between the two with information going from one to the other.
Does anyone have any insight into how I can achieve this?
I have googled but there doesn't appear to be a wealth of information on this topic.
It depends on how fast after a change in your source (the on premise SQL Server) you need that change reflected in your Sink (Azure SQL).
If you have some minutes or even only need to update it every day I would suggest a basic Data Factory Pipeline (search on google for data factory upsert). Here it depends on your data on how you can achieve this.
If you need it faster or it is impossible to extract an incremental update from your source you would need to either use triggers and write the changes from one database to the other or get a program that does change data capture that does that.
It looks like you just want to sync the data in some table between local SQL Server and Azure SQL database.
You can use the Azure SQL Data Sync.
Summary:
SQL Data Sync is a service built on Azure SQL Database that lets you synchronize the data you select bi-directionally across multiple SQL databases and SQL Server instances.
With Data Sync, you can keep data synchronized between your on-premises databases and Azure SQL databases to enable hybrid applications.
A Sync Group has the following properties:
The Sync Schema describes which data is being synchronized.
The Sync Direction can be bi-directional or can flow in only one
direction. That is, the Sync Direction can be Hub to Member, or
Member to Hub, or both.
The Sync Interval describes how often synchronization occurs.
The Conflict Resolution Policy is a group level policy, which can be
Hub wins or Member wins.
Next step, you need to learn how to configure the Data Sync. Please reference this Azure document:Tutorial: Set up SQL Data Sync between Azure SQL Database and SQL Server on-premises.
In this tutorial, you learn how to set up Azure SQL Data Sync by creating a sync group that contains both Azure SQL Database and SQL Server instances. The sync group is custom configured and synchronizes on the schedule you set.
Hope this helps.
The most robust solution here is Transactional Replication. You can also use SSIS or Azure Data Factory for copying tables to/from Azure SQL Database. And Azure SQL Data Sync also exists.