Building datawarehouse with multiple data sources and using SQL change tracking - sql

I would like to setup a datawarehouse which can be used by our companies Qlik application. The applications where I would like to retrieve data from a mostly running on-premises and all have a SQL server database as a source. Application which doesn't have an accessible SQL server as a source can be accessed via a REST API and/or Webservice.
This is how the setup looks like:
Data warehouse is a SQL-server (Standard, no Express version)
SQL datasources are running on 3 different SQL-servers (2 are Standard, 1 is Express)
Other sources are Webservice and/or REST API accessibel (SaaS application).
The SQL-servers are all on-premises are located within our network. The SaaS application is running in a data center (cloud).
Preferable I would like to have data as live as possible and the load on the server as small as possible. To do so I was wondering if there are ETL-tools which work with SQL Change Tracking to keep track of changes on table level (so that changes are PUSH based on not PULL). If this is the case I can let them sync sequential and set per table if it has to be synced based on Change Tracking or only full sync a day. As soon as I have the data in my datawarehouse I can create some data transformations in the ETL-tool or with T-SQL, that doesn't matter.
Hopefully there are some people around here who can tell me which ETL tool to use. There is a lot of information on the internet, but not much who go into the subject of SQL Change Tracking.
Many thanks in advance!

Related

Creating Feeds between local SQL servers and Azure SQL servers?

We are wanting to use Azure servers to run our Power Apps applications, however we have local SQL servers which contains our data warehouse we want only certain tables to be on Azure and want to create data feeds between the two with information going from one to the other.
Does anyone have any insight into how I can achieve this?
I have googled but there doesn't appear to be a wealth of information on this topic.
It depends on how fast after a change in your source (the on premise SQL Server) you need that change reflected in your Sink (Azure SQL).
If you have some minutes or even only need to update it every day I would suggest a basic Data Factory Pipeline (search on google for data factory upsert). Here it depends on your data on how you can achieve this.
If you need it faster or it is impossible to extract an incremental update from your source you would need to either use triggers and write the changes from one database to the other or get a program that does change data capture that does that.
It looks like you just want to sync the data in some table between local SQL Server and Azure SQL database.
You can use the Azure SQL Data Sync.
Summary:
SQL Data Sync is a service built on Azure SQL Database that lets you synchronize the data you select bi-directionally across multiple SQL databases and SQL Server instances.
With Data Sync, you can keep data synchronized between your on-premises databases and Azure SQL databases to enable hybrid applications.
A Sync Group has the following properties:
The Sync Schema describes which data is being synchronized.
The Sync Direction can be bi-directional or can flow in only one
direction. That is, the Sync Direction can be Hub to Member, or
Member to Hub, or both.
The Sync Interval describes how often synchronization occurs.
The Conflict Resolution Policy is a group level policy, which can be
Hub wins or Member wins.
Next step, you need to learn how to configure the Data Sync. Please reference this Azure document:Tutorial: Set up SQL Data Sync between Azure SQL Database and SQL Server on-premises.
In this tutorial, you learn how to set up Azure SQL Data Sync by creating a sync group that contains both Azure SQL Database and SQL Server instances. The sync group is custom configured and synchronizes on the schedule you set.
Hope this helps.
The most robust solution here is Transactional Replication. You can also use SSIS or Azure Data Factory for copying tables to/from Azure SQL Database. And Azure SQL Data Sync also exists.

Best Approach for syncing Azure SQL Database

Right now, our application only has one Web Site instance along with SQL Database deployed at Azure US datacenter. We are looking for deploying more Web Site instance at other datacenter such as APAC and Europe. There still be a local SQL Database for each of those web site instance. We would like end user could fail over to another instance if his registered instance is not available, such as if US web site instance is down, we could fail over user to Europe instance. With this, we would need to synchronize local SQL Database at all data centers, US, Europe and APAC.
So we are looking for what's best approach to implement the database synchronization here for Azure SQL Database. Here are what we found at this point:
Azure Data Sync, it looks like that it is the perfect choice since it is available right away at Azure Management Portal and it would be up and running with some simple configuration. However there seems couple catches. The feature has been on preview about 2 years now (see this link with the following quote from comment):
SQL Data Sync has been in preview for over 2 years and the last update was December 2012. Has this been abandoned? Is this a technology we should encourage our clients to use? There absolutely needs to be an ability to synchronize data between a local SQL DB and Azure but Microsoft seems to have dropped this and I'm leery of putting a client on this only to find that the plug has been pulled. You owe it to your users to give us some information
I also saw the post Azure data sync not syncing all databases at SO, it seems that this feature is a second class feature at Azure and MS doesn't really pay sufficient attention to it. So I am worried how good it is.
Microsoft Sync Framework, it seems a more generic sync framework and more suitable for client and server sync instead of sync among server database. Plus it is not simple as above SQL Data Sync which is available just by configuration at Azure.
Any other suggestions on sql database sync at Azure? It would be really appreciated if you could share your experience here.
Thanks very much in advance for your insight.
Update:
Azure Data Sync is built upon using Microsoft Sync Framework: see link, the quote:
Microsoft SQL Data Sync is a cloud-based data synchronization service built on the Microsoft Sync Framework technologies.
Since no one is answering this question and I am going to do it myself. Based on some latest information, the Azure Data Sync is buggy and can not be used for production at this point. I guess that's the reason why it never moves out of preview even after around 2 years. There is no other good approach for handling Azure SQL Database sync at this point unless you want to build something yourself.
you can use RedGate Data Compare to sync your Azuresql DB with your Local DB

What is the best way to achieve data sync between SQL Azure and Multiple On-Premises SQL server databases?

I have a scenario as explained below and I need to implement the best Data Sync method.
I have a centralized SQL Azure database (master Database)
There are about 20 (this will increase in future) on-premises SQL Server Databases. These database are not necessarily always connected to the internet.
All master and on-premises DB's will have the same schema/table structures.
I would like to do bidirectional data sync between all on-premises databases with SQL Azure and vice-versa.
Data Sync frequency will be once in a day.
Each on-premises DB size is reasonable(not too big and not too small).
These below options I have explored:
SQL Azure Data Sync
Microsoft Sync Framework
SQL Server 2008 Change Data Capture
SQL Server Change Tracking
I would like to know the best possible method to achieve this.
I have been working with SQl azure data sync, Microsift sync framework and Sql server change tracking. I have no idea about change data capture.
Sql azure data sync.
This is the easiest way to implement data sync. It is a matter of configuration. But unfortunately still in preview and Microsoft no recommended for production yet. We have been using to sync 20 databases spread around different geographical location and so far works good. No coding required. But you may have to pay in future when you are using this service. At the moment it is free.
Microsoft Sync Framework
Microsoft sync framework is for developers. Developers can use Sync framework as an API and develop sync application. Sql azure data sync use sync framework internally. To implement data sync with azure you need to implement N-Tier architecture with WCF. And you need to host your WCF service in azure web site or virtual machine. Considerable development time required and see the following link for sample implementation from Microsoft. Once you develop you can easily configure and use for sync multiple databases.
Database Sync:SQL Server and SQL Express N-Tier with WCF
SQL Server Change Tracking
You need to manually programme the each table for data syn and you need to have link server setup between each sql server. To setup link server with azure database you need to open some specific port.
items #3 and #4 in your list are not really synchronization solutions, just part of it. Both SQL CDC and SQL CT simply allows you to track the changes. you have to put in extra code to grab those changes and apply/sync to another database.
SQL Data Sync service will be your best option if you don't want to write code. Note that up until today (despite the fact its in preview for so long), Data Sync is still in Preview Mode.
If you're find writing code, Sync Fx is a good option as well (SQL Data Sync internally uses Sync Framework).
Azure SQL Data Sync has now reached general availability (GA) as shown on the following Microsoft Article.
Announcing the general availability of Azure SQL Data Sync

Pulling data across multiple servers

The company i am working for is implementing Share-point with reporting servers that runs on an SQL back end. The information that we need lives on two different servers. The first server being the Manufacturing server that collects data from PLCs and inputs that information into a SQL database, the other server is our erp server which has data for payroll and hours worked on specific projects. The i have is to create a view on a separate database and then from there i can pull the information from both servers. I am having a little bit of trouble with the syntax for connecting the two servers to run the View. We are running ms SQL. If you need any more information or clarification please let me know.
Please read this about Linked Servers.
Alternatively you can make a Data Warehouse - which would be a reporting data base. You can feed this by either making procs with linked servers or use SSIS packages if they're not linked.
It all depends on a project size and complexity, but in many cases it is difficult to aggregate data from multiple sources with Views. The reason is that the source data structure is modeled for the source application and not optimized for reporting.
In that case, I would suggest going with an ETL process, where you would create a set of Extract, Transform and Load jobs to get data from multiple sources (databases) into a target database where data will be stored in the format optimized for reporting.
Ralph Kimball has many great books on the subject, for example:
1) The Data Warehouse ETL Toolkit
2) The Data Warehouse Toolkit
They are truly worth the read if you are dealing with data

Is it possible to run SQL Express within a Azure Web Role?

I am working on a project which uses a relational database (SQL Server 2008). The local (on-premises) application both reads and writes to the database. I am working on a different front end for Azure (MVC2 Web Role), which will use the same data, but in a read only fashion. If I was deploying a traditional web app, I would use SQL Express to act as the local database, and deploy changes with updates to the application (the data changes very slowly) or via some sync system.
With Azure, the picture is a little cloudy (sorry, I had to). I can't seem to find any information to indicate if SQL Express will work inside of Web Roles, and if so, how to do it. Does anyone know if using SQL Express in an Azure web role is possible?
Other options I could do if forced: SQL CE or use SQL Azure. Both have a number of downsides, and are definitely less than perfect.
Thanks,
Erick
Edit
I think my scenario may not have been clear enough.
This data won't change between deployments, and is only accessed from within the Web Role; it is basically a static cache. The on-premises part is kind of a red herring, as it doesn't impact the data on the web role (aside from being its source). Basically, what I want to do is have a local data store/cache that I use existing T-SQL/DAL code with.
While I could use SQL Azure, it doesn't add anything, and if anything only adds additional overhead and failure points. I could also use a VM Role, but that is way too costly/complex.
In a perfect world, I would package the MDF into the cspkg (so it gets deployed with the app) and then use it locally from within the role. If there is no way to do this, then that is ok and I need to figure out the pros and cons of other solutions. We don't live in a perfect world. :)
You might be able to run SQL Express using a custom VHD but you won't be able to rely on any data every being present on that VHD. The VMs are completely reset when they reboot - there is no physical persistence across reboots.
If you wanted to, you might be able to locate your entire SQL Server installation in Azure blob storage.
However, in doing all of this, you'll only be able to have one worker/web role that can use that database. Remember: a SQL Server database can only be attached to one SQL Server at a time. If you want to scale out, you'll have to create new SQL Server instances for every web/worker role.
Outside of cost concerns, I can't think of anything that is in SQL Express that should be a show stopper for 99.9% of applications out there.
Adding to Jeremiah's answer: SQL Azure should give you nearly everything SQL Express does today, and you can use the Sync service to synchronize on-premise SQL Server with SQL Azure.
If you installed SQL Express into a VM role, you'd be consuming around $90 monthly just for that instance, plus blob storage (you'd want a Cloud Drive for durability). By definition, a VM Role (or any role) must support scale-out; if you were to scale to 2 instances for whatever reason, both instances would need their own copy of the database, so you'd need to create a blob snapshot for each instance.
Keep in mind, though, if you choose to install SQL Express in a VM: once you're at 2 instances, along with, say, 20GB per instance of blob storage, you're nearing $200 monthly and you're maintaining your VM's OS patches, SQL Express configuration and updates, failure recovery procedures, etc. In contrast, SQL Azure at 20GB, while costing the same $200, will offer better performance and works with the sync service, while completely removing any OS or database server management tasks from you.
To add to the already existing answers and for anyone wondering if its a good idea to run SQL Express in the cloud:
it does makes sense as a temporary storage area. Consider this architectural approach:
say you're spinning up nodes to run jobs. Storing a gazillion of calculation results might be a good idea inside a local SQL Express for each node, and provide the aggregated responses immediately when the job finishes on the node. Transfer of the no longer hot results to off-prem SQL server for future reporting/etc can be done afterwords. SQL Azure may not be optimal from the volume/latency/cost perspective to store gazillion of results and ATS will not always fit the bill, especially when relational data, performance or existing code are involved.
To expand on what David mentioned you can register for SQL Azure Data Sync CTP2 that would allow sync from SQL Server to SQL Azure here: http://www.microsoft.com/en-us/SQLAzure/datasync.aspx
Make sure to use CTP2 though since CTP1 did not support SQL Server.
If it's a read only local cache - SQL CE 4 or SQLite.
Both have Entity Framework providers.
If you're writing to it - SQL Azure