I would like to setup a datawarehouse which can be used by our companies Qlik application. The applications where I would like to retrieve data from a mostly running on-premises and all have a SQL server database as a source. Application which doesn't have an accessible SQL server as a source can be accessed via a REST API and/or Webservice.
This is how the setup looks like:
Data warehouse is a SQL-server (Standard, no Express version)
SQL datasources are running on 3 different SQL-servers (2 are Standard, 1 is Express)
Other sources are Webservice and/or REST API accessibel (SaaS application).
The SQL-servers are all on-premises are located within our network. The SaaS application is running in a data center (cloud).
Preferable I would like to have data as live as possible and the load on the server as small as possible. To do so I was wondering if there are ETL-tools which work with SQL Change Tracking to keep track of changes on table level (so that changes are PUSH based on not PULL). If this is the case I can let them sync sequential and set per table if it has to be synced based on Change Tracking or only full sync a day. As soon as I have the data in my datawarehouse I can create some data transformations in the ETL-tool or with T-SQL, that doesn't matter.
Hopefully there are some people around here who can tell me which ETL tool to use. There is a lot of information on the internet, but not much who go into the subject of SQL Change Tracking.
Many thanks in advance!
Currently our team is having a major database management/data management issue where hundreds of databases are being built and used for minor/one off applications where the app should really be pulling from an already existing database.
Since our security is so tight, the owners of these Systems of authority will not allow others to pull data from them at a consistent (App Necessary) rate, rather they allow a single app to do a weekly pull and that data is then given to the org.
I am being asked to compile all of those publicly available (weekly snapshots) into a single data warehouse for end users to go to. We realistically are talking 30-40 databases each with hundreds of thousands of records.
What is the best way to turn this into a data warehouse? Create a SQL server and treat each one as its own DB on the server? As far as the individual app connections I am less worried, I really want to know what is the best practice to house all of the data for consumption.
What you're describing is more of a simple data lake. If all you're being asked for is a single place for the existing data to live as-is, then sure, directly pulling all 30-40 databases to a new server will get that done. One thing to note is that if they're creating Database Snapshots, those wouldn't be helpful here. With actual database backups, it would be easy to build a process that would copy and restore those to your new server. This is assuming all of the sources are on SQL Server.
"Data warehouse" implies a certain level of organization beyond that, to facilitate reporting on an aggregate of the data across the multiple sources. Generally you'd identify any concepts that are shared between the databases and create a unified table for each concept, then create an ETL (extract, transform, load) process to standardize the data from each source and move it into those unified tables. This would be a large lift for one person to build. There's plenty of resources that you could read to get you started--Ralph Kimball's The Data Warehouse Toolkit is a comprehensive guide.
In either case, a tool you might want to look into is SSIS. It's good for copying data across servers and has drivers for multiple different RDBMS platforms. You can schedule SSIS packages from SQL Agent. It has other features that could help for data warehousing as well.
According to my system i have maintained two databases in LAN and online db.But i want to synchronize these two databases. I hope to do this things using microsoft sync frame work.
.http://msdn.microsoft.com/en-us/library/ee819079.aspx
Can i do sync local and online sql db using this? or any suitable method for do this.thank you
Sync Framework is designed for occasionally connected systems, eg. a laptop that can access the corporate network every other day and update its database, but needs to work when it has no corpnet access too. The pairing of Sync Framework is usually a central DB (SQL Server) and local embedded SQL Server Compact or SQL Express on the devices (laptops, phones, tablets etc).
IF the databases are always connected (eg. two DBs in two servers, with 24x7 connectivity between them, even if over Internet) then the appropriate technology is replication. Either Merge or Transactional. Theoretically replication also works when disconnect periods are expected, but Sync Framework is much better at it, and most importantly Sync Framework is not strongly dependent on DNS names as replication is (very important for occasionally connected systems).
Synchronizing the database is a vague term, you have to consider if you want a Master-Slave replication shcme or a Master-Master (the later being very difficult to achieve) and you have to consider what do you want replicated from the database. You also need to consider if more partners will be later added (more databases to 'synchronize'). And you have to be way more careful now about schema changes.
The company i am working for is implementing Share-point with reporting servers that runs on an SQL back end. The information that we need lives on two different servers. The first server being the Manufacturing server that collects data from PLCs and inputs that information into a SQL database, the other server is our erp server which has data for payroll and hours worked on specific projects. The i have is to create a view on a separate database and then from there i can pull the information from both servers. I am having a little bit of trouble with the syntax for connecting the two servers to run the View. We are running ms SQL. If you need any more information or clarification please let me know.
Please read this about Linked Servers.
Alternatively you can make a Data Warehouse - which would be a reporting data base. You can feed this by either making procs with linked servers or use SSIS packages if they're not linked.
It all depends on a project size and complexity, but in many cases it is difficult to aggregate data from multiple sources with Views. The reason is that the source data structure is modeled for the source application and not optimized for reporting.
In that case, I would suggest going with an ETL process, where you would create a set of Extract, Transform and Load jobs to get data from multiple sources (databases) into a target database where data will be stored in the format optimized for reporting.
Ralph Kimball has many great books on the subject, for example:
1) The Data Warehouse ETL Toolkit
2) The Data Warehouse Toolkit
They are truly worth the read if you are dealing with data
I am working on a project which uses a relational database (SQL Server 2008). The local (on-premises) application both reads and writes to the database. I am working on a different front end for Azure (MVC2 Web Role), which will use the same data, but in a read only fashion. If I was deploying a traditional web app, I would use SQL Express to act as the local database, and deploy changes with updates to the application (the data changes very slowly) or via some sync system.
With Azure, the picture is a little cloudy (sorry, I had to). I can't seem to find any information to indicate if SQL Express will work inside of Web Roles, and if so, how to do it. Does anyone know if using SQL Express in an Azure web role is possible?
Other options I could do if forced: SQL CE or use SQL Azure. Both have a number of downsides, and are definitely less than perfect.
Thanks,
Erick
Edit
I think my scenario may not have been clear enough.
This data won't change between deployments, and is only accessed from within the Web Role; it is basically a static cache. The on-premises part is kind of a red herring, as it doesn't impact the data on the web role (aside from being its source). Basically, what I want to do is have a local data store/cache that I use existing T-SQL/DAL code with.
While I could use SQL Azure, it doesn't add anything, and if anything only adds additional overhead and failure points. I could also use a VM Role, but that is way too costly/complex.
In a perfect world, I would package the MDF into the cspkg (so it gets deployed with the app) and then use it locally from within the role. If there is no way to do this, then that is ok and I need to figure out the pros and cons of other solutions. We don't live in a perfect world. :)
You might be able to run SQL Express using a custom VHD but you won't be able to rely on any data every being present on that VHD. The VMs are completely reset when they reboot - there is no physical persistence across reboots.
If you wanted to, you might be able to locate your entire SQL Server installation in Azure blob storage.
However, in doing all of this, you'll only be able to have one worker/web role that can use that database. Remember: a SQL Server database can only be attached to one SQL Server at a time. If you want to scale out, you'll have to create new SQL Server instances for every web/worker role.
Outside of cost concerns, I can't think of anything that is in SQL Express that should be a show stopper for 99.9% of applications out there.
Adding to Jeremiah's answer: SQL Azure should give you nearly everything SQL Express does today, and you can use the Sync service to synchronize on-premise SQL Server with SQL Azure.
If you installed SQL Express into a VM role, you'd be consuming around $90 monthly just for that instance, plus blob storage (you'd want a Cloud Drive for durability). By definition, a VM Role (or any role) must support scale-out; if you were to scale to 2 instances for whatever reason, both instances would need their own copy of the database, so you'd need to create a blob snapshot for each instance.
Keep in mind, though, if you choose to install SQL Express in a VM: once you're at 2 instances, along with, say, 20GB per instance of blob storage, you're nearing $200 monthly and you're maintaining your VM's OS patches, SQL Express configuration and updates, failure recovery procedures, etc. In contrast, SQL Azure at 20GB, while costing the same $200, will offer better performance and works with the sync service, while completely removing any OS or database server management tasks from you.
To add to the already existing answers and for anyone wondering if its a good idea to run SQL Express in the cloud:
it does makes sense as a temporary storage area. Consider this architectural approach:
say you're spinning up nodes to run jobs. Storing a gazillion of calculation results might be a good idea inside a local SQL Express for each node, and provide the aggregated responses immediately when the job finishes on the node. Transfer of the no longer hot results to off-prem SQL server for future reporting/etc can be done afterwords. SQL Azure may not be optimal from the volume/latency/cost perspective to store gazillion of results and ATS will not always fit the bill, especially when relational data, performance or existing code are involved.
To expand on what David mentioned you can register for SQL Azure Data Sync CTP2 that would allow sync from SQL Server to SQL Azure here: http://www.microsoft.com/en-us/SQLAzure/datasync.aspx
Make sure to use CTP2 though since CTP1 did not support SQL Server.
If it's a read only local cache - SQL CE 4 or SQLite.
Both have Entity Framework providers.
If you're writing to it - SQL Azure