handling data between remote instances - sql

We have a hr system that holds employee data and have many remote databases that use this data. Currently we use a mixture of copying the data across periodically to the remote databases and pulling the data across using views at runtime. Im curious as to which option you think is best. My personal preference is to copy the data across periodically as it removes the dependency from the master databases. However it seems both have pros and cons
Whats the best practice for this?
Thanks
p.s we have a mixture of sql2000, 2005 and s008 servers

Part of the answer will depend on what level of latency is acceptable for the other systems that use the HR data. Is a day behind OK? an Hour? or does it need to be current?
Each instance could result in a different solution.
I prefer a data pull instead of a push. The remote decides when it needs its data and you can encapsulate all that logic on the server where it belongs. In a push, you have to keep processes on the HR server in synch with the demands of the subsystem.
I have reservations about multiple remote databases querying a source system directly. If some latency is not an issue, build a process on the HR system to snapshot the required data into some local tables (or a data warehouse?) and have all remotes query this data. At the very least, build local views against the HR source and only allow remote servers rights to those.
Are you doing this across a linked server? If so, I recommend creating synonyms on the remote that point to the HR source across the link. This will allow you to move source data locations around and only have to change your synonym definition.

Related

Creating a Datawarehouse

Currently our team is having a major database management/data management issue where hundreds of databases are being built and used for minor/one off applications where the app should really be pulling from an already existing database.
Since our security is so tight, the owners of these Systems of authority will not allow others to pull data from them at a consistent (App Necessary) rate, rather they allow a single app to do a weekly pull and that data is then given to the org.
I am being asked to compile all of those publicly available (weekly snapshots) into a single data warehouse for end users to go to. We realistically are talking 30-40 databases each with hundreds of thousands of records.
What is the best way to turn this into a data warehouse? Create a SQL server and treat each one as its own DB on the server? As far as the individual app connections I am less worried, I really want to know what is the best practice to house all of the data for consumption.
What you're describing is more of a simple data lake. If all you're being asked for is a single place for the existing data to live as-is, then sure, directly pulling all 30-40 databases to a new server will get that done. One thing to note is that if they're creating Database Snapshots, those wouldn't be helpful here. With actual database backups, it would be easy to build a process that would copy and restore those to your new server. This is assuming all of the sources are on SQL Server.
"Data warehouse" implies a certain level of organization beyond that, to facilitate reporting on an aggregate of the data across the multiple sources. Generally you'd identify any concepts that are shared between the databases and create a unified table for each concept, then create an ETL (extract, transform, load) process to standardize the data from each source and move it into those unified tables. This would be a large lift for one person to build. There's plenty of resources that you could read to get you started--Ralph Kimball's The Data Warehouse Toolkit is a comprehensive guide.
In either case, a tool you might want to look into is SSIS. It's good for copying data across servers and has drivers for multiple different RDBMS platforms. You can schedule SSIS packages from SQL Agent. It has other features that could help for data warehousing as well.

What permission are required on the source to copy a SQL Azure database?

I need to grant permissions to a remote development team so they can copy schema changes on a database to their local dev instances. I see many posts similar to this, but they seem to focus on what is required in the destination server, rather than rights to read everything necessary on the source.
Currently, the user is in the db_datareader role and while they seem to be able to read a good portion of the table structure, configuration items such as defaults seems to be obscured, and stored proc and view definitions don't seem to be available, either.
I need the team to be able to copy from our Test/UAT instance, but I don't want them to be able to modify it. They should already have sa access to their local dev instances.
I need to grant permissions to a remote development team so they can copy schema changes on a database to their local dev instances.
I think you can using Azure SQL database Data Sync.
Data Sync is useful in cases where data needs to be kept up-to-date across several Azure SQL databases or SQL Server databases. Here are the main use cases for Data Sync:
Hybrid Data Synchronization: With Data Sync, you can keep data
synchronized between your on-premises databases and Azure SQL
databases to enable hybrid applications. This capability may appeal
to customers who are considering moving to the cloud and would like
to put some of their application in Azure.
Distributed Applications: In many cases, it's beneficial to separate
different workloads across different databases. For example, if you
have a large production database, but you also need to run a
reporting or analytics workload on this data, it's helpful to have a
second database for this additional workload. This approach minimizes
the performance impact on your production workload. You can use Data
Sync to keep these two databases synchronized.
Globally Distributed Applications: Many businesses span several
regions and even several countries/regions. To minimize network
latency, it's best to have your data in a region close to you. With
Data Sync, you can easily keep databases in regions around the world
synchronized.
Data Sync is based around the concept of a Sync Group. A Sync Group is a group of databases that you want to synchronize.
A Sync Group has the following properties:
The Sync Schema describes which data is being synchronized.
The Sync Direction can be bi-directional or can flow in only one
direction. That is, the Sync Direction can be Hub to Member, or
Member to Hub, or both.
The Sync Interval describes how often synchronization occurs.
The Conflict Resolution Policy is a group level policy, which can be
Hub wins or Member wins.
For more detail, please see Overview of SQL Data Sync.
With Data sync, you can set your Azure SQL database as Hub database, teams local dev instances as member database, set Sync Direction to 'Hub to Member'.
Then you can sync the schema changes on a database to their local dev instances manually or automatically. Reference: Tutorial: Set up SQL Data Sync between Azure SQL Database and SQL Server on-premises
Hope this helps.
GRANT VIEW DEFINITION was what I needed.
Not sure how I didn't stumble on that in my searches, but there it is.

Online and local sql database synchronization

According to my system i have maintained two databases in LAN and online db.But i want to synchronize these two databases. I hope to do this things using microsoft sync frame work.
.http://msdn.microsoft.com/en-us/library/ee819079.aspx
Can i do sync local and online sql db using this? or any suitable method for do this.thank you
Sync Framework is designed for occasionally connected systems, eg. a laptop that can access the corporate network every other day and update its database, but needs to work when it has no corpnet access too. The pairing of Sync Framework is usually a central DB (SQL Server) and local embedded SQL Server Compact or SQL Express on the devices (laptops, phones, tablets etc).
IF the databases are always connected (eg. two DBs in two servers, with 24x7 connectivity between them, even if over Internet) then the appropriate technology is replication. Either Merge or Transactional. Theoretically replication also works when disconnect periods are expected, but Sync Framework is much better at it, and most importantly Sync Framework is not strongly dependent on DNS names as replication is (very important for occasionally connected systems).
Synchronizing the database is a vague term, you have to consider if you want a Master-Slave replication shcme or a Master-Master (the later being very difficult to achieve) and you have to consider what do you want replicated from the database. You also need to consider if more partners will be later added (more databases to 'synchronize'). And you have to be way more careful now about schema changes.

Local SQL database interface to cloud database

Excuse me if the question is simple. We have multiple medical clinics running each running their own SQL database EHR.
Is there anyway I can interface each local SQL database with a cloud system?
I essentially want to use the current patient data that one is consulting with at that moment to generate a pathology request that links to a cloud ?google app engine database.
As a medical student / software developer this project of yours interests me greatly!
If you don't mind me asking, where are you based? I'm from the UK and unfortunately there's just no way a system like this would get off the ground as most data is locked in proprietary databases.
What you're talking about is fairly complex anyway, whatever country you're in I assume there would have to be a lot of checks / security around any cloud system that dealt with patient data. Theoretically though, what you would want to do ideally is create an online database (cloud, hosted, intranet etc), and scrap the local databases entirely.
You then have one 'pool' of data each clinic can pull information from (i.e. ALL records for patient #3563). They could then edit that data and/or insert new records and SAVE them, exporting them back to the main database.
If there is a need to keep certain information private to one clinic only this could still be achieved on one database in a number of ways, or you could retain parts of the local database and have them merge with the cloud data as they're requested by the clinic
This might be a bit outdated, but you guys should checkout https://www.firebase.com/. It would let you do what you want fairly easily. We just did this for a client in the exact same business your are.
Basically, Firebase lets you work with a Central Database on the Cloud, that is automatically synchronised with all its front-ends. It even handles losing the connection to the server automagically. It's the best solution I've found so far to keep several systems running against one only cloud database.
We used to have our own backend that would try its best to sync changes, but you need to be really careful with inter-system unique IDs for your tables (i.e. going to one of the branches and making a new user won't yield the same id that one that already exists in any other branch or the central database). It becomes cumbersome very quickly.
CakePHP can automatically generate this kind of Unique IDs pretty easily and automatically, but you still have to work on sync'ing all the local databases with the central repository.

Where is the bottleneck / what are the gotchas when selecting records from a remote (linked) SQL server?

I'm in a satellite office that needs to pull some data from our main office for display on our intranet. We use MS SQL Server in both locations and we're planning to create a linked server in our satellite office pointing to the main office. The connection between the two is a VPN tunnel I believe (does that sound right? What do I know, I'm a programmer!)
I'm concerned about generating a lot of traffic across a potentially slow connection. We will be getting access to a SQL view on the main office's server. It's not a lot of data (~500 records) once the select query has run, but the view is huge (~30000 records) without a query.
I assume running a query on a linked server will bring back only the results over the wire (and not the entire view to be queried locally). In that case the major bottleneck is most likely the connection itself assuming the view is indexed, etc. Are there any other gotchas or potential bottlenecks (maybe based on the way I structure queries) that I should be aware of?
From what you explained your connection is likely to be the bottleneck.
Also, you might also consider caching data at the satellite location.
The decision will depend on the following:
- how many rows and how often data are updated in the main database
- how often you need to load the same data set at satellite location
Two edge examples:
Data is static or relatively static - inserts only in main DB. In satellite location users often query the same data again and again. In this case it would make sense to cache the data locally at satellite location.
Data is volatile, a lot of updates or/and deletes. Users in satellite location rarely query data and when they do, it is always different where condition. In this case it doesn't make sense to cache. If connection is slow and there are often changes you might end up never being at sync with the main DB.
Another advantage of caching is that you can implement data compression, which will alleviate bad effect of slow connection.
If you chose to cache at local location there are a lot of options, but this I believe would be another topic.
[Edit]
About compression: You can use compressed transaction log shipping. In SQL 2008 compression is supported in Enterprise edition only. In SQL 2008 R2 it is available starting Standard version. http://msdn.microsoft.com/en-us/library/bb964719.aspx .
You can implement custom compression before you ship transaction logs, using any compression library you like.