Continuously synchronize tables between two databases - sql

I have had my experience with MSSQL Server somewhat 6 years ago, so I have only basic knowledge of its workings now.
The problem I'm posed with is that of syncing the databases between two live CRMs (NopCommerce and Rainbow Portal-based one if anyone's curious) running on the same DB server. The data I'm interested in is spread out among 7 tables in one DB and 5 in the other one. The idea is to have two web applications with same data with updates in one instantly propagating to the other.
Each database has numerous triggers and stored procedures that are used to keep the data consistent.
I am not aware of all possibilities of SQL Server, so I am open to suggestions as to what is the best and quickest way to achieve the goal. Is it about writing more triggers? Should I create a "watcher" application? Is there some built-in mechanism for that?
Thanks!

You should look at SQL Replication, and / or using SSIS for the integration ETL and scheduling etc.
Triggers (especially cross DB) can be messy to maintain and debug - you might also consider loading data into a separate (third) staging database, before then propogating the data into your other 2 databases?
(Other alternatives include Synchronous and Asynchronous Mirroring, which would require the entire DB's to be in synch, and log shipping - also entire DB - which would be one way only, typically for redundancy - These aren't likely to be useful for your purpose though)

You might want to look at SQL Server Replication - http://msdn.microsoft.com/en-us/library/bb500346.aspx in particular Merge Replication

Related

SQL Server - Avoiding write timeouts on logging table due to reporting queries

I have two very busy tables in an email dispatch system. One is for batching mail for dispatch, the other is used for logging. Expensive queries are ran that use both of these tables to produce stats for a UI. I would like to remove the reporting overhead on these tables as I am seeing timeouts during report generation.
My question is - what are my options for reducing the query overhead on these two tables while generating the report data.
I've considered using triggers to create exact copies of the tables. Is there any built in functionality in SQL server for mirroring data within a database? If I can avoid growing the database unnecessarily though that would be an advantage. It doesn't matter if the stats are not real time.
There is a built in functionality for this scenario and it's known as Database Snapshot.
If you run a query against a DB snapshot table, no shared locks should be created on original database.
You can use Resource Governor for SQL Server. Unfortunately, I have only read about it and haven't used it yet. It is used to isolate workloads on SQL Server.
Please try and let us know if it helps.
Some helpful links: MSDN SQLBlog technet
Kind Regards,
Sumit

Querying multiple database servers?

I am working on a database for a monitoring application, and I got all the business logic sorted out. It's all well and good, but one of the requirements is that the monitoring data is to be completely stand-alone.
I'm using a local database on my web-server to do some event handling and caching notifications. Since there is one event row per system on my monitor database, it's easy to just get the id and query the monitoring data if needed, and since this is something only my web server uses, integrity can be enforced externally. Querying is not an issue either, as all the relationships are one-to-one so it's very straight forward.
My problem comes with user administration. My original plan had it on yet another database (to meet the requirement of leaving the monitoring database alone), but I don't think I was thinking straight when I thought of that. I can get all the ids of the systems a user has access to easily enough, but how then can I efficiently pass that to a query on the other database? Is there a solution for this? Making a chain of ors seems like an ugly and buggy solution.
I assume this kind of problem isn't that uncommon? What do most developers do when they have to integrate different database servers? In any case, I am leaning towards just talking my employer into putting user administration data in the same database, but I want to know if this kind of thing can be done.
There are a few ways to accomplish what you are after:
Use concepts like linked servers (SQL Server - http://msdn.microsoft.com/en-us/library/ms188279.aspx)
Individual connection strings within your front end driving the database layer
Use things like replication to duplicate the data
Also, the concept of multiple databases on a single database server instance seems like it would not be violating your business requirements, and I investigate that as a starting point, with the details you have given.

How do I keep a table synchronized with a query in SQL Server - ETL?

I wan't sure how to word this question so I'll try and explain. I have a third-party database on SQL Server 2005. I have another SQL Server 2008, which I want to "publish" some of the data in the third-party database too. This database I shall then use as the back-end for a portal and reporting services - it shall be the data warehouse.
On the destination server I want store the data in different table structures to that in the third-party db. Some tables I want to denormalize and there are lots of columns that aren't necessary. I'll also need to add additional fields to some of the tables which I'll need to update based on data stored in the same rows. For example, there are varchar fields that contain info I'll want to populate other columns with. All of this should cleanse the data and make it easier to report on.
I can write the query(s) to get all the info I want in a particular destination table. However, I want to be able to keep it up-to-date with the source on the other server. It doesn't have to be updated immediately (although that would be good) but I'd like for it be updated perhaps every 10 minutes. There are 100's of thousands of rows of data but the changes to the data and addition of new rows etc. isn't huge.
I've had a look around but I'm still not sure the best way to achieve this. As far as I can tell replication won't do what I need. I could manually write the t-sql to do the updates perhaps using the Merge statement and then schedule it as a job with sql server agent. I've also been having a look at SSIS and that looks to be geared at the ETL kind of thing.
I'm just not sure what to use to achieve this and I was hoping to get some advice on how one should go about doing this kind-of thing? Any suggestions would be greatly appreciated.
For that tables whose schemas/realtions are not changing, I would still strongly recommend Replication.
For the tables whose data and/or relations are changing significantly, then I would recommend that you develop a Service Broker implementation to handle that. The hi-level approach with service broker (SB) is:
Table-->Trigger-->SB.Service >====> SB.Queue-->StoredProc(activated)-->Table(s)
I would not recommend SSIS for this, unless you wanted to go to something like dialy exports/imports. It's fine for that kind of thing, but IMHO far too kludgey and cumbersome for either continuous or short-period incremental data distribution.
Nick, I have gone the SSIS route myself. I have jobs that run every 15 minutes that are based in SSIS and do the exact thing you are trying to do. We have a huge relational database and then we wanted to do complicated reporting on top of it using a product called Tableau. We quickly discovered that our relational model wasn't really so hot for that so I built a cube over it with SSAS and that cube is updated and processed every 15 minutes.
Yes SSIS does give the aura of being mainly for straight ETL jobs but I have found that it can be used for simple quick jobs like this as well.
I think, staging and partitioning will be too much for your case. I am implementing the same thing in SSIS now but with a frequency of 1 hour as I need to give some time for support activities. I am sure that using SSIS is a good way of doing it.
During the design, I had thought of another way to achieve custom replication, by customizing the Change Data Capture (CDC) process. This way you can get near real time replication, but is a tricky thing.

Single or multiple databases

SQL Server 2008 database design problem.
I'm defining the architecture for a service where site users would manage a large volume of data on multiple websites that they own (100MB average, 1GB maximum per site). I am considering whether to split the databases up such that the core site management tables (users, payments, contact details, login details, products etc) are held in one database, and the database relating to the customer's own websites is held in a separate database.
I am seeing a possible gain in that I can distribute the hardware architecture to provide more meat to the heavy lifting done in the websites database leaving the site management database in a more appropriate area. But I'm also conscious of losing the ability to directly relate the sites to the customers through a Foreign key (as far as I know this can't be done cross database?).
So, the question is two fold - in general terms should data in this sort of scenario be split out into multiple databases, or should it all be held in a single database?
If it is split into multiple, is there a recommended way to protect the integrity and security of the system at the database layer to ensure that there is a strong relationship between the two?
Thanks for your help.
This question and thus my answer may be close to the gray line of subjective, but at the least I think it would be common practice to separate out the 'admin' tables into their own db for what it sounds like you're doing. If you can tie a client to a specific server and db instance then by having separate db instances, it opens up some easy paths for adding servers to add clients. A single db would require you to monkey with various clustering approaches if you got too big.
[edit]Building in the idea early that each client gets it's own DB also just sets the tone for how you develop when it is easy to make structural and organizational changes. Discovering 2 yrs from now you need to do it will become a lot more painful. I've worked with split dbs plenty of times in the past and it really isn't hard to deal with as long as you can establish some idea of what the context is. Here it sounds like you already have the idea that the client is the context.
Just my two cents, like I said, you could be close to subjective on this one.
Single Database Pros
One database to maintain. One database to rule them all, and in the darkness - bind them...
One connection string
Can use Clustering
Separate Database per Customer Pros
Support for customization on per customer basis
Security: No chance of customers seeing each others data
Conclusion
The separate database approach would be valid if you plan to support per customer customization. I don't see the value if otherwise.
You can use link to connect the databases.
Your architecture is smart.
If you can't use a link, you can always replicate critical data to the website database from the users database in a read only mode.
concerning security - The best way is to have a service layer between ASP (or other web lang) and the database - so your databases will be pretty much isolated.
If you expect to have to split the databases across different hardware in the future because of heavy load, I'd say split it now. You can use replication to push copies of some of the tables from the main database to the site management databases. For now, you can run both databases on the same instance of SQL Server and later on, when you need to, you can move some of the databases to a separate machine as your volume grows.
Imagine we have infinitely fast computers, would you split your databases? Of course not. The only reason why we split them is to make it easy for us to scale out at some point. You don't really have any choice here, 100MB-1000MB per client is huge.

SQL Server 2008: N small databases VS 1 database with N schemas

I have a database server with few main databases, and few dozens of small ones.
These small databases are kind of intermediary/staging databases for data import from various sources into main database. Data import is a daily task. They are all quite similar in structure as the implementation of these data imports are similar, so basically they have a configuration tables, which define mapping, conversions etc, and the data tables, which contain the results of the import.
Some time ago there have been only the handful of small ones, but now I have more then 20 of them will grow further with the number of supported data feeds.
I have just migrated all the server environment to SQL Server 2008, and having some time now for clean-up/refactoring, I am thinking to merge all of data-import databases into just one database, and use database schema to separate them.
Question-0: Any other ideas for the described situation?
Question-1: Shall I change from a separate database to a separate schema?
Question-2: !!!: Any tricky thing to be careful about in database schema implementation?
Edit-1: highlighted question-2 as the most 'unanswered' currently.
In your instance, I would probably put merge the databases into one. I don't really see a reason to have them separated, and merging them will reduce the amount of work you have to do to support backups etc. If you were importing data from a data source once and then never using the staging tables again, I could see the reason to bring up separate databases to handle the data transformation. Since you use these tables on an ongoing basis, I would much rather keep them together so that I only have to go to one place to find the full end to end state of the production data and the data load states.
2008 is really good at handling database partitioning too, if the db gets too large, or you need to separate data for security reasons you get the benefit of having a single db with the advantages like having several smaller ones. You won't get that with multiple smaller dbs.
When we migrated we had a very similar situation and I ended up moving everything into one some-what large Importing database like you have hinted towards. We did not, however, separate them using schemas.
Because the database is the unit of referential integrity and backup, if you are bringing in large amounts of data for staging which does not need to be backed up on the same schedule, it might be easiest to keep it in a separate DB.
You can use a single DB with multiple file groups and different backups, but it will require a lot more design.
The basic factors this will depend on are: recovery model, backup objectives, usage patterns and amount of effort to design and maintain your file group design.
All the prior answers work for me, particularly your comment about selectively combining databases -- if some are very busy, very large, or process sensitive data, you might want to keep them separate, or in separate groupings. This would make it easier to configure backups/restores and disk/drive allocation (give the busy ones their own set of spindles).
Like possibly most database developers, I have dealt almost exclusively with objects in the dbo schema, but I have done some recent work with other schemas. The main gotcha I've encountered is remembering to always specify the schema when referring to any database object. Never assume that any given connection will reference an object in the schema you want it to--always be clear and precise!
I would put all your import staging tables in one database separate from your regular production databse as the backup needs may be very different. This database should also contains things like your configuration management for SSIS packages, any logging tables, any import metadata tables (we keep track of every run of the imports and the status of that run as well as a bazillion other things about the import like the filename, the normal file size, etc. Comes in handy for researching problems and for adding checks to the processing. We usea a schema that is by client and then an additional schema for objects realted to the importing/exporting process (logs, meta data etc.)