SQL Server replication for HA and LB - sql-server-2005

I'm currently trying to build a high-availability and load-balanced web application with SQL Server Replication Services technologies. Automatic fail-over is built into the application logic. Basically, there are two groups of application servers running the same, each with its own SQL Server instance. They are set to use the other instance in case of failure. Data is continously replicated between the two SQL Server instances via transactional replication. (A few seconds lag exists, that's okay.)
I set up both servers in a way that Distribution agents run on the Distributor (= Publisher). My idea is, that as long as Server A (publisher) is working, it 'collects' the transactions and forwards them to Server B (subscriber) as soon as it's available. The same for Server B. The default option (distributor on the subscriber) would 'lose' changes while the subscriber is offline. Am I right with that? UPDATE to this first question: transactions (waiting to be delivered) are stored in the distribution database, so they are "safe" anyway. This leads be to another question: if the Distributor Agent is on the subscriber, how will it know about new transactions to be delivered? Via frequent polling?
The two databases have the same schema (except identity seed and increment). Each read-write table is cross-replicated. Why don't I see a loop, when I insert a row into a table? When I read about bi-directional replication in documents or blogs, is this scenario what they mean or rather updateable transactional replication?
What do you think about this scenario in general? This is my first time with replication, and I fear the risks. Therefore any comments are very welcome.

Related

SQL Mirroring or Failover Clustering VS Azure built in infrastructure

I read in a few places that SQL Azure data is automatically replicated and the Azure platform provides redundant copies of the data, Therefore SQL Server high availability features such as database mirroring and failover cluster aren't needed.
Has anyone got a chance to investigate deeper into this? Are all those availability enhancements really not needed in Azure? Thanks!
To clarify, I'm talking about SQL as a service and not a VM hosted SQL.
The SQL Database service (database-as-a-service) is a multi-tenant database service, and your databases are triple-replicated within the data center, providing durable storage. The service itself, being large-scale, provides high availability (since there are many VMs running the service itself, along with replicated data). Nothing is needed in terms of mirroring or failover clusters. Having said that: If, say, your particular database became unavailable for a period of time, you'll need to consider how you'll handle that situation (perhaps sync'ing to another SQL Database, maybe even in another data center).
If you go with SQL Database (DBaaS), you'll still need to work out your backup strategy, and possibly syncing with another DC (or on-premises database server) for DR purposes.
More info on SQL Database fault tolerance is here.
Your desired detail is probably contained in this MSDN article of Business Continuity and Azure SQL Database (see: http://msdn.microsoft.com/en-us/library/windowsazure/hh852669.aspx). At the most basic level Azure SQL Database will keep three replicas of your database - one primary and two secondary.
While this helps with BCP / DR scenarios you may also wish to investigate ways to backup your database so you have point-in-time restore capabilities. More information on backup / restore can be found here: http://msdn.microsoft.com/en-us/library/windowsazure/jj650016.aspx

Replicating data in microsoft sqlserver

I am new to sql server .I have a sql server and I have 2nd sql server(backupserver).I want to copy data from sql server to 2nd one and all my transactions create update delete must be reflected in the 2nd server immediately in real time.I have very large data in my table assuming million of rows.How can I achieve this.I dont have previalge to use third party tools.
You specify that you need data to be reflected on the second server immediately, in real time.
This requirement will add latency to every transaction that occurs on your primary server, since every action will need to be committed on the secondary server before it can be committed on your primary server.
In addition to the latency, this requirement will also likely reduce your availability. If the secondary server is no longer able to successfully commit transactions from the primary, then the primary can no longer commit transactions either, and your system is down.
For more information about these constraints, refer to the extensive discussion around this topic(CAP theorem).
If you're OK with these restrictions, you might consider using Synchronous Database Mirroring (High-Safety Mode).
If you're not OK with these restrictions, please adjust / clarify your requirements.
You may try to get some help from these two references:
Replication in MS SQL Server
SQL Server Replication
On a side note an important thing you should need to plan before doing replication is the Replication model which you will use for your replication.
There is a list of Replication model which you can use:-
Peer-to-peer model
Central publisher model
Central publisher with remote distributor model
Central subscriber model
Publishing subscriber model
Each one of the above has its own advantages. Check them as per your need.
Also to add to it there are three types of Replication:-
Transactional replication
Merge replication
Snapshot replication
Check out this tutorial on SQL Server 2008 replication:
Tutorial: Replicating Data Between Continuously Connected Servers
Since you said "... reflected in the 2nd server immediately in real time", that means you want to use Transactional Replication. You still may only choose to replicate certain tables.
Does the "millions of rows" represent some kind of history? If so, consider the risks that Michael mentioned... and whether you need all the history in the 2nd server, or just current / recent activity. If it's just current/recent, it may be safer and less of a system drain, to write something in T-SQL or SSIS, for a job to execute that loops, reads, and copies the data.
That could be done with linked servers and triggers... but the risks Michael mentions, about preventing the primary server from committing transactions, are as much or more a concern with triggers... that you can avoid with your own T-SQL/SSIS + job.
Hope that helps...

Does it matter if time goes out of sync on merge replication clients?

I'm quite new on merge replication, but a scenario. If I have a server, and two clients with pull subscriptions does it matter if the time on those machines goes out of sync with each other or the server?
When I modify some data on one of those clients does it store the time against that change?
I am using MS SQL 2012.
I will be using a setup as described here,
http://msdn.microsoft.com/en-us/library/ms151329(v=sql.105).aspx
OK, I've found that merge replication doesn't store the time against a specific change.
If you have client subscriptions then the first person to sync will win in the case of a conflict.
If you have server subscriptions then the priority of those server subscriptions will determine who will win.
Neither of these options rely on time.
I currently have a 3rd-party application in production that uses merge replication. This includes a SQL 2008 R2 database server as publisher with ~100 desktop clients interacting with the database via the application, plus another 65 laptops with SQL Server Express 2008 R2 as subscribers to the publisher. The laptops synchronize via wireless when they get to their home base locations, with any deltas being pushed to the subscribers.
Now to get to your question. Time on the clients down to a millisecond likely won't make any difference. What will make a difference is whether or not the publisher and subscriber are out of sync for more than a specific period of time. Using SQL Server Mgmt Studio, look at the server publishing the database. Expand the Replication section, between Server Objects & Management, then right-click on the publisher, under Local Publications, and select Properties.
Under the General page, look at the Subscription expiration area. The default from Microsoft is 14 days and can be moved up/down as you need, or you can specify that it doesn't expire at all. Note the phrase "Replication metadata is never kept longer than this amount of time."
In other words, if one of your subscribers is off the reservation for more than 14 days, it won't matter, you will have to unsubscribe, then re-subscribe that unit. That's because of the gap between the oldest metadata on the publisher & the newest record/metadata on the subscriber. No overlap, no sync up.

secure database distribution to external clients

We want to distribute / synchronize data from our Datawarehouse (MS SQL Server) to external customers (also MS SQL Server). The connection has to be secure, because we are dealing with trusted data. Transmission of data from our system to external client system must be via the http/https
In addition it is possible that the clients still run their systems with an older database schema, so already existing tables and columns should be transmitted and non existing ones should be ignored.
Its most likely that we will have large database updates and the updates have to arrive in almost real-time.
And it is definitely necessary that the data is stored in a client side datawarehouse / SQL database.
The whole process should also include good monitoring possibilities in case something goes wrong.
We started to develop our own .NET solution but I thought it should be almost a common problem to exchange data between different systems.
Does anybody know about an existing solution which we can adapt to our scenario?
Any help is appreciated!
The problem is so common that it has a dedicated component in SQL Server: Service Broker. Rather than start your own .Net thing and take care of the many problems (how are you gonna handle down time? Retries? duplicates? out of order delivery? authentication of non-domain joined computers? routing for machines that change names? service upgrades? transactional consistency, rollbacks? are you gonna use dtc?). You can look at the demo I gave to SQL connections to see how you can easily scale SSB to a throughput of well over 1000 msgs/sec (1k payload) on commodity hardware.
the only requirement is that all partitcipants must be at least SQL Server 2005 (no SSB in 2000).
Just use regular SQL connections over a secure VPN or an SSH tunnel. Should be very easy to setup for your networking guys.
For example, you can create a linked server. Then a SQL scheduled job could move the data:
truncate table targetserver.dbname.dbo.tablename
insert into targetserver.dbname.dbo.tablename
select a, b, c
from dbname.dbo.sourcetable
Since the linked server talks to your server over a VPN or SSH tunnel, all data is send encrypted over the internet.

SQL Server 2005 Replication and different indexes on the subscriber

We have SQL Server database setup. We are setting up a replication scenarios where we have one publisher and on subscriber. The subscriber will be used as a reporting platform so that we can run all the BI queries that we need and have to hit the server that is reciving all the data from our clients. The subscriber is set to pull data in from the distributer.
We don't have many indexes on the publisher db, but we will need them on the reporting server (i.e subscriber).
My Question is: Will SQL Server a) allow this scenario, noting that no changes on the subscriber are pushed back the the publisher. b) if a snapshot is run I am presuming it will overwrite our indexes, can I stop this from happening? c) is this a wise course of action.
Thanks.
Paul Kinlan,
http://www.topicala.com/
http://www.thecompanything.com/
The scenario you explain is a common one and one of the benefits of using replication. No changes or indexes you create on the subscriber will go to the publisher as it is a one way process. If you have to re-run the snapshot agent for some reason and re-initialize the subscriber than you will need to re-create your indexes on the subscriber. There are alot of things you can do to minimize the need to re-initialize the subscriber but some of them require some manual steps. Generally if you keep all of your index creation scripts for the subscriber up to date it usually isn't a big deal to re-run them if needed.