Active - Active DR Strategy for SQL Sever 2005

Active - Active DR Strategy for SQL Sever 2005 - sql-server-2005

We are trying to come up with an Active - Active DR strategy for our 6 TB data warehouse. Our datawarehouse has 40 DBs and everything has to be replicated on a real time bases.
Site 1 : Needs to handle all the ETL
Site 2 : Will handle all the reporting queries.
Database Mirroring (Cannot afford to drop and create snapshots as we cannot Kill any connections)
Replication
Log shipping
Migrating to SQL Server 2008 is an option.
Which is the best way for performance and availability?
Regards,
Nagy

Since you can't afford to drop active connections log shipping isn't an option either. You need to get exclusive access to the database to restore the log. Hardware support (SAN) will be a big help here. I'd almost like to see you ETL into one server, and then snap over making that the active server for reporting and use the other server for ETL. Thus you have a reporting server with no ETL process, and an ETL server with no reporting, but you swap which is which on a nightly? basis.

You need to talk with your hardware vendor - especially the storage one to see if they provide some sort of hardware based replication. Looking at the volume of the data, I don't think software based solution will be optimal.
Here is how I handle it for 3 databases (11, 17 and 23 TB) right now.
We are hosting the database in a EMC SAN.
Every 12 hours the databases are cloned on different luns located on the same same SAN and then mounted on different servers. This is the backup in case the primary servers get hosed. These databases are generally 12 hours behind the primary databases. We use them for reporting where we can live with 12 hours old data.
Every 24 hours, the clones in 2 are copied to a different SAN in a different building and mounted. This is a the secondary backup. In these databases we run the diagnostics, DBCC checks etc.
In total we are running a total of 9 SQL Server Enterprise Edition (3 prod, 3 first line DR and 3 second line DR) instances.
We decided to go this way, as we could live with upto 24 hours of lag in the data.
This is certainly doable, but it will require a fair bit of planning as well as investment in your part. For us the cost for 9 EE license was not much compared to the price of two SANs and the interconnect between them.

Peer to Peer transactional replication is probably the best option for you unless you want to go down the expensive SAN hardware replication path.
It's offer's near real-time so this should be good enough for reporting.

Pretty much SQL Server Replication, or some sort of customer solution using the SQL Service Broker are going to be your best bet. If your tables are static and all data changes are being done at one site then transactional replication may be your best bet. You'll need to large WAN pipe to handle the replication as transactional consistency is maintained even if multiple threads are used.
SQL Server 2008 has some improvements to Replication's performance as it allows multiple threads to the distributor so that may help you.

Related

Azure SQL Database vs. MS SQL Server on Dedicated Machine

I'm currently running an instance of MS SQL Server 2014 (12.1.4100.1) on a dedicated machine I rent for $270/month with the following specs:
Intel Xeon E5-1660 processor (six physical 3.3ghz cores +
hyperthreading + turbo->3.9ghz)
64 GB registered DDR3 ECC memory
240GB Intel SSD
45000 GB of bandwidth transfer
I've been toying around with Azure SQL Database for a bit now, and have been entertaining the idea of switching over to their platform. I fired up an Azure SQL Database using their P2 Premium pricing tier on a V12 server (just to test things out), and loaded a copy of my existing database (from the dedicated machine).
I ran several sets of queries side-by-side, one against the database on the dedicated machine, and one against the P2 Azure SQL Database. The results were sort of shocking: my dedicated machine outperformed (in terms of execution time) the Azure db by a huge margin each time. Typically, the dedicated db instance would finish in under 1/2 to 1/3 of the time that it took the Azure db to execute.
Now, I understand the many benefits of the Azure platform. It's managed vs. my non-managed setup on the dedicated machine, they have point-in-time restore better than what I have, the firewall is easily configured, there's geo-replication, etc., etc. But I have a database with hundreds of tables with tens to hundreds of millions of records in each table, and sometimes need to query across multiple joins, etc., so performance in terms of execution time really matters. I just find it shocking that a ~$930/month service performs that poorly next to a $270/month dedicated machine rental. I'm still pretty new to SQL as a whole, and very new to servers/etc., but does this not add up to anyone else? Does anyone perhaps have some insight into something I'm missing here, or are those other, "managed" features of Azure SQL Database supposed to make up the difference in price?
Bottom line is I'm beginning to outgrow even my dedicated machine's capabilities, and I had really been hoping that Azure's SQL Database would be a nice, next stepping stone, but unless I'm missing something, it's not. I'm too small of a business still to go out and spend hundreds of thousands on some other platform.
Anyone have any advice on if I'm missing something, or is the performance I'm seeing in line with what you would expect? Do I have any other options that can produce better performance than the dedicated machine I'm running currently, but don't cost in the tens of thousand/month? Is there something I can do (configuration/setting) for my Azure SQL Database that would boost execution time? Again, any help is appreciated.
EDIT: Let me revise my question to maybe make it a little more clear: is what I'm seeing in terms of sheer execution time performance to be expected, where a dedicated server # $270/month is well outperforming Microsoft's Azure SQL DB P2 tier # $930/month? Ignore the other "perks" like managed vs. unmanaged, ignore intended use like Azure being meant for production, etc. I just need to know if I'm missing something with Azure SQL DB, or if I really am supposed to get MUCH better performance out of a single dedicated machine.

(Disclaimer: I work for Microsoft, though not on Azure or SQL Server).
"Azure SQL" isn't equivalent to "SQL Server" - and I personally wish that we did offer a kind of "hosted SQL Server" instead of Azure SQL.
On the surface the two are the same: they're both relational database systems with the power of T-SQL to query them (well, they both, under-the-hood use the same DBMS).
Azure SQL is different in that the idea is that you have two databases: a development database using a local SQL Server (ideally 2012 or later) and a production database on Azure SQL. You (should) never modify the Azure SQL database directly, and indeed you'll find that SSMS does not offer design tools (Table Designer, View Designer, etc) for Azure SQL. Instead, you design and work with your local SQL Server database and create "DACPAC" files (or special "change" XML files, which can be generated by SSDT) which then modify your Azure DB such that it copies your dev DB, a kind of "design replication" system.
Otherwise, as you noticed, Azure SQL offers built-in resiliency, backups, simplified administration, etc.
As for performance, is it possible you were missing indexes or other optimizations? You also might notice slightly higher latency with Azure SQL compared to a local SQL Server, I've seen ping times (from an Azure VM to an Azure SQL host) around 5-10ms, which means you should design your application to be less-chatty or to parallelise data retrieval operations in order to reduce page load times (assuming this is a web-application you're building).

Perf and availability aside, there are several other important factors to consider:
Total cost: your $270 rental cost is only one of many cost factors. Space, power and hvac are other physical costs. Then there's the cost of administration. Think work you have to do each patch Tuesday and when either Windows or SQL Server ships a service pack or cumulative update. Even if you don't test them before rolling out, it still takes time and effort. If you do test, then there's a second machine and duplicating the product instance and workload for test.
Security: there is a LOT written about how bad and dangerous and risky it is to store any data you care about in the cloud. Personally, I've seen way worse implementations and processes on security with local servers (even in banks and federal agencies) than I've seen with any of the major cloud providers (Microsoft, Amazon, Google). It's a lot of work getting things right then even more work keeping them right. Also, you can see and audit their security SLAs (See Azure's at http://azure.microsoft.com/en-us/support/trust-center/).
Scalability: not just raw scalability but the cost and effort to scale. Azure SQL DB recently released the huge P11 edition which has 7x the compute capacity of the P2 you tested with. Scaling up and down is not instantaneous but really easy and reasonably quick. Best part is (for me anyway), it can be bumped to some higher edition when I run large queries or reindex operations then back down again for "normal" loads. This is hard to do with a regular SQL Server on bare metal - either rent/buy a really big box that sits idle 90% of the time or take downtime to move. Slightly easier if in a VM; you can increase memory online but still need to bounce the instance to increase CPU; your Azure SQL DB stays online during scale up/down operations.

There is an alternative from Microsoft to Azure SQL DB:
“Provision a SQL Server virtual machine in Azure”
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-provision-sql-server/
A detailed explanation of the differences between the two offerings: “Understanding Azure SQL Database and SQL Server in Azure VMs”
https://azure.microsoft.com/en-us/documentation/articles/data-management-azure-sql-database-and-sql-server-iaas/
One significant difference between your stand alone SQL Server and Azure SQL DB is that with SQL DB you are paying for high levels of availability, which is achieved by running multiple instances on different machines. This would be like renting 4 of your dedicated machines and running them in an AlwaysOn Availability Group, which would change both your cost and performance. However, as you never mentioned availability, I'm guessing this isn't a concern in your scenario. SQL Server in a VM may better match your needs.

SQL DB has built in availability (which can impact performance), point in time restore capability and DR features. You have the option to scale up / down your DB based on your usage to reduce the cost. You can improve your query performance using Global query (shard data). SQl DB manages auto upgrades and patching and greatly improves the manageability story. You may need to pay a little premium for that. Application level caching / evenly distributing the load, downgrading when cold etc. may help improve your database performance and optimize the cost.

SQL Server Architecture on Production Environment

I want to understand the best approach for SQL Server architecture on production environment.
Here is my problem:
I have database which has on average around 20,000 records being inserted every second in various tables.
We have reports also implemented for the same, now what's happening is whenever reports is searched by user, performance of other application steeps down.
We have implemented
Table Partitioning
Indexing
And all other required things.
My question is: can anyone suggest an architecture that have different SQL Server databases for reports and application, and they can sync themselves online every time when new data is entered in master SQL Server?
Some what like Master and Slave Architecture. I understand Master and Slave architecture, however need to get more idea around it.
Our main tables are having around 40 millions rows (table partitioning done)

In SQL Server 2008R2 you have database mirroring and replication available, which will keep two databases in sync.
A schema which is efficient for OLTP is unlikely to be efficient for large volume reporting. The 'live' and 'reporting' databases should have different schema with an ETL process moving data from one to the other. I'd would like to negotiate with the business just how synchronised the reporting database needs to be. If the reports are processing large amounts of data they will take some time to run so a lag in data replication will not be noticed, I would suggest. In extremis you could construct a solution using Service Broker to move the data and processing on the reporting server to distribute it amonst the reporting tables.
The numbers you quote (20,000 inserts per second, 40 millions rows in largest table) suggests a record doesn't reside in the DB for long. You would have a significant load performing DELETEs. Optimising these out of peak hours could be sufficient to solve your problems.

Is database replication the way to go to keep production and development databases in sync?

I am not a DBA; however, my small company is using SQL Server for a project that we are working on. On the same SQL Server instance there is a MS Great Plains (Dynamics GP) database - as we pass data back and forth between the two databases (mainly a scribe process getting our data and transferring it into GP).
We are using database replication (snapshot) as a means of syncing our production and development (and soon DR) environments. Right now its set to replicate every three hours during core business hours - mainly to keep production and development up to date for us while we are working.
1) Is this the correct way of doing such a thing? Is there a better way?
2) Does this stress the server or the SQL Server? Is this a possible cause of GP database issues because they are on the same server and instance?
3) Replication only occurs on the non GP database - this shouldn't affect the GP database at all right?
Our database should stay rather small. In doing the snapshot, it is my understanding that tables get locked while the replication is going on. Do the tables stay locked until the entire replication is done or are they off loading after they are completed as the process continues?

There are many ways to sync a SQL Server with another. There is replication which you are currently using, log shipping, backup/restore, mirroring, and Always On to name a few methods.
The "best" method depends on your requirements. If you're concerned about disaster recovery, snapshot replication is not a great option and I would look into AlwaysOn Availability Groups.
If load on your production system is a concern I would look into nightly restoring a backup of the production system.
To answer your specific questions:
1) Is this the correct way of doing such a thing? Is there a better way?
This answer depends on your exact requirements
2) Does this stress the server or the SQL Server?
Doing something is always more work than doing nothing. Depending on many factors this could affect your production server.
3) Replication only occurs on the non GP database - this shouldn't affect the GP database at all right?
Your server only has a finite amount of hardware resources. It could affect the performance of queries against the GP database

We have found that having replication in place also adds complexity when it comes to upgrades and schema changes. If you must have dev and prod in sync (and I would argue about that) Always On or log shipping would be my preferred techniques.
DR is a separate issue. You have to determine your Recovery Point Objective (RPO) and Recovery Time Objective (RTO) and adopt the appropriate technology to satisfy your requirements.

SQL Server 2012 Database Transaction Replication performance issue

We have configured SQL Server 2012 database transaction replication for our client's .NET web application to distribute SQL transaction and reporting on different SQL Servers.
We had implemented transaction replication on to SQL-Node1 is working as Master DB Server, We'd configured replication of Master DB on SQL-Node2 to pull out reports in to our web application which having lots of transactions and data uploading from excel sheet entries around 10 million entries each day.
After configured replication on two SQL Server 2012 instances, after few weeks we facing some performance issues and found some resource get locked during uploading files on to database that's why application unable to access those tables and data. Also found that server performing too much slow during day time when users access our web application.
Now we are looking to distribute loads on different 3 Nodes of SQL Server 2012. Where web application will access and transact data on SQL-Node1, Reporting queries get pull data from SQL-Node2 and SQL-Node3 will be get used to upload excel sheet data on to Database which will get replicated on all other SQL Nodes.
Current setup, all servers having Windows Server 2008 Standard and SQL Server 2012 Enterprise Edition.
Database size approx : 15 GB / Replication used : Transaction / Distributor role configured on SQL Node1 / Subscriber role configured on SQL Node 2.
We are looking for solution to resolve above issues which can distribute different loads (reporting, data uploading, transaction) and replicate data between all SQL Nodes.
Which feature will do perform well for above scenario among SQL Server 2012 HA, SQL Server Replication or SQL Server Mirroring ??
Quick response will be highly appreciated....

Because you have changes happening at more than one node (transactional data at node 1, excel uploads at node 3), "none of the above". All of the abovementioned technologies are built on having data changes happen in one location and propagating to others. You could look at peer to peer replication, but it seems like overkill.
If it were me, I'd try to diagnose why your file upload process is killing performance and fix/work around that. Once you do that, I'd move that process back to node 1 and implement an availability group to cover your reporting needs (with the added bonus of HA).

All of the technologies would bog down on a large data import that is done in one big transaction. I suggest doing it as an ETL like function. Import into a staging table and migrate the data into the production table in bite sized chunks (test many data row sizes to find the size that works the best for your environment). 2 servers should be fine with replication on a cluster for HA with work loads you are talking about.

Sql Server replication over wan

Im looking at developing a simple ecommerce platform and need to replicate product and customer data to the web host over the internet so the website can run disconnected. The two options i can think of at present are using enterprise messaging and database replication.
Im leaning towards database replication over enterprise messaging as enterprise messaging would require additional developer resource to write all the plumbing code. Anyone have any success using sql server one way replication over unreliable wan links through the internet?

I'm sorry I missed this... NitroAccelerator from Nitrosphere.com is built exactly to speed up replication over the internet. It compresses the TDS packets very efficiently and results in 80-90% improvement in replication times.

In the last company I worked for we had full merge replication for some of our customers.
There were 2 scenarios
Merge Replication for hanadheld devices
Some of our customers had PDAs and they subscribed to some published tables of our main database. They were disconnected for large periods and merge replication worked fine and updated changes on both sides when the connection was restored
Full site to site Merge Replications
This was used for customers that had remote offices but required a fully synchronized local database for performance reasons. In most cases the VPN was extremely poor and we did have some instances of the VPN being down for a week and on restoration replication synchronized both database without an issue.
In both cases replication seems to be very fault tolerant and performed very well.
In your case its one way replication so there should not be no merge conflicts to deal with making the situation easier.
There is a learning curve with replication but as a technology it works very well I found even over poor connections.
Liam

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas