Daily Data subset of main database - sql

I have a large Db 500GB one of our customers wants daily snapshot of only his data, He only has a 3mb connection , I suspect that is the Max ! What method is the most effective method I could use?
1. Views that are updated but it wants the underlaying tables.
2. Replication I don’t know much about this.
3. Alternative method.

Merge replication, which allows you to initialize a subscriber without using a snapshot. You will have to initialize replication on the subscriber from a backup. You can possibly do the same with transactional replication, but it just never worked quite the same for me. YMMV. When it breaks (etc.) you will have to be prepared to ship a new backup and start over (hacking replication sometimes works, but don't count on it). Changing database structures is also a pain once in replication. I have seen ~500Gb with less than 3Mbit work in production, though not without proper planning and preparation (and grey hair)
I have used transactional replication with a read only subscription, where I invoked the distribution with a batch file on a task schedule once a day. Transactional replication did not feel as maintained (from MS) or as stable as merge replication, though the data integrity with transactional was more consistent
I have not tried transaction log shipping, but that might also be an option
(p.s. notice I didn't say "If it breaks")

Related

Strategy for keeping separate Databases in Sync

I have a NoSQL database that we are using for data processing, as it can be used for my application faster than SQL can. I'm treating our NoSQL database almost like a cache of information, with the SQL being the authority of data, and the NoSQL store being updated with changes. Right now this is being done through our application, so when a request comes in for a change, it is made in the SQL database, and the NoSQL database. This is failing at times as sometimes the NoSQL update fails, or other situations cause the NoSQL database to get out of sync.
I could do a batch update every X minutes, however it is a lot of information in the data stores, and it would take hours to ensure that they are in sync. We have some timestamps to do a difference of what has been changed, but this is not always accurate.
I'm wondering what some recommended strategy for keeping a data store(secondary database cache) in sync with my main store are?
I know I've done with this with messaging in the past - specifically JMS with ActiveMQ. I would send the updates to a NoSQL store (Mongo) by using a queue. This way messages could accumulate in the queue and if the connection to the NoSQL store ever got severed, it could pick up where it left off.
It worked really well because ActiveMQ was really stable and simple to work with.
I've always seen this done with diffs like you mentioned. You introduce date fields all over and then keep track of the latest sync. The nice thing about this approach is that it easily allows you to replay transactions by modifying the last sync date.
One last piece of advice ... write good tools around pumping data from point A to point B (in this case SQL to NoSQL). I wrote several tools to bulk load the NoSQL store from SQL at my last job and it made life easy if anything got really out of sync. Between scripts and bulk loading processes, I could always recover.

sql server 2005 mirrored database transaction log file maintenance

Ok so for standard, non-mirrored databases, the transaction log is kept in check either simply by having the database in simple mode or by doing regular backups. We keep ours in simple as we have SAN snapshot backups taking place and there is no need for SQL backups.
We're now going to mirroring. I obviously no longer have the choice of simple mode and must use full. this obviously leads to large log files and the need for log backups. That's fine I can deal with that; a maintenance plan that takes a log backup and discards any previous ones. I realise that this backup is essentially useless without its predecessors but the SAN snapshots are doing the backups.
My question is...
a) Is there a way to truncate the log file of all processed rows without creating a backup? (as I can't use them anyway...)
b) A maintenance plan is local to a server and is not replicated across a mirrored pair. How should it be done on a mirrored setup? such that when the database fails over, the plan starts running on the new principal, but doesn't get upset when its a mirror?
Thanks
A. If your server is important enough to mirror it, why isn't it important enough to take transaction log backups? SAN snapshots are point-in-time images of just one point in time, but they don't give you the ability to stop at different points of time along the way. When your developers truncate a table, you want to replay all of the logs right up until that statement, and stop there. That's what transaction log backups are good for.
B. Set up a maintenance plan (or even better, T-SQL scripts like Ola Hallengren's at http://ola.hallengren.com) to back up all of the databases, but check the boxes to only back up the online ones. (Off the top of my head, not sure if that's an option in 2005 - might be 2008 only.) That way, you'll always get whatever ones happen to fail over.
Of course, keep in mind that you need to be careful with things like cleanup scripts and copying those backup files. If you have half of your t-log backups on one share and half on the other, it's tougher to restore.
a) no, you cannot truncate a log that is part of a mirrored database. backing the logs up is your best option. I have several databases that are setup with mirroring simply based on teh HA needs but DR is not required for various reasons. That seems to be your situation? I would really still recommend keeping the log backups for a period of time. No reason to kill a perfectly good recovery plan that is added by your HA strategy. :)
b) My own solutions for this are to have a secondary agent job that monitors based on the status of the mirror. If the mirror is found to change, the secondary job on teh mirror instance is enabled and if possible, the old principal is disabled. if the principal was down and it comes back up, the job is still disabled. the only way the jobs themselves would be switched back is the event of again, another forced failover.

Creating tables in SQL Server 2005 master DB

I am adding a monitoring script to check the size of my DB files so I can deliver a weekly report which shows each files size and how much it grew over the last week. In order to get the growth, I was simply going to log a record into a table each week with each DB's size, then compare to the previous week's results. The only trick is where to keep that table. What are the trade-offs in using the master DB instead of just creating a new DB to hold these logs? (I'm assuming there will be other monitors we will add in the future)
The main reason is that master is not calibrated for additional load: it is not installed on IO system with proper capacity planning, is hard to move around to new IO location, it's maintenance plan takes backups and log backups are as frequent as needed for a very low volume of activity, its initial size and growth rate are planned as if no changes are expected. Another reason against it is that many troubleshooting scenarios you would want a copy of the database to inspect, but you'd have to attach a new master to your instance. These are the main reasons why adding objects to master is discouraged. Also many admins understandably prefer an application to use it's own database so it can be properly accounted for, and ultimately easily uninstalled.
Similar problems exist for msdb, but if push comes to shove it would be better to store app data in msdb rather than master since the former is an ordinary database (despite widespread believe that is system, is actually not).
The Master DB is a system database that belongs to SQL Server. It should not be used for any other purposes. Create your own DB to hold your logs.
I would refrain from putting anything in master, it could be overwritten/recreated on an upgrade.
I have put a DBA only ServerInfo database on each server for uses like this, as well as any application specific environmental things (things that differ between prod and test and dev).
You should add a separat database for the logging. It is not garanteed that the master database is not breaking the next patch of sql server if you leave your objects in there.
And microsoft itself does advise you to not do it.
http://msdn.microsoft.com/en-us/library/ms187837.aspx

Using Sql Server Replication

We are using Replication and seem to be having endless problems with it. It seems to shut down for unknown reasons. It needs to be shut down to remove a column and only starts back up half the time. Does anyone have any advice on how to properly use replication or some alternatives to it.
Edit:
We are using Sql Server 2005, We cannot use database mirroring as we used the other database for reporting. As far as I am aware you cannot query from a mirrored database.
If you need just couple of tables from your DB for reports, replication is more useful, but you also can set up log shipping with secondary server in STAND BY mode (especially if you need significant part of your data for reports), then you can run reports on secondary server. You just have to remember that log shipping will interfere with transaction log backups, so you have to use the same folder with log backup files for both processes.
I would think the combination of database mirroring and database snapshots will solve your issues.
First, database mirroring is very easy to setup and I have never had any problems with it (using it for the past 4+ years).
Second, creating a database snapshot on your failover server will allow you to run reports. You can setup a sql agent job to drop and re-create the snapshot on whatever acceptable interval you like.
Of course this is all dependent on if you need your reports to run on real-time data or if they can be delayed somewhat.
Here are a list of the problems that I have had to resolve to get replication working:
1) The replication sometimes lies to me and tells me this, even when its working fine.
"The server 'Bob' is not a Subscriber. (.Net SqlClient Data Provider)" I have tried to re-initialise it thinking that it was broken and it never was...
2) It can take a little while to restart itself, especially if your remote DB is on the other side of the planet, which it is in my case. If you are on a slow network connection, or it is not 100% reliable, then you can have problems. Also, the jobs which restart the process can sometimes take a while to run, which also delays things further.
3) Some changes require full re-initalisation which involves sending a new snapshot out. If you don't have your permissions quite right, and you can re-initialise manually, but it doesn't happen automatically, then this can be a another reason for problems.
We have a SQL transactional replication which runs perfectly happily. You seem to say that it is when you are making schema changes to the publisher that you get problems. Each time we do a schema change we drop the publication, subscription and the subscription database. Do the change, then re-build it all. We can do this becuase we can tolerate the time it takes to re-apply the snapshot. There are ways to apply schema changes to the publication and have them propogate to the subscriber. Take a look at sp_register_custom_scripting. We have made this work once, so I can give some more information about it if you need.
As #Jason says, you can report from a mirrored database by using a snapshot. Beware that the snapshot will take up space, and cause more work for the mirror server. Although how much space will depend on how much data is changing and how big your original database is. We do use a snapshot on a mirrored database for occasional reports because our entire database is not replicated.
log shipping http://msdn.microsoft.com/en-us/library/ms187103.aspx
What version of SQL Server are you using?
We're using replication now for a particular solution, and it seems to just work, day in, day out.
I would examine your event log's, and SQL Server logs to see if you can determine why it is shutting down, and why it doesn't start up.
Are you possibly patching the servers, or are you having network errors?
The alternatives to replication are log shipping, or database mirroring.
I personally prefer Database Mirroring, but it really depends what you're trying to do, as some of these aren't appropriate for certain situations.
We also have used SQL transactional replication. We had the same pains with updating schema, which requires dropping the publication on all servers, performing the updates, and then reinitializing replication, and hoping for the best. Sometimes it would not initialize, or a node would fall behind and we'd get little warning for it. A few times we even lost all the stored procedure execute permissions causing pretty much total failure on the websites.
We have a rather large database so reinitialization could take quite some time, meaning all updates had to be done at 2am on Sunday - not exactly when we're awake and alert and able to use all our faculties to deal with a problem that might arise.
We are ditching replication in favor of failover clustering on SQL 2008, but it can still be done all the way back to SQL 2000.
http://technet.microsoft.com/en-us/library/cc917693.aspx

SQL 2005 Transactional Replication: Behavior during snapshot processing?

So, I've got SQL (2005) Transactional Replication generally working well with a single publisher and single (read-only) subscriber. Data changes and updates flow perfectly, with about 5 second latency, which is just fine.
My one nagging problem, that I've spent a couple days trying to solve (and Googling everywhere for answers) is that new sprocs/tables/etc. do not get propagated to the read-only subscriber, even though I've added them as "articles" to the "publication". The publication has "transmit schema changes" set to ON, and stored procedures are set to transfer their definitions. But, for some reason, they don't.
My "snapshot agent" process is set to NOT SCHEDULED. (In other words, it only happens once, when I initiate it manually.) Should I be putting this on a schedule to enable the transfer of new or modified tables and sprocs?
I thought the mere act of adding the object as an article to the publication would do it, but it's still not sending it unless I do a snapshot. The WAN connecting these is totally fast and reliable, so that's not the issue, and table-data-updates transfer relatively fast and flawlessly.
While I could put my snapshot agent on a schedule, does this have any real-time production impacts for users of the main publication database or the read-only copy? (My site currently gets 4+ million unique-users a month, so I'd like to have minimal disruption...) Thanks!
Transactional replication only distributes (and then subsequently publishes) the DML (Data Manipulation Language) statements from the transaction log of the source (publication) database.
New tables and stored procedures are not replicated to the subscriber. Schema changes in this particular context, although I have to admit it is a little unclear in some of the Books Online documentation, refer to the existing schema, i.e. if you were to a add column to an existing database this change would be propagated to the subscribers.
For clarification here is a Microsoft article that details the schema changes that you can make.
[http://msdn.microsoft.com/en-us/library/ms151870(SQL.90).aspx][1]
I hope this helps. Replication is a big subject area so please let me know if I can be of further assistance.
Oh yes, you are correct, if you add new articles to your publication you will need to create an updated snapshot.
Cheers,