It is an insane idea to delete records from backup since the notion of backup is to serve on disaster. But in our case, data deletion is a valid use-case.
Requirement: in brief, we are in need of a system which is capable of deleting a specific record from an active database instance and from all its backups.
We have a fully functional internal system which is capable of performing the mentioned requirement of deleting data from active database. But what we don't know is how to do the same agonist all these database backups.
Question:
Is it possible to find a specific record from a backup?
Is there any predefined schema or data allocation style within SQL Server backup file, which allow us to isolate a specific record?
Can you share any thoughts or experience you have on such style of deletion?
Note: we take 2 full backup daily and store a week worth (14 in total) at any point in time.
I do understand the business concept of "deleted everywhere".
I do not know of any way to do this. I do not believe the format of the backup is even published. That doesn't mean that someone hasn't hacked it, but it certainly isn't a broadly known capability.
I think that, in order to do this, you will need to securely wipe all copies of backups and take new backups. You then lose the point in time recovery capability.
Solution: The way that I would address this business requirement is to recover each backup, delete the desired record(s), secure wipe the backup media (or destroy the old media and use new media), and then take a new backup of THAT recovered version. That will give you a point in time recovery of that data without the specific record(s).
You can't modify the contents of a .bak file. You shouldn't want to do that either. If you want to restore to a specific point in time you should use the Full recovery model and take differential and log backups instead of just full backups.
Related
I'm using SpecFlow with Selenium for doing UI testing on my ASP.NET MVC websites. I want to be able to restore the database (SQL Server 2012) to its pre-test condition after I have finished my suite of tests, and I want to do it as quickly as possible. I can do a full backup and restore with replace (or with STOPAT), but that takes well over a minute, when the differential backup itself took only a few seconds. I want to basically set a restore point and then revert to it as quickly as possible, dropping any changes made since the backup. It seems to me that this should be able to be done very quickly, without needing to overwrite the whole database. Is this possible, and if so, how?
Not with a differential backup. What a differential backup is is an image of all of the data pages that have changed since the last full backup. In order to restore a differential backup, you must first restore it's base (i.e. full) backup and then the differential.
What you're asking for is some process that keeps track of the changes since the backup. That's where a database snapshot will shine. It's a copy-on-write technology which is to say that when you make a data modification, it writes the pre-change state of the data page to the snapshot file before writing the change itself. So reverting is quick as it need only pull back those changed pages from the snapshot. Check out the "creating a database snapshot" example in the CREATE DATABASE documentation.
Keep in mind that this isn't really a good guard against failure (which is one of the reasons to take a backup). But for your described use case, it sounds like a good fit.
What are the consequences of a full DB Backup on the LSN?
Will it break the LogShipping that is already configured and running when a full DB backup is complete (if it changes the LSN).
Also what is the best way to perform a DB resilience.
Just have LogShipping configured
Have LogShipping configured (15 min interval) with weekly full backups?
Have a incremental backup every 15 mins and then a weekly full backup
Are there any great tools that would simplify the above process?
Full backups have no effect on the log chain except for the first full backup taken after database creation or after a change to the FULL recovery model. It will not break log shipping. What could break log shipping is if you took a manual log backup and did not apply that to your secondary database.
The best recovery strategy for your database depends completely upon your situation. How much data loss is acceptable? How long of an outage can you reasonably allow for a restore to occur? It's not possible for someone on an online forum to provide an answer that you should rely on for your recovery strategy. You need to research backup and recovery very carefully, understand it inside and out, then apply the strategy that is best for your particular situation. There are many tools out there for backup from companies such as Red-Gate, Idera, Dell, etc.
That being said, just having log shipping is absolutely not good enough.
Also, #Anup ... COPYONLY only prevents the differential bitmap from being reset. It does not impact log backups.
LiteSpeed works fantastic for your full backups, I would run a full backup daily and NOT weekly. Log shipping with or without Litespeed then to be done every 15 minutes is very typical. If you wait too long then Data from your primary to your secondary is too out of sync and you end up with errors based on what SQL Server has.
I always loved Red Gate products, but I have not used their log shipping / backup tools , only their data compare and their schema compare etc..
I believe Quest makes LiteSpeed and Quest I think is owned by Dell.
Currently we are stuck having to monitor the log shipping and it is not full proof. I would prefer Mirroring or Replication.
Ok so for standard, non-mirrored databases, the transaction log is kept in check either simply by having the database in simple mode or by doing regular backups. We keep ours in simple as we have SAN snapshot backups taking place and there is no need for SQL backups.
We're now going to mirroring. I obviously no longer have the choice of simple mode and must use full. this obviously leads to large log files and the need for log backups. That's fine I can deal with that; a maintenance plan that takes a log backup and discards any previous ones. I realise that this backup is essentially useless without its predecessors but the SAN snapshots are doing the backups.
My question is...
a) Is there a way to truncate the log file of all processed rows without creating a backup? (as I can't use them anyway...)
b) A maintenance plan is local to a server and is not replicated across a mirrored pair. How should it be done on a mirrored setup? such that when the database fails over, the plan starts running on the new principal, but doesn't get upset when its a mirror?
Thanks
A. If your server is important enough to mirror it, why isn't it important enough to take transaction log backups? SAN snapshots are point-in-time images of just one point in time, but they don't give you the ability to stop at different points of time along the way. When your developers truncate a table, you want to replay all of the logs right up until that statement, and stop there. That's what transaction log backups are good for.
B. Set up a maintenance plan (or even better, T-SQL scripts like Ola Hallengren's at http://ola.hallengren.com) to back up all of the databases, but check the boxes to only back up the online ones. (Off the top of my head, not sure if that's an option in 2005 - might be 2008 only.) That way, you'll always get whatever ones happen to fail over.
Of course, keep in mind that you need to be careful with things like cleanup scripts and copying those backup files. If you have half of your t-log backups on one share and half on the other, it's tougher to restore.
a) no, you cannot truncate a log that is part of a mirrored database. backing the logs up is your best option. I have several databases that are setup with mirroring simply based on teh HA needs but DR is not required for various reasons. That seems to be your situation? I would really still recommend keeping the log backups for a period of time. No reason to kill a perfectly good recovery plan that is added by your HA strategy. :)
b) My own solutions for this are to have a secondary agent job that monitors based on the status of the mirror. If the mirror is found to change, the secondary job on teh mirror instance is enabled and if possible, the old principal is disabled. if the principal was down and it comes back up, the job is still disabled. the only way the jobs themselves would be switched back is the event of again, another forced failover.
I am adding a monitoring script to check the size of my DB files so I can deliver a weekly report which shows each files size and how much it grew over the last week. In order to get the growth, I was simply going to log a record into a table each week with each DB's size, then compare to the previous week's results. The only trick is where to keep that table. What are the trade-offs in using the master DB instead of just creating a new DB to hold these logs? (I'm assuming there will be other monitors we will add in the future)
The main reason is that master is not calibrated for additional load: it is not installed on IO system with proper capacity planning, is hard to move around to new IO location, it's maintenance plan takes backups and log backups are as frequent as needed for a very low volume of activity, its initial size and growth rate are planned as if no changes are expected. Another reason against it is that many troubleshooting scenarios you would want a copy of the database to inspect, but you'd have to attach a new master to your instance. These are the main reasons why adding objects to master is discouraged. Also many admins understandably prefer an application to use it's own database so it can be properly accounted for, and ultimately easily uninstalled.
Similar problems exist for msdb, but if push comes to shove it would be better to store app data in msdb rather than master since the former is an ordinary database (despite widespread believe that is system, is actually not).
The Master DB is a system database that belongs to SQL Server. It should not be used for any other purposes. Create your own DB to hold your logs.
I would refrain from putting anything in master, it could be overwritten/recreated on an upgrade.
I have put a DBA only ServerInfo database on each server for uses like this, as well as any application specific environmental things (things that differ between prod and test and dev).
You should add a separat database for the logging. It is not garanteed that the master database is not breaking the next patch of sql server if you leave your objects in there.
And microsoft itself does advise you to not do it.
http://msdn.microsoft.com/en-us/library/ms187837.aspx
Say there is a database with 100+ tables and a major feature is added, which requires 20 of existing tables to be modified and 30 more added. The changes were done over a long time (6 months) by multiple developers on the development database. Let's assume the changes do not make any existing production data invalid (e.g. there are default values/nulls allowed on added columns, there are no new relations or constraints that could not be fulfilled).
What is the easiest way to publish these changes in schema to the production database? Preferably, without shutting the database down for an extended amount of time.
Write a T-SQL script that performs the needed changes. Test it on a copy of your production database (restore from a recent backup to get the copy). Fix the inevitable mistakes that the test will discover. Repeat until script works perfectly.
Then, when it's time for the actual migration: lock the DB so only admins can log in. Take a backup. Run the script. Verify results. Put DB back online.
The longest part will be the backup, but you'd be crazy not to do it. You should know how long backups take, the overall process won't take much longer than that, so that's how long your downtime will need to be. The middle of the night works well for most businesses.
There is no generic answer on how to make 'changes' without downtime. The answer really depends from case to case, based on exactly what are the changes. Some changes have no impact on down time (eg. adding new tables), some changes have minimal impact (eg. adding columns to existing tables with no data size change, like a new nullable column that doe snot increase the null bitmap size) and other changes will wreck havoc on down time (any operation that will change data size will force and index rebuild and lock the table for the duration). Some changes are impossible to apply without *significant * downtime. I know of cases when the changes were applies in parallel: a copy of the database is created, replication is set up to keep it current, then the copy is changed and kept in sync, finally operations are moved to the modified copy that becomes the master database. There is a presentation at PASS 2009 given by Michelle Ufford that mentions how godaddy gone through such a change that lasted weeks.
But, at a lesser scale, you must apply the changes through a well tested script, and measure the impact on the test evaluation.
But the real question is: is this going to be the last changes you ever make to the schema? Finally, you have discovered the perfect schema for the application and the production database will never change? Congratulation, once you pull this off, you can go to rest. But realistically, you will face the very same problem in 6 months. the real problem is your development process, with developers and making changes from SSMS or from VS Server Explored straight into the database. Your development process must make a conscious effort to adopt a schema change strategy based on versioning and T-SQL scripts, like the one described in Version Control and your Database.
Use a tool to create a diff script and run it during a maintenance window. I use RedGate SQL Compare for this and have been very happy with it.
I've been using dbdeploy successfully for a couple of years now. It allows your devs to create small sql change deltas which can then be applied against your database. The changes are tracked by a changelog table within database so that it knows what to apply.
http://dbdeploy.com/