RavenDB Periodic Backups: How can I "clear the ledger" and force a full backup every so often? - ravendb

When you enable RavenDB's 'Period Backups' bundle, RavenDB does the following:
Backs up the entire database.
At every interval (or 'n' minutes), RavenDB makes an incremental backup (or delta backup) of all the changes that occurred since the last interval.
I'm comfortable with this configuration with one caveat.
Every week, I'd like to "clear the ledger" and force RavenDB to backup the entire database and resume making incremental backups from this new starting point.
How can I do this in an automated fashion?

From the Raven.Backup utility documentation:
incremental - Optional. When specified, the backup process will be incremental when done to a folder where a previous backup lies. If dest is an empty folder, or it does not exist, a full backup will be created. For incremental backups to work, the configuration option Raven/Esent/CircularLog has to be set to false.
So the solution to my problem is:
Every week, delete the dest directory.
This will force RavenDB to create a full backup.

I don't believe this is a supported scenario.
RavenDB's Periodic Backup bundle is intended to work with incremental updates, and AFAIK there's nothing to force a full update or make the bundle believe it's starting from a clean slate.
If you want to do full backups, you'll need to use Raven.Backup.exe, which can do either incremental or full backups. You can trigger it to run programmatically, via REST, and via command line utility, and it works with IIS.

Related

How to Automate Backup verification in SQL Server

I am taking full backup every day, Differential backup after 12 hrs and log backup every 15 minutes.
I want to verify those backups automatically.
I know the command "RESTORE VERIFYONLY FROM DISK = 'D:\Test.bak' " but we have to verify it manually one by one. as i have automated backup process. Need to do verification too.
Is there any way to do this.
If you use Ola Hallengren's Maintenance Solution (and I recommend that you do, as it makes the scheduling and validation of your backups as you've described them here and other maintenance tasks much easier to set up and manage), you can have it automatically run RESTORE VERIFYONLY after backing up databases by specifying #Verify='Y' in the parameters you pass into it.
If you'd like a PowerShell solution, check out the dbatools function Restore-DbaDatabase, with the -VerifyOnly switch passed in. It will locate the most recent backup of each database (or a specified list of databases) and perform a test of that backup in the way you specify (verify only or, by default, full restore & DBCC CHECKDB)

How to do a quick differential backup and restore in SQL Server

I'm using SpecFlow with Selenium for doing UI testing on my ASP.NET MVC websites. I want to be able to restore the database (SQL Server 2012) to its pre-test condition after I have finished my suite of tests, and I want to do it as quickly as possible. I can do a full backup and restore with replace (or with STOPAT), but that takes well over a minute, when the differential backup itself took only a few seconds. I want to basically set a restore point and then revert to it as quickly as possible, dropping any changes made since the backup. It seems to me that this should be able to be done very quickly, without needing to overwrite the whole database. Is this possible, and if so, how?
Not with a differential backup. What a differential backup is is an image of all of the data pages that have changed since the last full backup. In order to restore a differential backup, you must first restore it's base (i.e. full) backup and then the differential.
What you're asking for is some process that keeps track of the changes since the backup. That's where a database snapshot will shine. It's a copy-on-write technology which is to say that when you make a data modification, it writes the pre-change state of the data page to the snapshot file before writing the change itself. So reverting is quick as it need only pull back those changed pages from the snapshot. Check out the "creating a database snapshot" example in the CREATE DATABASE documentation.
Keep in mind that this isn't really a good guard against failure (which is one of the reasons to take a backup). But for your described use case, it sounds like a good fit.

In Endeca I want to have dgraph backups saved on dgraph server automatically after baseline update.

How to have 3 dgraphs backup saved automatically in dgraph server and not on ITL server . By default backup of dgidx output gets saved on ITL server . I want it to be saved on dgraph server ie MDEX host. Please help.
I don't believe there to be an Out-of-the-Box option for backing up the deployed dgidx output on the target server. Have you gone through the documentation? I would also question whether it is a good idea. Consider you are deploying and 2 of the 3 servers have gone through successfully but the third one fails. You now need to roll back only two of the machines. Your central EAC will not know which ones to rollback and which ones to keep. However, by keeping it all at a central point (ie. on the ITL server) in the event of a rollback you will always push the same backup out to all three servers.
Assuming that you are trying to speed up the deployment of very large indices (Endeca copies the entire dgidx output to each MDEX), you can always look at the performance tuning guide.
You should be able to do this in any number of ways:
In any baseline update, dgidx_output is automatically copied to each
dgraph server. You should be add a copy or archive job as a
pre-shutdown task for your dgraph.
You could also create a custom copy job that would for each dgraph
server that would run at the end or beginning of a baseline update.
Or it could be offline from your baseline update entirely.
To a point radimpe makes, making copies on the dgraph servers is not that hard, but rather, it's the rollback process you need to really consider. You need to set that up and ensure it uses whatever backup copies you've made, whether local to the ITL machine or on the dgraph servers.
Also know that dgidx_output will not include any partial update information added since the index was created. Partial update info only be available in the dgraph_input on the dgraph servers. Accordingly, if you incorporate partial updates, you should archive the dgraph input and make that available for any rollback job.
You can create a DGRAPH post startup task and assign it in the graph definitions. It will will be executed on each MDEX startup
<dgraph id="Dgraph01" host-id="LiveMDEXHost01" port="10000" pre-shutdown-script="DgraphPreShutdownScript" post-startup-script="DgraphPostStartupScript">
<script id="DgraphPostStartupScript">
<bean-shell-script>
<![CDATA[
...code to backup here
]]>
</bean-shell-script>
</script>

sql server 2005 mirrored database transaction log file maintenance

Ok so for standard, non-mirrored databases, the transaction log is kept in check either simply by having the database in simple mode or by doing regular backups. We keep ours in simple as we have SAN snapshot backups taking place and there is no need for SQL backups.
We're now going to mirroring. I obviously no longer have the choice of simple mode and must use full. this obviously leads to large log files and the need for log backups. That's fine I can deal with that; a maintenance plan that takes a log backup and discards any previous ones. I realise that this backup is essentially useless without its predecessors but the SAN snapshots are doing the backups.
My question is...
a) Is there a way to truncate the log file of all processed rows without creating a backup? (as I can't use them anyway...)
b) A maintenance plan is local to a server and is not replicated across a mirrored pair. How should it be done on a mirrored setup? such that when the database fails over, the plan starts running on the new principal, but doesn't get upset when its a mirror?
Thanks
A. If your server is important enough to mirror it, why isn't it important enough to take transaction log backups? SAN snapshots are point-in-time images of just one point in time, but they don't give you the ability to stop at different points of time along the way. When your developers truncate a table, you want to replay all of the logs right up until that statement, and stop there. That's what transaction log backups are good for.
B. Set up a maintenance plan (or even better, T-SQL scripts like Ola Hallengren's at http://ola.hallengren.com) to back up all of the databases, but check the boxes to only back up the online ones. (Off the top of my head, not sure if that's an option in 2005 - might be 2008 only.) That way, you'll always get whatever ones happen to fail over.
Of course, keep in mind that you need to be careful with things like cleanup scripts and copying those backup files. If you have half of your t-log backups on one share and half on the other, it's tougher to restore.
a) no, you cannot truncate a log that is part of a mirrored database. backing the logs up is your best option. I have several databases that are setup with mirroring simply based on teh HA needs but DR is not required for various reasons. That seems to be your situation? I would really still recommend keeping the log backups for a period of time. No reason to kill a perfectly good recovery plan that is added by your HA strategy. :)
b) My own solutions for this are to have a secondary agent job that monitors based on the status of the mirror. If the mirror is found to change, the secondary job on teh mirror instance is enabled and if possible, the old principal is disabled. if the principal was down and it comes back up, the job is still disabled. the only way the jobs themselves would be switched back is the event of again, another forced failover.

sane backup strategy for webapps

I'm doing a webapp and need a backup plan. Here's what I've got so far:
nightly encrypted backup of the SQL database to Amazon S3 and my external drive (incremental if possible, not overly familiar with PostgreSQL yet, but that's another thread)
nightly backup of my Mercurial repo (which includes Apache configs, deploy scripts, etc) to S3 (w/ local backups via Time Machine)
Should I add anything else, or will this cover it? For a gauge of how critical the data is/would be, it's a project management app along the lines of Basecamp.
Weekly full backup of your database as well as nightly incremental ones as well perhaps?
It means that if one of your old incremental backups gets corrupted then you have lost less than a week of data.
Also, ensure you have a backup test plan to ensure your backups work. There are a lot of horror stories going around about this, from companies that have been doing backups for years, never testing them and then finding out none of them are any good once they need them. (I've also been at a company like this. Thankfully I spotted the backups weren't working before they were required and fixed the problems).
One of the best strategies that worked for me in the past was to have the "backup" process just be the same as the install process, i.e. we fully scripted in linux the server configuration, application creation, database setup, etc etc so a install would look like:
./install.sh [server] [application name]
and the backup/recovery
./install [server] [application name] -database [database backup file]
In terms of backup the database was backed up fully (MySQL database), by a cronjob
This pretty much ensured that the recovery was tested every time a new instance was deployed, and the scripts ended up being used also to move instances when hardware needed replacement, or when a given server was a getting too much load from a customer.
This was the setup for a Saas enterprise application that I worked a few years back, so we had full control of the servers.
I would if you can change from a incremental back up to a differential. If you have a incremental then you would have to apply the weekly full backup and then every incremental following that. If one of your incrementals fails early in the week, then all your subsequent backups will fail too.
However if you use a differential then each differential contains all the changes since the last back up. so even if one of the back ups failed earlier in the week you would still be able to recover fully if you have a sucessful recent backup.
I hope i am explaining this well!
:)