Troubleshooting failover cluster problem in W2K8 / SQL05 - sql-server-2005

I have an active/passive W2K8 (64) cluster pair, running SQL05 Standard. Shared storage is on a HP EVA SAN (FC).
I recently expanded the filesystem on the active node for a database, adding a drive designation. The shared storage drives are designated as F:, I:, J:, L: and X:, with SQL filesystems on the first 4 and X: used for a backup destination.
Last night, as part of a validation process (the passive node had been offline for maintenance), I moved the SQL instance to the other cluster node. The database in question immediately moved to Suspect status.
Review of the system logs showed that the database would not load because the file "K:\SQLDATA\whatever.ndf" could not be found. (Note that we do not have a K: drive designation.)
A review of the J: storage drive showed zero contents -- nothing -- this is where "whatever.ndf" should have been.
Hmm, I thought. Problem with the server. I'll just move SQL back to the other server and figure out what's wrong..
Still no database. Suspect. Uh-oh. "Whatever.ndf" had gone into the bit bucket.
I finally decided to just restore from the backup (which had been taken immediately before the validation test), so nothing was lost but a few hours of sleep.
The question: (1) Why did the passive node think the whatever.ndf files were supposed to go to drive "K:", when this drive didn't exist as a resource on the active node?
(2) How can I get the cluster nodes "re-syncd" so that failover can be accomplished?
I don't know that there wasn't a "K:" drive as a cluster resource at some time in the past, but I do know that this drive did not exist on the original cluster at the time of resource move.

Random thought based on what happened to me a few months ago... sound quite similar
Do you have NFTS mount points? I forget what it was exactly (me code monkey and relied on DBAs), but the mount points were either "double booked" or not part of the cluster resource or the SAN volumes were not configured correctly.
We had "zero size" drives (I used xp_fixeddrives) for our log files but we could still write to them.
Assorted reboots and failovers were unsuccessful. Basically, it was a thorough review of all settings in the SAN management tool.
A possibility for your K: drive...
The other thing I've seen is the mounted drives have letters as well as being mounted in folders. I used to use mounted folders for SQL Server but the backup system used a direct drive letter.

Related

azure virtual machine disk errors

I've been using 3 identical VMs on Azure for a month or more without problem.
Today I couldn't Remote Desktop to one of them, and restarted it from the Azure Portal. That took a long time. It eventually came back up, and the Event log has numerous entries such as:
"The IO operation at logical block address 70 for Disk 0 ..... was retried"
"Windows cannot access the file C:\windows\Microsoft.Net\v4.0.30319\clrjit.dll for one of the following reasons, network, disk etc.
There are lots of errors like this. To me they seem symptomatic that the underlying disk system is having serious problems. Given the VHD is stored in a triple replicated Azure blob, I would have thought there was some immunity to this kind of thing?
Many hours later it's still doing the same thing. It works fine for a few hours, then slows to a crawl with the Event log containing lots of disk problems. I can upload screen shots of the event log if people are interested.
This is a pretty vanilla VM, I'm only using the one OS disk it came with.
The other two identical VMs in the same region are fine.
Just wondering if anybody has seen this before with Azure VMs and how to safeguard against it, or recover from it.
Thanks.
Thank you for providing all the details and we apologize for the inconvenience. We have investigated the failures and determined that they were caused by a platform issue. Your virtual machine’s disk does not have any problems and therefore you should be able to continue using it as is.

Query about EBS Backed Instances + Backup on S3 + Snapshots

I've spent a number of days looking into putting up two Windows Servers on Amazon, a domain controller and a remote desktop services server but there are a few questions that I can't find detailed or any answers for:
1) When you have an EBS backed instance I assume this means that all files (OS/Applications/Pagefile) etc are all stored on EBS? Physically in the datacentre, lets assume I have 50 gig of OS files/application data etc, are these all stored on just one SAN type device? What happens if that device blows up or say that particular data centre gets destroyed. Is the data elsewhere? What is the probability that your entire EBS volume can just disappear?
2) As I understand it you can backup your EBS instance to S3 with snapshotting. I assume you can choose how often to snapshot (say daily?). In my above scenario if I have 50 gig of files, and snapshot once a day. Over 7 days will my S3 storage be 350 gig or will it be 50 gig + incremental changes I have made over the week?
3) I remember reading somewhere that the instance has to go offline to snapshot. If that is the case does it do this by shutting down the guest OS, snapshotting then booting up or does it just detach the data, prevent you from connecting while it snapshots, then bring it back to the exact moment before it went for a snapshot.
4) I understand the concept of paying per month per gig of space but how I am concerned about the $0.11 per 1 million I/O requests. How does that work when I am running a windows server? I have no idea how many I/O requests a server makes to its disks. I am assuming a lot of the entire VM is being stored on an EBS volume. Is running a server on the standard EBS going to slow it down radically?
5) Are people using the snapshot to S3 as their main backup are are people running other types of backup for Data?
Sorry for the noob questions - I'd appreciate any partial answers, answers or advice anyone could offer me. Thanks in advance!
1) amazon is fuzzy on this. They say that data is replicated within the AZ it belongs to and that if you have less than 20GB of data changed since the last snapshot your annual failure rate is ~ 0.1-0.4%
2) snapshots are triggered manually, and are done incrementally
3) Depends on your filesystem. For example on a linux box with an xfs volume you can freeze IO to the volume, do your snapshot (takes only a second or so) and then unfreeze. If you take a snapshot without doing something similar you run the risk of the data being in an inconsistent state. This will depend on your filesystem
4) I run all my instances on EBS. You probably wouldn't want your pagefile on EBS, it would make more sense to use instance storage for that. The amount of IOs you use will be very dependant on the workload. The IO count depends heavily on your workload - an application server does a lot less IOPs than a database server for example. You're unlikely to use more than a few dollars a month per volume if you're running particularly IO heavy operations
5) Personally I don't care about the installed software/configuration (I have AMIs with that all setup so I can restore that in minutes), I only care about the data. I back that data up separately (S3 & Glacier). Partly that's because I was bitten by a bug EBS had about a year ago or so where they lost some snapshots
You also use multiple strategies, as Fantius commented. For example on the mongodb servers I run the boot volume is small (and never snapshotted or backed up since it can be restored automatically from an AMI), with a separate data volume containing the actual mongodb data. The mongodb volume is snapshotted as well as storing dumps on S3. Snapshots are an efficient way of creating backups (since you're only storing incremental changes) however you can't transfer them out of your EC2 region, whereas a tarball on S3 can easily be copied anywhere.

Is Amazon EC2 recommended for a persistent public facing website?

My company is about to write a new public facing website in SharePoint (so Windows Server 2008 RC2, SQL Server 2008 RC2, etc) and we're looking at using Amazon EC2 to host it. I've read and been told that instances can disappear (often through user-error, but also in batches), so I'm skeptical that EC2 is the best idea for us.
I've done research on the Amazon AWS site, but must confess that most of the terminology used is confusing, and Googling my questions often brought me here, so I thought I'd ask my questions here too and see if people can advise me.
1) It's critical that our website be available to the public as much as possible (the usual 99.9% up times apply). The Amazon EC2 Service Level Agreement commitment is 99.95% availability, which is fine, but what happens if we hit that 0.05% scenario? Would our E2 instance be lost? Can these be recovered? If so, what would we need to do to ensure that we recover to a not-too-old version of our site?
2) I've read about Amazon Elastic Block Store (EBS), and how this is persist independently from the lifetime of the instance. If I understand right, EBS is like having a hard-drive, so if the instance is lost we can start a new instance using our EBS to recover the latest version, while the 'local instance store' would be lost if the instance is lost as well. Is that right?
3) Are 'reserved instances' a more stable option? i.e. are they less likely to disappear? If they do still disappear, what recovery benefits do they offer, if any?
I know these questions are kinda vague, but hopefully you'll be able to offer a newbie from basic info - enough to point me in the right direction for further, deeper research at least.
Many thanks.
Kevin
We rely on AWS for our webservers. I won't use anything else. They're highly scalable, easily configurable and have an absurd uptime. I've never experienced downtime with them. We've been with them for two years.
Reserved instances are cheaper. Get them if you're planning on having that instance for a while. It's simply a cost/budgeting issue.
Never heard of people losing an EC2 instance.
Not terribly knowledgeable about EBS, but S3 is a good way to back up data.
HTH
EDIT:
Came across some links that might be helpful. Cheers.
http://techblog.netflix.com/2010/12/four-reasons-we-choose-amazons-cloud-as.html
http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-monkey.html
One of the main design goals of AWS is to make fault tolerant services--that is services that can recover from failures. That is, they design all of their services with the assumption that something will fail in some way at some point, but that there will be redundancies and other mechanism in place to recover from those inevitable failures.
In the case of storage services like S3 and SimpleDB, this is achieved primarily by replicating your data across multiple nodes (machines) in multiple data centers. So when one node experiences a hardware failure or one data center experiences a power outage, there's no real down time as the replicas can still service the requests. As a consumer, you aren't even aware of the down nodes or data centers.
EC2 is designed to work similarly, but it is not quite as encapsulated as S3 and SimpleDB, so you'll need to plan for a bit of the work yourself. For example, if you need a web service with guaranteed uptime and availablity, you'll want to look into AWS ELB (Elastic Load Balancing) service. That way if an instance is down, requests will automatically be routed to other healthy instances. For your data, you can either store it in other AWS services (like S3 and SimpleDB and EBS) which have built-in redundancy or you can build your own solution using similar redundancy techniques.
The SLA amounts to none, when we found out that:
Instances and EBS volumes DID get lost
It takes Amazon more than 2 days to recover from a disaster, and even that not to the full extent
We were the lucky ones, that managed to get back on our feet in less than 2 days. Other companies got stuck with no recovery option.
And what does Amazon recommend? "Don't trust our reliability. Pay for 2 or 3 more copies of your system in different regions, and then you will be safe".
More information can be found here:
http://www.zdnet.com/blog/saas/lightning-strike-zaps-ec2-ireland/1382
tldr: AWS is very reliable if you know what you're doing, a bad idea if you don't.
As your unfamiliar with terms here's a very quick glossary:
AZ - Availability zone, there's several availability zones per region (e.g. 3 in Ireland). They are physical isolated datacentres with different power grids, flood plains etc. But with internal network quality speed connections. It's possible even likely an AZ may become unavailable at some point, I don't think all AZ's in a region have ever been down though.
EBS/Instance Store - These are the two main types of storage available to instance. The best way to describe them is Instance Store is the equivalent to a HDD you have plugged in via sata to your motherboard - its very fast. But what happens if you shutdown your instance (or if the motherboard fails) and want to instantly start on another board? (Amazon completely hides the physical hardware setup) obviously you aren't going to wait for an engineer to unplug a drive from one server and into another so they don't even offer this. Instance store is fast but temporary and tied to the physical machine DO NOT store anything important on it. EBS then is the alternative it is a very low latency network drive that any server can connect to as though it were local. You shut down a server, change the size and restart on a completely different server on the other side of the datacentre (again the physical stuff is hidden), doesn't matter your ebs hasn't gone anywhere (by default theyre also on multiple physical discs).
Commodity cloud hardware - My interpretation of all the 'cloud hardware fails all the time - its really risky and unreliable' is that yes aws hardware is not as reliable as enterprise level components in a managed datacentre. This doesn't mean its unreliable, it just means you should build failure as an option into your design.
First very important thing to note when talking about SLA's is that amazon state very clearly that the SLA ONLY applies if one or more AZ goes down. So if you do not understand how their service works and only build one server in one AZ and a generator or router fails it's your own fault.
As for recovery, that depends - is your entire application state stored on one server - if it is, don't bother with the cloud. If however you can cluster your state on multiple servers, store it in RDS or some other persistent DB. OR if your content changes so infrequently you can utilise periodic copies to s3 storage, you'll be fine. You failure strategy (in order of preference) could be clustered, failover, or auto repair. For the first one you have clustered servers sharing state - it doesn't matter if you lose a server or an AZ. For the second you only have one live server, but if it goes down you have a failover standing by with the same content. Finally with auto repair there's two possible situations - if your data is only on one EBS drive, you could start another instance with the same drive and carry on. But if the EBS drive or AZ fails, you will need to be ready with some snapshot in s3 that a completely fresh instance can copy and start up with.
Reserved instances are no more reliable - they're the same hardware, you're just entering into a contract to say i'll have x machines for y years. Which allows aws to plan better, which is cheaper for you.

Deploying on EC2

This question is for anyone who has actually used Amazon EC2. I'm looking into what it would take to deploy a server there.
It looks like I can start in VirtualBox, setup my server and then export the image using the provided ec2-tools.
What gets tricky is if I actually want to make configuration changes to my running server, they will not be persistent.
I have some PHP code that I need to be able to deploy (and redeploy) to the system, so I was thinking that EBS would be a good choice there.
I have a massive amount of data that I need stored, but it just so happens that latency is not an issue, so I was thinking something like s3fs might work.
So my question is... What would you do? What does your configuration look like? What have been particular challenges that perhaps you didn't see coming?
We have deployed a large-scale commercial app in the AWS environment.
There are three basic approaches to keeping your changes under control once the server is running, all of which we use in different situations:
Keep the changes in source control. Have a script that is part of your original image that can pull down the latest and greatest. You can pull down PHP code, Apache settings, whatever you need. If you need to restart your instance from your AMI (Amazon Machine Image), just run your script to get the latest code and configuration, and you're good to go.
Use EBS (Elastic Block Storage). EBS is like a big external hard drive that you can attach to your instance. Even if your instance goes away, EBS survives. If you later need two (or more) identical instances, you can give each one of them access to what you save in EBS. See https://stackoverflow.com/a/3630707/141172
Burn a new AMI after each change. There's a tool to create a new AMI from a running instance. If EBS is like having an external hard drive, creating a new AMI is like having a DVD-R. You can save the current state of your machine to it. Next time you have to start a new instance, base it on that new AMI. Good to go.
I recommend storing your PHP code in a repository such as SVN, and writing a script that checks the latest code out of the repository and redeploys it when you want to upgrade. You could also have this script run on instance startup so that you get the latest code whenever you spin up a new instance; saves on having to create a new AMI every time.
The main challenge that I didn't see coming with EC2 is instance startup time - especially with Windows. Linux instances take 5 to 10 minutes to launch, but I've seen Windows instances take up to 40 minutes; this can be an issue if you want to do dynamic load balancing and start up new instances when your load increases.
I'd suggest the best bet is to simply 'try it'. The charges to run a small instance are not high and data transfer rates are very low - I have moved quite a few GB and my data fees are still less than a dollar(!) in my first month. You will likely end up paying mostly for system time rather than data I suspect.
I haven't deployed yet but have run up an instance, migrated it from Ubuntu 8.04 to 8.10, tried different port security settings, seen what sort of access attempts unknown people have tried (mostly looking for phpadmin), run some testing against it and generally experimented with the config and restart of the components I'm deploying. It has been a good prelude to my end deployment. I won't be starting with a big DB so will be initially sticking with the standard EC2 instance space.
The only negativity I have heard it that some spammers have made some of the IP ranges subject to spam-blocking - but have not yet confirmed that.
Your virtual box approach I will suggest you take after you are more familiar with the EC2 infrastructure. I suggest that you go to EC2, open an account and follow Amazon's EC2 getting-started guide. This guide will give you enough overview on all things (EBS, IP, CONNECTIONS, and otherS) to get you started. We are currently using EC2 for production and the way we started was like I am explaining here.
I hope you become a Cloud Expert Soon.
Per timbo's concern, I was able to nab an IP that, so far hasn't legitimately shown up on any spam lists. You will have a few hiccups since many blacklists are technically whitelists and will have every IP on their list until otherwise notified that a Mail Server is running on that IP. It's really easy to remove, most of them have automated removal request forms and every one that doesn't has been very cooperative in removing me from their lists. Just be professional, ask if they can give a time and reason for the block and what steps you should take to remove your IP. All the services I have emailed never asked me to jump through any hoops, within two or three business days they all informed me my IP had been removed.
Still, if you plan on running a mail server I would recommend reserving IPs now. They're 1 cent per every hour they are not bound to an instance so it works out to being about $7 a month. I went ahead and reserved an extra one as I plan on starting up another instance soon.
I have deployed some simple stuff to EC2 Win2k3 instances. Here's my advice:
Find a tutorial. Sign up for the service. Just spend an afternoon setting up your first server. It's pretty darned easy, though there will be obstacles to overcome. It's not too tough.
When I was fooling with EC2 I think I spent like $2.00 setting up a server and playing with it for a while.
Some of your data will be persistent, but you can connect S3 to EC2 as well.
Just go for it!
With regards to the concerns about blacklisting of mail servers, you can also use Amazon's Simple Email Service (SES), which obviates the need to run the mail server on the EC2 instances.
I had trouble with this as well, but posted a note here in their forums - https://forums.aws.amazon.com/thread.jspa?threadID=80158&tstart=0

How do you back up your development machine? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
How do you back up your development machine so that in the event of a catastrophic hardware malfunction, you are up and running in the least amount of time possible?
There's an important distinction between backing up your development machine and backing up your work.
For a development machine your best bet is an imaging solution that offers as near a "one-click-restore" process as possible. TimeMachine (Mac) and Windows Home Server (Windows) are both excellent for this purpose. Not only can you have your entire machine restored in 1-2 hours (depending on HDD size), but both run automatically and store deltas so you can have months of backups in relatively little space. There are also numerous "ghosting" packages, though they usually do not offer incremental/delta backups so take more time/space to backup your machine.
Less good are products such as Carbonite/Mozy/JungleDisk/RSync. These products WILL allow you to retrieve your data, but you will still have to reinstall the OS and programs. Some have limited/no histories either.
In terms of backing up your code and data then I would recommend a sourcecode control product like SVN. While a general backup solution will protect your data, it does not offer the labeling/branching/history functionality that SCC packages do. These functions are invaluable for any type of project with a shelf-life.
You can easily run a SVN server on your local machine. If your machine is backed up then your SVN database will be also. This IMO is the best solution for a home developer and is how I keep things.
All important files are in version control (Subversion)
My subversion layout generally matches the file layout on my web server so I can just do a checkout and all of my library files and things are in the correct places.
Twice-daily backups to an external hard drive
Nightly rsync backups to a remote server.
This means that I send stuff on my home server over to my webhost and all files & databases on my webhost back home so I'm not screwed if I lose either my house or my webhost.
I use Mozy, and rarely think about it. That's one weight off my shoulders that I won't ever miss.
Virtual machines and CVS.
Desktops are rolled out with ghost and are completely vanilla.
Except they have VirtualBox.
Then developers pull the configured baseline development environment
down from CVS.
They log into the development VM image as themselves, refresh the source and libraries from CVS and they're up and working agian.
This also makes doing develpment and maintenance at the same time a lot easier.
(I know some people won't like CVS or VirtualBox, so feel free to substiture your tools of choice)
oh, and You check you work into a private branch off Trunk daily.
There you go.
Total time to recover : 1 hour (tops)
Time to "adopt" a shbiy new laptop for a customer visit : 1 hour ( tops)
And a step towards CMMI Configuration Management.
BTW your development machine should not contain anything of value. All your work (and your company's work) should be in central repositories (SVN).
I use TimeMachine.
For my home and development machines I use Acronis True Image.
In my opinion, with the HD cheap prices nothing replaces a full incremental daily HD backup.
A little preparation helps:
All my code is kept organized in one single directory (with categorized sub-directories).
All email is kept in various PSTs.
All code is also checked into source control at the end of every day.
All documents are kept in one place as well.
Backup:
Backup your code, email, documents as often as it suits you (daily).
Keep an image of your development environment always ready.
Failure and Recovery
If everything fails, format and install the image.
Copy back everything from backup and you are up and running.
Of course there are tweaks here and there (incremental backup, archiving, etc.) which you have to do to make this process real.
If you are talking absolute least amount of restore time... I've often setup machines to do Ghost (Symantec or something similar) backups on a nightly basis to either an image or just a direct copy to another drive. That way all you have to do is reimage the machine from the image or just swap the drives. You can be back up in under 10 minutes... The setup I did before was in situation where we had some production servers that were redundant and it was acceptable for them to be offline long enough to clone the drive...but only at night. During the day they had to be up 100%...it saved my butt a couple times when a main drive failed... I just opened the case, swapped the cables so the backup drive was the new master and was back online in 5 minutes.
I've finally gotten my "fully automated data back-up strategy" down to a fine art. I never have to manually intervene, and I'll never lose another harddrive worth of data. If my computer dies, I'll always have a full bootable back-up that is no more than 24 hours old, and incremental back-ups no more than an hour old. Here are the details of how I do it.
My only computer is a 160 gig MacBook running OSX Leopard.
On my desk at work I have 2 external 500 gig harddrives.
One of them is a single 500 gig partition called "External".
The other has a 160 gig partition called "Clone" and a 340 gig partition called TimeMachine.
TimeMachine runs whenever I'm at work, constantly backing up my "in progress" files (which are also committed to Version Control throughout the day).
Every weekday at 12:05, SuperDuper! automatically copies my entire laptop harddrive to the "Clone" drive. If my laptop's harddrive dies, I can actually boot directly from the Clone drive and pick up work without missing a beat -- giving me some time to replace the drive (This HAS happened to me TWICE since setting this up!). (Technical Note: It actually only copies over whatever has changed since the previous weekday at 12:05... not the entire drive every time. Works like a charm.)
At home I have a D-Link DNS-323, which is a 1TB (2x500 gig) Network Attached Storage device running a Mirrored RAID, so that everything on the first 500 gig drive is automatically copied to the second 500 gig drive. This way, you always have a backup, and it's fully automated. This little puppy has a built-in Dynamic DNS client, and FTP server.
So, on my WRT54G router, I forward the FTP port (21) to my DNS-323, and leave its FTP server up.
After the SuperDuper clone has been made, rSync runs and synchronizes my "External" drive with the DNS-323 at home, via FTP.
That's it.
Using 4 drives (2 external, 2 in the NAS) I have:
1) An always-bootable complete backup less than 24 hours old, Monday-Friday
2) A working-backup of all my in-progress files, which is never more than 30 minutes old, Monday-Friday (when I'm at work and connected to the external drives)
3) Access to all my MP3s (170GB) at documents at work on the "External" and at home on the NAS
4) Two complete backups of all my MP3s and documents on the NAS (External is original copy, both drives on NAS are mirrors via ChronoSync)
Why do I do all of this?
Because:
1) In 2000, I dropped a 40 gig harddrive 1 inch, and it cost me $2500 to get that data back.
2) In the past year, I've had to take my MacBook in for repair 4 times. One dead harddrive, two dead motherboards, and a dead webcam. On the 4th time, they replaced my MacBook with a newer better one at no charge, and I haven't had a problem since.
Thanks to my daily backups, I didn't lose any work, or productivity. If I hadn't had them, though, all my work would have been gone, along with my MP3s, and my writing, and all the photos of my trips to Peru, Croatia, England, France, Greece, Netherlands, Italy, and all my family photos. Can you imagine? I'm sure you can, because I bet you have a pile of digital photos sitting on your computer right now... not backed-up in any way.
A combination of RAID1, Acronis, xcopy, DVDs and ftp. See:
http://successfulsoftware.net/2008/02/04/your-harddrive-will-fail-its-just-a-question-of-when/
Maybe just a simple hardware hard disk raid would be a good start. This way if one drive fails, you still have the other drive in the raid. If something other than the drives fail you can pop these drives into another system and get your files quickly.
I'm just sorting this out at work for the team. An image with all common tools is on Network. (We actually have a hotswap machine ready). All work in progress is on network too.
So Developers machine goes boom. Use hotswap machine and continue. Downtime ~15 mins + coffee break.
We have a corporate solution pushed down on us called Altiris, which works when it wants to. It depends on whether or not it's raining outside. I think Altiris might be a rain-god, and just doesn't know it. I am actually delighted when it's not working, because it means that I can have my 99% of CPU usage back, thank you very much.
Other than that, we don't have any rights to install other software solutions for backing things up or places we are permitted to do so. We are not permitted to move data off of our machines.
So, I end up just crossing my fingers while laughing at the madness.
I don't.
We do continuous integration, submit code often to the central source control system (which is backed up like crazy!).
If my machine dies at most I've lost a couple of days work.
And all I need to do is get a clean disk at setup the dev environment from a ghost image or by spending a day sticking CDs in, rebooting after Windows update, etc. Not a pleasant day but I do get a nice clean machine.
At work NetBackup or PureDisk depending on the box, at home rsync.
like a few others, I have a clean copy of my virtual pc that I can grab and start fresh at anytime and all code is stored in subversion.
I use SuperDuper! and backup my Virtual Machine to another external drive (i have two).
All the code is on a SVN server.
I have a clean VM in case mine fails. But in either case it takes me a couple of hours to install WinXP+Vstudio. i don't use anything else in that box.
I use xcopy to copy all my personal files to an external hard drive on startup.
Here's my startup.bat:
xcopy d:\files f:\backup\files /D /E /Y /EXCLUDE:BackupExclude.txt
This recurses directories, only copies files that have been modified and suppresses the message to replace an existing file, the list of files/folders in BackupExclude.txt will not be copied.
Windows Home Server. My dev box has two drives with about 750GB of data between them (C: is a 300GB SAS 15K RPM drive with apps and system on it, D: is a mirrored 1TB set with all my enlistments). I use Windows Home Server to back this machine up and have successfully restored it several times after horking it.
My development machine is backed up using Retrospect and Acronis. These are nightly backups that run when I'm asleep - one to an external drive and one to a network drive.
All my source code is in SVN repositories, I keep all my repositories under a single directory so I have a scheduled task running a script that spiders a path for all SVN repositories and performs a number of hotcopies (using the hotcopy.py script) as well as an svndump of each repository.
My work machine gets backed up however they handle it, however I also have the same script running to do hotcopies and svndumps onto a couple of locations that get backed up.
I make sure that of the work backups, one location is NOT on the SAN, yes it gets backed up and managed, but when it is down, it is down.
I would like a recommendation for an external RAID container, or perhaps just an external drive container, preferably interfacing using FireWire 800.
I also would like a recommendation for a manufacturer for the backup drives to go into the container. I read so many reviews of drives saying that they failed I'm not sure what to think.
I don't like backup services like Mozy because I don't want to trust them to not look at my data.
SuperDuper complete bootable backups every few weeks
Time Machine backups for my most important directories daily
Code is stored in network subversion/git servers
Mysql backups with cron on the web servers, use ssh/rsync to pull it down onto our local servers also using cron nightly.
If you use a Mac, it's a no brainer - just plug in an external hard drive and the built in Time Machine software will back up your whole system, then maintain an incremental backup on the schedule you define. This has got me out of a hole many a time when I've messed up my environment; it also made it super easy to restore my system after installing a bigger hard drive.
For offsite backups, I like JungleDisk - it works on Mac, Windows and Linux and backs up to Amazon S3 (or, added very recently, the Rackspace cloud service). This is a nice solution if you have multiple machines (or even VMs) and want to keep certain directories backed up without having to think about it.
Home Server Warning!
I installed Home Server on my development Server for two reasons: Cheap version of Windows Server 2003 and for backup reasons.
The backup software side of things is seriously hit or miss. If you 'Add' a machine to the list of computers to be backed up right at the start of installing Home Server, generally everything is great.
BUT it seems it becomes a WHOLE lot harder to add any other machines after a certain amount of time has passed.
(Case in point: I did a complete rebuild on my laptop, tried to Add it - NOPE!)
So i'm seriously doubting the reliability of this platform for backup purposes. Seems to be defeating the purpose if you can't trust it 100%
I have the following backup scenarios and use rsync as a primary backup tool.
(weekly) Windows backup for "bare metal" recovery
Content of System drive C:\ using Windows Backup for quick recovery after physical disk failure, as I don't want to reinstall Windows and applications from scratch. This is configured to run automatically using Windows Backup schedule.
(daily and conditional) Active content backup using rsync
Rsync takes care of all changed files from laptop, phone, other devices. I backup laptop every night and after significant changes in content, like import of the recent photo RAWs from SD card to laptop.
I've created a bash script that I run from Cygwin on Windows to start rsync: https://github.com/paravz/windows-rsync-backup