Offsite backups - possible with large amounts of code/source images etc? - development-environment

The biggest hurdle I have in developing an effective backup strategy is being able to do some sort of offsite backup. Unfortunately, this can only be via uploading data to the offsite source but my internet cable has upload speeds which prohibit this.
Has anyone here managed to do offsite backups of large libraries of source code?
This is only relevant to home users and not in the workplace where budgets may open up doors.
EDIT: I am using Windows Vista (So 'nix solutions aren't relevant).
Thanks

I don't think your connections upload speed will be as prohibitive as you think. Just make sure you look for a solution where your changes can be sent as diffs. Even if your initial sync takes days, daily changes would likely be more manageable.
Knowing a few more specifics about how much data you are talking about, and exactly how slow your connection is, I think would allow the community to make more specific suggestions.

Services like Mozy allow you to back up large amounts of data offsite.
They upload slowly in the background, and getting the initial sync to the servers can take a while depending on your speed and amount of data, but after that they use efficient diffs to keep the stored data in sync.
The pricing is very home-friendly too.

I think you have to define backup and whats acceptable to you.
At my house, i have a hot backup of our repositories where I poll svn once an hour over the VPN and it takes down any check ins. this is just to catch any check ins that are not captured each 24 hours via the normal backup. I also send a full backup every 2 days through the pipe to be outside of the normal 3 tier backup we do at the office. our current zipped repository is 2GB zipped at max compression. That takes 34 hrs at 12 k/s and 17hrs at 24k/s, you did not say the speed of your connection, so its hard to judge if thats workable.
If this isnt viable, you might want to invest in a couple of 2.5" USB drives and load/swap them offsite to a safety deposit box at the bank. this used to be my responsibility but I lacked the discipline to do this consistently each week to assure some safety net. In the end it was just easier to live uploading the data to an ftp site at my house.

Related

How should data be provided to a web server using a data warehouse?

We have data stored in a data warehouse as follows:
Price
Date
Product Name (varchar(25))
We currently only have four products. That changes very infrequently (on average once every 10 years). Once every business day, four new data points are added representing the day's price for each product.
On the website, a user can request this information by entering a date range and selecting one or more products names. Analytics shows that the feature is not heavily used (about 10 users requests per week).
It was suggested that the data warehouse should daily push (SFTP) a CSV file containing all data (currently 6718 rows of this data and growing by four each day) to the web server. Then, the web server would read data from the file and display that data whenever a user made a request.
Usually, the push would only be once a day, but more than one push could be possible to communicate (infrequent) price corrections. Even in the price correction scenario, all data would be delivered in the file. What are problems with this approach?
Would it be better to have the web server make a request to the data warehouse per user request? Or does this have issues such as a greater chance for network errors or performance issues?
Would it be better to have the web server make a request to the data warehouse per user request?
Yes it would. You have very little data, so there is no need to try and 'cache' this in some way. (Apart from the fact that CSV might not be the best way to do this).
There is nothing stopping you from doing these requests from the webserver to the database server. With as little information as this you will not find performance an issue, but even if it would be when everything grows, there is a lot to be gained on the database-side (indexes etc) that will help you survive the next 100 years in this fashion.
The amount of requests from your users (also extremely small) does not need any special treatment, so again, direct query would be the best.
Or does this have issues such as a greater chance for network errors or performance issues?
Well, it might, but that would not justify your CSV method. Examples and why you need not worry, could be
the connection with the databaseserver is down.
This is an issue for both methods, but with only one connection per day the change of a 1-in-10000 failures might seem to be better for once-a-day methods. But these issues should not come up very often, and if they do, you should be able to handle them. (retry request, give a message to user). This is what enourmous amounts of websites do, so trust me if I say that this will not be an issue. Also, think of what it would mean if your daily update failed? That would present a bigger problem!
Performance issues
as said, this is due to the amount of data and requests, not a problem. And even if it becomes one, this is a problem you should be able to catch at a different level. Use a caching system (non CSV) on the database server. Use a caching system on the webserver. Fix your indexes to stop performance from being a problem.
BUT:
It is far from strange to want your data-warehouse separated from your web system. If this is a requirement, and it surely could be, the best thing you can do is re-create your warehouse-database (the one I just defended as being good enough to query directly) on another machine. You might get good results by doing a master-slave system
your datawarehouse is a master-database: it sends all changes to the slave but is inexcessible otherwise
your 2nd database (on your webserver even) gets all updates from the master, and is read-only. you can only query it for data
your webserver cannot connect to the datawarehouse, but can connect to your slave to read information. Even if there was an injection hack, it doesn't matter, as it is read-only.
Now you don't have a single moment where you update the queried database (the master-slave replication will keep it updated always), but no chance that the queries from the webserver put your warehouse in danger. profit!
I don't really see how SQL injection could be a real concern. I assume you have some calendar type field that the user fills in to get data out. If this is the only form just ensure that the only field that is in it is a date then something like DROP TABLE isn't possible. As for getting access to the database, that is another issue. However, a separate file with just the connection function should do fine in most cases so that a user can't, say open your webpage in an HTML viewer and see your database connection string.
As for the CSV, I would have to say querying a database per user, especially if it's only used ~10 times weekly would be much more efficient than the CSV. I just equate the CSV as overkill because again you only have ~10 users attempting to get some information, to export an updated CSV every day would be too much for such little pay off.
EDIT:
Also if an attack is a big concern, which that really depends on the nature of the business, the data being stored, and the visitors you receive, you could always create a backup as another option. I don't really see a reason for this as your question is currently stated, but it is a possibility that even with the best security an attack could happen. That mainly just depends on if the attackers want the information you have.

Query about EBS Backed Instances + Backup on S3 + Snapshots

I've spent a number of days looking into putting up two Windows Servers on Amazon, a domain controller and a remote desktop services server but there are a few questions that I can't find detailed or any answers for:
1) When you have an EBS backed instance I assume this means that all files (OS/Applications/Pagefile) etc are all stored on EBS? Physically in the datacentre, lets assume I have 50 gig of OS files/application data etc, are these all stored on just one SAN type device? What happens if that device blows up or say that particular data centre gets destroyed. Is the data elsewhere? What is the probability that your entire EBS volume can just disappear?
2) As I understand it you can backup your EBS instance to S3 with snapshotting. I assume you can choose how often to snapshot (say daily?). In my above scenario if I have 50 gig of files, and snapshot once a day. Over 7 days will my S3 storage be 350 gig or will it be 50 gig + incremental changes I have made over the week?
3) I remember reading somewhere that the instance has to go offline to snapshot. If that is the case does it do this by shutting down the guest OS, snapshotting then booting up or does it just detach the data, prevent you from connecting while it snapshots, then bring it back to the exact moment before it went for a snapshot.
4) I understand the concept of paying per month per gig of space but how I am concerned about the $0.11 per 1 million I/O requests. How does that work when I am running a windows server? I have no idea how many I/O requests a server makes to its disks. I am assuming a lot of the entire VM is being stored on an EBS volume. Is running a server on the standard EBS going to slow it down radically?
5) Are people using the snapshot to S3 as their main backup are are people running other types of backup for Data?
Sorry for the noob questions - I'd appreciate any partial answers, answers or advice anyone could offer me. Thanks in advance!
1) amazon is fuzzy on this. They say that data is replicated within the AZ it belongs to and that if you have less than 20GB of data changed since the last snapshot your annual failure rate is ~ 0.1-0.4%
2) snapshots are triggered manually, and are done incrementally
3) Depends on your filesystem. For example on a linux box with an xfs volume you can freeze IO to the volume, do your snapshot (takes only a second or so) and then unfreeze. If you take a snapshot without doing something similar you run the risk of the data being in an inconsistent state. This will depend on your filesystem
4) I run all my instances on EBS. You probably wouldn't want your pagefile on EBS, it would make more sense to use instance storage for that. The amount of IOs you use will be very dependant on the workload. The IO count depends heavily on your workload - an application server does a lot less IOPs than a database server for example. You're unlikely to use more than a few dollars a month per volume if you're running particularly IO heavy operations
5) Personally I don't care about the installed software/configuration (I have AMIs with that all setup so I can restore that in minutes), I only care about the data. I back that data up separately (S3 & Glacier). Partly that's because I was bitten by a bug EBS had about a year ago or so where they lost some snapshots
You also use multiple strategies, as Fantius commented. For example on the mongodb servers I run the boot volume is small (and never snapshotted or backed up since it can be restored automatically from an AMI), with a separate data volume containing the actual mongodb data. The mongodb volume is snapshotted as well as storing dumps on S3. Snapshots are an efficient way of creating backups (since you're only storing incremental changes) however you can't transfer them out of your EC2 region, whereas a tarball on S3 can easily be copied anywhere.

Simple DB and Amazon S3 performance tools

Is there any benchmark tools that i can use to test the Amazon Simple DB performance and Amazon S3 performance?
help needed please.
Its going to depend on you usage and whether you're running in EC2 or not. There are some benchmarks somewhere for S3 access from EC2, but your mileage will vary with object sizes, the SDK library you're using and where you're accessing from.
Roll your own tests and then you'll know that you're testing something close to your end goal...
You need to write your own code that approximates what you want to do.
Having said that: In my experience, S3 is about as fast as your connection. You may have to upload/download more than one item at a time to hit your local bandwidth limit, but you can get there.
Listing performance is also pretty good on S3, but the results are uncompressed XML, so they are little large. If you want to do 'something' to say a million files, you need to run several requests in parallel. This goes for SimpleDb too. The number of requests 'in flight' that works best is a mix of ping, bandwidth, AWS service response and other factors.
SimpleDB on the other hand I find to be pretty slow for many tasks. It totally depends on your needs, though. Selecting a record and getting back the attributes when you know the db item name is usually ping time limited, but searching with the %like% operator is usually quite slow (seconds is easy to hit).
Add to this that its all much faster if you are running on EC2 vs a local machine, and also add in the delay/bandwidth if your app is in say Singapore and you are trying to use the US Standard location to store everything. There is just too much to figure in.

Amazon S3: when/why [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
So, I have a dedicated server. I host about dozen or so small sites.
Is there a real benefit in using S3(or Mosso) for my image and static file hosting? My server has more than enough disk space, or am I completely missing the point of S3?
I keep reading about how wonderful and cheap it is, and I ask myself "self, why aren't you using this" and the reply is always "why?"
if you're running within the included storage and bandwidth of your server and your needs are being served well, you are already doing the simplest thing that is working for you and that is where you should always start. Off the top of my head I can think of a couple reasons why you may want to move some storage to S3 in the future:
Your storage or bandwidth needs grow beyond what you have and S3 is cheaper than upgrading your current solution
You move to a multiple-dedicated-server solution for failover/performance reasons and want to be able to store your assets in a single shared location
Your bandwidth needs are highly
variable (so you can avoid a monthly
fee when you're not getting traffic) [Thanks Jim, from the comments]
If you run an entire website off of a single machine, and that machine is more than enough to handle your site, then kudos, images are not a bottleneck that needs solving right now. Forget about S3 for now.
However, as your server gets busier, you will want your server to be spending all of its time doing server things. Transferring static content like flat HTML files and images is an easy, dumb job, and wasting precious active connections, bandwidth, and CPU cycles on them is no good. By switching to S3, your server can concentrate on doing what's important, which is whatever your program actually DOES.
S3 also has benefits of being distributed around and attached to what's probably a fatter pipe than your server, which means the images will show up slightly more quickly on your client's machines, so that's an added bonus.
S3 is also backed up, which means that it makes for a pretty nice place to store pretty much any private data under the sun, in addition to stuff that you want to serve to others (although don't confuse the permissions settings between those two things -- in fact, you may want to use separate accounts entirely).
S3 is also nigh-infinite, which means that if you want to let users upload files to your site (profile images, attachments, etc), S3 is a great choice so that you don't have to constantly worry if your server is going to run out of disk space (obligatory $$$ warning here).
But like I said at the top, if you're a one-server setup with a handful of users, none of this really matters. It's a tool like any other, and it may not be something you need yet.
It's simply a matter of doing the numbers: given a certain amount of traffic for a set of files, you can calculate exactly how much hosting those file on S3 would cost you, and you should be able to do the same for your current provider. If the number is lower for S3, there you have your reason.
An added benefit is that S3 scales pretty much linearly with traffic and you pay only for what you actually use, whereas most providers charge you a flat fee no matter how little traffic you actially have, and some will gouge you badly if you ever exceed the maximum traffic included in the flat fee.
Better speed and availability could be an additional benefit.
Basically, if you have a site that could potentially incur wildly disparate traffic, then using S3 for its images and other static files means that if you're hit by the Slashdot effect, the site has a much better chance of staying reachable, and you have a much better chance of avoiding nasty surprises concerning excess traffic fees.
The advantages of Amazon S3 are reliability, scalability, speed and cost. Here is some info on each.
Reliability: Amazon stores your data in multiple data centers. If there was a disaster and one data center was destroyed your content would continue to be served from the second data center. It’s very unlikely that data you upload to Amazon would ever be lost.
Scalability: If one of your web sites becomes popular and millions of people visit the site, your web server will not be able to handle the load. In comparison when you upload your files to Amazon they are stored in multiple locations. If the load on your content grows your files are automatically replicated to more servers so your files will always be available.
Speed: Amazon has a service called CloudFront that works in conjunction with Amazon S3. When you activate CloudFront on your S3 content your content is moved to edge locations. These are servers that make your content available for high speed transfer.
Cost: With Amazon S3 you only pay for what you use. If you have a few files that get little traffic you will only pay a few cents a month.
SprightlySoft has a blog post which gives even more reasons why Amazon S3 is great. Read it at http://sprightlysoft.com/blog/?p=8
If you're hosting a high-traffic site, the bandwidth cost (and latency issues) of hosting images yourself makes S3 and other services like Akami attractive. For a low-traffic site, it probably isn't an issue.
I'd say that there's no reason if your base hosting plan provides enough space/bandwidth. Where I think it's useful is when your file transfers become enough that you have to look at buying an add-on of storage/bandwidth from the provider -- in that case, S3 may be a viable alternative. But if I'm paying $X/month and not using all of the storage, there's no upside to it.
On the other hand, if your capacity planning calls for you to someday exceed the provider's limits, S3 may be a good solution from the start so you don't have files being served from multiple places.
I would second the mention of "redundancy" -- you can count on any content that's in S3 to be distributed to multiple data centers, and effectively been very much always accessible for anyone with functioning network connection.
Cost may be another factor: data transfer rates for S3 are quite competitive.
And speed is the last one: you can access data VERY fast from S3. But that's more of an issue for data other than browser-viewable images.
For small sites, S3 or Mosso may not be that reasonable for image hosting, but if you have any video files (.wmv, .flv, etc...) or large downloads (app distributions, etc..), I'd still put them on S3 or Mosso to save potential bandwidth spikes if for some odd reason, your content becomes wildly popular.
You write:
My server has more than enough disk space, or am I completely missing the point of S3?
You are not missing the point if what you have on you server is write-once read-less-than-once stuff, such as disaster-recovery backups (which you hope will be read-never), because transfer times will not matter. The point of S3 is delivery speed.
First, S3 distributes your content geographically. End users benefit from shorter paths.
Second, S3 can act as a BitTorrent seed, which not only conserves your bandwidth, it means your most popular content will be distributed faster because it can take advantage of the ad-hoc swarm. There are reports on the AWS Discussion Forums that S3 support of the BitTorrent protocol is "very, very spotty." I have not tested it myself.
Many of you won't have this problem, but if you (and your web server) are located in Australia (read: the 3rd world of the Internet), you run into the issue that S3 does not have geographically close locations, which means there will be a higher latency on your images and other static content. Scalable: yes. Fast: no.
From what I hear, besides low cost, the main advantage is the ease of backup from an EC2 setup.
Link..
http://groups.drupal.org/node/2383
Speed might be the only benefit. If your dedicated server is simply networked through your ISP (which may well throttle upstream speeds even if downstream speeds are high) then you might find that your sites are often slow to load. If so, then S3 or another dedicated server provider can help. Other than that, I can think of absolutely no reason why Amazon's service would be more appropriate for you - especially with simple, static sites.
It's not really directly related to your actual hosting of web sites, but it's certainly an important part of it, especially if the sites don't belong to you alone -- S3 is a great backup solution. There are tools such as duplicity that can automatically and efficiently back things up onto S3 for you, and it's extremely cheap for this purpose. I back up a fairly large amount of data for less than $1/month.
Besides the Fat Pipe and Local Delivery arguments for S3 there is also the manner of a single server does not function optimally when its functioning both as a db server and as a file server. If your running any sort of db I would suggest offloading all your static files to s3. The cost is trivial and you will see pretty big performance gains on page load.

How do you back up your development machine? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
How do you back up your development machine so that in the event of a catastrophic hardware malfunction, you are up and running in the least amount of time possible?
There's an important distinction between backing up your development machine and backing up your work.
For a development machine your best bet is an imaging solution that offers as near a "one-click-restore" process as possible. TimeMachine (Mac) and Windows Home Server (Windows) are both excellent for this purpose. Not only can you have your entire machine restored in 1-2 hours (depending on HDD size), but both run automatically and store deltas so you can have months of backups in relatively little space. There are also numerous "ghosting" packages, though they usually do not offer incremental/delta backups so take more time/space to backup your machine.
Less good are products such as Carbonite/Mozy/JungleDisk/RSync. These products WILL allow you to retrieve your data, but you will still have to reinstall the OS and programs. Some have limited/no histories either.
In terms of backing up your code and data then I would recommend a sourcecode control product like SVN. While a general backup solution will protect your data, it does not offer the labeling/branching/history functionality that SCC packages do. These functions are invaluable for any type of project with a shelf-life.
You can easily run a SVN server on your local machine. If your machine is backed up then your SVN database will be also. This IMO is the best solution for a home developer and is how I keep things.
All important files are in version control (Subversion)
My subversion layout generally matches the file layout on my web server so I can just do a checkout and all of my library files and things are in the correct places.
Twice-daily backups to an external hard drive
Nightly rsync backups to a remote server.
This means that I send stuff on my home server over to my webhost and all files & databases on my webhost back home so I'm not screwed if I lose either my house or my webhost.
I use Mozy, and rarely think about it. That's one weight off my shoulders that I won't ever miss.
Virtual machines and CVS.
Desktops are rolled out with ghost and are completely vanilla.
Except they have VirtualBox.
Then developers pull the configured baseline development environment
down from CVS.
They log into the development VM image as themselves, refresh the source and libraries from CVS and they're up and working agian.
This also makes doing develpment and maintenance at the same time a lot easier.
(I know some people won't like CVS or VirtualBox, so feel free to substiture your tools of choice)
oh, and You check you work into a private branch off Trunk daily.
There you go.
Total time to recover : 1 hour (tops)
Time to "adopt" a shbiy new laptop for a customer visit : 1 hour ( tops)
And a step towards CMMI Configuration Management.
BTW your development machine should not contain anything of value. All your work (and your company's work) should be in central repositories (SVN).
I use TimeMachine.
For my home and development machines I use Acronis True Image.
In my opinion, with the HD cheap prices nothing replaces a full incremental daily HD backup.
A little preparation helps:
All my code is kept organized in one single directory (with categorized sub-directories).
All email is kept in various PSTs.
All code is also checked into source control at the end of every day.
All documents are kept in one place as well.
Backup:
Backup your code, email, documents as often as it suits you (daily).
Keep an image of your development environment always ready.
Failure and Recovery
If everything fails, format and install the image.
Copy back everything from backup and you are up and running.
Of course there are tweaks here and there (incremental backup, archiving, etc.) which you have to do to make this process real.
If you are talking absolute least amount of restore time... I've often setup machines to do Ghost (Symantec or something similar) backups on a nightly basis to either an image or just a direct copy to another drive. That way all you have to do is reimage the machine from the image or just swap the drives. You can be back up in under 10 minutes... The setup I did before was in situation where we had some production servers that were redundant and it was acceptable for them to be offline long enough to clone the drive...but only at night. During the day they had to be up 100%...it saved my butt a couple times when a main drive failed... I just opened the case, swapped the cables so the backup drive was the new master and was back online in 5 minutes.
I've finally gotten my "fully automated data back-up strategy" down to a fine art. I never have to manually intervene, and I'll never lose another harddrive worth of data. If my computer dies, I'll always have a full bootable back-up that is no more than 24 hours old, and incremental back-ups no more than an hour old. Here are the details of how I do it.
My only computer is a 160 gig MacBook running OSX Leopard.
On my desk at work I have 2 external 500 gig harddrives.
One of them is a single 500 gig partition called "External".
The other has a 160 gig partition called "Clone" and a 340 gig partition called TimeMachine.
TimeMachine runs whenever I'm at work, constantly backing up my "in progress" files (which are also committed to Version Control throughout the day).
Every weekday at 12:05, SuperDuper! automatically copies my entire laptop harddrive to the "Clone" drive. If my laptop's harddrive dies, I can actually boot directly from the Clone drive and pick up work without missing a beat -- giving me some time to replace the drive (This HAS happened to me TWICE since setting this up!). (Technical Note: It actually only copies over whatever has changed since the previous weekday at 12:05... not the entire drive every time. Works like a charm.)
At home I have a D-Link DNS-323, which is a 1TB (2x500 gig) Network Attached Storage device running a Mirrored RAID, so that everything on the first 500 gig drive is automatically copied to the second 500 gig drive. This way, you always have a backup, and it's fully automated. This little puppy has a built-in Dynamic DNS client, and FTP server.
So, on my WRT54G router, I forward the FTP port (21) to my DNS-323, and leave its FTP server up.
After the SuperDuper clone has been made, rSync runs and synchronizes my "External" drive with the DNS-323 at home, via FTP.
That's it.
Using 4 drives (2 external, 2 in the NAS) I have:
1) An always-bootable complete backup less than 24 hours old, Monday-Friday
2) A working-backup of all my in-progress files, which is never more than 30 minutes old, Monday-Friday (when I'm at work and connected to the external drives)
3) Access to all my MP3s (170GB) at documents at work on the "External" and at home on the NAS
4) Two complete backups of all my MP3s and documents on the NAS (External is original copy, both drives on NAS are mirrors via ChronoSync)
Why do I do all of this?
Because:
1) In 2000, I dropped a 40 gig harddrive 1 inch, and it cost me $2500 to get that data back.
2) In the past year, I've had to take my MacBook in for repair 4 times. One dead harddrive, two dead motherboards, and a dead webcam. On the 4th time, they replaced my MacBook with a newer better one at no charge, and I haven't had a problem since.
Thanks to my daily backups, I didn't lose any work, or productivity. If I hadn't had them, though, all my work would have been gone, along with my MP3s, and my writing, and all the photos of my trips to Peru, Croatia, England, France, Greece, Netherlands, Italy, and all my family photos. Can you imagine? I'm sure you can, because I bet you have a pile of digital photos sitting on your computer right now... not backed-up in any way.
A combination of RAID1, Acronis, xcopy, DVDs and ftp. See:
http://successfulsoftware.net/2008/02/04/your-harddrive-will-fail-its-just-a-question-of-when/
Maybe just a simple hardware hard disk raid would be a good start. This way if one drive fails, you still have the other drive in the raid. If something other than the drives fail you can pop these drives into another system and get your files quickly.
I'm just sorting this out at work for the team. An image with all common tools is on Network. (We actually have a hotswap machine ready). All work in progress is on network too.
So Developers machine goes boom. Use hotswap machine and continue. Downtime ~15 mins + coffee break.
We have a corporate solution pushed down on us called Altiris, which works when it wants to. It depends on whether or not it's raining outside. I think Altiris might be a rain-god, and just doesn't know it. I am actually delighted when it's not working, because it means that I can have my 99% of CPU usage back, thank you very much.
Other than that, we don't have any rights to install other software solutions for backing things up or places we are permitted to do so. We are not permitted to move data off of our machines.
So, I end up just crossing my fingers while laughing at the madness.
I don't.
We do continuous integration, submit code often to the central source control system (which is backed up like crazy!).
If my machine dies at most I've lost a couple of days work.
And all I need to do is get a clean disk at setup the dev environment from a ghost image or by spending a day sticking CDs in, rebooting after Windows update, etc. Not a pleasant day but I do get a nice clean machine.
At work NetBackup or PureDisk depending on the box, at home rsync.
like a few others, I have a clean copy of my virtual pc that I can grab and start fresh at anytime and all code is stored in subversion.
I use SuperDuper! and backup my Virtual Machine to another external drive (i have two).
All the code is on a SVN server.
I have a clean VM in case mine fails. But in either case it takes me a couple of hours to install WinXP+Vstudio. i don't use anything else in that box.
I use xcopy to copy all my personal files to an external hard drive on startup.
Here's my startup.bat:
xcopy d:\files f:\backup\files /D /E /Y /EXCLUDE:BackupExclude.txt
This recurses directories, only copies files that have been modified and suppresses the message to replace an existing file, the list of files/folders in BackupExclude.txt will not be copied.
Windows Home Server. My dev box has two drives with about 750GB of data between them (C: is a 300GB SAS 15K RPM drive with apps and system on it, D: is a mirrored 1TB set with all my enlistments). I use Windows Home Server to back this machine up and have successfully restored it several times after horking it.
My development machine is backed up using Retrospect and Acronis. These are nightly backups that run when I'm asleep - one to an external drive and one to a network drive.
All my source code is in SVN repositories, I keep all my repositories under a single directory so I have a scheduled task running a script that spiders a path for all SVN repositories and performs a number of hotcopies (using the hotcopy.py script) as well as an svndump of each repository.
My work machine gets backed up however they handle it, however I also have the same script running to do hotcopies and svndumps onto a couple of locations that get backed up.
I make sure that of the work backups, one location is NOT on the SAN, yes it gets backed up and managed, but when it is down, it is down.
I would like a recommendation for an external RAID container, or perhaps just an external drive container, preferably interfacing using FireWire 800.
I also would like a recommendation for a manufacturer for the backup drives to go into the container. I read so many reviews of drives saying that they failed I'm not sure what to think.
I don't like backup services like Mozy because I don't want to trust them to not look at my data.
SuperDuper complete bootable backups every few weeks
Time Machine backups for my most important directories daily
Code is stored in network subversion/git servers
Mysql backups with cron on the web servers, use ssh/rsync to pull it down onto our local servers also using cron nightly.
If you use a Mac, it's a no brainer - just plug in an external hard drive and the built in Time Machine software will back up your whole system, then maintain an incremental backup on the schedule you define. This has got me out of a hole many a time when I've messed up my environment; it also made it super easy to restore my system after installing a bigger hard drive.
For offsite backups, I like JungleDisk - it works on Mac, Windows and Linux and backs up to Amazon S3 (or, added very recently, the Rackspace cloud service). This is a nice solution if you have multiple machines (or even VMs) and want to keep certain directories backed up without having to think about it.
Home Server Warning!
I installed Home Server on my development Server for two reasons: Cheap version of Windows Server 2003 and for backup reasons.
The backup software side of things is seriously hit or miss. If you 'Add' a machine to the list of computers to be backed up right at the start of installing Home Server, generally everything is great.
BUT it seems it becomes a WHOLE lot harder to add any other machines after a certain amount of time has passed.
(Case in point: I did a complete rebuild on my laptop, tried to Add it - NOPE!)
So i'm seriously doubting the reliability of this platform for backup purposes. Seems to be defeating the purpose if you can't trust it 100%
I have the following backup scenarios and use rsync as a primary backup tool.
(weekly) Windows backup for "bare metal" recovery
Content of System drive C:\ using Windows Backup for quick recovery after physical disk failure, as I don't want to reinstall Windows and applications from scratch. This is configured to run automatically using Windows Backup schedule.
(daily and conditional) Active content backup using rsync
Rsync takes care of all changed files from laptop, phone, other devices. I backup laptop every night and after significant changes in content, like import of the recent photo RAWs from SD card to laptop.
I've created a bash script that I run from Cygwin on Windows to start rsync: https://github.com/paravz/windows-rsync-backup