What's a good way to backup (and maybe synchronize) your development machine? [closed] - backup

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I make extensive use of source control for anything that relates to a project I'm working on (source, docs etc.) and I've never lost anything that way.
However, I have had two or three crashes (spread over the last 4 years) on my development machine that forced me to reinstall my system and reconfigure my apps (eclipse, vim, Firefox, etc.). For weeks after reinstalling, I was missing one little app or another, some PHP or Python module wasn't there, stuff like that.
While this is not fatal, it's very annoying and sucks up time. Because it seemed so rare, I didn't bother about an actual solution, but meanwhile I've developed a mindset where I just don't want stuff like that happening anymore.
So, what are good backup solutions for a development machine? I've read this very similar question, but that guy really wants something different than me.
What I want is to have spare harddrives on the shelf and reduce my recovery time after a crash to something like an hour or less.
Thinking about this, I figured there might also be a way to use the backup mechanism for keeping two or more dev workstations in sync, so I can continue work at a different PC anytime.
EDIT: I should've mentioned that
I'm running Linux
I want incremental backup, so that it's cheap to do it frequently (once or twice a day)
RAID is good, but I'm on a laptop most of the time, no second hd in there, no E-SATA and I'm not sure about RAIDing to a USB drive: would that actually work?
I've seen sysadmins use rsync, has anybody had any experiences with that?

I would set up the machine how you like it and then image it. Then, you can set up rsync(or even SVN) to backup your homedir nightly/etc.
Then when your computer dies, you can reimage, and then redeploy your home dir.
The only problem would be upgraded/new software, but the only way to deal completely with that would be to do complete nightly backups of your drive(s).
Thanks, this sounds like a good suggestion. I think it should be possible to also update the image regularly (to get software updates / installs), but maybe not that often. E. g. I could boot the image in a VM and perform a global package update or something.
Hanno

You could create an image of your workstation after you've installed & configured everything. Then when your computer crashes, you can just restore the image.
A (big) downside to this, is that you won't have any updates or changes you've made since you created the image.

Cobian Backup is a reliable backup system for Windows that will perform scheduled backups to an external drive.

You could create a hard drive image. Restoring from a backup image restores everything to the exact state that it was at the time you took the image.
Or you could create an installer that installs just about everything needed.

Since you expressed interest in rsync, here's an article that covers how to make a bootable backup image via rsync for Debian Linux:
http://www.debian-administration.org/articles/575
Rsync is fast and easy for local and network syncing and is by nature incremental.

You can use RAID-1 for that. It’s the synchronize type, not the backup type.

I use RAID mirroring in conjunction with an external hard drive using Vista's system backup utility to backup the entire machine. That way I can easily fix a hard drive failure, but in the event my system becomes corrupted, I can restore from the E-SATA drive (which I only connect for backup).
Full disclosure: I've never had to restore the backup, so it's kind of like the airbag in your car; hopefully it works when you need it, but there's no way to be sure. Also, the backup process is manual (it can be automated) so I'm only as safe as the last backup.

You can use the linux "dd" command line utility to clone a hard drive.
Just boot from a linux cd, clone or restore your drive and reboot.
It works great for Windows/Mac drives too.
This will clone partition 1 of the first hard drive (/dev/sda) to partition 1 of the second drive (/dev/sdb)
dd if=/dev/sda1 of=/dev/sdb1
This will clone partition 1 of the first hard drive to a FILE on the second drive.
dd if=/dev/sda1 of=/media/drive2/backup/2009-02-25.iso
Simply swap the values for if= and of= to restore the drive.
If you boot from the Ubuntu live CD it will automount your USB drives making it easy to perform the backup/restore with external drive(s).
CAUTION: Verify the identity of your drives BEFORE running the above commands. It's easy to overwrite the wrong drive if you not careful.

Guess this is not exactly what you are looking for, but I just document all what I install and configure on a machine. Google Docs lets me do this from anywhere, and keeps the document intact when the machine crashes.
A good step by step document usually reduces the recovery time to one day or so

If you use a Mac, just plug in an external hard drive and Time Machine will do the rest, creating a complete image of your machine on the schedule you set. I restored from a Time Machine image when I swapped out the hard drive in my MacBook Pro and it worked like a charm.
One other option that a couple of guys use at my company is to have their development environment on a large Linux server. They just use their local machines to run an NX client to access the remote desktop (NX is much faster than VNC) - this has the advantages of fast performance, automatic backup of their files on the server, and the fact that they're developing on the same hardware that our customers use.

No matter what solution you use, it is always a good idea to have a secondary backup, too. Secondary backup should be off-site and include your essential work (source code, important docs). In case something happens to your main site (fire at the office, somebody breaks in and steals all your hardware, etc.), you would still be able to recover, eventually.
There are many online backup solutions. You could just get a remote storage at a reliable provider (e.g. Amazon S3) and sync your work on a daily basis. The solution depends on the type of access you can get, but rsync is probably the tool you would use for that.

Related

Virtual machine undoing

I am working with fairly sensitive information and I am also just a very paranoid person in general. I am using my work computer, but seeing as how I don't work for a company, I don't have any way of safely wiping everything completely whenever I decide to get rid of it. I mean, I can have somebody wipe it for me (I just don't know how secure it is) or just destroy the computer, but outside of that I am not sure what I can do.
So I was thinking about using a Virtual Machine, but I don't understand much about it. For example, I see this article about internet browsing, sandboxing, and an "undo" feature. I realize this is about internet browsing, but the idea of whenever I close the application and it deleting from the VM is appealing. However, I've also read things where you can use VMWare Tools or something like that to recover data that you deleted on the VM.
Is it possible to have the VM delete the data and, at least, make it virtually impossible to recover the data? If not completely, at least make it very unlikely?
The VM's storage is an abstraction and compartmentalization of the storage on the host machine. You can delete the VM, but recovering its image and therefore its contents is not any harder than using forensic tools to recover regular files on your device. If you're worried about security, use strong passwords and a VPN service. In terms of file destruction, you can simply encrypt your data before "destruction". This way even if someone recovers it, it'll be computationally infeasible * for them to undo the encryption just for the chance to maybe peek at your files.
Computational infeasibility means a computation which although computable would take far too many resources to actually compute. Ideally in cryptography one would like to ensure an infeasible computation’s cost is greater than the reward obtained by computing it.

Correct way to take a snapshot

I've asked this question before, but I am still confused. What's the correct and quickest way to take a snapshot (I only use EBS-backed Unix and Windows machines, so that's all my interest right now). Some ideas:
Just take the snapshot... This seems to sometimes cause system corruption.
Stop the machine, take the snapshot and then start the machine. I guess this also means I need to wait for each individual task to complete, making scirpting a bit of a challenge?
Take a snapshot with the 'reboot machine' flag set. There is very little in the documentation to specify why a reboot is needed...
Hope you EC2 experts can help me.
If a bit of data loss is acceptable, just take the snapshot while the instance is running.
If it's not acceptable, write some script magic to ensure everything your application is working on is saved to disk, take the snapshot, then let your app resume work.
For what I do, I find it best to keep a separate EBS volume with my application and its data on it, and when I need a snapshot, I just stop the app for a moment, snapshot, and fire it back up. That way you don't have to worry about what the OS is doing during the snapshot, and it also comes with the added bonuses of being able to quickly move your app and its data to more powerful hardware, and having much smaller snapshots.
Quick answer - both operating systems have features for safely unmounting the drive. An unmounted drive can be snapshotted without fear of corruption.
Long answer
An EBS snapshot is point-in-time and differential (it does not "copy" your data per-se), so as long as your drive is in a consistent and restorable state when it begins, you'll avoid the corruption (since the snapshot is atomic).
As you have implied, whatever state the whole drive is in when it starts, that's what your snapshot image will be when it is restored (if you snapshot while you are half-way done writing a file, then, by-golly, that file will be half-written when you restore).
For both Linux and Windows, a consistent state can be this can be acheived by unmounting the drive. This will guarantee that your buffers are flushed to disk and no writes can occur. In both linux and windows there are commands to list which processes are using the drive; once you have stopped those processes or otherwise gotten them to stop marking the drive for use (different for each program/service), you can unmount. In windows, this is very easy by setting your drive as a "removable drive" and then using the "safely remove hardware" feature to unmount. In linux, you can unmount with the "umount" command.
There are other sneakier ways, but the above is pretty universal.
So assuming you get to a restorable state before you begin, you can resume using your drive as soon as the snapshot starts (you do not need to wait for the snapshot completes before you unlock (or remount) and resume use). At that point you can remount the volume.
How AWS snapshot works:
Your volume and the snapshot is just a set of pointers, when you take a snapshot, you just diverge any blocks that you write to from that point forward; they are effectively new blocks associated with the volume, and the old blocks at that logical place in the volume are left alone so that the snapshot stays the same logically.
This is also why subsequent snapshots will tend to go faster (they are differential).
http://harish11g.blogspot.com/2013/04/understanding-Amazon-Elastic-block-store-EBS-snapshots.html
Taking a snapshot often requires scheduled downtime.
Procedure:
I'd unmount the drive.
Start the snapshot and wait until it's done.
And then re-mount it.
Afaik, the only viable way for a consistent snapshot.
If you could share more about what kind of data is on the snapshot (e.g. a database?), then I could probably extend my answer.
I cannot comment on Windows-based instances since I'm not at all familiar with it but in order to avoid redundancy, check out this blog entry explains as it explains a lot:
http://shlomoswidler.com/2010/01/creating-consistent-snapshots-of-live.html
In a nutshell, they use the xfs file system and when they freeze it to create the snapshot, they allow updates to the filesystem to pass. According to the comments, it seems to work for most people.

Virtual desktop environment for development

Our network team is thinking of setting up a virtual desktop environment (via Windows 2008 virtual host) for each developer.
So we are going to have dumb terminals/laptops and should be using the virtual desktops for all of our work.
Ours is a Microsoft shop and we work with all versions of .net framework. Not having the development environments on the laptops is making the team uncomfortable.
Are there any potential problems with that kind of setup? Is there any reason to be worried about this setup?
Unless there's a very good development-oriented reason for doing this, I'd say don't.
Your developers are going to work best in an environment they want to work in. Unless your developers are the ones suggesting it and pushing for it, you shouldn't be instituting radical changes in their work environments without very good reasons.
I personally am not at all a fan of remote virtualized instances for development work, either. They're often slower, you have to deal with network issues and latency, you often don't have as much control as you would on your own machine. The list goes on and on, and little things add up to create major annoyances.
What happens when the network goes down? Are your dev's just supposed to sit on their hands? Or maybe they could bring cards and play real solitare...
Seriously, though, Unless you have virtual 100% network uptime, and your dev's never work off-site (say, from home) I'm on the "this is a Bad Idea" side.
One option is to get rid of your network team.
Seriously though, I have worked with this same type of setup through VMWare and it wasn't much fun. The only reason why I did it was because my boss thought it might be worth a try. Since I was newly hired, I didn't object. However, after several months of programming this way, I told him that I preferred to have my development studio on my machine and he agreed.
First, the graphical interface isn't really clear with a virtual workstation since it's sending images over the network rather than having your video card's graphical driver render the image. Constant viewing of this gave me a headache.
Secondly, any install of components or tools required the network administrator's help which meant I had to hurry up and wait.
Third, your computer is going to process one application faster than your server is going to process many apps and besides that, it has to send the rendered image over the network. It doesn't sound like it slows you down but it does. Again, hurry up and wait.
Fourth, this may be specific to VMWare but the virtual disk size was fixed to 4GB which to my network guy seemed to think it was enough. This filled up rather quickly. In order for me to expand the drive, I had to wait for the network admin to run partition magic on my drive which screwed it up and I had to have him rebuild my installation.
There are several more reasons but I would strongly encourage you to protest if you can. Your company is probably trying to impliment this because it's a new fad and it can be a way for them to save money. However, your productivity time will be wasted and that needs to be considered as a cost.
Bad Idea. You're taking the most critical tool in your developers' arsenal and making it run much, much, much slower than it needs to, and introducing several critical dependencies along the way.
It's good if you ever have to develop on-site, you can move your dev environment to a laptop and hit the road.
I could see it being required for some highly confidential multiple client work - there is a proof that you didn't leak any test data or debug files from one customer to another.
Down sides:
Few VMs support multiple monitors - without multiple monitors you can't be a productive developer.
Only virtualbox 3 gets close to being able to develop for opengl/activeX on a VM.
In my experience Virtual environments are ideal for test environments (for testing deployments) and not development environments. They are great as a blank slate / clean sheet for testing. I think the risk of alienating your developers is high if you pursue this route. Developers should have all the best tools at their disposal, i.e. high spec laptop / desktop, this keeps morale and productivity high.
Going down this route precludes any home-working which may or may not be an issue. Virtual environments are by their nature slower than dedicated environments, you may also have issues with multiple monitor setups on a VM.
If you go that route, make sure you bench the system aggressively before any serious commitment.
My experience of remote desktops is that it's ok for occasional use, but seldom sufficient for intensive computations and compilation typical of development work, especially at crunch time when everyone needs resources at the same time.
Not sure if that will affect you, but both VMWare and Virtual PC work very slow when viewed via Remote Desktop. For some reason Radmin (http://www.radmin.com/ ) does a much better job.
I regularly work with remote development environments and it is OK (although it takes some time to get used to keep track in which system you're working at the moment ;) ) - but most of the time I'm alone on the system.

Carrying and Working on an Entire Development Box from a USB Stick. Feasible?

Lately I have been thinking about investing in a worthy USB pen drive (something along the lines of this), and install Operating Systems on Virtual Machines and start developing on them.
What I have in mind is that I want to be able to carry my development boxes, being a Windows Distribution for .Net development and a Linux Distribution for stuff like RoR, Perl and whatnot, so that I would be able to carry them around where need be...be it work, school, different computers at home etc...
I am thinking of doing this also for backup purposes...ie to backup my almost-single VM file to an external hd, instead of doing routinely updates to my normal Windows Box. I am also thinking about maybe even committing the VM boxes under Source Control (is that even feasible?)
So, am I on the right track with this ? Do you suggest that I try to implement this out?
How feasible is it to have your development box on Virtual Machine that runs from a USB Pen-Drive ?
I absolutely agree with where you are heading. I wish to do this myself.
But if you don't already know, it's not just about drive size, believe it or not USB Flash drives can be much slower than your spinning disk drives!
This can be a big problem if you plan to actually run the VMs directly from the USB drive!
I've tried running a 4GB Windows XP VM on a 32GB Corsair Survivor and the VM was virtually unusuable! Also copying my 4GB VM off and back onto the drive was also quite slow - about 10 minutes to copy it onto the drive.
If you have an esata port I'd highly recommend looking at high-speed ESata options like this Kanguru 32GB ESata/USB Flash drive OR this 32GB one by OCZ.
The read and write speeds of these drives are much higher over ESata than other USB drives. And you can still use them as USB if you don't have an ESata port. Though if you don't have an ESata port you can buy PCI to ESata cards online and even ESata ExpressCards for your laptop.
EDIT: A side note, you'll find the USB flash drives use FAT instead of NTFS. You don't want to use NTFS because it makes a lot more reads & writes on the disk and your drive will only have a limited number of reads & writes before it dies. But by using FAT you'll be limited to max 2GB file size which might be a problem with your VM. If this is the case, you can split your VM disks into 2GB chunks. Also make sure you backup your VM daily incase your drive does reach it's maximum number of writes. :)
This article on USB thumbdrives states,
Never run disk-intensive applications
directly against files stored on the
thumb drive.
USB thumbdrives utilize flash memory and these have a maximum number of writes before going bad and corruption occurs. The author of the previously linked article found it to be in the range of 10,000 - 100,000 writes but if you are using a disk intensive application this could be an issue.
So if you do this, have an aggressive backup policy to backup your work. Similarly, if when you run your development suite, if it could write to the local hard drive as a temporary workspace this would be ideal.
Hopefully you are talking about interpreted language projects. I couldn't imagine compiling a C/C++ of any size on a VM, let alone a VM running off of a USB drive.
I do it quite frequently with Xen, but also include a bare metal bootable kernel on the drive. This is particularly useful when working on something from which a live CD will be based.
The bad side is the bloat on the VM image to keep it bootable across many machines .. so where you would normally build a very lean and mean paravirtualized kernel only .. you have to also include one that has everything including the kitchen sink (up to what you want, i.e. do you need Audio, or token ring, etc?)
I usually carry two sticks, one has Xen + a patched Linux 2.6.26, the other has my various guest images which are ready to boot either way. A debootstrapped copy of Debian or Ubuntu makes a great starting point to create the former.
If nothing else, its fun to tinker with. Sorry to be a bit GNU/Linux centric, but that's what I use exclusively :) I started messing around with this when I had to find an odd path to upgrading my distro, which was two years behind the current one. So, I strapped a guest, installed what I wanted and pointed GRUB at the new LV for my root file system. Inside, I just mounted my old /home LV and away I went.
Check out MojoPac:
http://www.mojopac.com/
Hard-core gamers use it to take world of warcraft with them on the go -- it should work fine for your development needs, at least on Windows. Use cygwin with it for your unix-dev needs.
I used to do this, and found that compiling was so deathly slow, it wasn't worth it.
Keep in mind that USB flash drives are extremely slow (maybe 10 to 100 times slower) compared to hard drives at random write performance (writing lots of small files to a partition which already has lots of files).
A typical build process using GNU tools will create lots of small files - a simple configure script creates thousands of small files and deletes them again just to test the environment before you even start compiling. You could be waiting a long time.

Best Dual HD Set up for Development

I've got a machine I'm going to be using for development, and it has two 7200 RPM 160 GB SATA HDs in it.
The information I've found on the net so far seems to be a bit conflicted about which things (OS, Swap files, Programs, Solution/Source code/Other data) I should be installing on how many partitions on which drives to get the most benefit from this situation.
Some people suggest having a separate partition for the OS and/or Swap, some don't bother. Some people say the programs should be on the same physical drive as the OS with the data on the other, some the other way around. Same with the Swap and the OS.
I'm going to be installing Vista 64 bit as my OS and regularly using Visual Studio 2008, VMWare Workstation, SQL Server management studio, etc (pretty standard dev tools).
So I'm asking you--how would you do it?
If the drives support RAID configurations in your BIOS, you should do one of the following:
RAID 1 (Mirror) - Since this is a dev machine this will give you the fault tolerance and peace of mind that your code is safe (and the environment since they are such a pain to put together). You get better performance on reads because it can read from both/either drive. You don't get any performance boost on writes though.
RAID 0 - No fault tolerance here, but this is the fastest configuration because you read and write off both drives. Great if you just want as fast as possible performance and you know your code is safe elsewhere (source control) anyway.
Don't worry about mutiple partitions or OS/Data configs because on a dev machine you sort of need it all anyway and you shouldn't be running heavy multi-user databases or anything anyway (like a server).
If your BIOS doesn't support RAID configurations, however, then you might consider doing the OS/Data split over the two drives just to balance out their use (but as you mentioned, keep the programs on the system drive because it will help with caching). Up to you where to put the swap file (OS will give you dump files, but the data drive is probably less utilized).
If they're both going through the same disk controller, there's not going to be much difference performance-wise no matter which way you do it; if you're going to be doing lots of VM's, I would split one drive for OS and swap / Programs and Data, then keep all the VM's on the other drive.
Having all the VM's on an independant drive would let you move that drive to another machine seamlessly if the host fails, or if you upgrade.
Mark one drive as being your warehouse, put all of your source code, data, assets, etc. on there and back it up regularly. You'll want this to be stable and easy to recover. You can even switch My Documents to live here if wanted.
The other drive should contain the OS, drivers, and all applications. This makes it easy and secure to wipe the drive and reinstall the OS every 18-24 months as you tend to have to do with Windows.
If you want to improve performance, some say put the swap on the warehouse drive. This will increase OS performance, but will decrease the life of the drive.
In reality it all depends on your goals. If you need more performance then you even out the activity level. If you need more security then you use RAID and mirror it. My mix provides for easy maintenance with a reasonable level of data security and minimal bit rot problems.
Your most active files will be the registry, page file, and running applications. If you're doing lots of data crunching then those files will be very active as well.
I would suggest if 160gb total capacity will cover your needs (plenty of space for OS, Applications and source code, just depends on what else you plan to put on it), then you should mirror the drives in a RAID 1 unless you will have a server that data is backed up to, an external hard drive, an online backup solution, or some other means of keeping a copy of data on more then one physical drive.
If you need to use all of the drive capacity, I would suggest using the first drive for OS and Applications and second drive for data. Purely for the fact of, if you change computers at some point, the OS on the first drive doesn't do you much good and most Applications would have to be reinstalled, but you could take the entire data drive with you.
As for dividing off the OS, a big downfall of this is not giving the partition enough space and eventually you may need to use partitioning software to steal some space from the other partition on the drive. It never seems to fail that you allocate a certain amount of space for the OS partition, right after install you have several gigs free space so you think you are fine, but as time goes by, things build up on that partition and you run out of space.
With that in mind, I still typically do use an OS partition as it is useful when reloading a system, you can format that partition blowing away the OS but keep the rest of your data. Ways to keep the space build up from happening too fast is change the location of your my documents folder, change environment variables for items such as temp and tmp. However, there are some things that just refuse to put their data anywhere besides on the system partition. I used to use 10gb, these days I go for 20gb.
Dividing your swap space can be useful for keeping drive fragmentation down when letting your swap file grow and shrink as needed. Again this is an issue though of guessing how much swap you need. This will depend a lot on the amount of memory you have and how much stuff you will be running at one time.
For the posters suggesting RAID - it's probably OK at 160GB, but I'd hesitate for anything larger. Soft errors in the drives reduce the overall reliability of the RAID. See these articles for the details:
http://alumnit.ca/~apenwarr/log/?m=200809#08
http://permabit.wordpress.com/2008/08/20/are-fibre-channel-and-scsi-drives-more-reliable/
You can't believe everything you read on the internet, but the reasoning makes sense to me.
Sorry I wasn't actually able to answer your question.
I usually run a box with two drives. One for the OS, swap, typical programs and applications, and one for VMs, "big" apps (e.g., Adobe CS suite, anything that hits the disk a lot on startup, basically).
But I also run a cheap fileserver (just an old machine with a coupla hundred gigs of disk space in RAID1), that I use to store anything related to my various projects. I find this is a much nicer solution than storing everything on my main dev box, doesn't cost much, gives me somewhere to run a webserver, my personal version control, etc.
Although I admit, it really isn't doing much I couldn't do on my machine. I find it's a nice solution as it helps prevent me from spreading stuff around my workstation's filesystem at random by forcing me to keep all my work in one place where it can be easily backed up, copied elsewhere, etc. I can leave it on all night without huge power bills (it uses <50W under load) so it can back itself up to a remote site with a little script, I can connect to it from outside via SSH (so I can always SCP anything I need).
But really the most important benefit is that I store nothing of any value on my workstation box (at least nothing that isn't also on the server). That means if it breaks, or if I want to use my laptop, etc. everything is always accessible.
I would put the OS and all the applications on the first disk (1 partition). Then, put the data from the SQL server (and any other overflow data) on the second disk (1 partition). This is how I'd set up a machine without any other details about what you're building. Also make sure you have a backup so you don't lose work. It might even be worth it to mirror the two drives (if you have RAID capability) so you don't lose any progress if/when one of them fails. Also, backup to an external disk daily. The RAID won't save you when you accidentally delete the wrong thing.
In general I'd try to split up things that are going to be doing a lot of I/O (such as if you have autosave on VS going off fairly frequently) Think of it as sort of I/O multithreading
I've observed significant speedups by putting my virtual machines on a separate disk. Whenever Windows is doing something stupid in the VM (e.g., indexing yet again), it doesn't thrash my Mac's disk quite so badly.
Another issue is that many tools (Visual Studio comes to mind) break in frustrating ways when bits of them are on the non-primary disk.
Use your second disk for big random things.