Best Dual HD Set up for Development - hardware

I've got a machine I'm going to be using for development, and it has two 7200 RPM 160 GB SATA HDs in it.
The information I've found on the net so far seems to be a bit conflicted about which things (OS, Swap files, Programs, Solution/Source code/Other data) I should be installing on how many partitions on which drives to get the most benefit from this situation.
Some people suggest having a separate partition for the OS and/or Swap, some don't bother. Some people say the programs should be on the same physical drive as the OS with the data on the other, some the other way around. Same with the Swap and the OS.
I'm going to be installing Vista 64 bit as my OS and regularly using Visual Studio 2008, VMWare Workstation, SQL Server management studio, etc (pretty standard dev tools).
So I'm asking you--how would you do it?

If the drives support RAID configurations in your BIOS, you should do one of the following:
RAID 1 (Mirror) - Since this is a dev machine this will give you the fault tolerance and peace of mind that your code is safe (and the environment since they are such a pain to put together). You get better performance on reads because it can read from both/either drive. You don't get any performance boost on writes though.
RAID 0 - No fault tolerance here, but this is the fastest configuration because you read and write off both drives. Great if you just want as fast as possible performance and you know your code is safe elsewhere (source control) anyway.
Don't worry about mutiple partitions or OS/Data configs because on a dev machine you sort of need it all anyway and you shouldn't be running heavy multi-user databases or anything anyway (like a server).
If your BIOS doesn't support RAID configurations, however, then you might consider doing the OS/Data split over the two drives just to balance out their use (but as you mentioned, keep the programs on the system drive because it will help with caching). Up to you where to put the swap file (OS will give you dump files, but the data drive is probably less utilized).

If they're both going through the same disk controller, there's not going to be much difference performance-wise no matter which way you do it; if you're going to be doing lots of VM's, I would split one drive for OS and swap / Programs and Data, then keep all the VM's on the other drive.
Having all the VM's on an independant drive would let you move that drive to another machine seamlessly if the host fails, or if you upgrade.

Mark one drive as being your warehouse, put all of your source code, data, assets, etc. on there and back it up regularly. You'll want this to be stable and easy to recover. You can even switch My Documents to live here if wanted.
The other drive should contain the OS, drivers, and all applications. This makes it easy and secure to wipe the drive and reinstall the OS every 18-24 months as you tend to have to do with Windows.
If you want to improve performance, some say put the swap on the warehouse drive. This will increase OS performance, but will decrease the life of the drive.
In reality it all depends on your goals. If you need more performance then you even out the activity level. If you need more security then you use RAID and mirror it. My mix provides for easy maintenance with a reasonable level of data security and minimal bit rot problems.
Your most active files will be the registry, page file, and running applications. If you're doing lots of data crunching then those files will be very active as well.

I would suggest if 160gb total capacity will cover your needs (plenty of space for OS, Applications and source code, just depends on what else you plan to put on it), then you should mirror the drives in a RAID 1 unless you will have a server that data is backed up to, an external hard drive, an online backup solution, or some other means of keeping a copy of data on more then one physical drive.
If you need to use all of the drive capacity, I would suggest using the first drive for OS and Applications and second drive for data. Purely for the fact of, if you change computers at some point, the OS on the first drive doesn't do you much good and most Applications would have to be reinstalled, but you could take the entire data drive with you.
As for dividing off the OS, a big downfall of this is not giving the partition enough space and eventually you may need to use partitioning software to steal some space from the other partition on the drive. It never seems to fail that you allocate a certain amount of space for the OS partition, right after install you have several gigs free space so you think you are fine, but as time goes by, things build up on that partition and you run out of space.
With that in mind, I still typically do use an OS partition as it is useful when reloading a system, you can format that partition blowing away the OS but keep the rest of your data. Ways to keep the space build up from happening too fast is change the location of your my documents folder, change environment variables for items such as temp and tmp. However, there are some things that just refuse to put their data anywhere besides on the system partition. I used to use 10gb, these days I go for 20gb.
Dividing your swap space can be useful for keeping drive fragmentation down when letting your swap file grow and shrink as needed. Again this is an issue though of guessing how much swap you need. This will depend a lot on the amount of memory you have and how much stuff you will be running at one time.

For the posters suggesting RAID - it's probably OK at 160GB, but I'd hesitate for anything larger. Soft errors in the drives reduce the overall reliability of the RAID. See these articles for the details:
http://alumnit.ca/~apenwarr/log/?m=200809#08
http://permabit.wordpress.com/2008/08/20/are-fibre-channel-and-scsi-drives-more-reliable/
You can't believe everything you read on the internet, but the reasoning makes sense to me.
Sorry I wasn't actually able to answer your question.

I usually run a box with two drives. One for the OS, swap, typical programs and applications, and one for VMs, "big" apps (e.g., Adobe CS suite, anything that hits the disk a lot on startup, basically).
But I also run a cheap fileserver (just an old machine with a coupla hundred gigs of disk space in RAID1), that I use to store anything related to my various projects. I find this is a much nicer solution than storing everything on my main dev box, doesn't cost much, gives me somewhere to run a webserver, my personal version control, etc.
Although I admit, it really isn't doing much I couldn't do on my machine. I find it's a nice solution as it helps prevent me from spreading stuff around my workstation's filesystem at random by forcing me to keep all my work in one place where it can be easily backed up, copied elsewhere, etc. I can leave it on all night without huge power bills (it uses <50W under load) so it can back itself up to a remote site with a little script, I can connect to it from outside via SSH (so I can always SCP anything I need).
But really the most important benefit is that I store nothing of any value on my workstation box (at least nothing that isn't also on the server). That means if it breaks, or if I want to use my laptop, etc. everything is always accessible.

I would put the OS and all the applications on the first disk (1 partition). Then, put the data from the SQL server (and any other overflow data) on the second disk (1 partition). This is how I'd set up a machine without any other details about what you're building. Also make sure you have a backup so you don't lose work. It might even be worth it to mirror the two drives (if you have RAID capability) so you don't lose any progress if/when one of them fails. Also, backup to an external disk daily. The RAID won't save you when you accidentally delete the wrong thing.

In general I'd try to split up things that are going to be doing a lot of I/O (such as if you have autosave on VS going off fairly frequently) Think of it as sort of I/O multithreading

I've observed significant speedups by putting my virtual machines on a separate disk. Whenever Windows is doing something stupid in the VM (e.g., indexing yet again), it doesn't thrash my Mac's disk quite so badly.
Another issue is that many tools (Visual Studio comes to mind) break in frustrating ways when bits of them are on the non-primary disk.
Use your second disk for big random things.

Related

Prevent Memory Corruption During Writes with Power Loss

I have a system that runs windows via a USB stick (it's a proprietary machine). This type of machine is commonly powered off by 'pulling the plug'. There is no way around it, that is how it is operated.
We occasionally have drive corruption on the USB stick, or at least corruption in the directory that we write things into. Is there really any software solution to get around this problem other than 'write as little/infrequently as possible'?
It's a windows machine and the applications that write are typically written in Java/C# if that is useful to anyone. The corruption typically shows up as a write directory or the parent of a write directory that can no longer be accessed due to the corruption. The only way to deal with it is to delete it via command line and start over.
Is there any way to programmatically deal with such a scenario, to perhaps restore a previous state of the memory as opposed to deleting and starting anew?
I don't feel as though there is any way to prevent this type of thing from happening given our current design. If you do enough writes and keep pulling the plug you are eventually going to get a corruption and that's just facts. Especially in this design. Even if the backup batteries are charged, if the software doesn't shutdown gracefully within the battery's discharge time, the corruptions could still occur. Not to mention as gravitymixes said above its going to damage hardware eventually which we have seen before.
A system redesign needs to considered for this project as a whole. Some type of networked solution comes to mind immediately where data is sent off the volatile machine to be logged on a machine with a more reliable power source over a reliable network connection with writing to the disk on the actual volatile machine as a last ditch effort if network comms are not reliable at a given point in time (backfill). I feel like this would increase hardware life as well. Of course the problem of network reliability then becomes your problem.

Write time to hard drive

I realize this number will change based on many factors, but in general, when I write data to a hard-drive (e.g. copy a file), how long does it take for that data to actually be written to the platter after Windows says the copy is done?
Could anyone point me in the right direction to discover more on this topic?
If you are looking for a hard number, that is pretty much unknowable. Generally it is the order of a tens to a few hundred milliseconds for the data to start reaching the disk platters, but can be as high as several seconds in a large server disk array with RAID and de-duplication.
The flow of events goes something like this.
The application calls a function like fwrite().
This call is handled by the filesystem layer in your Operating System, which has to figure out what specific disk sectors are to be manipulated.
The SATA/IDE driver in your OS will talk to the hard drive controller hardware. On a modern PC, it typically uses DMA to feed the data to the disk.
The data sits in a write cache inside the hard disk (RAM).
When the physical platters and heads have made it into position, it will begin to transfer the contents of cache onto the platters.
Steps 3-6 may repeat several times depending on how much data is to be written, where on the disk it is to be written. Additionally, there is usually filesystem metadata that must be updated (e.g. free space counters), which will trigger more writes to the disk.
The time it takes from steps 1-3 can be unpredictable in a general purpose OS like Windows due to task scheduling, background threads, and your disk write is probably queued up with a few dozen other processes. I'd say it is usually on the order of 10-100msec on a typical PC. If you go to the Windows Resource Monitor and click the Disk tab, you can get an idea of the average disk queue length. You can use the Performance Monitor to produce more finely-controlled graphs.
Steps 3-4 are largely controlled by the disk controller and disk interface (SATA, SAS, etc). In the server world, you can be talking about a SAN with FC or iSCSI network switches, which impose their own latencies.
Step 5 will be controlled by they physical performance of the disk. Many consumer-grade HDD manufacturers do not post average seek times anymore, but 10-20msec is common.
Interesting detail about Step 5: Some HDDs lie about flushing their write cache to get better benchmark scores.
Step 6 will depend on your filesystem and how much data you are writing.
You are right that there can be a delay between Windows indicating that data writing is finished and the last data actually written. Things to consider are:
Device Manager, Disk Drive, Properties, Policies - Options for disabling Write Caching.
You might be better off using Direct I/O so that Windows does not save it temporarily in File Cache.
If your program writes the data, you can log what has been copied.
If you are sending the data over a network, you are likely to have no control of when the remote system has finished.
To see what is happening, you can set up Perfmon logging. One of my examples of monitoring:
http://www.roylongbottom.org.uk/monitor1.htm#anchor2

Is writing to tape drives any different? (Plain C and Objective-C)

I understand that writing to tape (say floppy) drives using plain C (say the openf statement and subsequent standard C file-write functions) is fundamentally different than writing to regular hard drives. I understand that I have to be careful about what block sizes I use, etc. Can some C veteran confirm that I am right? If I'm right, some further info would be appreciated, such as how I determine the right block size at run-time, etc.
And for Objective-C programmers: Do the Foundation classes to write files abstract away such details in that I can just stop worrying about what kind of a physical media I'm writing to? I.e., do the, say, NSFileManager methods support tape drives without me having to worry about anything?
Note: I am writing a modern Mac app. However, even though tape drives are rare these days (right?), it seems imprudent to just assume them away. Agreed? If this is the case, and Foundation abstracts such details away (which I hope it does), I should much rather prefer Foundation over plain C, right?
openf? What OS is this? I always just used open, read, write, and close for writing to tape for the most part. I think there's some ioctl commands to do seeks, and they take a while, but that was it.
As for floppies, they have always just looked like small volumes without a partition map. vfat was the usual Linux volume type, IIRC. Nothing special about accessing them.
P.S. I can honestly say that, unless you need a tape drive, you can assume them away at this point. I got rid of my last one years ago, and at work the sysadmin only uses a few specialized programs (tar, mt, etc.) with them, and it's all scripted. Nobody uses tapes as secondary storage these days.
Further, I use hard drives, a la Time Machine, as backups these days. They are far faster and more cost effective.
I don't think the unix concept of filesystems goes all the way to tape drives. Normally, tapedrives are accessed in a completely different manner (using special programs) than mountable media. Foundation will probably help you with mountable media, but not with tape drives that are anyway just used for backup.
A little googleing on tape drives for mac turns up this: LTO4 Tape Archiving on the Mac - and it's from 2009. I haven't found any info about tape drives in snow leopard or lion.
So really, I wonder how you would got about accessing one if you can't even test your code.

Carrying and Working on an Entire Development Box from a USB Stick. Feasible?

Lately I have been thinking about investing in a worthy USB pen drive (something along the lines of this), and install Operating Systems on Virtual Machines and start developing on them.
What I have in mind is that I want to be able to carry my development boxes, being a Windows Distribution for .Net development and a Linux Distribution for stuff like RoR, Perl and whatnot, so that I would be able to carry them around where need be...be it work, school, different computers at home etc...
I am thinking of doing this also for backup purposes...ie to backup my almost-single VM file to an external hd, instead of doing routinely updates to my normal Windows Box. I am also thinking about maybe even committing the VM boxes under Source Control (is that even feasible?)
So, am I on the right track with this ? Do you suggest that I try to implement this out?
How feasible is it to have your development box on Virtual Machine that runs from a USB Pen-Drive ?
I absolutely agree with where you are heading. I wish to do this myself.
But if you don't already know, it's not just about drive size, believe it or not USB Flash drives can be much slower than your spinning disk drives!
This can be a big problem if you plan to actually run the VMs directly from the USB drive!
I've tried running a 4GB Windows XP VM on a 32GB Corsair Survivor and the VM was virtually unusuable! Also copying my 4GB VM off and back onto the drive was also quite slow - about 10 minutes to copy it onto the drive.
If you have an esata port I'd highly recommend looking at high-speed ESata options like this Kanguru 32GB ESata/USB Flash drive OR this 32GB one by OCZ.
The read and write speeds of these drives are much higher over ESata than other USB drives. And you can still use them as USB if you don't have an ESata port. Though if you don't have an ESata port you can buy PCI to ESata cards online and even ESata ExpressCards for your laptop.
EDIT: A side note, you'll find the USB flash drives use FAT instead of NTFS. You don't want to use NTFS because it makes a lot more reads & writes on the disk and your drive will only have a limited number of reads & writes before it dies. But by using FAT you'll be limited to max 2GB file size which might be a problem with your VM. If this is the case, you can split your VM disks into 2GB chunks. Also make sure you backup your VM daily incase your drive does reach it's maximum number of writes. :)
This article on USB thumbdrives states,
Never run disk-intensive applications
directly against files stored on the
thumb drive.
USB thumbdrives utilize flash memory and these have a maximum number of writes before going bad and corruption occurs. The author of the previously linked article found it to be in the range of 10,000 - 100,000 writes but if you are using a disk intensive application this could be an issue.
So if you do this, have an aggressive backup policy to backup your work. Similarly, if when you run your development suite, if it could write to the local hard drive as a temporary workspace this would be ideal.
Hopefully you are talking about interpreted language projects. I couldn't imagine compiling a C/C++ of any size on a VM, let alone a VM running off of a USB drive.
I do it quite frequently with Xen, but also include a bare metal bootable kernel on the drive. This is particularly useful when working on something from which a live CD will be based.
The bad side is the bloat on the VM image to keep it bootable across many machines .. so where you would normally build a very lean and mean paravirtualized kernel only .. you have to also include one that has everything including the kitchen sink (up to what you want, i.e. do you need Audio, or token ring, etc?)
I usually carry two sticks, one has Xen + a patched Linux 2.6.26, the other has my various guest images which are ready to boot either way. A debootstrapped copy of Debian or Ubuntu makes a great starting point to create the former.
If nothing else, its fun to tinker with. Sorry to be a bit GNU/Linux centric, but that's what I use exclusively :) I started messing around with this when I had to find an odd path to upgrading my distro, which was two years behind the current one. So, I strapped a guest, installed what I wanted and pointed GRUB at the new LV for my root file system. Inside, I just mounted my old /home LV and away I went.
Check out MojoPac:
http://www.mojopac.com/
Hard-core gamers use it to take world of warcraft with them on the go -- it should work fine for your development needs, at least on Windows. Use cygwin with it for your unix-dev needs.
I used to do this, and found that compiling was so deathly slow, it wasn't worth it.
Keep in mind that USB flash drives are extremely slow (maybe 10 to 100 times slower) compared to hard drives at random write performance (writing lots of small files to a partition which already has lots of files).
A typical build process using GNU tools will create lots of small files - a simple configure script creates thousands of small files and deletes them again just to test the environment before you even start compiling. You could be waiting a long time.

What's a good way to backup (and maybe synchronize) your development machine? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I make extensive use of source control for anything that relates to a project I'm working on (source, docs etc.) and I've never lost anything that way.
However, I have had two or three crashes (spread over the last 4 years) on my development machine that forced me to reinstall my system and reconfigure my apps (eclipse, vim, Firefox, etc.). For weeks after reinstalling, I was missing one little app or another, some PHP or Python module wasn't there, stuff like that.
While this is not fatal, it's very annoying and sucks up time. Because it seemed so rare, I didn't bother about an actual solution, but meanwhile I've developed a mindset where I just don't want stuff like that happening anymore.
So, what are good backup solutions for a development machine? I've read this very similar question, but that guy really wants something different than me.
What I want is to have spare harddrives on the shelf and reduce my recovery time after a crash to something like an hour or less.
Thinking about this, I figured there might also be a way to use the backup mechanism for keeping two or more dev workstations in sync, so I can continue work at a different PC anytime.
EDIT: I should've mentioned that
I'm running Linux
I want incremental backup, so that it's cheap to do it frequently (once or twice a day)
RAID is good, but I'm on a laptop most of the time, no second hd in there, no E-SATA and I'm not sure about RAIDing to a USB drive: would that actually work?
I've seen sysadmins use rsync, has anybody had any experiences with that?
I would set up the machine how you like it and then image it. Then, you can set up rsync(or even SVN) to backup your homedir nightly/etc.
Then when your computer dies, you can reimage, and then redeploy your home dir.
The only problem would be upgraded/new software, but the only way to deal completely with that would be to do complete nightly backups of your drive(s).
Thanks, this sounds like a good suggestion. I think it should be possible to also update the image regularly (to get software updates / installs), but maybe not that often. E. g. I could boot the image in a VM and perform a global package update or something.
Hanno
You could create an image of your workstation after you've installed & configured everything. Then when your computer crashes, you can just restore the image.
A (big) downside to this, is that you won't have any updates or changes you've made since you created the image.
Cobian Backup is a reliable backup system for Windows that will perform scheduled backups to an external drive.
You could create a hard drive image. Restoring from a backup image restores everything to the exact state that it was at the time you took the image.
Or you could create an installer that installs just about everything needed.
Since you expressed interest in rsync, here's an article that covers how to make a bootable backup image via rsync for Debian Linux:
http://www.debian-administration.org/articles/575
Rsync is fast and easy for local and network syncing and is by nature incremental.
You can use RAID-1 for that. It’s the synchronize type, not the backup type.
I use RAID mirroring in conjunction with an external hard drive using Vista's system backup utility to backup the entire machine. That way I can easily fix a hard drive failure, but in the event my system becomes corrupted, I can restore from the E-SATA drive (which I only connect for backup).
Full disclosure: I've never had to restore the backup, so it's kind of like the airbag in your car; hopefully it works when you need it, but there's no way to be sure. Also, the backup process is manual (it can be automated) so I'm only as safe as the last backup.
You can use the linux "dd" command line utility to clone a hard drive.
Just boot from a linux cd, clone or restore your drive and reboot.
It works great for Windows/Mac drives too.
This will clone partition 1 of the first hard drive (/dev/sda) to partition 1 of the second drive (/dev/sdb)
dd if=/dev/sda1 of=/dev/sdb1
This will clone partition 1 of the first hard drive to a FILE on the second drive.
dd if=/dev/sda1 of=/media/drive2/backup/2009-02-25.iso
Simply swap the values for if= and of= to restore the drive.
If you boot from the Ubuntu live CD it will automount your USB drives making it easy to perform the backup/restore with external drive(s).
CAUTION: Verify the identity of your drives BEFORE running the above commands. It's easy to overwrite the wrong drive if you not careful.
Guess this is not exactly what you are looking for, but I just document all what I install and configure on a machine. Google Docs lets me do this from anywhere, and keeps the document intact when the machine crashes.
A good step by step document usually reduces the recovery time to one day or so
If you use a Mac, just plug in an external hard drive and Time Machine will do the rest, creating a complete image of your machine on the schedule you set. I restored from a Time Machine image when I swapped out the hard drive in my MacBook Pro and it worked like a charm.
One other option that a couple of guys use at my company is to have their development environment on a large Linux server. They just use their local machines to run an NX client to access the remote desktop (NX is much faster than VNC) - this has the advantages of fast performance, automatic backup of their files on the server, and the fact that they're developing on the same hardware that our customers use.
No matter what solution you use, it is always a good idea to have a secondary backup, too. Secondary backup should be off-site and include your essential work (source code, important docs). In case something happens to your main site (fire at the office, somebody breaks in and steals all your hardware, etc.), you would still be able to recover, eventually.
There are many online backup solutions. You could just get a remote storage at a reliable provider (e.g. Amazon S3) and sync your work on a daily basis. The solution depends on the type of access you can get, but rsync is probably the tool you would use for that.