Are raw levelDB files safe to use as a backup storage layer? - data-storage

I have a service that stores data to LevelDB. To make backups I simply zip the entire data folder that LevelDB writes to and upload it to S3.
If I need to restore I just unzip the data and copy it back into the data folder and it seems to work great. I can also do this across Mac OSX and Linux Machines.
If I am running the same version of LevelDB on all machines is there anything wrong with this approach?

Related

How to access Amazon S3 backup without Jungle Disk

I've been backing up my Mac to the Amazon S3 cloud using Jungle Disk. Now that Mac is dead. Fine, my backups are on the cloud. So, I go to my other Mac and download Jungle Disk. It is a workgroup version of the software. When I run it it wants me to verify that I purchased the software. Well, when I first set up the Jungle Disk client some years ago there was a free client. I'd rather not pay for this unless there's no good alternative.
Next I login to my Amazon S3 Console. I have a bunch of buckets there which are impossible to navigate.
So, I google around for S3 browsers and find Cyberduck. I download and install that. When I run it it wants a server URL. At this point I'm stuck.
Is there a client that knows about the structure of backups in S3 that I can install on this other Mac to get to my backed up data?
After a couple of conversations with Jungle Disk support I was given this (undocumented) url:
https://downloads.jungledisk.com/jungledisk/JungleDiskDesktop3160.dmg
I've downloaded and installed the client, didn't have to pay anything, and I've gotten to my backed up data. Whew!
Sol got his stuff fixed. Sharing additional background for future readers. Jungle Disk uses the WebDAV standard to allow access through our web service layer. Depending on the version of Jungle Disk you're running we have a few different URLs you'll authenticate to. Ping our team at support.jungledisk.com and we'll get you setup.

Direct upload to S3 from NAS

I'm trying to set up an S3 backup for my company's NAS (a QNAP TS-EC879U-RD), and I'm having some trouble. The NAS device itself has a much faster network connection than the computer that I'm using to upload (10Gb/s vs 1Gb/s), but it seems like every tool I explore has to move data through my computer first. I'm sure there must be a way to bypass this bottleneck, but I can't imagine what it is. Any help pointing me in the right direction would be much appreciated.
The only way to do this would be to have the NAS itself run the upload software. Otherwise, the data MUST be read from the NAS into your PC and then sent via the upload software to S3.
Think about it this way: What you're asking for is software that can see what files are on the NAS, then tell the NAS to upload the files through its network connection. If you could do this without having to install software on the NAS, then any software running on your PC (i.e., a virus) could tell the NAS to upload its data to some hacker's servers.
As such, either you need to have the software running on the NAS itself, which can then directly upload data to S3, or the upload software will need to suck the data from the NAS into your PC and then upload it to S3.

Amazon EC2 Windows AMI with shared S3 storage

I've currently got a base Windows 2008 Server AMI that I created on Amazon EC2. I use it to create 20-30 EBS-based EC2 instances at a time for processing large amounts of data into PDFs for a client. However, once the data processing is complete, I have to manually connect to each machine and copy off the files. This takes a lot of time and effort, and so I'm trying to figure out the best way to use S3 as a centralised storage for the outputted PDF files.
I've seen a number of third party (commercial) utilities that can map S3 buckets to drives within Windows, but is there a better, more sensible way to achieve what I want? Having not used S3 before, only EC2, I'm not sure of what options are available, and I've not been able to find anything online addressing the issue of using S3 as centralised storage for multiple EC2 Windows instances.
Update: Thanks for suggestions of command line tools for using S3. Was hoping for something a little more integrated and less ad-hoc. Seeing as EC2 is closely related to S3 (S3 used to be the default storage mechanism for AMIs, etc), that there might be something neater/easier I could do. Perhaps even around Private Cloud Networks and EC2 backed S3 servers, etc, or something (an area I know nothing about). No other ideas?
I'd probably look for a command line tool. A quick search on Google lead me to a .Net tool:
http://s3.codeplex.com/
And a Java one:
http://www.beaconhill.com/opensource/s3cp.html
I'm sure there are others out there as well.
You could use an EC2 instance with EBS exported through samba which can act as a centralized storage that windows instances can map?
this sounds very much like a hadoop/Amazon MapReduce job to me. Unfortunately, hadoop is best deployed on Linux:
Hadoop on windows server
I assume the software you use for pdf-processing is Windows only?
If this is not the case, I'd seriously consider porting your solution to Linux.

Transfer of directory structure over network

I am designing a remote CD/DVD burner to address hardware constraints on my machine.
My design works like this: (analogous to a network printer)
Unix-based machine (acts as server) hosts a burner.
Windows-based machine acts as client.
Client prepares data to burn and transfers it to the server.
Server burns the data on CD/DVD.
My question is: what is the best protocol to transfer data over the network (Keeping the same directory hierarchy) between different operating systems?
I would think some kind of archive format would be best. The *nix .tar archive format works well for most things. However, since you are burning CD/DVD disks the native .iso format of the disk may be a good choice.
You'll likely need to transfer the entire archive prior to burning to prevent buffer under-run issues.
Edit:
You can use mkisofs to create the .iso file from a folder or your CD burner software may be able to output an .iso file.

Backup tool for Microsoft Virtual Server 2005 R2?

I am seeking a backup tool to back-up virtual OS instances run through Microsoft Virtual Server 2005 R2. According to the MS docs, it should be possible to do it live through volume shadow copy service, but I am having trouble finding any tool for any.
What are the best solution to back-up MS Virtual Server instances?
I'm personally fond of using ImageX to capture the VHD to a WIM file. (This is called file-based imaging, as opposed to sector-based imaging.) WIMs are sort of like an NTFS-specific compression format. It also has a single-instance store, which means that files that appear multiple times are only stored once. The compression is superb and the filesystem is restored perfectly with ACLs and reparse points perfectly intact.
You can store multiple VHDs and multiple versions of those VHDs in a WIM. Which means you can backup incremental versions of your VHD and it'll just add a little delta to the end of the WIM each time.
As for live images, you can script vshadow.exe to make a copy of your virtual machine before backing it up.
You can capture the image to WIM format in one of two ways:
Mount the virtual machine you want to capture in Windows PE using Virtual Server. Then run ImageX with the /CAPTURE flag and save the WIM to a network drive.
Use a tool like VHDMount to mount the virtual machine as a local drive and then capture with ImageX. (In my experience VHDMount is flaky and I would recommend SmartVDK for this task. VHDMount is better for formatting disks and partitioning.)
This only skims the surface of this approach. I've been meaning to write up a more detailed tutorial covering the nuances of all of this.
http://technet.microsoft.com/en-us/library/cc720377.aspx
http://support.microsoft.com/kb/867590
There appear to be a number of ways you can do this.
I'm using BackupChain for Virtual Server 2005 as well as VMWare. It creates delta incremental files which only contain file changes and takes snapshots while the VMs are running. This way we save a lot of storage space and bandwidth because it sends the backups via FTP to another server.
Sav