Best way to back up Azure Blob Storage

Best way to back up Azure Blob Storage - backup

I would like to use Azure Blobs to store user uploaded images for a website. Upon upload the images are resized and put into folders for thumbnails, large pics and originals. These can be easily referenced from the website, so the solution works pretty well.
The problem is the backup. I understand that Azure has three copies of every blob to protect against hardware failure. If an authenticated user deletes the blob, MS will faithfully delete all three copies, which is a problem.
I couldn't find an easy way to regularly back up and restore a blob container to a point in time. Is there such a solution offered in the azure marketplace that anyone knows of? Maybe this would be better on ServerFault as I'm looking for a canned solution, but the MS link sent me over to Stack Overflow so I'm giving it a shot here.

One method is to use blob snapshots. Refer to https://msdn.microsoft.com/en-us/library/azure/ee691971.aspx

Related

UIDocument VS CoreData External Binary Data VS File Manager

I have an iOS 7 app, that is using Core Data. Some of the Core Data objects has a related (one to one relationship) images that are > 1MB & < 4MB and are stored in the app’s Document folder. Core Data objects only stores image names as string.
I want to integrate iCloud support for the app so I can sync data between devices. I am planning to use iCloud Core Data storage to sync Core Data objects. But what to do with the images?! After reading different posts, I found a couple of options that are highlighted underneath. I am struggling to pick one, that would suit me best. It would be nice to know someones experience/recommendations. What I should be careful with, or what didn't I think of? I also need to consider migration of the existing data to the option I will pick.
OPTION 1. Store UIImage in the Core Data as Binary Data with External Binary Data option (read here). At this moment is seems to be the easiest solution, but I guess not the best. From Documentation:
It is better, however, if you are able to store BLOBs as resources on
the filesystem, and to maintain links (such as URLs or paths) to those
resources.
Also will the external files be synced? If so, how reliable the sync would be if the user quits on minimises the app, will the sync process resume? From objc.io about External File References:
In our testing, when this occurs, iCloud does not always know how to
resolve the relationship and can throw exceptions. If you plan to use
iCloud syncing, consider unchecking this box in your iCloud entities
OPTION 2. Store images using UIDocument (good tutorial here) and somehow track relation between Core Data entry and UIDocument. From what I understand whatever I put in this directory will be automatically synchronised to the iCloud by a system daemon. So if the user quits the app, the images will still be synced to the iCloud, right?
OPTION 3. Using FileManager(more info here). I haven’t read a lot about this approach, but I think it can also work.
OPTION 4. Any other?
There are similar posts (e.g. Core Data with iCloud design), but unfortunately they don't fully answer my question.

Seems Apple will reject application because of large database iCloud synchronization.
I think the best solution is to store images on a remote host, and keep Image URL in CoreData.
And also Local path of image should be resolvable from remote URL.
So the algorithm will look like this ->
1) Getting Remote URL from CoreData.
2) Resolve local path of image.
3) If local image exists retrieve it, otherwise read it from remote and save it to local storage.
You can have a look to Amazon S3 server here.

Streaming from iCloud?

Is it possible to recreate a scenario like itunes match with iCloud APIs available to-date (i am writing/editing this in april 2012) .. specifically..I mean something like this>
A user creates a media document (audio or video) in my app...and it is automatically uploaded to (his) iCloud space. Then user decides to delete his local copy. But the app still shows the document in let say a table view...and if the user presses play..it begins to stream from iCloud..The user also is able to recreate (download) a local copy of the document. (Just to be sure, I understand the difference between the document I describe here and the concept of UIDocument).
If yes..how would I implement the transfer of the file ..let say a recording of 1 minute video..to the cloud? What folder would it be?

Apple docs state that there shouldnt be a distinction between where the data is stored. No sense of local copy of cloud copy. I think it is possible to do what you ask, but you're likely to get rejected. Use something like amazon AWS for hosting instead. You'd have more control over the files and unless you're going to have tons of users, you'll also qualify for the free tier.

How do services like Dropbox implement delta encoding if their files are stored in the cloud?

Dropbox claims that during syncing only the portion of files that changes are transmitted back to main server, which is obviously a great functionality, but how do they perform changes to files stored in Amazon S3 cloud? So for example, lets say a 30 page document on user's desktop contains changes to only page 4. Dropbox now syncs the blocks representing the changes and what happens on the backend if they files that they store are in the cloud? Does that mean they have to download the 30 page document stored in S3 to their server, then perform replacement of blocks representing page 4, and then uploading back to the cloud? I doubt this would be the case because that would be somewhat inefficient. The other option I could think of is if Amazon S3 provides update of file stored in the cloud based on byte ranges, so for example, make a PUT request to file X from bytes 100-200 which will replace all the bytes from 100 to 200 with value of PUT request. So I was curious how companies that use other cloud services such as Amazon, implement this type of syncing.
Thanks

As S3 and similar storages don't offer filesystem capabilities, anything that pretends to store files and directories needs to emulate a file system. And when doing this files are often split to pages of certain size, where each page is stored in a separate file in the storage. This way the changed block requires uploading only one page (for example) and not the whole file. I should note, that with files like office documents this approach can be faulty if file size is changed - for example, if you insert a page at the beginning or delete a page, then the whole file will be changed and the complete file would need to be re-uploaded. We didn't analyze how Dropbox in particular does his job, and I just described the common scenario. There exist also different "patch algorithms", where a patch can be created locally (if Dropbox has an older local copy in the cache) and then applied to one or more blocks on the server.

There are several synchronizing tools which transfer deltas over the wire like rsync, rdiff, rdiff-backup, etc. For bi-directional synchronising with S3 there are paid services like s3rsync for example. For pure client-side synchronising, tools like zsync can be considered (which is what many people employ to roll-out app updates).
An alternative approach would be to tar-ball a directory, generate a delta file (using rdiff or xdelta3), and upload the delta file by using a timestamp as part of the key. In order to sync, all you need to do is to perform these 2 checks client-side:
You have all the delta files from S3. If not pull them and apply them to generate the latest backup state.
Your last backup state corresponds to your current directory. If not generate a new delta file and push to S3.
The concerning factor here would be the at least 100% additional space utilization, client-side. But this approach will help you revert changes if needed.

How to maintain lucene indexes in azure cloud-app

I just started playing with the Azure Library for Lucene.NET (http://code.msdn.microsoft.com/AzureDirectory). Until now, I was using my own custom code for writing lucene indexes on the azure blob. So, I was copying the blob to localstorage of the azure web/worker role and reading/writing docs to the index. I was using my custom locking mechanism to make sure we dont have clashes between reads and writes to the blob. I am hoping Azure Library would take care of these issues for me.
However, while trying out the test app, I tweaked the code to use compound-file option, and that created a new file everytime I wrote to the index. Now, my question is, if I have to maintain the index - i.e keep a snapshot of the index file and use it if the main index gets corrupt, then how do I go about doing this. Should I keep a backup of all the .cfs files that are created or handling only the latest one is fine. Are there api calls to clean up the blob to keep the latest file after each write to the index?
Thanks
Kapil

After i answered this, we ended up changing our search infrastructure and used Windows Azure Drive. We had a Worker Role, which would mount a VHD using the Block Storage, and host the Lucene.NET Index on it. The code checked to make sure the VHD was mounted first and that the index directory existed. If the worker role fell over, the VHD would automatically dismount after 60 seconds, and a second worker role could pick it up.
We have since changed our infrastructure again and moved to Amazon with a Solr instance for search, but the VHD option worked well during development. it could have worked well in Test and Production, but Requirements meant we needed to move to EC2.

i am using AzureDirectory for Full Text indexing on Azure, and i am getting some odd results also... but hopefully this answer will be of some use to you...
firstly, the compound-file option: from what i am reading and figuring out, the compound file is a single large file with all the index data inside. the alliterative to this is having lots of smaller files (configured using the SetMaxMergeDocs(int) function of IndexWriter) written to storage. the problem with this is once you get to lots of files (i foolishly set this to about 5000) it takes an age to download the indexes (On the Azure server it takes about a minute,, of my dev box... well its been running for 20 min now and still not finished...).
as for backing up indexes, i have not come up against this yet, but given we have about 5 million records currently, and that will grow, i am wondering about this also. if you are using a single compounded file, maybe downloading the files to a worker role, zipping them and uploading them with todays date would work... if you have a smaller set of documents, you might get away with re-indexing the data if something goes wrong... but again, depends on the number....

Is storing Image File in database good in desktop application running in network?

I recently came across a problem for image file storage in network.
I have developed a desktop application. It runs in network. It has central database system. Users log in from their own computer in the network and do their job.
Till now the database actions are going fine no problem. Users shares data from same database server.
Now i am being asked to save the user[operator]'s photo too. I am getting confused whether to save it in database as other data or to store in separate file server.
I would like to know which one is better storing images in database or in file server?
EDIT:
The main purpose is to store the account holder's photo and signature and later show it during transaction so that teller can verify the person and signature is correct or not?

See these:
Storing images in database: Yea or nay?
Should I store my images in the database or folders?
Would you store binary data in database or folders?
Store pictures as files or or the database for a web app?
Storing a small number of images: blob or fs?
User Images: Database or filesystem storage?

Since this is a desktop application it's a bit different.
It's really how much data are we talking about here. If you've only got 100 or so users, and it's only profile pictures, I would store it in the DB for a few practical reasons:
No need to manage or worry about a separate file store
You don't need to give shared folder access to each user
No permissions issues
No chance of people messing up your image store
It will be included in your standard DB backup
It will be nicely linked to your data (no absolute vs. relative path issues)
Of course, if you're going to be storing tons of images for thousands of users, I would go with the file system storage.

I think you have to define what you mean with better.
If it is faster my guess you don't want to use a database. You probably just want it plain on a file server.
If you want something like a mini-facebook, where you need a much more dynamic environment, perhaps you are better of storing it a database.
This is more a question than an answer, what do you want to do with the pictures?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas