How to backup on premise data to AWS S3 bucket using tool/service? - amazon-s3

Let me explain a little bit, we are keeping users data to a centralized Share folder(Configured on Domain Controller, permission set via NTFS & Security group), like M-Marketing, S-Sales & T-Trading, etc(These drives are mapped to windows login profile according to their work profile).
On-premise back is already configured. Now I want to back up some of the important drives (Like M, S, T) to AWS S3 to keep data safe & whenever is Source data is not available for whatever reason, I must be able to map those drives according to their work profile.

Related

Can developer have access to limited S3 console

A developer of mine wants to be able to see the entire contents of the S3 bucket that I've given him to develop with. It seems the only way to do this is to give a limited version of the AWS console to see as objects enter the bucket.
Is this even possible? Is there any other way to allow him to see as objects populate the bucket?
You can use IAM roles to control access to resources at a granular level, even down to individual object contained in an S3 bucket.
You can read more about IAM here https://aws.amazon.com/iam/

Download NCDC weather dataset from s3 to local machine using S3cmd

I would like to download the publicly available NCDC weather dataset on amazon public datasets. But am unable to find it. Could anyone tell me the exact bucket it is located in? Also could you please tell me how to download it onto my local machine with s3cmd?
Any help would be much appreciated.
http://aws.amazon.com/datasets/2759 is the link to the specific dataset I am looking for.
The dataset you have referenced is provided as an EBS Snapshot, not an S3 bucket. You will need to convert the snapshot into a new EBS Volume, which can be mounted on an Amazon EC2 instance.
To access it:
Access the Amazon EC2 Management Console
Set the region to US East
In the search bar, choose "Public Snapshot"
Enter snapshot ID in the search filter (from the page you referenced) and hit Enter. You should now see a single snapshot.
Optional: "Copy" the snapshot to another region if you desire
"Create Volume" to create an EBS Volume from the snapshot (be sure to select the same Availability Zone as your EC2 instance)
Go to "Volumes" in the left navigation pane
"Attach" the new EBS Volume onto an existing EC2 instance
You will then have access to the data.
Please note that the dataset is from 2009. It does not contain recent data.

UIDocument VS CoreData External Binary Data VS File Manager

I have an iOS 7 app, that is using Core Data. Some of the Core Data objects has a related (one to one relationship) images that are > 1MB & < 4MB and are stored in the app’s Document folder. Core Data objects only stores image names as string.
I want to integrate iCloud support for the app so I can sync data between devices. I am planning to use iCloud Core Data storage to sync Core Data objects. But what to do with the images?! After reading different posts, I found a couple of options that are highlighted underneath. I am struggling to pick one, that would suit me best. It would be nice to know someones experience/recommendations. What I should be careful with, or what didn't I think of? I also need to consider migration of the existing data to the option I will pick.
OPTION 1. Store UIImage in the Core Data as Binary Data with External Binary Data option (read here). At this moment is seems to be the easiest solution, but I guess not the best. From Documentation:
It is better, however, if you are able to store BLOBs as resources on
the filesystem, and to maintain links (such as URLs or paths) to those
resources.
Also will the external files be synced? If so, how reliable the sync would be if the user quits on minimises the app, will the sync process resume? From objc.io about External File References:
In our testing, when this occurs, iCloud does not always know how to
resolve the relationship and can throw exceptions. If you plan to use
iCloud syncing, consider unchecking this box in your iCloud entities
OPTION 2. Store images using UIDocument (good tutorial here) and somehow track relation between Core Data entry and UIDocument. From what I understand whatever I put in this directory will be automatically synchronised to the iCloud by a system daemon. So if the user quits the app, the images will still be synced to the iCloud, right?
OPTION 3. Using FileManager(more info here). I haven’t read a lot about this approach, but I think it can also work.
OPTION 4. Any other?
There are similar posts (e.g. Core Data with iCloud design), but unfortunately they don't fully answer my question.
Seems Apple will reject application because of large database iCloud synchronization.
I think the best solution is to store images on a remote host, and keep Image URL in CoreData.
And also Local path of image should be resolvable from remote URL.
So the algorithm will look like this ->
1) Getting Remote URL from CoreData.
2) Resolve local path of image.
3) If local image exists retrieve it, otherwise read it from remote and save it to local storage.
You can have a look to Amazon S3 server here.

Streaming from iCloud?

Is it possible to recreate a scenario like itunes match with iCloud APIs available to-date (i am writing/editing this in april 2012) .. specifically..I mean something like this>
A user creates a media document (audio or video) in my app...and it is automatically uploaded to (his) iCloud space. Then user decides to delete his local copy. But the app still shows the document in let say a table view...and if the user presses play..it begins to stream from iCloud..The user also is able to recreate (download) a local copy of the document. (Just to be sure, I understand the difference between the document I describe here and the concept of UIDocument).
If yes..how would I implement the transfer of the file ..let say a recording of 1 minute video..to the cloud? What folder would it be?
Apple docs state that there shouldnt be a distinction between where the data is stored. No sense of local copy of cloud copy. I think it is possible to do what you ask, but you're likely to get rejected. Use something like amazon AWS for hosting instead. You'd have more control over the files and unless you're going to have tons of users, you'll also qualify for the free tier.

How do services like Dropbox implement delta encoding if their files are stored in the cloud?

Dropbox claims that during syncing only the portion of files that changes are transmitted back to main server, which is obviously a great functionality, but how do they perform changes to files stored in Amazon S3 cloud? So for example, lets say a 30 page document on user's desktop contains changes to only page 4. Dropbox now syncs the blocks representing the changes and what happens on the backend if they files that they store are in the cloud? Does that mean they have to download the 30 page document stored in S3 to their server, then perform replacement of blocks representing page 4, and then uploading back to the cloud? I doubt this would be the case because that would be somewhat inefficient. The other option I could think of is if Amazon S3 provides update of file stored in the cloud based on byte ranges, so for example, make a PUT request to file X from bytes 100-200 which will replace all the bytes from 100 to 200 with value of PUT request. So I was curious how companies that use other cloud services such as Amazon, implement this type of syncing.
Thanks
As S3 and similar storages don't offer filesystem capabilities, anything that pretends to store files and directories needs to emulate a file system. And when doing this files are often split to pages of certain size, where each page is stored in a separate file in the storage. This way the changed block requires uploading only one page (for example) and not the whole file. I should note, that with files like office documents this approach can be faulty if file size is changed - for example, if you insert a page at the beginning or delete a page, then the whole file will be changed and the complete file would need to be re-uploaded. We didn't analyze how Dropbox in particular does his job, and I just described the common scenario. There exist also different "patch algorithms", where a patch can be created locally (if Dropbox has an older local copy in the cache) and then applied to one or more blocks on the server.
There are several synchronizing tools which transfer deltas over the wire like rsync, rdiff, rdiff-backup, etc. For bi-directional synchronising with S3 there are paid services like s3rsync for example. For pure client-side synchronising, tools like zsync can be considered (which is what many people employ to roll-out app updates).
An alternative approach would be to tar-ball a directory, generate a delta file (using rdiff or xdelta3), and upload the delta file by using a timestamp as part of the key. In order to sync, all you need to do is to perform these 2 checks client-side:
You have all the delta files from S3. If not pull them and apply them to generate the latest backup state.
Your last backup state corresponds to your current directory. If not generate a new delta file and push to S3.
The concerning factor here would be the at least 100% additional space utilization, client-side. But this approach will help you revert changes if needed.