Is it possible to keep files in glacier after deletion from s3? - amazon-s3

Is it possible to move or copy files from s3 to glacier (or if not possible another cheaper storage class) although the original s3 files will be deleted? Looking for a robust solution for server backups from whm > s3 > glacier. I've trialled multiple lifecycle rules, and can see several questions have been asked around this here, but I can't seem to get the settings right.
WHM sends backups to s3 fine for me. It works by essentially creating a mirror of the on-server backups on s3. My problem is that the way the whm/s3 integration works means that when the on-server backups are deleted at the end of the month so are the backups in the s3 bucket.
What I'd like to achieve is that before the files are deleted from s3 they're permanently kept for a specified period, say 6 months. I've tried rules to archive them to glacier without success and think this is because the original files are deleted and so are the glacier instances?
Is what I'm trying to achieve possible?
Thanks.

There are actually two ways to use Amazon Glacier:
As an Amazon S3 storage class (as you describe), or
By interacting with Amazon Glacier directly
Amazon Glacier has its own API that you can use to upload/download objects to/from a Glacier vault (which is the equivalent to an S3 bucket). In fact, when you use Amazon S3 to move data into Glacier, S3 is simply calling the standard Glacier API to send the data to Glacier. The difference is that S3 is managing the vault for you, so you own't see the objects listed in your Glacier console.
So, what you might choose to do is:
Create your WHM backups
Send them directly to Glacier
Versioning
An alternative approach is to use Amazon S3 Versioning. This means that objects delete from Amazon S3 are not actually deleted. Rather, a delete marker hides the object, but the object is still accessible.
You could then define a lifecycle policy to delete non-current versions (including deleted objects) after a period of time.
See (old article): Amazon S3 Lifecycle Management for Versioned Objects | AWS News Blog

Related

How to stop AWS ElasticBeanstalk from creating an S3 Bucket or inserting into it?

It created an S3 bucket. If I delete it, it just creates a new one. How can I set it to not create a bucket or to stop write permissions from it?
You cannot prevent AWS Elastic Beanstalk from creating S3 Bucket as it stores your application and settings as a bundle in that bucket and executes deployments. That bucket is required till the time you run/deploy your application using AWS EB. Please be vary of deleting these buckets as this may cause your deployments/applications to crash. Although, you may remove older objects (which may not be in use).
Take a look at this link for a detailed information on how EB uses S3 buckets for deployments https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/AWSHowTo.S3.html

Syncing buckets between two S3 Storage Providers

I am currently using RIAK CS as an S3 Provider but I want to change to Scality S3. Therefore, I need to migrate the existing data from RIAK to Scality. Is there a quick an easy way of syncing buckets between the two different storage providers? I have got two docker containers running containing the docker images for the two.
One way of doing it would be to simply download the contents of the buckets to a local folder and then upload to Scality using s3cmd or a similar tool. However, I was hoping there was a direct route between the buckets.
Any ideas?
There would not be a "direct route between the buckets".
While the Amazon S3 CopyObject command can copy objects between different Amazon S3 buckets (even if they are in different regions), it will not work with a non-Amazon endpoint.
Your only hope is if Riak/Scality have somehow built-in connectivity with each other.

Best approach for setting up AEM S3 Data Store

We have an existing setup of AEM 6.1 which uses TarMK for data storage. To migrate the all assets to S3, I followed all steps here: https://docs.adobe.com/docs/en/aem/6-1/deploy/platform/data-store-config.html#Data%20Store%20Configurations (Amazon S3 Data Store). Apparently, the data synced to S3 but when I checked the disk usage report, I still see that assets are using disk space even for existing and newly added assets. What's the purpose of using S3 for assets if they still use the disk space? Or am I doing something wrong? How can I verify that my setup is really using S3? Here is my S3DataStore.config
accessKey="xxxxxxxxxx"
secretKey="xxxxxxxxxx"
s3Bucket="dev-aem-assets-local"
s3Region="eu-west-1"
connectionTimeout="120000"
socketTimeout="120000"
maxConnections="40"
writeThreads="30"
maxErrorRetry="10"
continueOnAsyncUploadFailure=B"true"
cacheSize="0"
minRecordLength="10"
Another question is: Do I need to do the same setup on publisher? Or is it ok just to do it on author and use publisher as is by replicating the binary data?
There are a few parts to your questiob so I'll break down the answer into logical blocks. Shout if I miss anything.
Your setup for migration is correct and S3 will use disk space. This is for the write-through cache.
AEM uses write-through cache for writing to S3 and all the settings for this cache are in your S3 config file. Any writes to data store are first written to this cache. Asynchronous background threads then uploaded to the S3 bucket. This mechanism makes AEM very responsive as it's not blocked by slow S3 writes. Also, data reads for recently written blobs are fast because they don't need slow reads from S3. In short, S3 IO traffic is too slow for AEM so this cache boosts the performance. You cannot disable it as it is required for asynchronous write to S3. You can reduce the size but it's recommended to be at least 50% of your S3 bucket size.
You can verify your S3 setup by looking at your logs for messages related to AWS (grep for aws).
As for publisher, yes you need to migrate from your old publisher to the new publisher. Assuming that you are not using binary-less replication, you will need a different S3 bucket for your publisher. In general, you migrate from author to author and publisher to publisher for a standard implementation.
You can also verify your S3 dat usage by looking at the S3 bucket and the traffic on it. If versioning is enabled on your S3 bucket all the blobs will show version stamping.
Async upload of blobs can be monitored from logs and IP traffic monitoring will show activities related to your S3 bucket. The most useful way is to see the network traffic between your AEM server and S3 end-point.

S3 Replication Between Regions

I search a way to replicate between S3 buckets across regions.
The purpose is that if a file accidentally deleted because a bug in my application, I would be able to restore it from the other bucket.
There is any way to do it without upload the file twice (meaning, not in the application layer)?
Set versioning on your S3 Bucket. After that it will keep all version files which you uploaded or updated in S3 Bucket. After that you can restore any version of file from version listing. See - Amazon S3 Object Lifecycle Management

How do I use Amazon's new RRS for S3?

Reduced Redundancy Storage (RRS) is a new service from Amazon that is a bit cheaper than S3 because there is less redundancy.
However, I can not find any information on how to specify that my data should use RRS rather than standard S3. In fact, there doesn't seem to be any website interface for an S3 services. If I log into AWS, there are only options for EC2, Elastic MapReduce, CloudFront and RDS, none of which I use.
I know this question is old but it's worth mentioning that Amazon's interface for S3 now has an option to change your files (recursively) to RRS. Select a folder and right click on it, under properties change the storage to RRS.
You can use S3 Browser to switch to Reduced Redundancy Storage. It allows you to view/edit storage class for a single file or for multiple files. Moreover, you can configure default storage class for the bucket, so S3 Browser will automatically apply predefined storage class for all new files you are uploading through S3 Browser.
If you are using S3 Browser to work with RRS, the following article may be helpful:
Working with Amazon S3 Reduced Redundancy Storage (RRS)
Note, Storage Class preferences are stored in a local settings file.Other s3 applications are using their own way to store bucket defaults and currently there is not single standard on this.
All objects in Amazon S3 have a
storage class setting. The default
setting is STANDARD. You can use an
optional header on a PUT request to
specify the setting
REDUCED_REDUNDANCY.
From: http://aws.amazon.com/s3/faqs/#How_do_I_specify_that_I_want_to_store_my_data_using_RRS
If you are looking for a way to convert existing data in amazon s3, you can use a fairly recent version of boto and a script I wrote. Details explained on my blog:
http://www.bryceboe.com/2010/07/02/amazon-s3-convert-objects-to-reduced-redundancy-storage/
If you're on a mac, the free cyberduck ftp program will do it. Log into S3, right-click on the bucket (or folder, or file) and choose 'info' and change the storage class from 'unknown' or 'regular s3 storage' to 'reduced redundancy storage'. Took it about 2 hours to change 30,000 files for me...
If you use boto, you can do this:
key.change_storage_class('REDUCED_REDUNDANCY')