Delete an S3 Bucket that has some data archived in Glacier - amazon-s3

We have a huge bucket for which we have setup lifecycle rules to archive data to Glacier.
Now we have decided that we do not need the data in that bucket and hence want to delete all the data stored in Glacier as well as s3.
If i delete the bucket from s3, would we incur a glacier cost for retrieval of data or would the delete's be free?
The bucket has TB's of data and we definitely dont want to pay AWS 1000's of $ in retrieval costs

You can't delete a bucket that is not empty, so you'll need to delete everything stored in the bucket, including what's stored in Glacier, first.
If everything in Glacier was migrated to the glacier storage class over 3 months ago, then you should not incur any charges.
If you don't restore the Glacier objects -- you just delete them -- then the only charge will be for anything that as been in Glacier for less than 3 months. Deleting these objects will incur the documented pro-rated charge for early deletions, which is equivalent to the charge for storing the content in Glacier for 3 months, less the charge already incurred for the storage of the objects in Glacier.
http://aws.amazon.com/s3/faqs/#How_am_I_charged_for_deleting_objects_from_Amazon_Glacier_that_are_less_than_3_months_old

Related

How to make AWS S3 replication faster

I want to back up my S3 storage to Glacier, and I set up some replication rules on the bucket. But after I check the metrics, the processing speed is 6000 objects per day which is super slow. I have 200k objects in my bucket, it may takes a month to complete replication. Is there any way can make replication faster?

How to have EMRFS consistent view on S3 buckets with retention policy?

I am using an AWS EMR compute cluster (version 5.27.0) , which uses S3 for data persistence.
This cluster both reads and writes to S3.
S3 has an issue of eventual consistency, because of which after writing data, it cannot be immediately listed. Due to this I use EMRFS with DynamoDB to store newly written paths for immediate listing.
Problem now is that I have to set a retention policy on S3, because of which data more than a month old will get deleted from S3. However, in doing so , the data does not get deleted from EMRFS DynamoDB table, leading to consistency issues.
My question is , how can I ensure that on setting the retention policy in S3, the same paths get deleted from the DynamoDB table?
One naive solution I have come up with is to define a Lambda, which fires periodically, and sets TTL of say 1 day on the DynamoDB records manually. Is there a better approach than this ?
You can configure DynamoDB with same expiration policy as your S3 objects have
https://aws.amazon.com/blogs/aws/new-manage-dynamodb-items-using-time-to-live-ttl/
and in this case, you ensure both DynamoDB and S3 have the same existing objects

AWS S3 Sync --force-glacier-transfer

A few days back I was experimenting with S3 & Glacier and my data was archived so restoring it back I had to use their expedited service (which costs a lot). I want to move all of my content from one bucket to another bucket in the same region same account.
When I try to sync the data it gives the following error
Completed 10.9 MiB/~10.9 MiB (30.0 KiB/s) with ~0 file(s) remaining (calculatingwarning: Skipping file s3://bucket/zzz0dllquplo1515993694.mp4. Object is of storage class GLACIER. Unable to perform copy operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 copy help for additional parameter options to ignore or force these transfers.
I am using the below command and I was wondering what would it cost me in terms of dollars? Because all of my files storage class is changed to "Glacier" from "Standard". So, I am forced to use --force-glacier-transfer flag
aws s3 sync s3://bucketname1 s3://bucketname2 --force-glacier-transfer --storage-class STANDARD
If you restored them and are before the expiry-date you should be able to sync them without an additional restore. You get the Glacier error for all recursive commands as the API they use doesn't check to see if they are restored. You can read about it in the ticket where they added the --force-glacier-transfer.
https://github.com/aws/aws-cli/issues/1699
When using the --force-glacier-transfer flag it doesn't do another restore, it just ignores the API saying the object is in Glacier and tries anyways. It will fail if the object is not restored (it won't try to restore it).
Note that this is only with the recursive commands (eg. sync and cp/mv with --recursive), if you just copy 1 file it will work without the force flag.
Copy file of a Glacier storage class to a different bucket
You wrote: "I want to move all of my content from one bucket to another bucket in the same region same account."
If you want to copy files kept at a Glacier storage class from one bucket to another bucket even by the sync command, you have to restore the files first, i.e. make the files available for retrieval before you can actually copy them. The exception is when a file is stored in a "Amazon S3 Glacier Instant Retrieval" storage class. In this case, you don't need to explicitly restore the files.
Therefore, you have to issue the restore-object command to each of the files to initiate a restore request. Then you have to wait until the restore request completes. After that, you will be able to copy your files within the number of days that you have specified during the restore request.
Retrieval pricing
You also wrote: "I was wondering what would it cost me in terms of dollars".
With the command you provided, aws s3 sync s3://bucketname1 s3://bucketname2 --force-glacier-transfer --storage-class STANDARD, you copy the files from Glacier to Standard storage class. In this case, you have to first pay for retrieval (one-off) and then you will pay (monthly) for storing both copies of the file: one copy at the glacier their and another copy at the Standard storage class.
According to Amazon (quote),
To change the object's storage class to Amazon S3 Standard, use copy (by overwriting the existing object or copying the object into another location).
However, for a file stored in the Glacier storage class, you can only copy it to another location at S3 within the same bucket, you cannot actually retrieve the file contents unless you restore it, i.e. make it available for retrieval.
Since you have asked "what would it cost me in terms of dollars", you will have to pay according to the retrieval prices and storage prices published by Amazon.
You can check the retrieval pricing at https://aws.amazon.com/s3/glacier/pricing/
The storage prices are available at https://aws.amazon.com/s3/pricing/
The retrieval prices depend on what kind of Glacier storage class you initially selected to store the files: "S3 Glacier Instant Retrieval", "S3 Glacier Flexible Retrieval" or "S3 Glacier Deep Archive". The storage class can be modified by lifecycle rules, so to be more correct, it is the current storage class for each file that matters.
Unless you store your files in the "S3 Glacier Instant Retrieval" storage class, the cheapest option is to first restore the files (make them available for retrieval) using "Bulk" retrieval option (restore tier), which is a free option for "S3 Glacier Flexible Retrieval" and very cheap for "S3 Glacier Deep Archive". Thus you can copy the files with minimal restoration costs if at all.
Since you prefer to use command-line, you can use the Perl script to make the files available for retrieval with the "Bulk" retrieval option (restore tier). Otherwise, the aws s3 sync command that you gave will use the "Standard" restore tier.
As of today, in the Ohio US region, the prices for retrieval are the following.
For "S3 Glacier Instant Retrieval", at the time of writing, it costs $0.03 per GB to restore, with no other options.
For "S3 Glacier Flexible Retrieval", the "Standard" retrieval costs $0.01 per GB while "Bulk" retrieval is free.
For "S3 Glacier Deep Archive", the "Standard" retrieval costs $0.02 while "Bulk" costs $0.0025 per GB.
You will also pay for retrieval requests regardless of the data size. However, for "S3 Glacier Instant Retrieval" you won't pay for retrieval requests; and for "Bulk", retrieval requests costs are minimal (for S3 Glacier Deep Archive), if not free (S3 Glacier Flexible Retrieval).
BUCKET=my-bucket
DATE=$1
BPATH=/pathInBucket/FolderPartitioDate=$DATE
DAYS=5
for x in `aws s3 ls s3://$BUCKET$BPATH --recursive | awk '{print $4}'`;
do
echo "1:Restore $x"
aws s3api --profile sriAthena restore-object --bucket $BUCKET --key $x --restore-request Days=$DAYS,GlacierJobParam
eters={"Tier"="Standard"};
echo "2:Monitor $x"
aws s3api head-object --bucket $BUCKET --key $x;
done
https://aws.amazon.com/premiumsupport/knowledge-center/restore-s3-object-glacier-storage-class/

Do original files stay frozen in Amazon Glacier when I restore them to S3?

I'm using s3cmd and I want to restore some project files from Glacier and I'm just trying to understand what is happening to the files. As far as I see I get a temporary copy of the object restored to S3. What if I change or delete the restored object will the glacier object also be changed or deleted? And after X amount of days the objects will simply be just again glacier objects?
And a second question. When I restore files for 90 days but after 30 days I no longer need them can I remove them from s3 and keep the ones in glacier before the 90 days are over? Could I extend the 90 Days period ?
According to the official documentation:
You use S3’s new RESTORE operation to access an object archived in Glacier. As part of the request, you need to specify a retention period in days. Restoring an object will generally take 3 to 5 hours. Your restored object will remain in both Glacier and S3’s Reduced Redundancy Storage (RRS) for the duration of the retention period. At the end of the retention period the object’s data will be removed from S3; the object will remain in Glacier.
So your original object will remain in Glacier even if you change or delete the retrieved S3 copy, and then the S3 copy will be deleted after the retention period you specify.

How to move object from Amazon S3 to Glacier with Vault Locked enabled?

I'm looking for a solution for moving Amazon S3 objects to Glacier with Vault Lock enabled (like described here https://aws.amazon.com/blogs/aws/glacier-vault-lock/). I'd like to use Amazon built in tools for that (lifecycle management or some other) if possible.
I cannot find any instructions or options to do that. S3 seems to only allow moving object to Glacier storage class. But that does not provide data integrity nor defends against data loss.
I know I could do it with a program. It would download S3 object and move it to Glacier through their respective REST API's. This approach seems too complicated for this simple task.
Picture it this way:
Glacier is a service of AWS.
S3 is a service of AWS.
But S3 is also a customer of the Glacier service.
When you migrate an object in S3 to the Glacier storage class, S3 stores the object in Glacier... using an AWS account that is owned by S3.
Those objects in S3 that use the GLACIER storage class aren't in "your" Glacier vaults, they're in vaults owned by S3.
This is consistent with the externally-observable evidence:
You can't see these S3 objects in vaults from the Glacier console.
You don't have to give S3 any IAM permissions to access Glacier (by contrast, you do have to give S3 permission to publish event notifications to SQS, SNS, or Lambda)
Glacier doesn't bill you for Glacier storage class objects -- S3 does.
In that light, what you are trying to accomplish is completely different. You want to store some archives in your Glacier vault, with your policy, and that content currently just "happens to be" stored in S3 at the moment.
Downloading from S3 and then uploading to Glacier is the solution.
But that does not provide data integrity nor defends against data loss.
The integrity of the payload can be assured when uploading to Glacier because the tree hash algorithm effectively prevents corrupt uploads.
Downloading from S3, unless the object is stored with SSE-C, the ETag is the MD5 hash of the stored object if single-part upload is used, or is the hex-encoded MD5 hash of the concatenated binary-encoded MD5 hashes of the parts, followed by - and the number of parts. Ideally, when uploading to S3, you'd store a better hash (e.g. sha256) in the object metadata, e.g. x-amz-meta-content-sha256.
Defense against data loss -- yes, Glacier does offer more functionality, here, but S3 is not entirely without capability here: bucket policies with a matching DENY action will always override any conflicting ALLOW action, whether it is in the bucket policy or any other IAM policy (e.g. role, user).