How can you set a limit to S3 bucket size in AEM 6.2 - amazon-s3

Is there a configuration to set the limit on S3 bucket size in AEM 6.2. I am aware of S3 cache size that can be configured using the S3 data store configuration file.
My issue is that S3 bucket can grow exponentially and although there is no limit to the size but there is a constraint on budget. For example if my bucket size in 250GB and it more or less stays the same after every compaction. I don't ever want it to cross 1TB.
I am aware that S3 can limit this but I want to do it via AEM so that operations don't fail and data store is never corrupted.
Any hints?

There are no configuration available that will limit the size of Amazon S3 buckets.
You can, however, obtain Amazon S3 metrics in Amazon CloudWatch. You could create an alarm on a bucket to send a notification when the amount of data stored in an Amazon S3 bucket exceeds a certain threshold.

Related

AWS S3 Folder wise metrics

We are using grafana's cloudwatch data source for aws metrics. We would like to differentiate folders in S3 bucket with respect to their sizes and show them as graphs. We know that cloudwatch doesn't give object level metrics but bucket level. In order to monitor the size of the folders in the bucket, let us know if any possible solution out there.
Any suggestion on the same is appreciated.
Thanks in advance.
Amazon CloudWatch provides daily storage metrics for Amazon S3 buckets but, as you mention, these metrics are for the whole bucket, rather than folder-level.
Amazon S3 Inventory can provide a daily CSV file listing all objects. You could load this information into a database or use Amazon Athena to query the contents.
If you require storage metrics at a higher resolution than daily, then you would need to track this information yourself. This could be done with:
An Amazon S3 Event that triggers an AWS Lambda function whenever an object is created or deleted
An AWS Lambda function that receives this information and updates a database
Your application could then retrieve the storage metrics from the database
Thanks for the reply John,
However I found a solution for it using an s3_exporter. It gives metrics according to size of the folders & sub-folders inside S3 bucket.

How to have EMRFS consistent view on S3 buckets with retention policy?

I am using an AWS EMR compute cluster (version 5.27.0) , which uses S3 for data persistence.
This cluster both reads and writes to S3.
S3 has an issue of eventual consistency, because of which after writing data, it cannot be immediately listed. Due to this I use EMRFS with DynamoDB to store newly written paths for immediate listing.
Problem now is that I have to set a retention policy on S3, because of which data more than a month old will get deleted from S3. However, in doing so , the data does not get deleted from EMRFS DynamoDB table, leading to consistency issues.
My question is , how can I ensure that on setting the retention policy in S3, the same paths get deleted from the DynamoDB table?
One naive solution I have come up with is to define a Lambda, which fires periodically, and sets TTL of say 1 day on the DynamoDB records manually. Is there a better approach than this ?
You can configure DynamoDB with same expiration policy as your S3 objects have
https://aws.amazon.com/blogs/aws/new-manage-dynamodb-items-using-time-to-live-ttl/
and in this case, you ensure both DynamoDB and S3 have the same existing objects

Get S3 bucket free space in Ceph cluster from amazon S3 API

Is exists any way to get available free space in Ceph cluster from amazon S3 API?
I need to implement automatic deletion of outdated objects from bucket when Ceph cluster has no space to store new objects. I know exists ways to calculate used space in bucket, but its logic data size and I can't compare them to raw size of cluster disks.
From S3 perspective you can't check free space.
Ceph has implemented expiration mechanism, which will delete outdated objects from your object storage.
Check the doc: http://docs.ceph.com/docs/master/radosgw/s3/#features-support

Does Amazon's S3 Permit you to set a whole bucket to RRS?

I'm using S3. And, I realize that it provides for Reduced Redundancy Storage (RRS) which is substantially cheaper. Is there a way to set a whole bucket to RRS?
To set bucket as a RRS is not possible at present in Amazon S3. The RRS does set on Object Level.
I am one of the developer of Bucket Explorer. and You can set Bucket Default for the Bucket and then every new upload in that s3 Bucket will have RRS enabled on it. and for all existing S3 files you can set RRS Using Batch Operation.

What are the data size limitations when using the GET,PUT methods to get and store objects in an Amazon S3 cloud?

What is the size of data that can be sent using the GET PUT methods to store and retrieve data from amazon s3 cloud and I would also like to know where I can learn more about the APIs available for storage in Amazon S3 other than the documentation that is already provided.
The PUT method is addressed in the respective Amazon S3 FAQ How much data can I store?:
The total volume of data and number of objects you can store are
unlimited. Individual Amazon S3 objects can range in size from 1 byte
to 5 terabytes. The largest object that can be uploaded in a single
PUT is 5 gigabytes. For objects larger than 100 megabytes, customers
should consider using the Multipart Upload capability. [emphasis mine]
As mentioned, Uploading Objects Using Multipart Upload API is recommended for objects larger than 100MB already, and required for objects larger than 5GB.
The GET method is essentially unlimited. Please note that S3 supports the BitTorrent protocol out of the box, which (depending on your use case) might ease working with large files considerably, see Using BitTorrent with Amazon S3:
Amazon S3 supports the BitTorrent protocol so that developers can save
costs when distributing content at high scale. [...]