Get S3 bucket free space in Ceph cluster from amazon S3 API - amazon-s3

Is exists any way to get available free space in Ceph cluster from amazon S3 API?
I need to implement automatic deletion of outdated objects from bucket when Ceph cluster has no space to store new objects. I know exists ways to calculate used space in bucket, but its logic data size and I can't compare them to raw size of cluster disks.

From S3 perspective you can't check free space.
Ceph has implemented expiration mechanism, which will delete outdated objects from your object storage.
Check the doc: http://docs.ceph.com/docs/master/radosgw/s3/#features-support

Related

What are pros and cons of using AWS S3 vs Cassandra as a image store?

Which DB is better for storing images in a photo-sharing application?
We don't recommend storing images directly in Cassandra. Most companies (they're household names you'd know very well and likely using their services) store images/videos/media on an object store like AWS S3 and Google Cloud Store.
Only the metadata of the media are stored in Cassandra for very fast retrieval -- S3 URL/URI, user info, media info, etc.
The advantage of using Cassandra is that it can be deployed to a hybrid combination of public clouds so you're not tied to one vendor. Being able to distribute your Cassandra nodes across clouds means that you can get as close as possible to your users. Cheers!
AWS S3 is an object storage service that works very well for unstructured data. It offers infinite store where the size of an object is restricted to 5TB. S3 is suitable for storing large objects.
DynamoDB is a NoSQL, low latency database which is suitable for semi-structured data. DynamoDB uses cases are usually where we want to store large number of small records and have a millisecond latency, DynamoDB record size limit is 400KB
For a photosharing application, you need Both S3 and DynamoDB. S3 acts as a storage, DynamoDB is your Database which lists all galleries, files, timestamps, captions, users etc
You can store photos in Amazon S3, but photo's metadata in someother database.
Amazon S3 well suited for any objects for large size as well.

How can you set a limit to S3 bucket size in AEM 6.2

Is there a configuration to set the limit on S3 bucket size in AEM 6.2. I am aware of S3 cache size that can be configured using the S3 data store configuration file.
My issue is that S3 bucket can grow exponentially and although there is no limit to the size but there is a constraint on budget. For example if my bucket size in 250GB and it more or less stays the same after every compaction. I don't ever want it to cross 1TB.
I am aware that S3 can limit this but I want to do it via AEM so that operations don't fail and data store is never corrupted.
Any hints?
There are no configuration available that will limit the size of Amazon S3 buckets.
You can, however, obtain Amazon S3 metrics in Amazon CloudWatch. You could create an alarm on a bucket to send a notification when the amount of data stored in an Amazon S3 bucket exceeds a certain threshold.

How to move object from Amazon S3 to Glacier with Vault Locked enabled?

I'm looking for a solution for moving Amazon S3 objects to Glacier with Vault Lock enabled (like described here https://aws.amazon.com/blogs/aws/glacier-vault-lock/). I'd like to use Amazon built in tools for that (lifecycle management or some other) if possible.
I cannot find any instructions or options to do that. S3 seems to only allow moving object to Glacier storage class. But that does not provide data integrity nor defends against data loss.
I know I could do it with a program. It would download S3 object and move it to Glacier through their respective REST API's. This approach seems too complicated for this simple task.
Picture it this way:
Glacier is a service of AWS.
S3 is a service of AWS.
But S3 is also a customer of the Glacier service.
When you migrate an object in S3 to the Glacier storage class, S3 stores the object in Glacier... using an AWS account that is owned by S3.
Those objects in S3 that use the GLACIER storage class aren't in "your" Glacier vaults, they're in vaults owned by S3.
This is consistent with the externally-observable evidence:
You can't see these S3 objects in vaults from the Glacier console.
You don't have to give S3 any IAM permissions to access Glacier (by contrast, you do have to give S3 permission to publish event notifications to SQS, SNS, or Lambda)
Glacier doesn't bill you for Glacier storage class objects -- S3 does.
In that light, what you are trying to accomplish is completely different. You want to store some archives in your Glacier vault, with your policy, and that content currently just "happens to be" stored in S3 at the moment.
Downloading from S3 and then uploading to Glacier is the solution.
But that does not provide data integrity nor defends against data loss.
The integrity of the payload can be assured when uploading to Glacier because the tree hash algorithm effectively prevents corrupt uploads.
Downloading from S3, unless the object is stored with SSE-C, the ETag is the MD5 hash of the stored object if single-part upload is used, or is the hex-encoded MD5 hash of the concatenated binary-encoded MD5 hashes of the parts, followed by - and the number of parts. Ideally, when uploading to S3, you'd store a better hash (e.g. sha256) in the object metadata, e.g. x-amz-meta-content-sha256.
Defense against data loss -- yes, Glacier does offer more functionality, here, but S3 is not entirely without capability here: bucket policies with a matching DENY action will always override any conflicting ALLOW action, whether it is in the bucket policy or any other IAM policy (e.g. role, user).

What are the data size limitations when using the GET,PUT methods to get and store objects in an Amazon S3 cloud?

What is the size of data that can be sent using the GET PUT methods to store and retrieve data from amazon s3 cloud and I would also like to know where I can learn more about the APIs available for storage in Amazon S3 other than the documentation that is already provided.
The PUT method is addressed in the respective Amazon S3 FAQ How much data can I store?:
The total volume of data and number of objects you can store are
unlimited. Individual Amazon S3 objects can range in size from 1 byte
to 5 terabytes. The largest object that can be uploaded in a single
PUT is 5 gigabytes. For objects larger than 100 megabytes, customers
should consider using the Multipart Upload capability. [emphasis mine]
As mentioned, Uploading Objects Using Multipart Upload API is recommended for objects larger than 100MB already, and required for objects larger than 5GB.
The GET method is essentially unlimited. Please note that S3 supports the BitTorrent protocol out of the box, which (depending on your use case) might ease working with large files considerably, see Using BitTorrent with Amazon S3:
Amazon S3 supports the BitTorrent protocol so that developers can save
costs when distributing content at high scale. [...]

Should I persist images on EBS or S3?

I am migrating my Java,Tomcat, Mysql server to AWS EC2.
I have already attached EBS volume for storing MySql data. In my web application people may upload images. So I should persist them. There are 2 alternatives in my mind:
Save uploaded images to EBS volume.
Use the S3 service.
The followings are my notes, please be skeptic about them, as my expertise is not on servers, but software development.
EBS plus: S3 storage is more expensive. (0.15 $/Gb > 0.1$/Gb)
S3 plus: Serving statics from EBS may influence my web server's performance negatively. Is this true? Does Serving images affect server performance notably? For S3 my server will not be responsible for serving statics.
S3 plus: Serving statics from EBS may result I/O cost, probably it will be minor.
EBS plus: People say EBS is faster.
S3 plus: People say S3 is more safe for persistence.
EBS plus: No need to learn API, it is straight forward to save the images to EBS volume.
Namely I can not decide, will be happy if you guide.
Thanks
The price comparison is not quite right:
S3 charges are $0.14 per GB USED, whereas EBS charges are $0.10 per GB PROVISIONED (the size of your EBS volume), whether you use it or not. As a result, S3 may or may not be cheaper than EBS.
I'm currently using S3 for a project and it's working extremely well.
EBS means you need to manage a volume + machines to attach it to. You need to add space as it's filling up and perform backups (not saying you shouldn't back up your S3 data, just that it's not as critical).
It also makes it harder to scale: when you want to add additional machines, you either need to pull off the images to a separate machine or clone the images across all. This also means you're adding a bottleneck: you'll have to manage your own upload process that will either upload to all machines or have a single machine managing it.
I recommend S3: it's set and forget. Any number of machines can be performing uploads in parallel and you don't really need to notify other machines about the upload.
In addition, you can use Amazon Cloudfront as a cheap CDN in front of the images instead of directly downloading from S3.
I have architected solutions on AWS for Stock photography sites which stores millions of images spanning TB's of data, I would like to share some of the best practice in AWS for your requirement:
P1) Store the Original Image file in S3 Standard option
P2) Store the reproducible images like thumbs etc in the S3 Reduced Redundancy option (RRS) to save costs
P3) Meta data about images including the S3 URL can be stored in Amazon RDS or Amazon DynamoDB depending upon the query complexity. Query the entries from Amazon RDS. If your query is complex it is also common practice to Store the meta data in Amazon CloudSearch or Apache Solr.
P4) Deliver your thumbs to users with low latency using Amazon CloudFront.
P5) Queue your image conversion either thru SQS or RabbitMQ on Amazon EC2
P6) If you are planning to use EBS, then they are not scalable with your EC2. So ideally you can use GlusterFS as your common storage pool for all your images. Multiple Amazon EC2 in Auto Scaled mode can still connect to it and access/write images.
You already outlined the advantages and disadvantages of both.
If you are planning to store terabytes of images, with storage requirements increasing day after day, S3 will probably be your best bet as it is built especially for these kinds of situations. You get unlimited storage space, without having to worry about sharding your data over many EBS volumes.
The recurrent cost of S3 is that it comes 50% more expensive than EBS. You will also have to learn the API and implement it in your application, but that is a one-off expense which I think you should be able to absorb very quickly.
Do you expect the images to last indefinitely?
The Amazon EBS FAQ is pretty clear; the annual failure rate is not "essentially zero"; they quote 0.1% to 0.5%. It's better than the disk under your desk, but it would need some kind of backup.