When to create an S3 Bucket - amazon-s3

I'm setting up my client with a system that allows users to upload a video or two. These videos will be stored on Amazon S3, which I've not used before. I'm unsure about buckets, and what they represent. Do you think I would have a single bucket for my application, a bucket per user or a bucket per file?
If I were to just have the one bucket, presumably I'd have to have really long, illogical file names to prevent a file name clash.

There is no limit to the amount of objects you can store in a bucket, so generally you would have a single bucket per application, or even across multiple applications. Bucket names have to be globally unique across S3 so it would certainly be impossible to manage a bucket per object. A bucket per user would also be difficult if you had more than a handful of users.
For more background on buckets you can try reading Working with Amazon S3 Buckets
Your application should generate unique keys for objects you are adding to the bucket. Try and avoid numeric ascending ids, as these are considered inefficient. Simply reversing a numeric id can usually make an effective object key. See Amazon S3 Performance Tips & Tricks for a more detailed explanation.

Related

Unique resource identifiers for S3 blobs

Is there any "standard" way to uniquely identify an S3 blob using a single string?
There are multiple services that support the S3 protocol: AWS, Minio, GCS.
Usually, to access an S3 blob, you must provide endpoint (+region), bucket and key. The 's3://' URIs seem to only contain bucket and key. Is there any standard that also includes the endpoint?
At least for GCS there is no standard on how you should name the objects that are uploaded to the bucket, they just need to follow the format rules mentioned here.
Nevertheless, as mentioned in this video, there are some tricks that you can follow in order to have a better performance on the read/write operations in a bucket through object naming.
If you have a bit more time, you can also check this other video which goes more in depth on how GCS works as well as performance an reliability tips.

How can I search the changes made on a `s3` bucket between two timestamp?

I am using s3 bucket to store my data. And I keep pushing data to this bucket every single day. I wonder whether there is feature I can compare the files different in my bucket between two date. I not, is there a way for me to build one via aws cli or sdk?
The reason I want to check this is that I have a s3 bucket and my clients keep pushing data to this bucket. I want to have a look how much data they pushed since the last time I load them. Is there a pattern in aws support this query? Or do I have to create any rules in s3 bucket to analyse it?
Listing from Amazon S3
You can activate Amazon S3 Inventory, which can provide a daily file listing the contents of an Amazon S3 bucket. You could then compare differences between two inventory files.
List it yourself and store it
Alternatively, you could list the contents of a bucket and look for objects dated since the last listing. However, if objects are deleted, you will only know this if you keep a list of objects that were previously in the bucket. It's probably easier to use S3 inventory.
Process it in real-time
Instead of thinking about files in batches, you could configure Amazon S3 Events to trigger something whenever a new file is uploaded to the Amazon S3 bucket. The event can:
Trigger a notification via Amazon Simple Notification Service (SNS), such as an email
Invoke an AWS Lambda function to run some code you provide. For example, the code could process the file and send it somewhere.

AWS bucket that is duplicate of another bucket in S3

I have the following requirement. I'm however unsure of how to go about it
Bucket 1 contains data.
Bucket 2 should have duplicate data of Bucket 1. Whenever any file is changed in bucket 1, it is also be changed in bucket 2.
Data in bucket 2 can be independently changed. However, this data change should not be reflected in bucket 1.
This entire process must be automated and run in real time.
Depending on your needs, you might find Cross Region Replication works for you. This would require the buckets to be in separate regions. It also wouldn't copy items that were replicated from another bucket.
Essentially you just create two buckets in separate regions, create an IAM role allowing the replication, then create a Replication Configuration.
If you already have data in the source bucket that you want to appear in the target bucket, then you will also need to run a sync (you can do this as a one-off via the cli).
Another option is using AWS Lambda, which allows the buckets to be in the same region, and gives you more control should you need it. You can also replicate to multiple buckets if you want to.

Is there a benefit to using multiple buckets on Amazon S3 versus consolidating into a single bucket?

We currently serve up downloadable content (mp3, pdf, mp4, zip files, etc) in a single S3 bucket called media.domainname.com.
We have a separate bucket that stores all the various video encodings for our iOS app: app.domainname.com.
We're investigating moving all of our images to S3 as well in order to ease the server load and prep us for moving to a load balanced server setup.
That said, is it better/more efficient to move our images to a separate bucket i.e., images.domainname.com? Or is it a better practice to create an images subfolder in the media bucket, like media.domainname.com/images?
What are the pros/cons of either method?
The primary benefits of using separate buckets are that you can assign separate policies to each:
Reduced redundancy to save on costs.
Versioning of changed contents.
Automatic archival to Glacier
Separate permissions
The only downside that I can think of is that it means you'd have to manage all these things separately across multiple buckets.

Amazon storage unlimited bucket

Hi is there any version of amazon web services provides unlimited buckets?
No. From the S3 documentation:
Each AWS account can own up to 100 buckets at a time.
S3 buckets are expensive (in terms of resources) to create and destroy:
The high availability engineering of Amazon S3 is focused on get, put, list, and delete operations. Because bucket operations work against a centralized, global resource space, it is not appropriate to make bucket create or delete calls on the high availability code path of your application. It is better to create or delete buckets in a separate initialization or setup routine that you run less often.
There's also no good reason to use lots of buckets:
There is no limit to the number of objects that can be stored in a bucket and no variation in performance whether you use many buckets or just a few. You can store all of your objects in a single bucket, or you can organize them across several buckets.
You want a separate space for each of your users to put things. Fine: create a single bucket and give your user-specific information a <user_id>/ prefix. Better yet, put it in users/<user_id>/, so you can use the same bucket for other non-user-specific things later, or change naming schemes, or anything else you might want.
ListObjects accepts a prefix parameter (users/<user_id>/), and has special provisions for hierarchical keys that might be relevant.
Will is correct. For cases where you can really prove a decent use-case, I'm would imagine that AWS would consider bumping your quota (as they will do for most any service). I wouldn't be surprised if particular users have 200-300 buckets or more, but not without justifying it on good grounds to AWS.
With that said, I cannot find any S3 Quota Increase form alongside the other quota increase forms.