Add a random prefix to the key names to improve S3 performance? - amazon-s3

You expect this bucket to immediately receive over 150 PUT requests per second. What should the company do to ensure optimal performance?
A) Amazon S3 will automatically manage performance at this scale.
B) Add a random prefix to the key names.
The correct answer was B and I'm trying to figure out why that is. Can someone please explain the significance of B and if it's still true?

As of a 7/17/2018 AWS announcement, hashing and random prefixing the S3 key is no longer required to see improved performance:
https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/

S3 prefixes used to be determined by the first 6-8 characters;
This has changed mid-2018 - see announcement
https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/
But that is half-truth. Actually prefixes (in old definition) still matter.
S3 is not a traditional “storage” - each directory/filename is a separate object in a key/value object store. And also the data has to be partitioned/ sharded to scale to quadzillion of objects. So yes this new sharding is kinda of “automatic”, but not really if you created a new process that writes to it with crazy parallelism to different subdirectories. Before the S3 learns from the new access pattern, you may run into S3 throttling before it reshards/ repartitions data accordingly.
Learning new access patterns takes time. Repartitioning of the data takes time.
Things did improve in mid-2018 (~10x throughput-wise for a new bucket with no statistics), but it's still not what it could be if data is partitioned properly. Although to be fair, this may not be applied to you if you don't have a ton of data, or pattern how you access data is not hugely parallel (e.g. running a Hadoop/Spark cluster on many Tbs of data in S3 with hundreds+ of tasks accessing same bucket in parallel).
TLDR:
"Old prefixes" still do matter.
Write data to root of your bucket, and first-level directory there will determine "prefix" (make it random for example)
"New prefixes" do work, but not initially. It takes time to accommodate to load.
PS. Another approach - you can reach out to your AWS TAM (if you have one) and ask them to pre-partition a new S3 bucket if you expect a ton of data to be flooding it soon.

#tagar That's true especially if you are not in a read intensive scenario !
You have to understand the small characters of the doc to reverse engineer how it is working internally and how your are limited by the system. There is no magic !
503 Slow Down errors are emitted typically when a single shard of S3 is in a hot spot scenario : too much requests to a single shard. What is difficult to understand is how sharding is done internally and that the advertised limit of request is not guaranteed.
pre-2018 behavior gives the details : it was advised to start the first 6-8 digits of the prefix with random characters to avoid hot spots.
One can them assume that initial sharding of an S3 bucket is done based on the first 8 digits of the prefix.
https://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-tricks-seattle-hiring-event/
post-2018 : an automatic sharding was put in place and AWS does no longer advise to bother about the first digits of the prefix... However from this doc :
http-5xx-errors-s3
amazon-s3-performance-tips-fb76daae65cb
One can understand that this automatic shard rebalancing can only work well if load to a prefix is PROGRESSIVELY scaled up to advertised limits:
If the request rate on the prefixes increases gradually, Amazon S3
scales up to handle requests for each of the two prefixes. (S3 will
scale up to handle 3,500 PUT/POST/DELETE or 5,500 GET requests per
second.) As a result, the overall request rate handled by the bucket
doubles.
From my experience 503 can appear way before the advertised levels and there is no guarantee on the speed of the internal rebalancing made internally by S3.
If you are in a write intensive scenario for exemple uploading a lot of small objects, the automatic scaling won't be efficient to rebalance your load.
In short : if you are relying on S3 performance I advise to stick to pre-2018 rules so that the initial sharding of your storage works immediately and does not rely on the auto-rebalancing algorithm of S3.
hash first 6 digits of prefix or design a datamodel balancing partitions uniformly across first 6 digits of prefix
avoid small objects (target size of object ~128MB)

Lookup/writes work means using filenames that are similar or ordered can harm performance.
Adding hashes/random ids prefixing the S3 key is still advisable to alleviate high loads on heavily accessed objects.
Amazon S3 Performance Tips & Tricks
Request Rate and Performance Considerations

How to introduce randomness to S3 ?
Prefix folder names with random hex hashes. For example: s3://BUCKET/23a6-FOLDERNAME/FILENAME.zip
Prefix file names with timestamps. For example: s3://BUCKET/ FOLDERNAME/2013-26-05-15-00-00-FILENAME.zip

B is correct because, when you add randomness (called entropy or some disorderness), that can place all the objects locat close to each other in the same partition in an index.(for example, a key prefixed with the current year) When your application experiences an increase in traffic, it will be trying to read from the same section of the index, resulting in decreased performance.So, app devs add some random prefixes to avoid this.
Note: AWS might have taken care of this so Dev won't need to take care but just wanted to attempt to give the correct answer for the question asked.

As of June 2021.
As mentioned on AWS guidebook Best practice design pattern: optimizing Amazon S3 performance, the application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket.
I think the random prefix will help to scale S3 performance.
for example, if we have 10 prefixes in one S3 bucket, it will have up to 35000 put/copy/post/delete requests and 55000 read requests.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html

Related

Seaweedfs: TTL options for S3 and/or volume

I'm really interested in TTL feature for files found in Seaweedfs. The only missing part in my understanding is interaction with S3 and on the volume level.
Currently my app is using S3 libraries to interact with the storage. It writes a lot of small files, which are never rewritten, but being accessed for time to time. Also I need to keep only x days of the written data, while older entries should be removed. Seaweedfs looks like a perfect solution to me. But as far as I understand the docs, I can only set TTL using it's own API.
So here's my two questions:
Can I somehow set the TTL for a file using S3 abstraction layer?
If not, can I have default TTL value for say per bucket (or volume, I guess) or per cluster?
Currently not. This can be a good feature to add.

When would you want to make s3 object keys similiar

So S3 uses the object key in partitioning data, and that you should make your keys with some randomness to distribute workloads across multiple partitions. My question is are there any scenarios in which you would want to have similar keys? And if not, why then would AWS use the key to partition your data instead of randomly partitioning data itself?
I ask this because I see it as an odd design as it makes it easy for developers to make mistakes in their partitioning if they generate keys which have a pattern, but it also prevents developers from creating keys in a logical manner as this would undoubtedly result in a pattern and the data being partitioned incorrectly.
You appear to be referring to Request Rate and Performance Considerations - Amazon Simple Storage Service, which states:
The Amazon S3 best practice guidelines in this topic apply only if you are routinely processing 100 or more requests per second. If your typical workload involves only occasional bursts of 100 requests per second and fewer than 800 requests per second, you don't need to follow these guidelines.
This is unlikely to affect most applications, but if applications do have such high traffic, then spreading requests across the keyname space can improve performance.
AWS has not explained why they have designed Amazon S3 in this manner.
So S3 uses the object key in partitioning data
Wait. Your question seems premised on this assumption, but it isn't correct.
S3 does not use the object key to partition the data. That would indeed, as you suggest, be a very "odd design" (or worse).
S3 uses the object key to partition the index of objects in the bucket -- otherwise the index of objects would be stored in an order that would not support enumerating the object keys in sorted order which would also eliminate the ability to list objects by prefix, or identify common prefixes using delimiters -- or there would need to be a secondary index, which would just compound the potential scaling issue and move the same problem down one level.
The case for similar keys is when you want to find objects with a common prefix (in the same "folder") on demand. Storing log files is an easy example, yyyy/mm/dd/.... Note that when the various services store log files in buckets for you (S3 logs, CloudFront, ELB), the object keys are sequential like this, because the date and time are in the object key.
When S3 does a partition split, only the index is split. The data is already durably stored and doesn't move. The potential performance considerations are related to the performance of the index, not that of the actual storage of the object data.

s3 rate limit against website endpoint

I'm hitting my s3 bucket via its website endpoint with various paths/keys. I'm able to get ok (200) responses when I'm hitting it at 1,000 requests per second over the course of 5 minutes. I'm using a popular tool: https://github.com/tsenart/vegeta so I have confidence in these stats.
This is suprising considering the documentation says that anything above is 800 per second is problematic.
Is using a website endpoint different than an API call in terms of throttling? Is 800 a real rate limit or a crude theshhold?
It's a soft limit, and not really a limit from the bucket level perspective. Read carefully. The documentation warns of a rapid request rate increase beyond 800 requests per second potentially resulting in temporary rate limits on your request rate.
S3 increases available capacity by keyspace partition splitting and it takes some time for this to happen... but buckets scale up with workload.
If you are requesting the same object(s) repeatedly, you are also not likely to be imposing as much load on the available resources as you would be if you were hitting 800 unique objects per second and reading between the lines, that is the threshold under discussion -- the time to look up keys in the bucket index. Recent hits are probably already more accessible than cold spots in the index.
The problem that document highlights is that of your object keys are lexically sequential, then S3 will be unable to split the partitions meaningfully, because you will always be creating new objects on only one side of the split or the other and thus working against the scaling algorithm of S3.
The documentation has been updated in meantime and the limits have been increased. Now the limits are per bucket prefix and 1000 req/s isn't a problem any more. For more see the mentioned doc.

Amazon storage unlimited bucket

Hi is there any version of amazon web services provides unlimited buckets?
No. From the S3 documentation:
Each AWS account can own up to 100 buckets at a time.
S3 buckets are expensive (in terms of resources) to create and destroy:
The high availability engineering of Amazon S3 is focused on get, put, list, and delete operations. Because bucket operations work against a centralized, global resource space, it is not appropriate to make bucket create or delete calls on the high availability code path of your application. It is better to create or delete buckets in a separate initialization or setup routine that you run less often.
There's also no good reason to use lots of buckets:
There is no limit to the number of objects that can be stored in a bucket and no variation in performance whether you use many buckets or just a few. You can store all of your objects in a single bucket, or you can organize them across several buckets.
You want a separate space for each of your users to put things. Fine: create a single bucket and give your user-specific information a <user_id>/ prefix. Better yet, put it in users/<user_id>/, so you can use the same bucket for other non-user-specific things later, or change naming schemes, or anything else you might want.
ListObjects accepts a prefix parameter (users/<user_id>/), and has special provisions for hierarchical keys that might be relevant.
Will is correct. For cases where you can really prove a decent use-case, I'm would imagine that AWS would consider bumping your quota (as they will do for most any service). I wouldn't be surprised if particular users have 200-300 buckets or more, but not without justifying it on good grounds to AWS.
With that said, I cannot find any S3 Quota Increase form alongside the other quota increase forms.

How can I organize a million+ files now that I'm moving to Amazon S3?

Well I'm getting booted from my shared host and I'm switching over to a combination of a VPS from Linode and Amazon S3 to host a few million jpegs.
My big worry is keeping some kind of sanity with all these images. Is there any hope of that? My understanding is you're only allowed 100 "buckets" and "buckets" are the only type of structure within S3.
Is putting a few million files in a bucket something you'd advise against?
You may notice in Bucket Restrictions and Limitations, it is stated:
There is no limit to the number of objects that can be stored in a bucket
My experience is that a very large number of objects in a single bucket will not affect the performance of getting a single object by its key (that is, get appears to be of constant complexity).
Having a very large number of object also does not affect the speed of listing a given number of objects:
List performance is not substantially affected by the total number of keys in your bucket
However, I must warn you, that most S3 management tools I've used (like S3Fox) will choke and die a horrible slow death when attempting to access a bucket with a very large number of objects. One tool that seems to cope well with very large numbers of objects is S3 Browser (they have a free version and a Pro version, I am not affiliated with them in any way).
Using "folders" or prefixes, does not change any of these points (get and listing a given number of objects are still constant, most tools still fall over themselves and hang).