Why are S3 and Google Storage bucket names a global namespace? - amazon-s3

This has me puzzled. I can obviously understand why account ID's are global, but why bucket names?
Wouldn't it make more sense to have something like: https://accountID.storageservice.com/bucketName
Which would namespace buckets under accountID.
What am I missing, why did these obviously elite architects choose to handle bucket names this way?

“The bucket namespace is global - just like domain names”
— http://aws.amazon.com/articles/1109#02
That's more than coincidental.
The reason seems simple enough: buckets and their objects can be accessed through a custom hostname that's the same as the bucket name... and a bucket can optionally host an entire static web site -- with S3 automatically mapping requests from the incoming Host: header onto the bucket of the same name.
In S3, these variant URLs reference the same object "foo.txt" in the bucket "bucket.example.com". The first one works with static website hosting enabled and requires a DNS CNAME (or Alias in Route 53) or a DNS CNAME pointing to the regional REST endpoint; the others require no configuration:
http://bucket.example.com/foo.txt
http://bucket.example.com.s3.amazonaws.com/foo.txt
http://bucket.example.com.s3[-region].amazonaws.com/foo.txt
http://s3[-region].amazonaws.com/bucket.example.com/foo.txt
If an object store service needs a simple mechanism to resolve the Host: header in an HTTP incoming request into a bucket name, the bucket name namespace also needs to be global. Anything else, it seems, would complicate the implementation significantly.
For hostnames to be mappable to bucket names, something has to be globally unique, since obviously no two buckets could respond to the same hostname. The restriction being applied to the bucket name itself leaves no room for ambiguity.
It also seems likely that many potential clients wouldn't like to have their account identified in bucket names.
Of course, you could always add your account id, or any random string, to your desired bucket name, e.g. jozxyqk-payroll, jozxyqk-personnel, if the bucket name you wanted wasn't available.

The more I drink the greater the concept below makes sense, so I've elevated it from a comment on the accepted answer to its own entity:
An additional thought that popped into my head randomly tonight:
Given the ability to use the generic host names that the various object store services provide, one could easily obscure your corporate (or other) identity as the owner of any given data resource.
So, let's say Black Hat Corp hosts a data resource at http://s3.amazonaws.com/obscure-bucket-name/something-to-be-dissassociated.txt‌​.
It would be very difficult for any non-governmental entity to determine who the owner of that resource is without co-operaton from the object store provider.
Not nefarious by design, just objective pragmatism.
And possibly a stroke of brilliance by the architects of this paradigm

Related

hiding s3 path in aws cloudfront url

I am trying to make sure I did not miss anything in the AWS CloudFront documentation or anywhere else ...
I have a (not public) S3 bucket configured as origin in a CloudFront web distribution (i.e. I don't think it matters but I am using signed urls).
Let's say a have a file in a S3 path like
/someRandomString/someCustomerName/someProductName/somevideo.mp4
So, perhaps the url generated by CloudFront would be something like:
https://my.domain.com/someRandomString/someCustomerName/someProductName/somevideo.mp4?Expires=1512062975&Signature=unqsignature&Key-Pair-Id=keyid
Is there a way to obfuscate the path to actual file on the generated URL. All 3 parts before the filename can change, so I prefer not to use "Origin Path" on Origin Settings to hide the begging of the path. With that approach, I would have to create a lot of origins mapped to the same bucket but different paths. If that's the only way, then the limit of 25 origins per distribution would be a problem.
Ideally, I would like to get something like
https://my.domain.com/someRandomObfuscatedPath/somevideo.mp4?Expires=1512062975&Signature=unqsignature&Key-Pair-Id=keyid
Note: I am also using my own domain/CNAME.
Thanks
Cris
One way could be to use a lambda function that receives the S3 file's path, copies it into an obfuscated directory (maybe it has a simple mapping from source to origin) and then returns the signed URL of the copied file. This will ensure that only the obfuscated path is visible externally.
Of course, this will (potentially) double the data storage so you need some way to clean up the obfuscated folders. That could be done on a time-based manner, so if each signed URL is expected to expire after 24 hours, you could create folders based on date, and each of the obfuscated directories could be deleted every other day.
Alternatively, you could use a service like tinyurl.com or something similar to create a mapping. It would be much easier, save on storage, etc. The only downside would be that it would not reflect your domain name.
If you have the ability to modify the routing of your domain then this is a non-issue, but I presume that's not an option.
Obfuscation is not a form of security.
If you wish to control which objects users can access, you should use Pre-Signed URLs or Cookies. This way, you can grant access to private objects via S3 or CloudFront and not worry about people obtaining access to other objects.
See: Serving Private Content through CloudFront

AMWS s3 bucket image url

I am using AMWS s3 in a ruby on rails project to store images for my models. Everything is working fine. I was just wondering if it okay/normal that if someone right clicks an image, it shows the following url:
https://mybucketname.s3.amazonaws.com/uploads/photo/picture/100/batman.jpg
Is this a hacking risk, letting people see your bucket name? I guess I was expecting to see a bunch of randomized letters or something. /Noob
Yes, it's normal.
It's not a security risk unless your bucket permissions allow unauthenticated actions like uploading and deleting objects by anonymous users (obviously, having the bucket name would be necessary if a malicious user wanted to overwrite your files) or your bucket name itself provides some kind of information you don't want revealed.
If it makes you feel better, you can always associate a CloudFront distribution with your bucket -- a CloudFront distribution has a default hostname like d1a2b3c4dexample.cloudfront.net, which you can use in your links, or you can associate a vanity hostname with the CloudFront distribution, like assets.example.com, neither of which will reveal the bucket name.
But your bucket name, itself, is not considered sensitive information. It is common practice to use links to objects in buckets, which necessarily include the bucket name.

Using aws s3 prefix and delimiter to differentiate between "folders" and "files" with a common prefix

The question is a bit wordy, but it is exactly what I'm looking to do.
The question is relative to an existing (currently open) question I have here. I believe that understanding this concept is key to answering my existing question, but distinct enough to warrant a whole new one.
given these keys in an s3 bucket named "my-permtest":
/1/
/1/a
/1/2/b
/1/3/c
How can I use prefix and delimiter correctly to get the objects that don't end in "/", (IE: "files").
The ultimate goal is to apply this knowledge to an IAM group policy granting ListBucket and getObject on /1/a while denying getObject or ListBucket to /1/2/, 1/2/*, 1/3 and 1/3/*
I'm effectively looking to mimic traditional file system permissions that let a user access all "files" in a "folder" but restrict access to "subfolders".
Currently, I'm using s3api calls and different values of prefix and delimiter options to get a feel for usage of these things. I've been reading and practicing with these resources, but going is slow and assist would be greatly appreciated.
This is actually pretty straightforward once it's understood that folders don't exist, per se. s3 objects are key/value pairs. the key is the "filename" which represents a full path containing "/" characters. the value for a key is the actual file content. so an s3 object with the key s3://my-bucket/1/2/a is NOT a file named a in a subfolder of 1 named 2. It's an object with a key that looks like a full path.
Knowing this, and applying the understanding that the * wildcard can be used to match key names in an apply or deny statement when writing a policy effectively answers the question. Additionally, it's important to include tightly scoped object or bucket allow/deny statements for specific actions.
So, basically to allow get/put access to the /1/2 "folder" but not the 1/2/a "file", you would need to allow list actions on s3://bucket/1/2 and allow object get/put actions on s3://bucket/1/2/*. Note that it's key to understand the importance of separating out your s3 object actions and s3 bucket list actions in to distinct statements within your policy.
If you wanted to deny access to s3://bucket/1/2/3/*, you would add 2 statements to the same policy. The first denying list access to s3://bucekt/1/2/3, the second denying get/put object access to s3://bucket/1/2/3*.
Now, should you want to allow some people access to s3://bucket/1/2/3/a you would be in a bind if you tried to use this policy, because s3://1/2/3/* has been explicitly denied. Any policy granting access will be ignored because of that explicit deny. The only option would be to have two policies that are nearly identical. One containing the original allow to s3://bucket/1/2/3/* and other containing the original plus the deny for list access to s3://bucket/1/2/3/ and object get/put access for s3://bucket/1/2/3/*. The folks with no access to s3://bucket/1/2/3/* would be in the group with the explicit deny, and the folks with access there would be in the first group that just has the allow.
This quickly gets out of control when there are many "subfolders" with different groups having access to each. Updating a policy every time a new nested "subfolder" is created is an untenable paradigm. For this reason, when using an IAM group policy based approach to securing s3 resources, you should take care to organize the data in your buckets in such a way that you don't have to do this kind of maintenance.
See my related answer here for detail on what I mean, but basically avoid creating subfolders with arbitrary restrictions of who can/can't access them. You're going to have a hard time saying "joe can access 1/3/5 and 1/3/7 but not 1/2/4 or 1/2/6. It's a lot easier to move /1/ /3/ and /5/ under /odd/ and move /2/ /4/ and /6/ under /even/ then just grant allow him to access /odd/*. you don't even have to specify deny to /even/ because it's implicit.

S3 Bucket Types

Just wondering if there is a recommended strategy for storing different types of assets/files in separate S3 buckets or just put them all in one bucket? The different types of assets that I have include: static site images, user's profile images, user-generated content like documents, files, and videos.
As far as how to group files into buckets. That is really not that critical of an issue unless you want to have different domain names or CNAMEs fordifferent types on content, in which case you would need a separate bucket for each domain name you would want to use.
I would tend to group them by functionality. Perhaps static files used in your application that you have full control over you might deploy into a separate bucket from content that is going to be user generated. Or you might want to have video in a different bucket than images, etc.
To add to my earlier comments about S3 metadata. It is going to be a critical part of optimizing how you server up content from S3/Cloudfront.
Basically, S3 metadata consists of key-value pairs. So you could have Content-Type as a key with a value of image/jpeg for example if the file is .jpg. This will automatically send appropriate Content-Type headers corresponding to your values for requests made directly to S3 URL or via Cloudfront. The same is true of Cache-Control metatags. You can also use your own custom metatags. For example, I use a custom metatag named x-amz-meta-md5 to store an md5 hash of the file. It is used for simple bucket comparisons against content stored in a revision control system, so we don't have to make checksums of each file in the bucket on the fly. We use this for pushing differential content updates to the buckets (i.e. only push those that have changed).
As far as how revision control goes. I would HIGHLY recommend using versioned file names. In other words say you have bigimage.jpg and you want to make an update, call it bigimage1.jpg and change your code to reflect this. Why? Because optimally, you would like to set long expiration time frames in your Cache-Control headers. Unfortunately, if you then want to deploy a file of the same name and you are using Cloudfront, it becomes problematic to invalidate the edge caching locations. Whereas if you have a new file name, Cloudfront would just begin to populate the edge nodes and you don't have to worry about invalidating the cache at all.
Similarly for user-produced content, you might want to include an md5 or some other (mostly) unique identifier scheme, so that each video/image can have its own unique filename and place in the cache.
For your reference here is a link to the AWs documentation on setting up streaming in Cloudfront
http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/CreatingStreamingDistributions.html

How to create multiple keys for the same object in Amazon S3 OR CLOUDFRONT?

I am syndicating out my multi-media content (mp4 and images) to several clients. So I create one S3 object for every mp4 say "my_content_that_pays_my_bills.mp4" and let the client access the S3 URL for the objects and embed it wherever they want.
What I want is for client A to access this MP4 as "A_my_content_that_pays_my_bills.mp4"
and Client B to access this as "B_my_content_that_pays_my_bills.mp4" and so on.
I want to bill the clients by usage: so I could process access logs and count access to "B_my_content_that_pays_my_bills.mp4" and bill client B for usage.
I know that S3 allows only one key per object. So how do I get around this ?
I don't know that you can alias file names in the way you'd like. Here are a couple of hacks I can think of for public files embedded freely by a customer:
1) Create one Cloudfront distribution per client, each pointing at the same bucket. Each AWS account can have 100 distributions, so you could support only that many clients. Or,
2) Duplicate the files, using the the client-specific names that you'd like. This is simpler but your file storage costs scale with your clients (which may or may not be significant).