I am using AMWS s3 in a ruby on rails project to store images for my models. Everything is working fine. I was just wondering if it okay/normal that if someone right clicks an image, it shows the following url:
https://mybucketname.s3.amazonaws.com/uploads/photo/picture/100/batman.jpg
Is this a hacking risk, letting people see your bucket name? I guess I was expecting to see a bunch of randomized letters or something. /Noob
Yes, it's normal.
It's not a security risk unless your bucket permissions allow unauthenticated actions like uploading and deleting objects by anonymous users (obviously, having the bucket name would be necessary if a malicious user wanted to overwrite your files) or your bucket name itself provides some kind of information you don't want revealed.
If it makes you feel better, you can always associate a CloudFront distribution with your bucket -- a CloudFront distribution has a default hostname like d1a2b3c4dexample.cloudfront.net, which you can use in your links, or you can associate a vanity hostname with the CloudFront distribution, like assets.example.com, neither of which will reveal the bucket name.
But your bucket name, itself, is not considered sensitive information. It is common practice to use links to objects in buckets, which necessarily include the bucket name.
Related
I am trying to make sure I did not miss anything in the AWS CloudFront documentation or anywhere else ...
I have a (not public) S3 bucket configured as origin in a CloudFront web distribution (i.e. I don't think it matters but I am using signed urls).
Let's say a have a file in a S3 path like
/someRandomString/someCustomerName/someProductName/somevideo.mp4
So, perhaps the url generated by CloudFront would be something like:
https://my.domain.com/someRandomString/someCustomerName/someProductName/somevideo.mp4?Expires=1512062975&Signature=unqsignature&Key-Pair-Id=keyid
Is there a way to obfuscate the path to actual file on the generated URL. All 3 parts before the filename can change, so I prefer not to use "Origin Path" on Origin Settings to hide the begging of the path. With that approach, I would have to create a lot of origins mapped to the same bucket but different paths. If that's the only way, then the limit of 25 origins per distribution would be a problem.
Ideally, I would like to get something like
https://my.domain.com/someRandomObfuscatedPath/somevideo.mp4?Expires=1512062975&Signature=unqsignature&Key-Pair-Id=keyid
Note: I am also using my own domain/CNAME.
Thanks
Cris
One way could be to use a lambda function that receives the S3 file's path, copies it into an obfuscated directory (maybe it has a simple mapping from source to origin) and then returns the signed URL of the copied file. This will ensure that only the obfuscated path is visible externally.
Of course, this will (potentially) double the data storage so you need some way to clean up the obfuscated folders. That could be done on a time-based manner, so if each signed URL is expected to expire after 24 hours, you could create folders based on date, and each of the obfuscated directories could be deleted every other day.
Alternatively, you could use a service like tinyurl.com or something similar to create a mapping. It would be much easier, save on storage, etc. The only downside would be that it would not reflect your domain name.
If you have the ability to modify the routing of your domain then this is a non-issue, but I presume that's not an option.
Obfuscation is not a form of security.
If you wish to control which objects users can access, you should use Pre-Signed URLs or Cookies. This way, you can grant access to private objects via S3 or CloudFront and not worry about people obtaining access to other objects.
See: Serving Private Content through CloudFront
This has me puzzled. I can obviously understand why account ID's are global, but why bucket names?
Wouldn't it make more sense to have something like: https://accountID.storageservice.com/bucketName
Which would namespace buckets under accountID.
What am I missing, why did these obviously elite architects choose to handle bucket names this way?
“The bucket namespace is global - just like domain names”
— http://aws.amazon.com/articles/1109#02
That's more than coincidental.
The reason seems simple enough: buckets and their objects can be accessed through a custom hostname that's the same as the bucket name... and a bucket can optionally host an entire static web site -- with S3 automatically mapping requests from the incoming Host: header onto the bucket of the same name.
In S3, these variant URLs reference the same object "foo.txt" in the bucket "bucket.example.com". The first one works with static website hosting enabled and requires a DNS CNAME (or Alias in Route 53) or a DNS CNAME pointing to the regional REST endpoint; the others require no configuration:
http://bucket.example.com/foo.txt
http://bucket.example.com.s3.amazonaws.com/foo.txt
http://bucket.example.com.s3[-region].amazonaws.com/foo.txt
http://s3[-region].amazonaws.com/bucket.example.com/foo.txt
If an object store service needs a simple mechanism to resolve the Host: header in an HTTP incoming request into a bucket name, the bucket name namespace also needs to be global. Anything else, it seems, would complicate the implementation significantly.
For hostnames to be mappable to bucket names, something has to be globally unique, since obviously no two buckets could respond to the same hostname. The restriction being applied to the bucket name itself leaves no room for ambiguity.
It also seems likely that many potential clients wouldn't like to have their account identified in bucket names.
Of course, you could always add your account id, or any random string, to your desired bucket name, e.g. jozxyqk-payroll, jozxyqk-personnel, if the bucket name you wanted wasn't available.
The more I drink the greater the concept below makes sense, so I've elevated it from a comment on the accepted answer to its own entity:
An additional thought that popped into my head randomly tonight:
Given the ability to use the generic host names that the various object store services provide, one could easily obscure your corporate (or other) identity as the owner of any given data resource.
So, let's say Black Hat Corp hosts a data resource at http://s3.amazonaws.com/obscure-bucket-name/something-to-be-dissassociated.txt.
It would be very difficult for any non-governmental entity to determine who the owner of that resource is without co-operaton from the object store provider.
Not nefarious by design, just objective pragmatism.
And possibly a stroke of brilliance by the architects of this paradigm
Just wondering if there is a recommended strategy for storing different types of assets/files in separate S3 buckets or just put them all in one bucket? The different types of assets that I have include: static site images, user's profile images, user-generated content like documents, files, and videos.
As far as how to group files into buckets. That is really not that critical of an issue unless you want to have different domain names or CNAMEs fordifferent types on content, in which case you would need a separate bucket for each domain name you would want to use.
I would tend to group them by functionality. Perhaps static files used in your application that you have full control over you might deploy into a separate bucket from content that is going to be user generated. Or you might want to have video in a different bucket than images, etc.
To add to my earlier comments about S3 metadata. It is going to be a critical part of optimizing how you server up content from S3/Cloudfront.
Basically, S3 metadata consists of key-value pairs. So you could have Content-Type as a key with a value of image/jpeg for example if the file is .jpg. This will automatically send appropriate Content-Type headers corresponding to your values for requests made directly to S3 URL or via Cloudfront. The same is true of Cache-Control metatags. You can also use your own custom metatags. For example, I use a custom metatag named x-amz-meta-md5 to store an md5 hash of the file. It is used for simple bucket comparisons against content stored in a revision control system, so we don't have to make checksums of each file in the bucket on the fly. We use this for pushing differential content updates to the buckets (i.e. only push those that have changed).
As far as how revision control goes. I would HIGHLY recommend using versioned file names. In other words say you have bigimage.jpg and you want to make an update, call it bigimage1.jpg and change your code to reflect this. Why? Because optimally, you would like to set long expiration time frames in your Cache-Control headers. Unfortunately, if you then want to deploy a file of the same name and you are using Cloudfront, it becomes problematic to invalidate the edge caching locations. Whereas if you have a new file name, Cloudfront would just begin to populate the edge nodes and you don't have to worry about invalidating the cache at all.
Similarly for user-produced content, you might want to include an md5 or some other (mostly) unique identifier scheme, so that each video/image can have its own unique filename and place in the cache.
For your reference here is a link to the AWs documentation on setting up streaming in Cloudfront
http://docs.amazonwebservices.com/AmazonCloudFront/latest/DeveloperGuide/CreatingStreamingDistributions.html
Is there a possibility to set maximum file (object) size using a bucket's policy?
I found here a question like this, but there is no size limitation in the examples.
No, you can't do this with a bucket policy. Check the Element Descriptions page of the S3 documentation for an exhaustive list of the things you can do in a bucket policy.
However, you can specify a content-length-range restriction within a Browser Uploads policy document. This feature is commonly used for giving untrusted users write access to specific keys within an S3 bucket you control (e.g. user-facing media uploads), and it provides the appropriate tools for limiting the location, size, and data types that can be uploaded without needing to expose your S3 credentials.
So I have been reading about signed URL's and some of its benefits. Especially the part about hot linking. Our app doesn't allow users to embed media (photo, video, audio) from our site. So signed URL's looks like the right direction. Mostly to prevent hotlinking.
So now that I know my requirements. I have a few questions.
Does this mean I have to add a policy to my bucket, denying read-write access to any of the files or folders in the bucket?
Do I have to create signed URL's for each page visit? So let's say 100 users visit the same page where the song can be played. Does this mean I have to create 100 signed URL's?
Creating S3 signed URL's are free?
Touching on point #2. Is it normal practice for Amazon S3 to create several signed URL's? I mean what happens if 1,000 users end up coming to the same song page..
Your thoughts?
REFERENCE:
For anyone interested on how I was able to generate signed url's. Based on https://github.com/appoxy/aws gem and the docs at http://rubydoc.info/gems/aws/2.4.5/frames :
s3 = Aws::S3.new(APP_CONFIG['amazon_access_key_id'], APP_CONFIG['amazon_secret_access_key'])
bucket_gen = Aws::S3Generator::Bucket.create(s3, 'bucket_name')
signed_url = bucket_gen.get(URI.unescape(URI.parse(URI.escape('http://bucket_name.s3.amazonaws.com/uploads/foobar.mp3')).path[1..-1]), 1.hour)
By default, your bucket will be set to private. When you upload files to S3, you can set the ACL (permissions) - in your case, you'll want to make sure the files are private.
The simplest solution is to create new signed urls for each visitor. You could do something like generate new urls everyday, store them somewhere, and then use those but that adds complexity for little benefit. The one place where you might need this though is too enable client side caching. Everytime you create a new url, the browser sees it as a different file and will download a fresh copy. If this isn't the behaviour you want, you need to generate urls that expire far in the future and reuse those - but that will reduce the effectiveness of preventing hotlinking.
Yes, generating urls are free. They are generated on the client and don't touch S3. I suppose there is a time/processing cost, but I have created pages with hundreds of urls that are generated on each visit and have not noticed any performance issues.