How is it even possible for an S3 bucket's global CORS policy to not apply to all files in the bucket? - amazon-s3

I have an S3 bucket used for media file storage, most importantly for mp4 videos. There is a web front-end to all of that. (There's no CloudFront involved and I don't want to sign up for one due to extra cost.) When a browser encounters the raw URL to an mp4 video the default behavior is to play the mp4 file upon clicking (either in place or in a new tab depending on the target HTML tag). Then you could download the video from the native browser player but that takes many clicks and we are talking about sometimes 40 videos in a playlist and it'd be a huge productivity hit vs if the videos would just download.
So understandably a user need emerged for a download button which would allow the download of the video instead of playing it. After hours of research I stumbled upon StreamSaver.js which works great, and I can even give the downloaded video a meaningful name (with the athlete name, position, jersey#, etc) instead of the GUIDs, but there's a downside: I hit CORS policy errors like Access to fetch at 'https://sportsboardmedia.s3.amazonaws.com/uploads/video/{GUID}_{GUID}.mp4' from origin 'https://app.sportsboard.io' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
My bucket was accessible to the public from the get go. After some research I switched my bucket over to static website hosting mode (as per this SO entry), and I also applied a CORS policy advised by this SO entry.
The current situation is that some video files are downloadable now, but other files (we are talking about the same bucket) still throw CORS errors. How is that even possible? The CORS policy supposed to be global to the whole bucket and every file should be equal from CORS point of view as far as I know. Where do I even start to fix this?
I applied this CORS policy a few days ago and my bucket contains about 5TB of videos. I think that's not a big whop for Amazon. Here is an example athlete locker where some of the videos throw CORS error: https://app.sportsboard.io/playerlocker/media/event/6272C5BE-CE03-4344-BED1-29369306C831/5e267f70-f1de-11e7-b8d3-f23c91087063/videos/
I opened an AWS Forum post as well, but all I hear is crickets there too: https://forums.aws.amazon.co/thread.jspa?messageID=991094#991094
Now I opened an AWS IQ request as well: https://iq.aws.amazon.com/p/N3SFMRLKRH
I recorded a video too: https://youtu.be/Dy38JRI5-oU
Maybe I should record a TikTok video as well???

Related

How to improve anti scraping for S3 bucket with specific referer policy and CloudFront distribution

The current situation is that I have a CloudFront distribution with a OAI pointing to my images bucket that are served in my app. I've managed to make the bucket policy to only allow the getObject action from the cloudfront distribution and only to be accessed from my app's domain with the Referer header.
The issue is that if someone manages to fake that referer from a script they could access the images and make a scraping bot and fetch all my data. Is there any way to prevent the access even futher to only be accessible from my own app ?
App's business is a photo news market between news agencies and photographers so my aim is to have a cheap and scalable mechanisms for those restrictions as I manage a lot of photos at once, thus a presigned request for each is load intensive and using a firewall could be very expensive.
If you don't want to go with signed urls approach, other approach you could try -
You can add a custom header along with traditional headers like referrer etc. - This will take care of 99% of the scraping, while there might be few scrapers who would replicate even this header, for them you can simply start blocking them.

Is it possible to have GitHub Readme images follow redirects?

I'm trying to add a test coverage badge to the Readme of a private repository on GitHub. Our continuous integration process saves out the image to a secured Google Cloud Storage bucket that's not accessible to the public, and should remain that way.
Google's authorization layer is smart enough that if I go to the URL for the image, I'm automatically redirected to the resource with a valid auto-generated signed URL.
E.g., if I go to http://storage.cloud.google.com/secret-files/mysecretfile.png, then if I'm logged in and allowed to view it, I'm automatically redirected to something like https://blahblah-apidata.googleusercontent.com/download/storage/v1/b/secret-files/o/mysecretfile.png?key=verylongkey, where I can load the image.
This seemed perfect. Reference the canonical path in the GitHub Readme, authenticated users see the image, unauthenticated users are still blocked, we don't have to make the file public, and we don't have to do anything complicated.
Except that GitHub is proxying the image request, meaning that it will always be unauthenticated. My browser is loading something like https://camo.githubusercontent.com/mysecretimage.png.
Is there a clever way to work around this? Or do I need to go back to the drawing board?
All images on github.com are proxied using the Camo image proxy. There are a couple reasons for this:
It preserves the privacy of users. It isn't possible for a document to track users by directing them to a different site or using cookies to track them.
It means images can be cached and served at an appropriate size.
GitHub can have a very strict content security policy that does not allow loading from untrusted sites, which means that any sort of accidental security problem (like an XSS) is a lot less likely to work.
Note the last part. Even if you found some sneaky way to get another image URL to render properly in the website, your browser wouldn't load it because it violates the Content-Security-Policy header the site sent, and moreover, your browser would tattle about that to the reporting URL that GitHub provided.
So any image URL you provide will need to be readable by GitHub's image proxy and it won't be possible to serve different content to different users.

Prevent view of individual file in AWS S3 bucket

I'm currently looking to host an app with the Angular frontend in a AWS S3 bucket connecting to a PHP backend using the AWS Elastic Beanstalk. I've got it set up and it's working nicely.
However, using S3 to create a static website, anyone can view your code, including the various Angular JS files. This is mostly fine, but I want to create either a file or folder to keep sensitive information in that cannot be viewed by anyone, but can be included/required by all other files. Essentially I want a key that I can attach to all calls to the backend to make sure only authorised requests get through.
I've experimented with various permissions but always seems to be able to view all files, presumably because the static website hosting bucket policy ensures everything is public.
Any suggestions appreciated!
Cheers.
The whole idea of static website hosting on S3 means the content to be public, for example, you have maintenance of your app/web, so you redirect users to the S3 static page notifying there is maintenance ongoing.
I am not sure what all have you tried when you refer to "experimented with various permissions", however, have you tried to setup a bucket policy or maybe setup the bucket as a CloudFront origin and set a Signed URL. This might be a bit tricky considering you want to call these sensitive files by other files. But the way to hide those sensitive files will either be by using some sort of bucket policy or by restricting using some sort of signed URL in my opinion.

CrossDomain Access, HLS through CloudFront with Signed URL(JWplayer)

I am using HLS streaming with the Amazon S3 and Cloud Front using the JWplayer.(With Rails)
I used the Signed URL to encrypt the URL and created an Origin Access Identity as given in the Amazon Cloud Front documentation.
The Signed URL's are generated fine.
I also have a 'crossdomain.xml' file in my bucket which is allowing all the origins(I have given '*')
Now, when I am trying to play my Hls video files from my bucket, I am getting crossdomain access denied issue
I think JW Player is trying to access the 'crossdomain.xml' file without the signed hash. So, it's getting that error.
I have tested my file in demo JWplayer Stream tester and this is the error I am getting in console.
Fetch API cannot load http://xxxxxxxx.cloudfront.net/xxx/1/1m_test.ts.
No 'Access-Control-Allow-Origin' header is present on the requested resource.
Origin 'http://demo.jwplayer.com' is therefore not allowed access.
The response had HTTP status code 403.
If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
Here is the ScreenShot.
Please help me out. Thank You.
This is the link I followed to configure my CloudFront Distribution
I just had the same problem (but with the Flowplayer). I am not sure yet about security risks (and if all steps are needed), but I got it running with:
adding permissions on the crossdomain.xml for everyone to open/download
adding a behaviour in the cloudfront distribution only for crossdomain.xml without restricting access (above the behaviour for * with restricted access)
and then I noticed that in the bucket, the link to the crossdomain.xml was something like "https://some-server.amazonaws.com/bucket.name/%1Fcrossdomain.xml" (notice the weird %1F) and that when I went on rename of the crossdomain.xml, I could delete one invisible character on first position of the name (I didn't make the crossdomain.xml, so I am not sure how this happened)
Edit:
I had hlsjs also running with this and making the crossdomain.xml accessible somehow disabled the CORS request. I am still looking into this.

Is it possible to use both S3 Query String Authentication and HTTP caching?

I have the following requirements for a (Rails) web application that uses S3/Cloudfront for image storage:
A user may only see an image if they are logged in. If the user sends an image URL to a friend, it will not work.
If a user has seen an image, it should be cached by their browser, so they don't have to download it again.
…
Requirement 1 can solved with S3's Query String Authentication (QSA)
(e.g. with 30 second expiry).
Requirement 2 can be solved using HTTP
caching.
Is it possible to use them both together?
The challenge I'm facing is that QSA effectively changes the URL of the image after expiry, even though a perfectly good copy may reside in the browser cache.