Video.js - HLS => No 'Access-Control-Allow-Origin' header [S3, CloudFront] - amazon-s3

I have a problem to play HLS videos using video.js plugin in my application.
I have an S3 storage of HLS videos(.m3u8, .ts) and its connected to cloudfront. Videos are working on safari, but they are not working on chrome properly. They work on chrome just when I hard reload the page(remove cache,cookies,...).
My configurations:
Video.JS:
videojs.Hls.xhr.beforeRequest = function (options) {
options.headers = {
"Access-Control-Allow-Origin": "*",
};
return options;
};
S3 bucket CORS:
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET",
"PUT",
"POST",
"HEAD"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [
"ETag",
"Access-Control-Allow-Origin",
"Connection",
"Content-Length"
],
"MaxAgeSeconds": 3000
}
]
CloudFront:

I faced a similar problem. In my case, some files were received successfully but others (in the same dir, uploaded at the same time by the same mechanism) threw CORS error.
After days of investigation, I fixed it (I hope). I leave what I figured out here for future researchers.
The CORS support is implemented in S3 and there is a lot of info on the Internet about how to configure it.
When the CloudFront link is requested AWS checks if there is a requested object in the CloudFront cache. If yes, CloudFront returns it. If not, CloudFront requests it from the origin (S3 in my case), caches it, and returns.
When the S3 link is requested and there is an origin header in the request S3 returns the file with access-control-allow-origin header, otherwise, access-control-allow-origin is not added to response headers.
When CloudFront requests a file from the origin (S3) it can transit request headers (that were sent with file request) to the origin (S3). That's why you have to add the origin header (and any others) to 'Choose which headers to include in the cache key.' In this case, if the request to CloudFront contains the origin header it will also be sent to S3. Otherwise, CloudFront will request a file from the S3 without the origin header, S3 will return a file without the access-control-allow-origin header, and a file without headers will be cached and returned to the browser (CORS error).
headers
Now there are 2 options under cache and origin settings: 'Cache policy and origin request policy (recommended)' and 'Legacy cache settings' (seems like earlier there weren't options and just settings from the 'Legacy cache settings' existed). Under 'Cache policy and origin request policy (recommended)' there are 'Cache policy' and 'Origin request policy - optional' sections. If the predefined recommended policies are set, the origin header (and others) are predefined for 'Origin request policy - optional' but not for 'Cache policy'. To be honest, I don't understand the exact meaning of each but seems like legacy 'Choose which headers to include in the cache key' is now divided into 2 sections. And you have to create a new Cache policy (duplicate of recommended) and add the headers (the same as in CORS-S3Origin policy) if you use 'Cache policy and origin request policy (recommended)' instead of 'Legacy cache settings'.
recomended settings cors-s3origin cachingoptimised
In my case, if the files were requested from a mobile app the first time, the requests didn't have the origin header. That's why S3 returned them without the access-control-allow-origin header and they were cached in CloudFront without headers. All next requests with the origin header (browser always add this header when you make a request from js) failed because of CORS error ("No 'Access-Control-Allow-Origin'...").
There is an ability to add custom headers to requests from CloudFront to S3 (Origins -> Edit particular origin -> Add custom header). If you don't care from where users request your files, you can add the origin header here and set it to any value. In this case, all requests to S3 will have the origin header.
custom header
There are a lot of CloudFront edge locations. Each of them has its own cache. The user will receive files from the nearest one. That's why it's possible that some users receive files successfully, but others get CORS errors.
There is an x-cache header in CloudFront response headers. The value can be either "Miss from cloudfront" (there is no request file in the cache) or "Hit from cloudfront" (the file returned from the cache). So, you can see if your request is the first to a particular edge location (disable browser cache in devtools if you want to try). But sometimes it behaves like randomly) and I don't know why.
Looks like even the same edge location can have different caches for different clients. I don't know what is it based on, but I've tried to experiment with browser, Postman, and curl, and have got the next results (I've tried many times with different files and different ordering - the request from the curl doesn't see the cache created for browser and Postman, and vice versa):
the request from the browser returns "Miss from cloudfront";
the request from the browser returns "Hit from cloudfront";
the request from the Postman returns "Hit from cloudfront";
the request from the curl returns "Miss from cloudfront";
the request from the curl returns "Hit from cloudfront".
As AWS docs are quite poor about this question and support just recommends reading the docs, I'm not sure about part of my conclusions. That's just what I think.

Related

CORS issue between subdomain on IE

I have 3 subdomains
clients.mywebsite.com,
admins.mywebsite.com,
api.mywebsite.com
api.mywebsite.com is the restful service consumed by the other two websites. When calling the API from the websites I got some cross origin issues. I was able to fix this issue on most browsers by setting 'Access-Control-Allow-Origin', '*' at the API. But in Internet Explorer this issue remained the same.
I was able to fix this manually by enabling CORS (this is turned off by default) in IE
Alt -> Tools -> Internet Options -> Security (Tab) -> Custom Level
-> Miscellaneous -> Access data sources across domains -> Set to Enable
And then from the console of the IE debugger I tried a GET request
var xhttp= new XMLHttpRequest();
xhttp.open("GET", "https://api.mywebsite.com/v1/", true);
xhttp.send();
after that, all the GET and POST request started working normally and I was able to log in. I can't make the clients configure IE in such a way. What are some alternate solutions ?
In IE 9 and earlier you could use the way you using to configure IE to deal with CORS issues. In IE 10+, your server must attach the following headers to all responses:
Access-Control-Allow-Origin: http://example.com
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: ACL, CANCELUPLOAD, CHECKIN, CHECKOUT, COPY, DELETE, GET, HEAD, LOCK, MKCALENDAR, MKCOL, MOVE, OPTIONS, POST, PROPFIND, PROPPATCH, PUT, REPORT, SEARCH, UNCHECKOUT, UNLOCK, UPDATE, VERSION-CONTROL
Access-Control-Allow-Headers: Overwrite, Destination, Content-Type, Depth, User-Agent, Translate, Range, Content-Range, Timeout, X-File-Size, X-Requested-With, If-Modified-Since, X-File-Name, Cache-Control, Location, Lock-Token, If
Access-Control-Expose-Headers: DAV, content-length, Allow
Optionally you can also attach the Access-Control-Max-Age header specifying the amount of seconds that the preflight request will be cached, this will reduce the amount of requests:
Access-Control-Max-Age: 3600
You could refer to this link about implementing CORS for a specific server.
If you don't want to use the way you using, you could refer to this article to bypass CORS:
The way we can bypass it is, rather calling the API from your browser to the other domain, call your own domain API (for example /api) and at nginx (or any other web server) level proxy it to the destination server.

S3 CORS policy for public bucket

It seems to be easy, but I don't know what I am missing.
I have a public bucket with a js script that I fetch from my web site. I noticed that I don't send Origin header to S3, it is not required and everything works without any CORS configurations.
What's more, even after I manually added Origin header to that GET call and explicitly disallowed GET and my domain via:
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>http://www.nonexistingdomain.com</AllowedOrigin>
<AllowedMethod>POST</AllowedMethod>
<AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>
I can still get the content. What's going on here?
Ok, after a conversation with Quentin, I think I understand where I am misinterpreting how CORS should work.
In Java world, it's a very common practice to actually reject requests when Origin doesn't match. Here is another thread where it's mentioned.
If we take Spring as an example(which is de-facto standard in Java world), here is what happens when CORS filter is added:
String allowOrigin = checkOrigin(config, requestOrigin);
...
if (allowOrigin == null) {
logger.debug("Reject: '" + requestOrigin + "' origin is not allowed");
rejectRequest(response);
return false;
}
where:
/**
* Invoked when one of the CORS checks failed.
*/
protected void rejectRequest(ServerHttpResponse response) {
response.setStatusCode(HttpStatus.FORBIDDEN);
}
You can find the code here.
But to my surprise, it is not such a common practice with other stacks and server-side technologies. Another common approach would be to send whatever CORS configuration they have to the browser and leave the decision to it.
S3 is even more tricky: it only sends CORS response headers when the bucket CORS rules match the CORS-enabled request(a request qith Origin header). Otherwise, there would be no CORS response headers.
The Same Origin Policy is a feature enforced by browsers which prevents JavaScript running on one website from reading data from a different website. (This stops random websites using JavaScript to use your browser to skip past your corporate firewall and access your intranet or read your GMail with your cookies).
CORS lets a website relax the Same Origin Policy to allow other websites to read data from it that way.
CORS is not authentication/authorisation. Your public bucket is public.
You aren't using JavaScript to read data from your bucket, you are loading the JS directly from the bucket.
Let's break down the problem and try to understand the fundamentals of CORS.
What is Cross-Origin Request & CORS?
Cross-Origin Request: A request for a resource (like
an image or a font) outside of the origin is known as a cross-origin
request.
CORS is helpful when you are requesting for a protected resource from another origin.
Cross-Origin Request Sharing: A request for a protected resource (like an image or a font or an XHR request) outside of the origin is known as a cross-origin request.
Why do we need CORS when the resources can be protected by using authentication/authorization tokens?
CORS is the first line of defense. When both the client (e.g., browsers) and servers are CORS-aware, clients will allow only requests from the specific origins to the servers as per instructed by the servers.
By default, browsers are supposed to implement same-origin policy security mechanism as per the guidelines on building the browser. Almost all the modern browser implement same-origin policy that instructs the browsers to allow requests to the servers if the origin is the same.
Same-origin policy is a security mechanism of a browser, you can read more about it here. It is because of this feature of browsers, the browser blocks all the request when the designation origin and the source origin are different. (Servers are not even aware that this is happening, Wow!)
For simpler use cases, when the assets(js, CSS, images, fonts), XHR resources are accessible with the same origin, there is no need to worry about CORS.
If assets are hosted on another origin or XHR resource are hosted on servers with a different domain that the source then browsers will not deny the request to cross-origin by default. Only with appropriate CORS request and response headers, browsers are allowed to make cross-origin requests.
Let's look at the request and response headers.
Request headers
Origin
Access-Control-Request-Method
Access-Control-Request-Headers
Response headers
Access-Control-Allow-Origin
Access-Control-Allow-Credentials
Access-Control-Expose-Headers
Access-Control-Max-Age
Access-Control-Allow-Methods Access-Control-Allow-Headers
For setting up CORS the Origin, and Access-Control-Allow-Origin headers are needed. Browsers will automatically add Origin header to every request, so a developer needs to configure only Access-Control-Allow-Origin response header.
For protecting access to resources only from specific domains, S3 provides an option to configure CORS rules. If the value of Access-Control-Allow-Origin header is * all the cross-origin requests are allowed, or else define a comma-separated list of domains.
There are a couple of things you need to be aware of when using CORS.
It's the first level of defense to a protected resource and not the ultimate defense.
You still need to implement appropriate authentication & authorization for the resource to perform CRUD operations on the server.
Implementing Same Origin Policy is a guideline for building the browser and are not mandatory.
CORS headers are useful only when clients accept the headers. Only modern browsers accept CORS headers. If you are not using browsers to make the resource request, then CROS do not apply.
If you type the link in the address bar of the browser, the CORS rules are not applied to because the browser does not send the Origin header to the server. The Origin header is sent by the browser only on the subsequent resource request (stylesheets, js files, fonts) and XHR requests by the origin.
If you access the resource file by directly typing the link in the address bar, the browser does not send Origin header to that request.
Also, if you want to restrict GET access use S3 pre-signed URL on a private bucket.

How to enable safari browser caching for urls which are 302 redirects

I have a single-page application which is dependent on a javascript bundle to work. For fetching this bundle's CDN (cloudfront) url, I'm making a call to an AWS API Gateway endpoint which returns a HTTP 302 response having the Location header parameter as the CDN url. Now this CDN Url responds with cache-control headers having a sufficiently large max-age value. All the other browsers like Chrome, Firefox seem to honor this and cache the CDN Url response for further requests. But Safari isn't doing so (Version - 12). However, it does cache the response when I'm making the request to the CDN Url directly. Do I need to add some more headers or some additional metadata in the 302 response to make it work for safari?
I tried fiddling with the cache-control parameters like adding 'immutable' but nothing worked. I googled quite a lot about this issue but nothing concrete turned up.
I expected Safari to work with just the max-age parameter present in CDN's response, but it never caches it.

AWS CloudFront Leverage browser caching not working

I am trying to set following Origin Custom Headers
Header Name: Cache-Control
Value: max-age=31536000
But it is giving com.amazonaws.services.cloudfront.model.InvalidArgumentException: The parameter HeaderName : Cache-Control is not allowed. (Service: AmazonCloudFront; Status Code: 400; Error Code: InvalidArgument; error.
I tried multiple ways along with setting the Minimum TTL, Default TTL, and Maximum TTL, but no help.
I assume you are trying to get good ratings on gtmetrix page score by leveraging browser caching! If you are serving content from S3 through cloudfront, then you need to add the following headers to objects in S3 while uploading files to S3.
Expires: {some future date}
Bonus: You do not need to specify this header for every object individually. You can upload a bunch of files together on S3, click next, and then on the screen that asks S3 storage class, scroll down and add these headers. And don't forget to click save!

Does Amazon pass custom headers to origin?

I am using CloudFront to front requests to our service hosted outside of amazon. The service is protected and we expect an "Authorization" header to be passed by the applications invoking our service.
We have tried invoking our service from Cloud Front but looks like the header is getting dropped by cloud front. Hence the service rejects the request and client gets 401 forbidden response.
For some static requests, which do not need authorization, we are not getting any error and are getting proper response from CloudFront.
I have gone through CloudFront documentation and there is no specific information available on how headers are handled and hence was hoping that they will be passed as is, but looks like thats not the case. Any guidance from you folks?
The list of the headers CF drops or modifies can be found here
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#RequestCustomRemovedHeaders
CloudFront does drop Authorization headers by default and will not pass it to the origin.
If you would like certain headers to be sent to the origin, you can setup a whitelist of headers under CloudFront->Behavior Settings->Forward headers. Just select the headers that you would like to be forwarded and CloudFront will do the job for you. I have tested it this way for one of our location based services and it works like a charm.
One thing that I need to verify is if the Authorization header will be included in the cache key and if its safe to do that?? That is something you might want to watch out for as well.
It makes sense CF drops the Authorization header, just imagine 2 users asking for the same object, the first one will grant access, CF will cache the object, then the second user will get the object as it was previously cached by CloudFront.
Great news are using forward headers you can forward the Authorization header to the origin, that means the object will be cached more than once as the header value is part of the cache "key"
For exmple user A GETS private/index.html
Authorization: XXXXXXXXXXXXX
The object will be cached as private/index.html + XXXXXXXXXXXXX (this is the key to cahce the object in CF)
Now when the new request from a diferent user arrives to CloudFront
GET private/index.html
Authorization: YYYYYYYYYYYY
The object will be passed to the origin as the combinaiton of private/index.html + YYYYYYYYYYYY is not in CF cache.
Then Cf will be cached 2 diferent objects with the same name (but diferent hash combinaiton name).
In addition to specifying them under the Origin Behaviour section, you can also add custom headers to your origin configuration. In the AWS documentation for CloudFront custom headers:
If the header names and values that you specify are not already present in the viewer request, CloudFront adds them. If a header is present, CloudFront overwrites the header value before forwarding the request to the origin.
The benefit of this is that you can then use an All/wildcard setting for whitelisting your headers in the behaviour section.
It sounds like you are trying to serve up dynamic content from CloudFront (at least in the sense that the content is different for authenticated vs unauthenticated users) which is not really what it is designed to do.
CloudFront is a Content Distribution Network (CDN) for caching content at distributed edge servers so that the data is served near your clients rather than hitting your server each time.
You can configure CloudFront to cache pages for a short time if it changes regularly and there are some use cases where this is worthwhile (e.g. a high volume web site where you want to "micro cache" to reduce server load) but it doesn't sound like this is the way you are trying to use it.
In the case you describe:
The user will hit CloudFront for the page.
It won't be in the cache so CloudFront will try to pull a copy from the origin server.
The origin server will reply with a 401 so CloudFront will not cache it.
Even if this worked and headers were passed back and forth in some way, there is is simply no point in using CloudFront if every page is going to hit your server anyway; you would just make the page slower because of the extra round trip to your server.