Does Amazon pass custom headers to origin? - http-headers

I am using CloudFront to front requests to our service hosted outside of amazon. The service is protected and we expect an "Authorization" header to be passed by the applications invoking our service.
We have tried invoking our service from Cloud Front but looks like the header is getting dropped by cloud front. Hence the service rejects the request and client gets 401 forbidden response.
For some static requests, which do not need authorization, we are not getting any error and are getting proper response from CloudFront.
I have gone through CloudFront documentation and there is no specific information available on how headers are handled and hence was hoping that they will be passed as is, but looks like thats not the case. Any guidance from you folks?

The list of the headers CF drops or modifies can be found here
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#RequestCustomRemovedHeaders

CloudFront does drop Authorization headers by default and will not pass it to the origin.
If you would like certain headers to be sent to the origin, you can setup a whitelist of headers under CloudFront->Behavior Settings->Forward headers. Just select the headers that you would like to be forwarded and CloudFront will do the job for you. I have tested it this way for one of our location based services and it works like a charm.
One thing that I need to verify is if the Authorization header will be included in the cache key and if its safe to do that?? That is something you might want to watch out for as well.

It makes sense CF drops the Authorization header, just imagine 2 users asking for the same object, the first one will grant access, CF will cache the object, then the second user will get the object as it was previously cached by CloudFront.
Great news are using forward headers you can forward the Authorization header to the origin, that means the object will be cached more than once as the header value is part of the cache "key"
For exmple user A GETS private/index.html
Authorization: XXXXXXXXXXXXX
The object will be cached as private/index.html + XXXXXXXXXXXXX (this is the key to cahce the object in CF)
Now when the new request from a diferent user arrives to CloudFront
GET private/index.html
Authorization: YYYYYYYYYYYY
The object will be passed to the origin as the combinaiton of private/index.html + YYYYYYYYYYYY is not in CF cache.
Then Cf will be cached 2 diferent objects with the same name (but diferent hash combinaiton name).

In addition to specifying them under the Origin Behaviour section, you can also add custom headers to your origin configuration. In the AWS documentation for CloudFront custom headers:
If the header names and values that you specify are not already present in the viewer request, CloudFront adds them. If a header is present, CloudFront overwrites the header value before forwarding the request to the origin.
The benefit of this is that you can then use an All/wildcard setting for whitelisting your headers in the behaviour section.

It sounds like you are trying to serve up dynamic content from CloudFront (at least in the sense that the content is different for authenticated vs unauthenticated users) which is not really what it is designed to do.
CloudFront is a Content Distribution Network (CDN) for caching content at distributed edge servers so that the data is served near your clients rather than hitting your server each time.
You can configure CloudFront to cache pages for a short time if it changes regularly and there are some use cases where this is worthwhile (e.g. a high volume web site where you want to "micro cache" to reduce server load) but it doesn't sound like this is the way you are trying to use it.
In the case you describe:
The user will hit CloudFront for the page.
It won't be in the cache so CloudFront will try to pull a copy from the origin server.
The origin server will reply with a 401 so CloudFront will not cache it.
Even if this worked and headers were passed back and forth in some way, there is is simply no point in using CloudFront if every page is going to hit your server anyway; you would just make the page slower because of the extra round trip to your server.

Related

Cloufdront Masks User Agent and Remote Address

We are running an SPA that communicates with an API. Both are exposed to the public via Cloudfront.
We now have the issue that the requests we see in the backend are masked by Cloudfront. Meaning:
The Remote Address we see is the address of the AWS Cloud
The User Agent Header field is set to "Amazon Cloudfront" and not the browser of the user
So Cloudfront somehow intercepts the request in a way we didn't anticipate.
I already went through these steps: https://aws.amazon.com/premiumsupport/knowledge-center/configure-cloudfront-to-forward-headers/ but ended up cutting the connection between the API and the frontend.
We don't care about caching implications (we don't have a lot of traffic), we just need to the right fields to show up in the backend.
By default, most request headers are removed, because CloudFront's default behaviors are generally designed around optimal caching. CloudFront's default header handling behavior is documented.
If you need to see specific headers at the origin, whitelist those headers for forwarding in the cache distribution. The documentation refers to this as “Selecting the Headers on Which You Want CloudFront to Base Caching” -- and that is what it does -- but that description masks what's actually happening. CloudFront removes the rest of the headers because it has no way of knowing for certain whether a specific header with a certain value might change the response that the origin generates. If it didn't remove these headers by default, there would be confusion in the other direction when users saw the "wrong" responses served from the cache.
In your case, you almost certainly don't want to include the Host header in what you are whitelisting for forwarding.
When testing, especially, be sure you also set the Error Caching Minimum TTL to 0, because the default value is 300 seconds... so you can't see whether the problem is fixed for up to 5 minutes after you fix it. This default is also by design, a protective measure to avoid overloading your origin with requests that are likely to continue to fail.
When examining responses from CloudFront, keep an eye on the Age response header, which is present any time the response is served from cache. It tells you how long it's been (in seconds) since CloudFront obtained the response it is currently returning to you.
If you want to disable CloudFront caching, you can set Maximum, Minimum, and Default TTL all to 0 (this only affects 2xx and 3xx HTTP responses -- errors are cached for a different time window, as noted above), or your origin can consistently return Cache-Control: s-maxage=0, which will prevent CloudFront from caching the response.

Cloudfront and Lambda#Edge - fetch from custom origin depending on user agent

I'm serving my JavaScript app (SPA) by uploading it to S3 (default root object index.html with Cache-Control:max-age=0, no-cache, pointing to fingerprinted js/css assets) and configuring it as an origin to a CloudFront distribution. My domain name, let's say SomeMusicPlatform.com has a CNAME entry in Route53 containing the distribution URL. This is working great and all is well cached.
Now I want to serve a prerendered HTML version for purposes of bots and social network crawlers. I have set up a server that responds with a pre-rendered version of the JavaScript app (SPA) at the domain prerendered.SomeMusicPlatform.com.
What I'm trying to do in the lambda function is to detect the user agent, identify bots and serve them the prerendered version from my custom server (and not the JavaScript contents from S3 as I would normally serve to regular browsers).
I thought I could achieve this by using a Lambda#Edge: Using an Origin-Request Trigger to Change From an Amazon S3 Origin to a Custom Origin function that switches the origin to my custom prerender server in case it identifies a crawler bot in response headers (or, in the testing phase, with a prerendered=true query parameter).
The problem is that the Origin-Request trigger with the Lambda#Edge function is not triggering because CloudFront still has Default Root Object index.html cached and tends to return the content from the cached edge. I get X-Cache:RefreshHit from cloudfront by using both SomeMusicPlatform.com/?prerendered=true and SomeMusicPlatform.com, even though there is a Cache-Control:max-age=0, no-cache on the Default Root Object - index.html.
How can I keep the well-cached serving and low latency of my JavaScript SPA with CloudFront and add serving content from my custom prerender server just for crawler bots?
The problem with caching (getting the same hit when using either mywebsite.com/?prerendered=true or mywebsite.com) was solved by adding prerendered to the query whitelist in the cloudfront distribution. This means that CloudFront now correctly maintains both normal and prerendered version of the website content, depending on presence of the parameter (without the parameter cached content from S3 origin is served, and with the parameter cached content from my custom origin specified in the lambda function is served).
This was enough for the testing phase - to ensure the mechanism is working correctly. Then I followed Michael's advice and added another lambda function in the Viewer Request trigger which adds a custom header Is-Bot in case a bot is detected in User-Agent. Again, whitelisting was needed, this time for the custom header (to maintain caches for both origins depending on the custom header). The other lambda function later in the Origin Request trigger then decides which origin to use, depending on the Is-Bot header.

How to serve HLS streams from S3 in secure way (authorized & authenticated)

Problem:
I am storing number of HLS streams in S3 with given file structure:
Video1
├──hls3
├──hlsv3-master.m3u8
├──media-1
├──media-2
├──media-3
├──media-4
├──media-5
├──hls4
├──hlsv4-master.m3u8
├──media-1
├──media-2
├──media-3
├──media-4
├──media-5
In my user API I know which exactly user has access to which video content
but I also need to ensure that video links are not sharable and only accessible
by users with right permissions.
Solutions:
1) Use signed / temp S3 urls for private S3 content. Whenever client wants to play some specific video it is
sending request to my API. If user has right permissions the API is generating signed url
and returning it back to client which is passing it to player.
The problem I see here is that real video content is stored in dozen of segments files in media-* directories
and I do not really see how can I protect all of them - would I need to sign each of the segment file urls separately?
2) S3 content is private. Video stream requests made by players are going through my API or separate reverse-proxy.
So whenever client decides to play specific video, API / reverse-proxy is getting the request, doing authentication & authorization
and passing the right content (master play list files & segments).
In this case I still need to make S3 content private and accessible only by my API / reverse-proxy. What should be the recommended way here?
S3 rest authentication via tokens?
3) Use encryption with protected key. In this case all of video segments are encrypted and publicly available. The key is also stored in S3
but is not publicly available. Every key request made by player is authenticated & authorized by my API / reverse-proxy.
These are 3 solutions I have in my mind right now. Not convinced on all of them. I am looking for something simple and bullet proof secure. Any recommendations / suggestions?
Used technology:
ffmpeg for video encoding to different bitrates
bento4 for video segmentation
would I need to sign each of the segment file urls separately?
If the player is requesting directly from S3, then yes. So that's probably not going to be the ideal approach.
One option is CloudFront in front of the bucket. CloudFront can be configured with an Origin Access Identity, which allows it to sign requests and send them to S3 so that it can fetch private S3 objects on behalf of an authorized user, and CloudFront supports both signed URLs (using a different algorithm than S3, with two important differences that I will explain below) or with signed cookies. Signed requests and cookies in CloudFront work very similarly to each other, with the important difference being that a cookie can be set once, then automatically used by the browser for each subsequent request, avoiding the need to sign individual URLs. (Aha.)
For both signed URLs and signed cookies in CloudFront, you get two additional features not easily done with S3 if you use a custom policy:
The policy associated with a CloudFront signature can allow a wildcard in the path, so you could authorize access to any file in, say /media/Video1/* until the time the signature expires. S3 signed URLs do not support wildcards in any form -- an S3 URL can only be valid for a single object.
As long as the CloudFront distribution is configured for IPv4 only, you can tie a signature to a specific client IP address, allowing only access with that signature from a single IP address (CloudFront now supports IPv6 as an optional feature, but it isn't currently compatible with this option). This is a bit aggressive and probably not desirable with a mobile user base, which will switch source addresses as they switch from provider network to Wi-Fi and back.
Signed URLs must still all be generated for all of the content links, but you can generate and sign a URL only once and then reuse the signature, just string-rewriting the URL for each file making that option computationally less expensive... but still cumbersome. Signed cookies, on the other hand, should "just work" for any matching object.
Of course, adding CloudFront should also improve performance through caching and Internet path shortening, since the request hops onto the managed AWS network closer to the browser than it typically will for requests direct to S3. When using CloudFront, requests from the browser are sent to whichever of 60+ global "edge locations" is assumed to be nearest the browser making the request. CloudFront can serve the same cached object to different users with different URLs or cookies, as long as the sigs or cookies are valid, of course.
To use CloudFront signed cookies, at least part of your application -- the part that sets the cookie -- needs to be "behind" the same CloudFront distribution that points to the bucket. This is done by declaring your application as an additional Origin for the distribution, and creating a Cache Behavior for a specific path pattern which, when requested, is forwarded by CloudFront to your application, which can then respond with the appropriate Set-Cookie: headers.
I am not affiliated with AWS, so don't mistake the following as a "pitch" -- just anticipating your next question: CloudFront + S3 is priced such that the cost difference compared to using S3 alone is usually negligible -- S3 doesn't charge you for bandwidth when objects are requested through CloudFront, and CloudFront's bandwidth charges are in some cases slightly lower than the charge for using S3 directly. While this seems counterintuitive, it makes sense that AWS would structure pricing in such a way as to encourage distribution of requests across its network rather than to focus them all against a single S3 region.
Note that no mechanism, either the one above or the one below is completely immune to unauthorized "sharing," since the authentication information is necessarily available to the browser, and thus to the user, depending on the user's expertise... but both approaches seem more than sufficient to keep honest users honest, which is all you can ever hope to do. Since signatures on signed URLs and cookies have expiration times, the duration of the share-ability is limited, and you can identify such patterns through CloudFront log analysis, and react accordingly. No matter what approach you take, don't forget the importance of staying on top of your logs.
The reverse proxy is also a good idea, probably easily implemented, and should perform quite acceptably with no additional data transport charges or throughput issues, if the EC2 machines running the proxy are in the same AWS region as the bucket, and the proxy is based on solid, efficient code like that found in Nginx or HAProxy.
You don't need to sign anything in this environment, because you can configure the bucket to allow the reverse proxy to access the private objects because it has a fixed IP address.
In the bucket policy, you do this by granting "anonymous" users the s3:getObject privilege, only if their source IPv4 address matches the IP address of one of the proxies. The proxy requests objects anonymously (no signing needed) from S3 on behalf of authorized users. This requires that you not be using an S3 VPC endpoint, but instead give the proxy an Elastic IP address or put it behind a NAT Gateway or NAT instance and have S3 trust the source IP of the NAT device. If you do use an S3 VPC endpoint, it should be possible to allow S3 to trust the request simply because it traversed the S3 VPC Endpoint, though I have not tested this. (S3 VPC Endpoints are optional; if you didn't explicitly configure one, then you don't have one, and probably don't need one).
Your third option seems weakest, if I understand it correctly. An authorized but malicious user gets the key an can share it all day long.

How can I hide a custom origin server from the public when using AWS CloudFront?

I am not sure if this exactly qualifies for StackOverflow, but since I need to do this programatically, and I figure lots of people on SO use CloudFront, I think it does... so here goes:
I want to hide public access to my custom origin server.
CloudFront pulls from the custom origin, however I cannot find documentation or any sort of example on preventing direct requests from users to my origin when proxied behind CloudFront unless my origin is S3... which isn't the case with a custom origin.
What technique can I use to identify/authenticate that a request is being proxied through CloudFront instead of being directly requested by the client?
The CloudFront documentation only covers this case when used with an S3 origin. The AWS forum post that lists CloudFront's IP addresses has a disclaimer that the list is not guaranteed to be current and should not be relied upon. See https://forums.aws.amazon.com/ann.jspa?annID=910
I assume that anyone using CloudFront has some sort of way to hide their custom origin from direct requests / crawlers. I would appreciate any sort of tip to get me started. Thanks.
I would suggest using something similar to facebook's robots.txt in order to prevent all crawlers from accessing all sensitive content in your website.
https://www.facebook.com/robots.txt (you may have to tweak it a bit)
After that, just point your app.. (eg. Rails) to be the custom origin server.
Now rewrite all the urls on your site to become absolute urls like :
https://d2d3cu3tt4cei5.cloudfront.net/hello.html
Basically all urls should point to your cloudfront distribution. Now if someone requests a file from https://d2d3cu3tt4cei5.cloudfront.net/hello.html and it does not have hello.html.. it can fetch it from your server (over an encrypted channel like https) and then serve it to the user.
so even if the user does a view source, they do not know your origin server... only know your cloudfront distribution.
more details on setting this up here:
http://blog.codeship.io/2012/05/18/Assets-Sprites-CDN.html
Create a custom CNAME that only CloudFront uses. On your own servers, block any request for static assets not coming from that CNAME.
For instance, if your site is http://abc.mydomain.net then set up a CNAME for http://xyz.mydomain.net that points to the exact same place and put that new domain in CloudFront as the origin pull server. Then, on requests, you can tell if it's from CloudFront or not and do whatever you want.
Downside is that this is security through obscurity. The client never sees the requests for http://xyzy.mydomain.net but that doesn't mean they won't have some way of figuring it out.
[I know this thread is old, but I'm answering it for people like me who see it months later.]
From what I've read and seen, CloudFront does not consistently identify itself in requests. But you can get around this problem by overriding robots.txt at the CloudFront distribution.
1) Create a new S3 bucket that only contains one file: robots.txt. That will be the robots.txt for your CloudFront domain.
2) Go to your distribution settings in the AWS Console and click Create Origin. Add the bucket.
3) Go to Behaviors and click Create Behavior:
Path Pattern: robots.txt
Origin: (your new bucket)
4) Set the robots.txt behavior at a higher precedence (lower number).
5) Go to invalidations and invalidate /robots.txt.
Now abc123.cloudfront.net/robots.txt will be served from the bucket and everything else will be served from your domain. You can choose to allow/disallow crawling at either level independently.
Another domain/subdomain will also work in place of a bucket, but why go to the trouble.

HttpWebRequest cookie with empty domain

I have an ASP.NET MVC action that sends a GET request to another server via HttpWebRequest. I'd like to include all cookies in the original action's request in the new request. Some of the System.Web.HttpCookies in the original request have empty domain values (i.e. ""), which apparently doesn't cause any issues. When I create a System.Net.Cookie using the name, value, path, and domain of each of these cookies and add it to the request's CookieContainer, I get this error:
"System.ArgumentException: The parameter '{0}' cannot be an empty string. Parameter name: cookie.Domain"
Here's some code that will throw the same error (when the cookie is added):
var request = (HttpWebRequest)WebRequest.Create("http://www.whatever.com");
request.Method = "GET";
request.CookieContainer = new CookieContainer();
request.CookieContainer.Add ( new Cookie ( "MyCookieName", "MyCookieValue", "/", "") );
EDIT
I sort of fixed this by using "localhost" for the domain, instead of the null or empty string value from the original HttpCookie. So, why does an empty domain not work for the CookieContainer? And does HttpCookie use an empty value to signify localhost, or do I need to find another fix for this problem?
Disclaimer:
As stated earlier by #feroze, setting your cookies' domain to localhost is not going to work out so well for you. I'm assuming you're writing a helper that allows you to tunnel HTTP requests out to foreign domains. Note that this is not best practice and in a lot of cases is not needed (i.e. jQuery has a lot of cool cross-domain support built-in, also see the new CORS specification). But sometimes you may be stuck doing this (i.e. the external resource is XML only, and is on a server that doesn't support CORS).
Background Information on Cookie Domains and How They Work:
If you haven't already take a look at HTTP Cookie: Domain and Path on Wikipedia -- pretty much everything you need to know is in there.
When evaluating a cookie, the Domain and Path are taken into account by both the client (the "local" requester) and the web server (the "foreign" responder). When a client requests a resource, the client should only send cookies where those cookies match the Domain (or a more generic parent domain) and Path (or a more generic parent path) of the URI being requested.
Web browsers handle this correctly. If a web browser has a cookie for the domain "localhost" and you're requesting "google.com", for example, those cookies for the "localhost" domain won't be sent in the request to "google.com". -- In fact, most modern browsers won't just not send them, they'll completely ignore them in Set-Cookie response headers that they receive (these are called third-party cookies -- enabling the acceptance of third party cookies in your web browser is a huge privacy/security concern -- don't do it!).
It works in the other direction as well -- even though it's unlikely for a client to include a third party cookie in a request, if it does, the foreign web server is supposed to ignore it (and even some cookies for correct domains/paths, so as to prevent the infamous super-cookie issue. (i.e. The web server hosting "example.com" should ignore cookies belonging to its parent domain: ".com", because ".com" is a "public suffix")).
What You Should Do [if you have to]:
The course of action I recommend for you, is when you read in your client's cookies (I'm not an MVC guy, but in regular ASP.NET this would be in Request.Cookies), loop through them (make sure to filter out your own site's legitimate cookies, especially SessionId, etc -- or use Path properly so they never get sent to this page in the first place), then add them one at a time to the outgoing request's cookie collection, rewriting the domain to be "www.whatever.com" (per your example -- if you're doing this dynamically, load the URL into a new Uri() object and use the .Host property), and then set the Path to "/". -- This will build the "Cookie" header for the outgoing request to the foreign web server.
When that request returns to your server, you then need to check it's incoming response for new cookies -- those cookies can be repackaged and sent back down to your client in much the same type of loop as I illustrated in the previous paragraph, except you'll want to rewrite Host to be Request.Url.Host -- and you'll want to set path back to "/" unless the path to your passthru page is static (I'm guessing it isn't since you're using MVC) then you'd want to set it to Request.Url.AbsolutePath for instance.
Happy Coding!
EDIT:
Also, you'll want to set the X-Forwarded-For tag of the outgoing request, so that the website you're calling doesn't think your web server is one single client that's been spamming the crap out of them.
Not sure it solves your problem. But to add cookies without the "Domain" property you must add to the headers the cookies using HttpRequestHeader.Cookie as follows.
request.Headers.Add(HttpRequestHeader.Cookie, "Your cookies...");
Hope it helps!
Some background
This occurs because CookieContainer is client-side container designed to be reused across multiple HttpWebRequest. Reusing it provides the expected cookie behavior that cookies set by the remote host are sent back with every subsequent HttpWebRequests targeted at the same host.
As a result of the reuse, a CookieContainer might actually contain cookies from multiple request and\or hosts.
So, in order to determine which of the cookies in the container need to be sent with a particular HttpWebRequest to some host (domain), CookieContainer examines the Domain and the Path property.
That's why a Cookie in a CookieContainer needs to have a valid Domain.
Conversely, on the server-side cookies are delivered via a different type, CookieCollection which a simple list of cookies with no extra logic.
Specifically, in your case, while copying cookies from the CookieCollection to the CookieContainer you need to set the Domain property of every cookie to the domain your are going to forward the request to, so that HttpWebRequest will know to include the cookies while sending the request.
You are trying to get cookies sent to localhost, right?
Why don't you do something like this where you give your own machine a real name:
Edit your hosts file and add a line "127.0.0.1 myname.com"
Test using myname.com - which is actually your localhost.
Your browser or app will not know the difference and send cookies to myname.com if that is where the cookie belongs.
Detailed info:
The Hosts file on windows is located at C:\Windows\System32\drivers\etc\hosts