I'm making classic stateless RESTfull APIs on Symfony2: users/apps gets an authentication token on the authenticate API and give it to all others APIs to be logged and post data / access protected/private/personal data on others APIs.
I've got now three concerns regarding this workflow and caching:
How to use HTTP cache for my 'static' APIs (that always deliver the same content, regardless the logged user and its token) assuming that different tokens would be passed in the url by different users for the same API, so that the url would never be the same? How to use HTTP shared cache then?
I've got APIs for the same url that produce a different output, regarding the logged user rights (I've basically 4 different rights levels). Question is: is it a good pattern? It is not better to have 4 different urls, one for each right, that I could cache? If not, how to implement a proper cache on that?
Is shared HTTP Cache working on HTTPS? If not, which type of caching should I implement, and how?
Thanks for your answers and lights on that.
I have had a similar issue (with all 3 scenarios) and have used the following strategy successfully with Symfony's built-in reverse-proxy cache:
If using Apache, update .htaccess to add an environment variable for your application to the http cache off of (NOTE: environment automatically adds REDIRECT_ to the environment variable):
# Add `REDIRECT_CACHE` if API subdomain
RewriteCond %{HTTP_HOST} ^api\.
RewriteRule .* - [E=CACHE:1]
# Add `REDIRECT_CACHE` if API subfolder
RewriteRule ^api(.*)$ - [E=CACHE:1]
Add this to app.php after instantiating AppKernel:
// If environment instructs us to use cache, enable it
if (getenv('CACHE') || getenv('REDIRECT_CACHE')) {
require_once __DIR__.'/../app/AppCache.php';
$kernel = new AppCache($kernel);
}
For your "static" APIs, all you have to do is take your response object and modify it:
$response->setPublic();
$response->setSharedMaxAge(6 * 60 * 60);
Because you have a session, user or security token, Symfony effectively defaults to $response->setPrivate().
Regarding your second point, REST conventions (as well as reverse-proxy recommendations), GET & HEAD requests aren't meant to change between requests. Because of this, if content changes based on the logged in user, you should set the response to private & prevent caching at all for the reverse-proxy cache.
If caching is required for speed, it should be handled internally & not by the reverse-proxy.
Because we didn't want to introduce URLs based on each user role, we simply cached the response by role internally (using Redis) & returned it directly rather than letting the cache (mis)handle it.
As for your third point, because HTTP & HTTPS traffic are hitting the same cache & the responses are having public/private & cache-control settings explicitly set, the AppCache is serving the same response both secure & insecure traffic.
I hope this helps as much as it has for me!
Related
Lets say I have my website named SiteA.com running on an Apache web server. I have defined the ff. below on my httpd.conf file:
Header set Access-Control-Allow-Origin "CustomBank.com"
Questions:
Does this mean only CustomBank.com can access my site (SiteA.com) directly? or does it mean only my site (SiteA.com) can access the CustomBank.com domain directly? I am confused if this setting is for inbound or outbound.
In reality I don't have any CORS requirement needed for my site, so I didn't implement the setting mentioned above, the one below shows up in my response header.
Access-Control-Allow-Origin: *
Penetration Testing team said this setting is overly permissive. Do I just need to remove it? if not what should I do?
It means javascript loaded from CustomBank.com can make requests to your site (the site whose configuration has changed) via XMLHTPRequest in the background.
Since XMLHTTPRequest will send a users existing session cookie with your site, malicious scripts could do all kinds of nefarious/misleading things on behalf of your user. That's why * is not normally a suitable fix.
The restrictions apply to other script-like invocations that are more esoteric that you can read about in the specs.
I have a website (userbob.com) that normally serves all pages as https. However, I am trying to have one subdirectory (userbob.com/tools/) always serve content as http. Currently, it seems like Chrome's HSTS feature (which I don't understand how it works) is forcing my site's pages to load over https. I can go to chrome://net-internals/#hsts and delete my domain from Chrome's HSTS set, and the next query will work as I want without redirecting to an https version. However, if I try to load the page a second time, it ends up redirecting again. The only way I can get it to work is if I go to chrome://net-internals/#hsts and delete my domain from Chrome's HSTS set after each request. How do I let browsers know that I want all my pages from userbob.com/tools/ to load as http? My site uses an apache/tomcat web server.
(Just FYI, the reason I want the pages in the tools directory to serve pages over http instead of https is because some of them are meant to iframe http pages. If I try to iframe an http page from an https page I end up getting mixed-content errors.)
HTTP Strict Transport Security (or HSTS) is a setting your site can send to browsers which says "I only want to use HTTPS on my site - if someone tries to go to a HTTP link, automatically upgrade them to HTTPS before you send the request". It basically won't allow you to send any HTTP traffic, either accidentally or intentionally.
This is a security feature. HTTP traffic can be intercepted, read, altered and redirected to other domains. HTTPS-only websites should redirect HTTP traffic to HTTPS, but there are various security issues/attacks if any requests are still initially sent over HTTP so HSTS prevents this.
The way HSTS works is that your website sends a HTTP Header Strict-Transport-Security with a value of, for example, max-age=31536000; includeSubDomains on your HTTPS requests. The browser caches this and activates HSTS for 31536000 seconds (1 year), in this example. You can see this HTTP Header in your browsers web developer tools or by using a site like https://securityheaders.io . By using the chrome://net-internals/#hsts site you are able to clear that cache and allow HTTP traffic again. However as soon as you visit the site over HTTPS it will send the Header again and the browser will revert back to HTTPS-only.
So to permanently remove this setting you need to stop sending that Strict-Transport-Security Header. Find this in your Apache/Tomcat server and turn it off. Or better yet change it to max-age=0; includeSubDomains for a while first (which tells the browser to clear the cache after 0 seconds and so turns it off without having to visit chrome://net-internals/#hsts, as long as you visit the site over HTTPS to pick up this Header, and then remove the Header completely later.
Once you turn off HSTS you can revert back to having some pages on HTTPS and some on HTTP with standard redirects.
However it would be remiss of me to not warn you against going back to HTTP. HTTPS is the new standard and there is a general push to encourage all sites to move to HTTPS and penalise those that do not. Read his post for more information:
https://www.troyhunt.com/life-is-about-to-get-harder-for-websites-without-https/
While you are correct that you cannot frame HTTP content on a HTTPS page, you should consider if there is another way to address this problem. A single HTTP page on your site can cause security problems like leaking cookies (if they are not set up correctly). Plus frames are horrible and shouldn't be used anymore :-)
You can use rewrite rules to redirect https requests to http inside of subdirectory. Create an .htaccess file inside tools directory and add the following content:
RewriteEngine On
RewriteCond %{HTTPS} on
RewriteRule (.*) http://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
Make sure that apache mod_rewrite is enabled.
Basically any HTTP 301 response from an HTTPS request indicating a target redirect to HTTP should never be honored at all by any browser, those servers doing that are clearly violating basic security, or are severaly compromized.
However a 301 reply to an HTTPS request can still redirect to another HTTPS target (including on another domain, provided that other CORS requirements are met).
If you navigate an HTTPS link (or a javascript event handler) and the browser starts loading that HTTPS target which replies with 301 redirect to HTTP, the behavior of the browser should be like if it was a 500 server error, or a connection failure (DNS name not resolved, server not responding timeout).
Such server-side redirect are clearly invalid. And website admins should never do that ! If they want to close a service and inform HTTPS users that the service is hosted elsewhere and no longer secure, they MUST return a valid HTTPS response page with NO redirect at all, and this should really be a 4xx error page (most probably 404 PAGE NOT FOUND) and they should not redirect to another HTTPS service (e.g. a third-party hosted search engine or parking page) which does not respect CORS requirements, or sends false media-types (it is acceptable to not honor the requested language and display that page in another language).
Browsers that implement HSTS are perfectly correct and going to the right direction. But I really think that CORS specifications are a mess, just tweaked to still allow advertizing network to host and control themselves the ads they broadcast to other websites.
I strongly think that serious websites that still want to display ads (or any tracker for audience measurement) for valid reasons can host these ads/trackers themselves, on their own domain and in the same protocol: servers can still get themselves the ads content they want to broadcast by downloading/refreshing these ads themselves and maintaining their own local cache. They can track their audience themselves by collecting the data they need and want and filtering it on their own server if they want this data to be analysed by a third party: websites will have to seriously implement thelselves the privacy requirements.
I hate now those too many websites that, when visited, are being tracked by dozens of third parties, including very intrusive ones like Facebook and most advertizing networks, plus many very weak third party services that have very poor quality/security and send very bad content they never control (including fake ads, fake news, promoting illegal activities, illegal businesses, invalid age rating...).
Let's return to the origin of the web: one site, one domain, one third party. This does not mean that they cannot link to other third party sites, but these must done only with an explicit user action (tapping or clicking), and visitors MUST be able to kn ow wherre this will go to, or which content will be displayed.
This is even possible for inserting videos (e.g. Youtube) in news articles: the news website can host themselves a cache of static images for the frame and icons for the "play" button: when users click that icon, it will start activating the third party video, and in that case the thirf party will interact directly with that user and can collect other data. But the unactivated contents will be tracked only by the origin website, under their own published policy.
In my local development environment I use apache server. What worked for me was :
Open you config file in sites-availabe/yoursite.conf. then add the following line inside your virtualhost:
Header always set Strict-Transport-Security "max-age=0". Restart your server.
I want to use an Apache proxy server (mod_proxy) to intercept all requests and responses to a web server. However I want to change requests and responses before redirecting them. Simply rewriting URLs is easy and documented, but the changes I want to make are more sophisticated, namely they need to inspect the request for user credentials as well as conditionally make redirects.
Is this possible in Apache's mod_rewrite, possibly in combination with other modules?
While the main goal is to implement this in Apache, I would also be happy with an alternative solution which doesn't necessarily use Apache.
Here is a more precise explanation of what I want to achieve, to give a little more context:
Check each incoming request for user credentials. If credentials are present, they are replaced by the user information which the web server can use to identify the user (Ideally in the Authorization header)
For example, let's assume a request contains a cookie which authenticates the request as beeing sent from the user "John", this cookie is removed, and the Authorization header is changed to Authorization Authenticated_by_proxy {"id":12345,"name":"John"}
Check each answer to see if it's an Error 403. If this is the case and the user is not logged in, redirect the user to a login page instead of forwarding the error
We have an application runing on Weblogic 10.3, with authentication provided on the application itself. We want to put the Weblogic behind an Apache server. The idea is that we will have some public content on the Apache server, and the application will be accessed through the reverse proxy. That's pretty much very standard. The issue comes with the fact that there are some contents on the Apache server that can only be accesssed if the user has logged in the application. So basically the Apache server will server three type of contents, on diferent URIs:
/ -> Will contain the public information, and will be server by the Apache
/myApp - > Will be redirected by the Apache to the weblogic behind
/private - > Will contain the private static information. This should only be accessed if the user has previously logged successfully in myApp.
My question (I'm a total newbie with Apache) is if this possible. My idea is that the application can put a cookie on the responses indicating if the user has logged on the application, and that the Apache will check for that cookie when the user tries to access /private.
Any thoughts?
The / public information is no problem, it's straightforward. Using ProxyPass or ProxyPassMatch to reverse proxy "/myApp" to your internal Weblogic server is also straightforward. You may need to use a couple of other options to make sure proxy hostname and cookie domains are setup correctly. But setting up static protected infrormation in "/private" is going to be a little more tricky.
1) You can check the existence of the cookie set by myApp using mod_rewrite, something like this:
RewriteCond %{HTTP_COOKIE} !the_name_of_the_auth_cookie
RewriteRule ^private - [F,L]
The problem with checking a cookie through something like this is that there's no way to verify that the cookie is actually a valid session. People can arbitrarily create a cookie with that name and be able to access the data in /private.
2) You could set it up so that anything something in "/private" is accessed, the request is rewritten to a php script or something that can check the cookie to ensure that it's a valid session cookie, then serve the requested page. Something like:
RewriteRule ^private/(.*)$ /cookie_check.php?file=$1 [L]
So when someone accesses, for example, "/private/reports.pdf", it gets internally redirected to "/cookie_check.php?file=reports.pdf" and it's up to this php script to access whatever it needs to in order to validate the cookie that /myApp has setup. If the cookie is a valid session, then read the "reports.pdf" file and send it to the browser, otherwise return FORBIDDEN.
I think this is the preferable way of handling this.
3) If you can't run php or any other scripts, or the cookie cannot be verifed (like with a database lookup of session_id or something similar), then you'll have to proxy from within WebLogic. This would be more of less the same basic idea as having access to "/private" through "cookie_check.php" except it's an app on the WebLogic server. Just like /myApp, you'll need to setup a reverse proxy to access it, then this app will get the request (which has been internally rewritten from "/private/some_file") check the cookie's validity, read the "some_file" file ON THE APACHE SERVER, then send it to the browser, or send FORBIDDEN. This is the general idea:
ProxyPass /CheckCookie http://internal_server/check_cookie_app
RewriteCond %{REMOTE_HOST} !internal_server
RewriteRule ^private/(.*)$ /CheckCookie?file=$1 [L]
This condition reroutes all requests for "/private" that didn't originate from "internal_server" through the /CheckCookie app, and since the app is running on "internal_server" it can access the files in "/private" just fine. This is kind of a round-about way of doing this, but if the validity of session cookies issued by /myApp can only be checked on the WebLogic server, you'll have to reroute requests back and forth or something similar.
One of YSlow's measurables is to use cookie-free domains to serve static files.
"When the browser requests a static
image and sends cookies with the
request, the server ignores the
cookies. These cookies are unnecessary
network traffic. To workaround this
problem, make sure that static
components are requested with
cookie-free requests by creating a
subdomain and hosting them there." --
Yahoo YSlow
I interpret this to mean that I could experience performance gains if I move www.example.com/images to static.example.com/images.
Although this is easy to do, I would lose the handy ability within my content management system (Joomla/WordPress) to easily reference and link to these images.
Is it possible to use .htaccess to redirect all requests for a particular folder on www.example.com to a folder on static.example.com instead? Would this method also fool the CMS into thinking the images were located in the default locations on its own domain?
Is it possible to use .htaccess to redirect all requests
for a particular folder on www.example.com to a folder on
static.example.com instead?
Possible, but counter productive — the client would have to make an HTTP request, get the redirect response, then make another HTTP request.
This costs a lot more than the single line of cookie data saved!
Would this method also fool the CMS into thinking the images
were located in the default locations on its own domain?
No.
Although this is easy to do, I would
lose the handy ability within my
content management system
(Joomla/WordPress) to easily reference
and link to these images.
What you could try to do is create a plugin in Joomla that dinamically creates these references.
For example, you have a plugin that when you enter {dinamic_path path} in an article, it appends 'static.example.com/images' to the path provided. So, everytime you need to change the server path, you just change in the plugin. For the links that are already in the database, you can try to use phpMyAdmin to change them in this structure.
It still loses the WYSIWYG hability in TinyMCE, but is an alternative.
In theory you could create a virtual domain that points directly to the images folder, such as images.example.com. Then in your CMS (hopefully at the theme layer) you could replace any paths that point to the images folder with an absolute path to the subdomain.
The redirects would cause far more network traffic, and far more latency, than simply leaving things as they are.
It would redirect the request but the client would still be sending its cookies to the server, so really you accomplished nothing. You would have to directly access the files from a domain that isn't storing cookies for it to work.
What you really want to do is use staticexample.com/images instead of static.example.com/images so that you don't pick up any cookies on the example.com domain that you may have set. If all you do is server images from that domain with a simple apache server or something then you can configure that server not to return even a session cookie.
The redirects are a very bad idea. Cookies cause some performance hits but round trips to the server such as a redirect would cause are a much more serious performance issue.
I did below and gained success:
<FilesMatch "!\.(gif|jpe?g|png)$">
php_value session.cookie_domain example.com
</FilesMatch>
What it means is that if you do not set images in cookie information.
Then images are cookie-free with server.