Expires cache for same script across multiple pages - http-headers

I have a script that is used across multiple pages on my site. I want to set the expires header so that browsers cache it and it doesn't get downloaded every time. That's ok and I understand how to do that, but I don't quite know how the browser works.
Does the browser cache it according to its path and then is it smart enough to know that any page requesting the script should use the cached version, or is there an association between the script and the page and therefore it would have to be cached against each page?

In the browser cache, there is no connection between the URL and the requesting page. Browser cache keys contain the path and sometimes the query string (see Is it the filename or the whole URL used as a key in browser caches?).
That's why Google recommends using their Libraries API: If every page that requires a specific version of jQuery pointed the browser to fetch the library from Google, the browser would fetch it only once for www.xyz.com and then re-use it from its cache for www.abc.com.

Related

PWA Caching Issue

I have a PWA which has been developed in ASP.net Core and hosted on an Azure App Service (Linux).
When a new version of the PWA was released, I found that devices failed to update without clearing the browser cache.
To resolve this, I discovered a tag helper called asp-append-version that will clear cache for a specific file. I also discovered that I can append the version of the src attribute that specifies the URL of a file to trigger the browser to retrieve the latest file. For example, src="/scripts/pwa.js?v=1". Each time I update the pwa.js file I would also change the version i.e. v=2.
I’ve now discovered that my PWA is caching other JavaScript files in my application which results in the app not working on devices that have been updated to the new version however failed to clear the cache on specific files.
I believed that if I didn’t specify any cache control headers such as Cache-Control that the browser would not cache any files however this appears not to be the case.
To resolve this issue, is the recommended approach to add the appropriate Cache-Control headers (Cache-Control, Pragma, and Expires) to prevent browser caching or should I only add the tag helper asp-append-version to for example scripts tags to auto clear cache for that specific file?
I would preferably like the browser to store for example images rather than going to the server each time to retrieve these. I believe setting the header Cache-Control: no-cache would work as this would check if the file has changed before retrieving the updated version?
Thanks.
Thanks # SteveSandersonMS for your insights, In your web server returns correct HTTP cache control headers, browsers will know not to re-use cached resources.
Refer here link 1 & link 2 for Cache control headers on Linux app service
For example, if you use the "ASP.NET Core hosted" version of the Blazor WebAssembly template, the server will return Cache-Control: no-cache headers which means the browser will always check with the server whether updated content is present (and this uses etags, so the server will return 304 meaning "keep using your cached content" if nothing has changed since the browser last updated its content).
If you use a different web server or service, you need to configure the web server to return correct caching headers. Blazor WebAssembly can't control or even influence that.
Refer here

Is it possible to have GitHub Readme images follow redirects?

I'm trying to add a test coverage badge to the Readme of a private repository on GitHub. Our continuous integration process saves out the image to a secured Google Cloud Storage bucket that's not accessible to the public, and should remain that way.
Google's authorization layer is smart enough that if I go to the URL for the image, I'm automatically redirected to the resource with a valid auto-generated signed URL.
E.g., if I go to http://storage.cloud.google.com/secret-files/mysecretfile.png, then if I'm logged in and allowed to view it, I'm automatically redirected to something like https://blahblah-apidata.googleusercontent.com/download/storage/v1/b/secret-files/o/mysecretfile.png?key=verylongkey, where I can load the image.
This seemed perfect. Reference the canonical path in the GitHub Readme, authenticated users see the image, unauthenticated users are still blocked, we don't have to make the file public, and we don't have to do anything complicated.
Except that GitHub is proxying the image request, meaning that it will always be unauthenticated. My browser is loading something like https://camo.githubusercontent.com/mysecretimage.png.
Is there a clever way to work around this? Or do I need to go back to the drawing board?
All images on github.com are proxied using the Camo image proxy. There are a couple reasons for this:
It preserves the privacy of users. It isn't possible for a document to track users by directing them to a different site or using cookies to track them.
It means images can be cached and served at an appropriate size.
GitHub can have a very strict content security policy that does not allow loading from untrusted sites, which means that any sort of accidental security problem (like an XSS) is a lot less likely to work.
Note the last part. Even if you found some sneaky way to get another image URL to render properly in the website, your browser wouldn't load it because it violates the Content-Security-Policy header the site sent, and moreover, your browser would tattle about that to the reporting URL that GitHub provided.
So any image URL you provide will need to be readable by GitHub's image proxy and it won't be possible to serve different content to different users.

Google AppEngine API Explorer redirects and lists no URLs

I'm having an unending issue trying to use the AppEngine API explorer with the stupidly simple helloworld example.
When trying to navigate to the url to explore the API my Chrome browser redirects to HTTPS from the default HTTP and no API's are listed. I have gone through every possible fix I can find (Like this, and all of these) and none are working reliably.
What's the most infuriating is I have gotten the API listed TWICE but now no longer displays with any of the methods below.
The setup I had when it worked the first time:
Chrome launched with "C:\Program Files (x86)\Google\Chrome\Application\Chrome.exe" --unsafely-treat-insecure-origin-as-secure=http://localhost:8080 (As per the tutorial)
The url being: (http://)apis-explorer.appspot.com/apis-explorer/?base=http://localhost:8080/_ah/api&root=http://localhost:8080/_ah/api#p/
The second time it worked was using also using the above URL but lasted only a second before being redirected to HTTPS and not listing anything.
Some specifics:
Windows 10 OS.
Every time the page loads I get the "The API you are exploring is hosted over HTTP, which can cause problems. Learn how to use Explorer with a local HTTP API." message, even the times the API displayed correctly.
Every time I now load any of the API Explorer URLs I get redirected to HTTPS, and nothing is listed. Also the URL is escaped (%3A instead of ':'). Not sure if it's important but the first time it worked the URL was HTTP and NOT escaped.
I have tried the shield in the search bar and enabling Load unsafe scripts ( from here).
Tried launching Chrome as usual and with the flags --unsafely-treat-insecure-origin-as-secure=http://localhost:8080 and/or --allow-running-insecure-content (from this answer).
Tried http://localhost:8080/_ah/api/explorer
Tried http://apis-explorer.appspot.com/apis-explorer/?base=http://localhost:8080/_ah/api#p/
http://localhost:8080/_ah/admin works correctly and shows the Admin console every time.
Since the API's being listed once I haven't touched the project code, but restarted the server, Chrome, and tried different URLs on more occasions than I care to count.
I also tried accessing the API URL directly as explained in this answer but cannot find the correct URL to access the helloworld /sayHi endpoint. Maybe someone can help me work out what I need to prefix it with as all of the variations I try give me a 404.
Any help would be a very very appreciated.

Who knows which files should be included in a website?

When the browser requests a website, any website from a HTTP server, which of the two parses the site's content in order to know which other files need to be included on the webpage?
What I mean is this:
the browser asks for the html file and then observers that it needs to import some external css files and HE is the one who requests them.
OR
the HTTP server when faced with a request for a website, parses (already knows) which sites need to be linked to a certain webpage and sends them alongside the html page?
I'm guessing the first case is the correct one, but if someone can confirm and maybe clarify it, I'd appreciate it.
It's all done by the client (which is usually a browser). When it sees <script>, <iframe>, <img>, <link>, etc. tags that reference other documents, it downloads them if necessary.
According to Wikipedia -
The primary function of a web server is to cater web page to the
request of clients using the Hypertext Transfer Protocol (HTTP). This
means delivery of HTML documents and any additional content that may
be included by a document, such as images, style sheets and scripts.
and
The primary purpose of a web browser is to bring information resources
to the user ("retrieval" or "fetching"), allowing them to view the
information ("display", "rendering"), and then access other
information ("navigation", "following links").
It is the Browser that parses the HTML and request for the associated contents.

Xenu Link checker

I want to use an application that checks for broken links. I got to know that, Xenu is one such software. I do not have access to internal aspx/http files on a drive. The Problem I am facing is the Website requires the user to be authenticated. After login I need to crawl the site to determine which links are broken.
As an example, I kick off with mail.google.com. We end up typing the Username and password after which we are served different URLs. If I give the Xenu (or similar programs) the link such as mail.google.com it will not be able to fecth URLs inside the mail.google.com which will be of type - /mail/u/0/?shva=1#inbox/ etc. There lies the problem.
With minimal or least scripting language how can I provide Xenu (or other similar app) capability to Login by providing external URL (mail.google.com) in this example in order to do whatever xenu has to do.
Thanks
Balaji S
Xenu can be used with an authenticated user as long as the cookies are persistent. You will need to enable cookies in Xenu and login once yourself using IE.
From their FAQ:
By default, cookies are disabled, and Xenu rejects all cookies. If you
need cookies because
you have used Internet Explorer to authenticate yourself before
starting a run
to prevent the server from delivering URLs with a
session ID
then you can enable the cookies in the advanced options
dialog. (This has been available since Version 1.2g)
Warning: You
should not use this option if you have links that delete data, e.g. a
database or a shop - you are risking data loss!!!
You can enable cookies in the Options menu. Click Preferences and switch to the Advanced tab.
For single page applications (like gmail) you will also need to configure Xenu to parse Javascript
This is done by modifying the ini file (traditionally at C:\Program Files (x86)\Xenu135\Xenu.ini) and adding a line of code under [Options]
Javascript=[Jj]ava[Ss]cript: *[_a-zA-Z0-9]+ *\( *['"]((/|ftp://|https?://)[^'"]+)['"]
There are several variations provided in their FAQ, but I didn't get them to work perfectly.