Why use a service worker for caching when browser cache handles the caching? - browser-cache

I read that using a service worker for offline caching is similar to browser caching. If so, then why would you prefer a service worker for this caching? Browser caching will check if the file is modified or not and then serve from the cache, and with a service worker we are handling the same thing from our code. By default, the browser has that feature so why prefer a service worker?

Service Workers give you complete control over network requests. You can return anything you want for the fetch event, it does not need to be the past or current contents of that particular file.
However, if the HTTP cache handles your needs, you are under no obligation to use Service Workers.
They are also used for things such as push notifications.
Documentation: https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API, https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers

I wanted to share the points that I observed while going through service worker documentation and implemented it.
Browser cache is different, as the service worker supports offline cache, the webapp will access the content that is cached, even though the network is not available.
Service worker will give native experience.
Service worker cannot modify DOM content but still it can serve the pages within its scope. With the help of events like postMessage, the page can be accessed and DOM can be changed.
Service worker do not require user interaction or webpage .
It runs in the background.

Actually, it's slower to response the request when you use sw instead of http cache...
Because sw use cache api to store the cache content, it's really slower than the browser cache--memory cache and disk cache.
It's not designed for faster than http cache, howerver, when you use sw, you can Fully customizable the response, I think the Fully customizable is the reason why you should use it.
If your situation is not complicated enough, you should not use it

Related

How to throttle requests to sites instead of to proxy server in scrapy?

I am using a proxy and have set AUTO_AUTOTHROTTLE_ENABLED to True. I had the impression that scrapy throttles the sites which I am crawling, instead it seems like scrapy throttles requests to proxy itself. How do I throttle requests to sites instead of proxy?
Update: I am manually setting proxy in meta while making each request, instead of using the proxy middleware.
I don't think this is possible to do solely from the spider side. By looking at the throttling algorithm and at the AutoThrottle extension source code, you can see that the delay being used is the time difference between sending a request and getting back a response. Everything that happens in between is added up to this delay (including the proxy delay).
To further verify this, consider the steps:
AutoThrottle is using latency information from the response, found
in the response.meta['download_latency] (see here)
The latency information ('download_latency') is set in the dedicated callback once the download is completed, by subtracting the start time from the current time (see here).
The start time is actually set just before the download agent is instructed to download the request, which means everything in between is added up to the final latency (see here).
If you want to actually throttle according to target latency through a proxy, this will have to be handled by the proxy itself. I suggest using some of the managed proxy pool solutions like Crawlera.

How to use Twisted Web static.File resource without cache?

I am using the Twisted Web static.File resource for the static part of the web server.
For development I would like to be able to add new file or to modify the current static files without having the restart the Twisted web server.
I am looking at the source code for static.File at getChild method and I can not see how the resources are cached:
http://twistedmatrix.com/trac/browser/tags/releases/twisted-11.0.0/twisted/web/static.py#L280
From my understanding getChild method is returning a new resource at each call.
Any help in creating a non-cached static.File resource is much appreciated.
Many thanks,
Adi
twisted.web.static.File serves straight from the filesystem. It does not have a cache. Your web browser probably has a cache, though.

Stop an in-progress query-string-authorized download on Amazon S3?

With Amazon S3, can I stop a query-string-authorized download that is in progress?
Are there other file download services that provide such a feature?
I'm not aware of a built in way to do this. If I understand your goal, you want to potentially stop an HTTP response mid-stream based on some custom rules you have. Is that right?
If so, perhaps you could write a very thin proxy to S3 that encapsulates this logic. If you ran the proxy on EC2 you wouldn't incur any additional bandwidth fees.
The downside is that you would have manage scaling the proxy (i.e. add more EC2 nodes based on traffic) so depending on your scaling requirements, this could require a bit of work. But the proxy script itself would probably be fairly trivial. Something like:
Make streaming HTTP request to S3 for object
for each x byte chunk in response from S3:
Check auth condition. Continue if valid. Break if not.
Send chunk to caller
I'm not aware of anyone that allows this. In general, the authentication is only checked once, when you begin downloading, but not thereafter.
Can you describe what you're trying to do more broadly?

How do I cache WCF REST web service in IIS7?

When I turn on output caching for my service it doesn't appear to be cache-worthy in IIS. It really should be since I'm returning the same JSON content over and over. The varyByQueryString option seems like it would do the trick, but since my resources are URI based, there really isn't a query string, just a path to a resource. Has anyone successfully gotten IIS to output cache a WCF REST service?
After much digging using the FREB logs in IIS, my service is in fact cache-worthy. You can listen to the Cache events in IIS and it will show you exactly what is and is not caching. I found this more helpful that using PerfMon. I used the following link to set it up. Output caching will work and will in fact serve your content right out of memory after things get warmed up.

Are there alternatives to CGI (and do I really need one)?

I am designing an application that is going to consist of 3-4 services that run as separate processes and are linked by a suitable IPC. The system is going to have a web interface and I want to use whatever webserver is there.
The web interface should be accessed under some URL that allows to have other URLs on the same webserver doing totally different things. I'm planning to use the path below that URL to specify what the web interface should do. It has facilities for use by other applications over the net and for humans to interact with in a browser.
Off the cuff, I'd work as follows:
make the webserver fire up a CGI process for every request it receives (like SetHandler in Apache)
let the CGI connect to the IPC
let it get whatever it needs from the backend services
let the CGI return HTML / XML and whatever HTTP Status based on the services' answers
Now, what I really want is to avoid the first two steps, or if I can't, avoid the second one, because I'm afraid that I'm wasting performance on unneccesary overhead (the requests coming from other applications might be frequent).
PHP, for example, can open persistent connections to a MySQL database that survive the script's runtime and don't need to be recreated next time, though I don't know how they actually do it. Also, as I understand it, the Apache modules are loaded once when the server starts, so that might remove the first step but would tie me to Apache.
So, what are good ways to hook a handler for specific URLs into different webservers? I don't want to handle the HTTP, otherwise I might just use a proxy setup to a second server, but it just seems to be so reinventing-the-wheel. If you think, CGI is fine and have examples where it handles large numbers of request of a similar structure, please let me know.
OK, I overlooked this previously. Explaining my question here brought me onto it:
Instead of creating a new process for every request, FastCGI can use a single persistent process which handles many requests over its lifetime. -- Wikipedia: FastCGI
Even under moderate loads, CGI is a pretty unscalable beast. FastCGI is an option, but you'll probably also find a mod_XXXX package where XXXX is the name of your language. There's a mod for ruby, perl, and python for instance and probably a fair few others.