Dynamic Image Caching - optimization

I have a CherryPy app that dynamically generates images, and those images are re-used a lot but generated each time. The image is generated from a querystring containing the variables, so the same query string will always return the same image (until I rewrite the generation code) and the images are not user-specific.
It occurred to me that I should be aggressively caching those images, but I have no idea how I would go about doing that. Should I be memoizing in CherryPy? Should I do something like this for Apache? Another layer entirely?

Your first attempt should be to use the HTTP cache that comes with CherryPy. See http://docs.cherrypy.org/dev/refman/lib/caching.html for an overview. From there, it's usually a "simple" matter of balancing computational expense versus RAM consumption via cache timeouts.

Related

Can caching an API based on the hash of the whole URL be a potential threat?

I am adding caching to an API server. The implementation of the caching system of the framework I am using simply hashes the whole URL of the request and uses it as a cache key (as well as some other data like language, etc..).
But with this simple system I can add fake query parameters to the url with arbitrary values and my system will also cache those requests. For example:
GET https://example.com/apicall # Cached
GET https://example.com/apicall?fake=1 # Also cached
GET https://example.com/apicall?fake=2 # Also cached!
Is this something I should worry about? That people can very easily fill my cache with junk entries that aren't used? Or am I exaggerating the potential impact of this?

Does ImageResizer support delivering fallback image when image corrupted?

We use ImageResizer images in a number of places, for example embedded images into pdf Reports, and sometimes users will upload corrupted files or pdf's with passwords. These currently thrown an exception and don't return any image.
Does ImageResizer have some way of returning an alternative fallback image instead? (Ideally with equivalent scaling, and returned immediately, without any 301/302 redirect). In a perfect world we could specify rules for mapping exceptions types to an appropriate fallback image.
We have no control client-side to handle this (e.g. in Telerik/Microsot reporting embedded images)
We currently only do this for 404. Masking 500 errors could be even more problematic.
You could copy/modify this plugin, though: https://imageresizing.net/docs/v4/plugins/image404

ImageResizer DiskCache+AzureReader strange behaviour

I'm using ImageResizer + Diskcache plugin and I'm finding issues to get cache working properly. Either the images are cached forever (regardless how many times I upload a new image) or changing some settings I get the old image in some browsers/computers and the new one in others.
This is what I have now in my web.config:
<add name="AzureReader2" connectionString="blahblahblah" endpoint="http://blahblahblah.blob.core.windows.net/" prefix="~/" redirectToBlobIfUnmodified="false" requireImageExtension="false" checkForModifiedFiles="true" cacheMetadata="true"/>
and:
<diskcache dir="~/imagecache" autoclean="true" hashModifiedDate="true" subfolders="8192" asyncWrites="true" asyncBufferSize="10485760" cacheAccessTimeout="15000" logging="true" />
Not sure if is something I can achieve using the existing parameters. My goal is to invalidate the cache preferably when the new image has been uploaded without having to change the query string serving the image to get the new one.
I was thinking:
Maybe having a blob storage trigger that when a replacement image has
been uploaded, fires a webhook that deletes the cache for that image?
Or a web request to my imageresizer app to preload the new image in
cache so it replaces the old cached image???
I've seen some posts about using IVirtualFileWithModifiedDate but from what I understand that would have a big performance impact? Is probably 5% of our images request that will have someone uploading an image and expecting it to see it right away since most of the images barely change but it's really frustrating if the image doesn't show the new one not even a day after they have uploaded it!
If I can use IVirtualFileWithModifiedDate to invalidate the cache when the image has changed and not in every image request? Would that be possible?
I get the old image in some browsers/computers and the new one in others.
That different browsers are displaying different versions indicates that either browser caching or proxy/CDN caching is at fault.
ImageResizer's DiskCache hashes the modified date, so it is always as correct as the storage provider.
Regarding your expectations around server-side invalidation:
You're using checkForModifiedFiles="true" cacheMetadata="true", which means that Azure is queried for the latest modified date, but that metadata is cached with a sliding expiration window of 1 hour. I.e, if a URL hasn't been accessed in 1 hour, the next request will cause the modified date to be checked. See StandardMetadataCache.
You can change this behavior by implementing IMetadataCache yourself and assigning that cache to the .MetadataCache member of the storage provider you're using.

WebKitGTK about webkit_web_view_load_uri

I have a question about WebktGTK.
These days I am making a program which is can analysis web page if has suspicious web content.
When "load-failed" "load-changed" signal is emitted with WEBKIT_LOAD_FINISHED,
The program anlaysis the next page continuously by calling webkit_web_view_load_uri again again.
(http://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebView.html#webkit-web-view-load-uri)
The question want to ask you is memory problem.
The more the program analsysis the webpages, The more WebKitWebProcess is bigger.
webkit_back_forward_list_get_length() return value also increased by analysising web pages. Where shoud I free memory?
Do you know how Can I solve this problem or Could give me any advice where Can I get advice?
Thank you very much :-) Have a nice day ^^
In theory, what you're doing is perfectly fine, and you shouldn't need to change your code at all. In practice, WebKit has a lot of memory leaks, and programatically loading many new URIs in the same web view is eventually going to be problematic, as you've found.
My recommendation is to periodically, every so many page loads, create a new web view that uses a separate web process, and destroy the original web view. (That will also reset the back/forward list to stop it from growing, though I suspect the memory lost to the back/forward list is probably not significant compared to memory leaks when rendering the page.) I filed Bug 151203 - [GTK] Start a new web process when calling webkit_web_view_load functions? to consider having this happen automatically; your issue indicates we may need to bump the priority on that. In the meantime, you'll have to do it manually:
Before doing anything else in your application, set the process model to WEBKIT_PROCESS_MODEL_MULTIPLE_SECONDARY_PROCESSES using webkit_web_context_set_process_model(). (If you are not creating your own web contexts, you'll need to use the default web context webkit_web_context_get_default().)
Periodically destroy your web view with gtk_widget_destroy(), then create a new one using webkit_web_view_new() et. al. and attach it somewhere in your widget hierarchy. (Be sure NOT to use webkit_web_view_new_with_related_view() as that's how you get two web views to use the same web process.)
If you have trouble getting that solution to work, an extreme alternative would be to periodically send SIGTERM to your web process to get a new one. Connect to WebKitWebView::web-process-crashed, and call webkit_web_view_load_uri() from there. That will result in the same web view using a new web process.

datatables : If I decide to use lazy loading would search show incorrect data?

I have a large amount of text information which I'll be loading in a table.
I want the user to search through it, using datatables search.
I wasn't doing lazy loading earlier, but now I'm thinking of using it now.
However, if I lazy load and the user searches for data, he/she won't be able to see everything, since the data isn't completely loaded.
Am I guess this correctly or does datatables work around this somehow ?
There are two processing modes: client-side and server-side, see Processing modes for more information.
Client-side processing - the full data set is loaded up-front and searching/filtering/pagination is done in the browser.
Server-side processing - an Ajax request is made for every table redraw, with only the data required for each display returned. Searching/filtering/pagination is performed on the server.
There is also Scroller extension, it's a virtual rendering plug-in for DataTables which allows large datasets to be drawn on screen very quickly.
Along with server-side processing it could be used to perform lazy loading of large amount of data. When searching, the request will be made to the server to search the whole dataset and only subset needed to be displayed at the moment will be returned.
See Server-side processing for more information on request and response in server-side processing mode.