Does ImageResizer support delivering fallback image when image corrupted? - imageresizer

We use ImageResizer images in a number of places, for example embedded images into pdf Reports, and sometimes users will upload corrupted files or pdf's with passwords. These currently thrown an exception and don't return any image.
Does ImageResizer have some way of returning an alternative fallback image instead? (Ideally with equivalent scaling, and returned immediately, without any 301/302 redirect). In a perfect world we could specify rules for mapping exceptions types to an appropriate fallback image.
We have no control client-side to handle this (e.g. in Telerik/Microsot reporting embedded images)

We currently only do this for 404. Masking 500 errors could be even more problematic.
You could copy/modify this plugin, though: https://imageresizing.net/docs/v4/plugins/image404

Related

ImageResizer DiskCache+AzureReader strange behaviour

I'm using ImageResizer + Diskcache plugin and I'm finding issues to get cache working properly. Either the images are cached forever (regardless how many times I upload a new image) or changing some settings I get the old image in some browsers/computers and the new one in others.
This is what I have now in my web.config:
<add name="AzureReader2" connectionString="blahblahblah" endpoint="http://blahblahblah.blob.core.windows.net/" prefix="~/" redirectToBlobIfUnmodified="false" requireImageExtension="false" checkForModifiedFiles="true" cacheMetadata="true"/>
and:
<diskcache dir="~/imagecache" autoclean="true" hashModifiedDate="true" subfolders="8192" asyncWrites="true" asyncBufferSize="10485760" cacheAccessTimeout="15000" logging="true" />
Not sure if is something I can achieve using the existing parameters. My goal is to invalidate the cache preferably when the new image has been uploaded without having to change the query string serving the image to get the new one.
I was thinking:
Maybe having a blob storage trigger that when a replacement image has
been uploaded, fires a webhook that deletes the cache for that image?
Or a web request to my imageresizer app to preload the new image in
cache so it replaces the old cached image???
I've seen some posts about using IVirtualFileWithModifiedDate but from what I understand that would have a big performance impact? Is probably 5% of our images request that will have someone uploading an image and expecting it to see it right away since most of the images barely change but it's really frustrating if the image doesn't show the new one not even a day after they have uploaded it!
If I can use IVirtualFileWithModifiedDate to invalidate the cache when the image has changed and not in every image request? Would that be possible?
I get the old image in some browsers/computers and the new one in others.
That different browsers are displaying different versions indicates that either browser caching or proxy/CDN caching is at fault.
ImageResizer's DiskCache hashes the modified date, so it is always as correct as the storage provider.
Regarding your expectations around server-side invalidation:
You're using checkForModifiedFiles="true" cacheMetadata="true", which means that Azure is queried for the latest modified date, but that metadata is cached with a sliding expiration window of 1 hour. I.e, if a URL hasn't been accessed in 1 hour, the next request will cause the modified date to be checked. See StandardMetadataCache.
You can change this behavior by implementing IMetadataCache yourself and assigning that cache to the .MetadataCache member of the storage provider you're using.

Override forcedownload behavior in Sitecore

We had a problem with some of our IE clients failing to download a PDF, even after clicking on the link. We found the answer here resolved our problems: set forcedownload=true for PDF mime types in web.config.
However, that created another problem: we are now unable to render a PDF in a browser when we want to. We used to do this with an iframe. However, as you can see, the PDF just downloads, and does not render in the browser.
I learned that the forcedownload=true setting is actually a default in a subsequent version of Sitecore (v7.2). So, I'm hesitant to revert that.
So, how do I render a PDF in a browser in this situation?
You can leave forceDownload=false on the PDF mime type and instead set the following setting to false:
<setting name="Media.EnableRangeRetrievalRequest" value="false"/>
I faced the same dilema a few months back with the same initial fix. Found out the actual issue last week, I wrote a blog post about it. (In fact, I wrote the answer you linked to, I've updated it with the same information now for future visitors)
The issue is basically a combination of Adobe Reader plugin for IE9, chunked transfer encoding and streaming the file directly from the database. I found if you close your browser and try again, or force refresh with Ctrl+F5 it worked fine. Once Sitecore had cached the file to disk it would continue to work for everyone.
The above setting disables chunked transfer encoding, instead sending the file down to the browser as a single piece. This setting was introduced in Sitecore 6.5+
This is one of the flaws in the MediaRequestHandler and in my opinion; the forceDownload option is pretty useless the way it is designed by default. (Why would ever want to configure this option on media extension only?)
You’ll have to basically turn off the forcedownload option again and replace the MediaRequestHandler with your own one. I usually end up with writing my own anyway because if other issues with the default handler, such as dealing properly with CDN’s etc.
In the ProcessRequest pipeline, you can determine if the item should be “downloaded” or not by setting the Content-Disposition header. You basically need to get rid of the default handling of forceDownload and set your headers based on your own logic.
Personally I prefer to set a query string parameter, such as ?dl=1, and base the Content-Disposition header on this. You could also extend the MediaItem template to contain a default behavior on each item or sub tree (leverage from Sitecore inheritance and standard values), and potentially you could thereby also define (override) a specific filename on each item for the attachment part in the Content-Disposition header.
When rendering the link, you can leverage from the properties collection (write a suitable extension method or similar), so that you can clearly mark your code that the link is meant for download, but still leverage from the built in field render methods. Thereby you eliminate the risk of messing up the page editor etc.
/ Mikael
You have to disable range retrieval request in web.config by setting its value to false.
<setting name="Media.EnableRangeRetrievalRequest" value="false" />
MediaRequestHandler enables Sitecore to download PDF content partially in range using HTTP 206 Status code. You can also overwrite MediaRequestHandler and write your own custom implementation to handle media request.

Asynchronous Pluggable Protocol for CID: (email), how to handle duplicate URLs

This is somewhat a duplicate of this question, but that question has no (valid) answer and is 1.5 years old so asking my own with hopes people have more info now.
If you are using multiple instances of a WebBrowser control, MSHTML, IHTMLDocument, or whatever... from inside the APP instance, mostly IInternetProtocol::Start, is there a way to know which instance is loading the resource? Or is there a way to use a different APP for each instance of the control, maybe by providing one via IDocHostUIHandler or ICustomDoc or otherwise? I'm currently using IInternetSession::RegisterNameSpace to make it process wide.
Optional reading below, don't feel you need to read it unless above isn't clear.
I'm working on a legacy (Win32 C++) email client that uses the MS ActiveX WebBrowser control (MSHTML or other names it goes by) to display HTML emails. It was saving everything to temp files, updating the cid: URLs, and then having the control load that. Now I want to do it the correct way, using APP. I've got it all working with some test code that just uses static variables/globals and loads one email.
My problem now is, the app might have several instances of the control all loading different emails (and other stuff) at the same time... not really multiple threads so much, just the asynchronous nature of the control. I can give each instance of the control a unique URL to load the email, say, cid:email-GUID, and then in my APP code I can use that URL to know which email to load. However, when it comes to loading any content inside the email, like attached images using src="cid:", those will not always be unique so I will not always know which image it is, for which email. I'd like to avoid having to modify the URLs of the HTML before displaying it (I'm doing that now for the temp file thing, but want to do it a better way).
IInternetBindInfo::GetBindString can return the referrer, BINDSTRING_XDR_ORIGIN, or the root URL, BINDSTRING_ROOTDOC_URL, but those require newer versions of IE and my legacy app must support older XP installs that might even have IE6 or IE7, so I'd rather not use these.
Tagged as TWebBrowser because that is actually what I'm using (Borland Builder 6 C++), but don't need answers specific to that platform.
As the Asynchronous Pluggable Protocol Handler us very low level, you cannot attach handlers individually to different rendering controls.
Here is a way to get the referrer:
Obtain BINDSTRING_HEADERS
Extract the referrer by parsing the line Referer: http://....
See also How can I add an extra http header using IHTTPNegotiate?
Here is another crazy way:
Create another Asynchronous Pluggable Protocol Handler by calling RegisterMimeFilter.
Monitor text/plain and text/html
Scan the incoming email source (content comes incrementally) and parse store all image links in a dictionary
In NameSpaceHandler you can use this dictionary to find the reference of any image resources.

Amazon S3 pre signed URLs using Amazon Java SDK and extra / characters

I've been creating Presigned HTTP PUT URLs and everything was working great until I wanted to start using "folders" in S3; I wanted the key to have the character '/'.
Now I get Signature doesn't match when I send the HTTP PUT requests due to the fact the '/' probably changes to %2F... If I escape the character before creating the presigned URL it works great, but then the Amazon console management doesn't understand it and shows it as one file instead of subfolders.
Any idea?
P.s.
The HTTP PUT requests are sent using C++ with POCO NET library.
EDIT
I'm using Poco HttpRequest from C++ to my Java web server to generate a signed url (returned on the response).
C++ then uses this url to put a file in s3 using Poco again.
The problem was that the urls returned from the web server were parsed through Poco URI objects that auto decoded the s3 object key thus changing it.With that in mind I was able to fix my problem.
Tricky - I'll try to approach this bottom up.
Disclaimer: I got carried away visually inspecting the Poco libraries instead of actually debugging a code sample, which should yield more reliable results much faster, see below ;)
Analysis
If I escape the character before creating the presigned URL it works
great, but then the Amazon console management doesn't understand it
and shows it as one file instead of subfolders.
The latter stems from S3 not having a concept of folders on the storage level actually, see e.g. section Index Documents and Folders within Index Document Support:
Objects stored in Amazon S3 are stored within a flat container, i.e.,
an Amazon S3 bucket, and it does not provide any hierarchical
organization, similar to a file system's. However, you can create a
logical hierarchy using object key names and use these names to infer
logical folders that contain these objects.
That's exactly what the AWS Management Console is doing here as well:
The AWS Management Console also supports the concept of folders, by
using the same key naming convention used in the preceding sample.
However, your test regarding the assumption of / being encoded as %2F proves, that this is indeed how Poco::Net is encoding the URL when performing the HTTP PUT request.
(I'm actually a bit surprised that the AWS Java SDK seems to generate different URLs here for / vs. %2F, insofar a recent analysis regarding Why is my S3 pre-signed request invalid when I set a response header override that contains a “+”? seems to indicate respective canonicalization by the AWS .NET SDK, see below for more on this.)
Potential Solution
In order for your scenario to work as desired, you'll need to figure out where the URL is encoded this way - I could think of two components in principle:
Poco::Net
Finding out why Poco::Net is encoding the URL different than S3 (if at all, see below) is best done by debugging your code, here's where I'd start:
Class HTTPRequest uses class URI in turn, which automatically performs a few normalizations on all URIs and URI parts passed to it, in particular percent-encoded characters are decoded. The other way round is handled by method encode(), which is where things get interesting and call for a breakpoint, see URI.cpp:
lines 575 ff. - here encode() does its magic, which indeed seems to be in place, insofar neither the code within the function nor the various chars passed in via the reserved parameter contain the offending / (see lines 47 ff. for the respective constants in use)
consequently you might want to set a breakpoint in this function and backtrace the callstack to find out which code is actually doing the encoding upfront, which might not yield an offender at all, see below.
Java => C++ transition
You haven't specified yet, which channel is actually used to communicate the pre-signed URL generated by the AWS Java SDK to C++ in turn. Given the code review (mind you, visual inspection only, I haven't debugged this myself yet) of the Poco::Net functionality yields the conclusion, that no obvious offender can be identified in the library itself, thus it seems more likely that it might already enter your C++ layer encoded (easily verified via debugging of course) - are you by chance using any kind of web service between these components for example?
Good luck!

Dynamic Image Caching

I have a CherryPy app that dynamically generates images, and those images are re-used a lot but generated each time. The image is generated from a querystring containing the variables, so the same query string will always return the same image (until I rewrite the generation code) and the images are not user-specific.
It occurred to me that I should be aggressively caching those images, but I have no idea how I would go about doing that. Should I be memoizing in CherryPy? Should I do something like this for Apache? Another layer entirely?
Your first attempt should be to use the HTTP cache that comes with CherryPy. See http://docs.cherrypy.org/dev/refman/lib/caching.html for an overview. From there, it's usually a "simple" matter of balancing computational expense versus RAM consumption via cache timeouts.