Default script resolutions in Sling - apache

I don't know that I have the terminology completely correct, but it seems that there are some default behaviors for script resolution in Sling (which I'm using as part of Day CQ). For example, .infinity.json returns a representation of the node and its children. Also, if I have a piece of content that I normally would access with a .html extension, I've been able to use a .xml or .json extension to get a representation of that data. However, if I use a .txt extension, I get nothing back, even though as far as I can tell, I do have scripts that should match the request (eg GET.jsp). Are these behaviors documented somewhere?

GET.jsp will match a request ending with .html, as Sling considers html as the default representation. To activate a JSP script for the .txt rendering, you need to name it txt.jsp
See http://sling.apache.org/site/servlets.html for details.

Related

How to force dispatcher cache urls with get parameters

As I understood after reading these links:
How to find out what does dispatcher cache?
http://docs.adobe.com/docs/en/dispatcher.html
The Dispatcher always requests the document directly from the AEM instance in the following cases:
If the HTTP method is not GET. Other common methods are POST for form data and HEAD for the HTTP header.
If the request URI contains a question mark "?". This usually indicates a dynamic page, such as a search result, which does not need to be cached.
The file extension is missing. The web server needs the extension to determine the document type (the MIME-type).
The authentication header is set (this can be configured)
But I want to cache url with parameters.
If I once request myUrl/?p1=1&p2=2&p3=3
then next request to myUrl/?p1=1&p2=2&p3=3 must be served from dispatcher cache, but myUrl/?p1=1&p2=2&p3=3&newParam=newValue should served by CQ for the first time and from dispatcher cache for subsequent requests.
I think the config /ignoreUrlParams is what you are looking for. It can be used to white list the query parameters which are used to determine whether a page is cached / delivered from cache or not.
Check http://docs.adobe.com/docs/en/dispatcher/disp-config.html#Ignoring%20URL%20Parameters for details.
It's not possible to cache the requests that contain query string. Such calls are considered dynamic therefore it should not be expected to cache them.
On the other hand, if you are certain that such request should be cached cause your application/feature is query driven you can work on it this way.
Add Apache rewrite rule that will move the query string of given parameter to selector
(optional) Add a CQ filter that will recognize the selector and move it back to query string
The selector can be constructed in a way: key_value but that puts some constraints on what could be passed here.
You can do this with Apache rewrites BUT it would not be ideal practice. You'll be breaking the pattern that AEM uses.
Instead, use selectors and extensions. E.g. instead of server.com/mypage.html?somevalue=true, use:
server.com/mypage.myvalue-true.html
Most things you will need to do that would ever get cached will work this way just fine. If you give me more details about your requirements and what you are trying to achieve, I can help you perfect the solution.

Override forcedownload behavior in Sitecore

We had a problem with some of our IE clients failing to download a PDF, even after clicking on the link. We found the answer here resolved our problems: set forcedownload=true for PDF mime types in web.config.
However, that created another problem: we are now unable to render a PDF in a browser when we want to. We used to do this with an iframe. However, as you can see, the PDF just downloads, and does not render in the browser.
I learned that the forcedownload=true setting is actually a default in a subsequent version of Sitecore (v7.2). So, I'm hesitant to revert that.
So, how do I render a PDF in a browser in this situation?
You can leave forceDownload=false on the PDF mime type and instead set the following setting to false:
<setting name="Media.EnableRangeRetrievalRequest" value="false"/>
I faced the same dilema a few months back with the same initial fix. Found out the actual issue last week, I wrote a blog post about it. (In fact, I wrote the answer you linked to, I've updated it with the same information now for future visitors)
The issue is basically a combination of Adobe Reader plugin for IE9, chunked transfer encoding and streaming the file directly from the database. I found if you close your browser and try again, or force refresh with Ctrl+F5 it worked fine. Once Sitecore had cached the file to disk it would continue to work for everyone.
The above setting disables chunked transfer encoding, instead sending the file down to the browser as a single piece. This setting was introduced in Sitecore 6.5+
This is one of the flaws in the MediaRequestHandler and in my opinion; the forceDownload option is pretty useless the way it is designed by default. (Why would ever want to configure this option on media extension only?)
You’ll have to basically turn off the forcedownload option again and replace the MediaRequestHandler with your own one. I usually end up with writing my own anyway because if other issues with the default handler, such as dealing properly with CDN’s etc.
In the ProcessRequest pipeline, you can determine if the item should be “downloaded” or not by setting the Content-Disposition header. You basically need to get rid of the default handling of forceDownload and set your headers based on your own logic.
Personally I prefer to set a query string parameter, such as ?dl=1, and base the Content-Disposition header on this. You could also extend the MediaItem template to contain a default behavior on each item or sub tree (leverage from Sitecore inheritance and standard values), and potentially you could thereby also define (override) a specific filename on each item for the attachment part in the Content-Disposition header.
When rendering the link, you can leverage from the properties collection (write a suitable extension method or similar), so that you can clearly mark your code that the link is meant for download, but still leverage from the built in field render methods. Thereby you eliminate the risk of messing up the page editor etc.
/ Mikael
You have to disable range retrieval request in web.config by setting its value to false.
<setting name="Media.EnableRangeRetrievalRequest" value="false" />
MediaRequestHandler enables Sitecore to download PDF content partially in range using HTTP 206 Status code. You can also overwrite MediaRequestHandler and write your own custom implementation to handle media request.

Is meta charset required if AddDefaultCharset is set?

Is there a reason why I should keep <meta charset='utf-8'> in my html head when my .htaccess file already has AddDefaultCharset utf-8?
Just for serving files over the web it is rather redundant. Since people may want to save the page to a file and open it later without the context of a web server though, it's good practice to embed the information into the document itself using the meta tag.
This W3 document gives a great overview of the tradeoffs of each approach:
http://www.w3.org/International/questions/qa-html-encoding-declarations
You should definitely use HTTP header declarations if it is likely that the document will be transcoded (ie. the character encoding will be changed by intermediary servers), since HTTP declarations have higher precedence than in-document ones.
Otherwise you should use HTTP headers if it makes sense for any type of content, but in conjunction with an in-document declaration (see below). You should always ensure that HTTP declarations are consistent with the in-document declarations.
One specific example where <meta> tag may still be appropriate is when specific, user-contributed content may be of a different character set, but the users don't have access to modify your Apache server settings to control that themselves, therefore its beneficial to offer control of the charset within the document.
See the document for more in-depth details.

How to set http headers in dotCMS

I'm trying to create a XML data feed with dotCMS. I can easily output the correct XML document structure in a .dot "page", but the http headers sent to the client are still saying that my page contains "text/html". How can I change them to "text/xml" or "application/xml"?
Apparently there's no way to do it using the administration console. The only way I found is to add this line of (velocity) code
$response.setHeader("Content-Type", "application/xml")
to the top of the page template.
Your solution is the easiest. However there are other options that are a bit more work, but that would prevent you from having to use velocity to do the XML generation, which is more robust most of the time.
DotCMS uses xstream to generate XML files (and vise versa). You could write a generic plugin to use this as well.
An JSONContentServlet exists in dotCMS that takes a query and generates json or xml (depending on your parameters). It is not mapped on a servlet by default, but that is easy to add.

Amazon S3 pre signed URLs using Amazon Java SDK and extra / characters

I've been creating Presigned HTTP PUT URLs and everything was working great until I wanted to start using "folders" in S3; I wanted the key to have the character '/'.
Now I get Signature doesn't match when I send the HTTP PUT requests due to the fact the '/' probably changes to %2F... If I escape the character before creating the presigned URL it works great, but then the Amazon console management doesn't understand it and shows it as one file instead of subfolders.
Any idea?
P.s.
The HTTP PUT requests are sent using C++ with POCO NET library.
EDIT
I'm using Poco HttpRequest from C++ to my Java web server to generate a signed url (returned on the response).
C++ then uses this url to put a file in s3 using Poco again.
The problem was that the urls returned from the web server were parsed through Poco URI objects that auto decoded the s3 object key thus changing it.With that in mind I was able to fix my problem.
Tricky - I'll try to approach this bottom up.
Disclaimer: I got carried away visually inspecting the Poco libraries instead of actually debugging a code sample, which should yield more reliable results much faster, see below ;)
Analysis
If I escape the character before creating the presigned URL it works
great, but then the Amazon console management doesn't understand it
and shows it as one file instead of subfolders.
The latter stems from S3 not having a concept of folders on the storage level actually, see e.g. section Index Documents and Folders within Index Document Support:
Objects stored in Amazon S3 are stored within a flat container, i.e.,
an Amazon S3 bucket, and it does not provide any hierarchical
organization, similar to a file system's. However, you can create a
logical hierarchy using object key names and use these names to infer
logical folders that contain these objects.
That's exactly what the AWS Management Console is doing here as well:
The AWS Management Console also supports the concept of folders, by
using the same key naming convention used in the preceding sample.
However, your test regarding the assumption of / being encoded as %2F proves, that this is indeed how Poco::Net is encoding the URL when performing the HTTP PUT request.
(I'm actually a bit surprised that the AWS Java SDK seems to generate different URLs here for / vs. %2F, insofar a recent analysis regarding Why is my S3 pre-signed request invalid when I set a response header override that contains a “+”? seems to indicate respective canonicalization by the AWS .NET SDK, see below for more on this.)
Potential Solution
In order for your scenario to work as desired, you'll need to figure out where the URL is encoded this way - I could think of two components in principle:
Poco::Net
Finding out why Poco::Net is encoding the URL different than S3 (if at all, see below) is best done by debugging your code, here's where I'd start:
Class HTTPRequest uses class URI in turn, which automatically performs a few normalizations on all URIs and URI parts passed to it, in particular percent-encoded characters are decoded. The other way round is handled by method encode(), which is where things get interesting and call for a breakpoint, see URI.cpp:
lines 575 ff. - here encode() does its magic, which indeed seems to be in place, insofar neither the code within the function nor the various chars passed in via the reserved parameter contain the offending / (see lines 47 ff. for the respective constants in use)
consequently you might want to set a breakpoint in this function and backtrace the callstack to find out which code is actually doing the encoding upfront, which might not yield an offender at all, see below.
Java => C++ transition
You haven't specified yet, which channel is actually used to communicate the pre-signed URL generated by the AWS Java SDK to C++ in turn. Given the code review (mind you, visual inspection only, I haven't debugged this myself yet) of the Poco::Net functionality yields the conclusion, that no obvious offender can be identified in the library itself, thus it seems more likely that it might already enter your C++ layer encoded (easily verified via debugging of course) - are you by chance using any kind of web service between these components for example?
Good luck!