Is meta charset required if AddDefaultCharset is set? - apache

Is there a reason why I should keep <meta charset='utf-8'> in my html head when my .htaccess file already has AddDefaultCharset utf-8?

Just for serving files over the web it is rather redundant. Since people may want to save the page to a file and open it later without the context of a web server though, it's good practice to embed the information into the document itself using the meta tag.

This W3 document gives a great overview of the tradeoffs of each approach:
http://www.w3.org/International/questions/qa-html-encoding-declarations
You should definitely use HTTP header declarations if it is likely that the document will be transcoded (ie. the character encoding will be changed by intermediary servers), since HTTP declarations have higher precedence than in-document ones.
Otherwise you should use HTTP headers if it makes sense for any type of content, but in conjunction with an in-document declaration (see below). You should always ensure that HTTP declarations are consistent with the in-document declarations.
One specific example where <meta> tag may still be appropriate is when specific, user-contributed content may be of a different character set, but the users don't have access to modify your Apache server settings to control that themselves, therefore its beneficial to offer control of the charset within the document.
See the document for more in-depth details.

Related

Which of the following are valid URI's(Uniform Resource Identifier) as per the REST API specifications?

How to identify which of the following Uniform Resource Identifier(URI) is valid as per REST API specifications.
Choose one or more options
1. POST https://api.example.com/whales/create/9xf3df
2. PUT https://api.example.com/whales/9xf3df
3. GET https://api.example.com/whales/9xf3df?sort=name&valid=true
4. DELETE https://api.example.com/whales
REST doesn't care what spelling conventions you use for your resource identifiers; anything that conforms to the production rules defined by RFC 3986 is fine.
/whales/create/9xf3df
/whales/9xf3df
/whales/9xf3df?sort=name&valid=true
/whales
/0cc846bb-678d-45d8-9c06-d9cf94cee0a5
/9xf3df/whales
These are all fine identifiers.
Identifiers for a "REST API" are exactly like identifiers for web pages - you can use any spelling you want, and the browsers, caches, web crawlers, and so on will work with them quite happily; these general purpose components treat identifiers like identifiers - they don't try to extract any meaning from them.
By way of demonstration, please observe that all of the following work exactly the way you would expect them to:
https://www.merriam-webster.com/dictionary/create
https://www.merriam-webster.com/dictionary/get
https://www.merriam-webster.com/dictionary/put
https://www.merriam-webster.com/dictionary/post
https://www.merriam-webster.com/dictionary/patch
https://www.merriam-webster.com/dictionary/delete
Does REST care about the POST, PUT, GET, and DELETE for the above options?
Hard to be sure which question you are asking here.
PUT /dictionary/delete HTTP/1.1
That's a perfectly satisfactory request-line, and there is no ambiguity about what it means. In this example, PUT is the method-token; that tells the server that we are requesting that the representation of the target resource (identified by /dictionary/delete) be replaced by the representation include in the message-body of the request
For this specific resource, that probably means that the message-body is an HTML document (we'd see Content-Type: text/html in the headers, to ensure that the server knows how to correctly interpret the bytes provided).
PATCH /dictionary/delete HTTP/1.1
This is also a satisfactory request line; we are again requesting a change to the representation of the /dictionary/delete resource, but we're going about it in a slightly different way - instead of including a replacement representation in the message body, we're providing a representation of a list of changes to make (aka a "patch document").
Uniform interface means that we should expect the folks at www.merriam-webster.com to understand these messages exactly as we've described them here.
Now, for these specific resources, they probably don't want random stack-overflow members making changes to their website, so they are likely to respond 403 Forbidden or 405 Method Not Allowed.
All of the general purpose components will understand what that means, again because the standardized response meta-data is common to all resources.

Google cloud load balancer custom http header is missing

While using Google Cloud HTTPS Load Balancer we hit the following bug. Couldn't find any information on it.
We have a custom http header in our request:
X-<Company name>-abcde. If we are working directly against the server all is good, but once we are working through the load balancer, than our custom header is missing. We didn't find any reference in the documentation that there is a need to white list our headers or something like that.
Why my custom header is not being transferred to my backend server while working through Google Cloud Load Balancer? And how to make it work?
Thanks
Data
After a lot of testing, these are the results I've come up with:
The Google Cloud HTTPS Load Balancer does transfer custom HTTP headers to the backend service.
However, it changes them to lower-case.
So, in your case, X-Custom-Header is transformed to x-custom-header.
Solution
Simply change your code to read the lower-case version of your custom HTTP header. This is a simple fix, but one which may not be supported in the long-term by Google (there's not a word on this in Google's documentation so it's subject to change with no notice).
Petition Google to change this idiosyncratic behaviour or at the very least mention it clearly in their documentation.
A little extra
As far as I know, the RFC 2047 which specified the X- prefix for custom HTTP headers and propagated the pseudo-standard of a capital letter for each word has been deprecated and replaced by RFC 6648 which recommends against the X- prefix in general and mentions nothing regarding the rest of the words in the custom HTTP header key name. If I were Google, I would change this behaviour to pass custom HTTP headers as is and let developers deal with the strings as they've set them.
The RFC (RFC 7230) for HTTP/1.1 Message Syntax and Routing says that header fields have a case-insensitive field name. If you're relying on case to match the header that doesn't align with the RFC.
Way back in the day I looked through either the Tomcat of Jetty source and they worked with everything as a .toLower().
Go has a CanonicalMIMEHeaderKey where it'll format the headers in a common way to be sure everything is on the same page.
Python still harkens back to the RFC822 (hg.python.org/cpython/file/2.7/Lib/rfc822.py#l211) days, but it forces a .lower() on headers to standardize.
Basically though what the GCP HTTP(S) Load Balancer is doing is acceptable as far as the RFC is concerned.
This is most likely an application bug.
As other answers have stated, HTTP header names are case insensitive. Ime, every time headers appear to be case sensitive, it is because there is a request wrapper somewhere in the application call stack.
Request wrappers like this are common (usually necessary) in Java servlet filters. It's a common, newbie mistake to use case-sensitive matching (e.g. a regular Java HashMap<String, T>()) for the header names in the wrapper.
That's where I would start looking for your bug.
A reasonable way to create a Java Map<String, T> that is both case insensitive and that doesn't modify the keys is to use new TreeMap<String, T>( String.CASE_INSENSITIVE_ORDER ).

Is there any benefit of processing request at Apache (using Server side directives) Vs AEM Publisher (using Sightly conditions)?

In my project, there are examples were components uses Server Side directives to say include/exclude components. And few places, proper sightly/JSP code is used for same purpose. For example:
<c:if test="${authorMode}">
<cq:include path="headerpar" resourceType="foundation/components/iparsys"/>
</c:if>
<c:if test="${not authorMode}">
<!--#include virtual="/content/myapp/${lang}/global/customer/header.html"--
</c:if>
From basics I understand Server Side directives gets resolved at Apache itself whereas Sightly/Handlebar codes is processed at Publisher. Is there a benefit of coding using directives? If yes, then pretty much lot of code can be moved to server side scripting since it reduces another layer and Apache is far far faster compared to publisher? Why does Adobe promote their scripts and not promote writing with server side directives?
SSI includes are resolved on apache while cq:include within CQ. Using SSI's require from you a bit more knowledge so you know what you're doing
You can benefit from using SSI includes in many ways (mainly performance-wise as you suggested) if you use it properly.
Assume you have a header and footer that is the same for every page but contains heavy logic (navigation, crx tree traversing etc). In this case, if you properly configure your dispatcher caching the header will actually get rendered only once (for the first page) and if user requests another page that has the same header it will simply be fetched from dispatcher cache
Assuming you have a mainly brochureware page but some components that are dynamic you can also cache parts of the page and deny caching for others - again you have to properly configure your dispatcher
Assuming you have user specific content you could also cache parts of pages and dynamically render only parts of those (that's what Sling Dynamic Include is for -> https://github.com/Cognifide/Sling-Dynamic-Include)
So I would definitely use SSI includes for specific fragments of pages that are reused all across your site or meant to differ for different users, like some headers, footers or sth like "Welcome user 'ABC'". I would though not do that with regular content as it will just be cached within the pages html.

Override forcedownload behavior in Sitecore

We had a problem with some of our IE clients failing to download a PDF, even after clicking on the link. We found the answer here resolved our problems: set forcedownload=true for PDF mime types in web.config.
However, that created another problem: we are now unable to render a PDF in a browser when we want to. We used to do this with an iframe. However, as you can see, the PDF just downloads, and does not render in the browser.
I learned that the forcedownload=true setting is actually a default in a subsequent version of Sitecore (v7.2). So, I'm hesitant to revert that.
So, how do I render a PDF in a browser in this situation?
You can leave forceDownload=false on the PDF mime type and instead set the following setting to false:
<setting name="Media.EnableRangeRetrievalRequest" value="false"/>
I faced the same dilema a few months back with the same initial fix. Found out the actual issue last week, I wrote a blog post about it. (In fact, I wrote the answer you linked to, I've updated it with the same information now for future visitors)
The issue is basically a combination of Adobe Reader plugin for IE9, chunked transfer encoding and streaming the file directly from the database. I found if you close your browser and try again, or force refresh with Ctrl+F5 it worked fine. Once Sitecore had cached the file to disk it would continue to work for everyone.
The above setting disables chunked transfer encoding, instead sending the file down to the browser as a single piece. This setting was introduced in Sitecore 6.5+
This is one of the flaws in the MediaRequestHandler and in my opinion; the forceDownload option is pretty useless the way it is designed by default. (Why would ever want to configure this option on media extension only?)
You’ll have to basically turn off the forcedownload option again and replace the MediaRequestHandler with your own one. I usually end up with writing my own anyway because if other issues with the default handler, such as dealing properly with CDN’s etc.
In the ProcessRequest pipeline, you can determine if the item should be “downloaded” or not by setting the Content-Disposition header. You basically need to get rid of the default handling of forceDownload and set your headers based on your own logic.
Personally I prefer to set a query string parameter, such as ?dl=1, and base the Content-Disposition header on this. You could also extend the MediaItem template to contain a default behavior on each item or sub tree (leverage from Sitecore inheritance and standard values), and potentially you could thereby also define (override) a specific filename on each item for the attachment part in the Content-Disposition header.
When rendering the link, you can leverage from the properties collection (write a suitable extension method or similar), so that you can clearly mark your code that the link is meant for download, but still leverage from the built in field render methods. Thereby you eliminate the risk of messing up the page editor etc.
/ Mikael
You have to disable range retrieval request in web.config by setting its value to false.
<setting name="Media.EnableRangeRetrievalRequest" value="false" />
MediaRequestHandler enables Sitecore to download PDF content partially in range using HTTP 206 Status code. You can also overwrite MediaRequestHandler and write your own custom implementation to handle media request.

Default script resolutions in Sling

I don't know that I have the terminology completely correct, but it seems that there are some default behaviors for script resolution in Sling (which I'm using as part of Day CQ). For example, .infinity.json returns a representation of the node and its children. Also, if I have a piece of content that I normally would access with a .html extension, I've been able to use a .xml or .json extension to get a representation of that data. However, if I use a .txt extension, I get nothing back, even though as far as I can tell, I do have scripts that should match the request (eg GET.jsp). Are these behaviors documented somewhere?
GET.jsp will match a request ending with .html, as Sling considers html as the default representation. To activate a JSP script for the .txt rendering, you need to name it txt.jsp
See http://sling.apache.org/site/servlets.html for details.