How to force dispatcher cache urls with get parameters - apache

As I understood after reading these links:
How to find out what does dispatcher cache?
http://docs.adobe.com/docs/en/dispatcher.html
The Dispatcher always requests the document directly from the AEM instance in the following cases:
If the HTTP method is not GET. Other common methods are POST for form data and HEAD for the HTTP header.
If the request URI contains a question mark "?". This usually indicates a dynamic page, such as a search result, which does not need to be cached.
The file extension is missing. The web server needs the extension to determine the document type (the MIME-type).
The authentication header is set (this can be configured)
But I want to cache url with parameters.
If I once request myUrl/?p1=1&p2=2&p3=3
then next request to myUrl/?p1=1&p2=2&p3=3 must be served from dispatcher cache, but myUrl/?p1=1&p2=2&p3=3&newParam=newValue should served by CQ for the first time and from dispatcher cache for subsequent requests.

I think the config /ignoreUrlParams is what you are looking for. It can be used to white list the query parameters which are used to determine whether a page is cached / delivered from cache or not.
Check http://docs.adobe.com/docs/en/dispatcher/disp-config.html#Ignoring%20URL%20Parameters for details.

It's not possible to cache the requests that contain query string. Such calls are considered dynamic therefore it should not be expected to cache them.
On the other hand, if you are certain that such request should be cached cause your application/feature is query driven you can work on it this way.
Add Apache rewrite rule that will move the query string of given parameter to selector
(optional) Add a CQ filter that will recognize the selector and move it back to query string
The selector can be constructed in a way: key_value but that puts some constraints on what could be passed here.

You can do this with Apache rewrites BUT it would not be ideal practice. You'll be breaking the pattern that AEM uses.
Instead, use selectors and extensions. E.g. instead of server.com/mypage.html?somevalue=true, use:
server.com/mypage.myvalue-true.html
Most things you will need to do that would ever get cached will work this way just fine. If you give me more details about your requirements and what you are trying to achieve, I can help you perfect the solution.

Related

Request URI too long on spartacus services

I've been trying to make use of service.getNavigation() method, but apparently the Request URI is too long which causes this error:
Request-URI Too Long
The requested URL's length exceeds the capacity limit for this server.
Is there a spartacus config that can resolve this issue?
Or is this supposed to be handled in the cloud (ccv2) config?
Not sure which service are you talking about specifically and what data are you passing there. For starters, please read this: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/414
Additionally it would benefit everyone if you could say something about the service you're using and the data you are trying to pass/get.
The navigation component is firing a request for all componentIds. If you have a navigation with a lot of (root?) elements, the maximum length of HTTP GET request might be too long for the given client or server.
The initial implementation of loading components was actually done by a POST request, but the impression was that we would not need to support requests with so many components. I guess we were wrong.
Luckily, the legacy POST based request is still in the code base, it's OccCmsComponentAdapter.findComponentsByIdsLegacy.
The easiest way for you to use this code, is to provide a CustomOccCmsComponentAdapter, that extends from OccCmsComponentAdapter. Then you can override the findComponentsByIds method and simply call the super.findComponentsByIdsLegacy and pass in a copy of the arguments.
A more cleaner way would be to override the CmsComponentConnector and directly delegate the load to the adapter.findComponentsByIdsLegacy. I would not start here, as it's more complicated. Do a POC with the first suggested approach.

Changing page content with custom scheme in WebKitGTK1

I have an app using the WebKitGTK1 API with WebKit-GTK 2.4.9 on Linux. (This is the current version in Debian Jessie and versions 2.5+ don't support the v1 API.)
I've implemented a custom URI scheme for loading entire basic page content by using a resource-request-starting handler, which parses the incoming URI via webkit_web_resource_get_uri and if it matches the custom scheme, generates some HTML content and calls webkit_network_request_set_uri to replace the original URI with a base64'd data: URI containing the content to render. (This is similar to the accepted answer of this question.)
This mostly works well, and my handler is called on each request (including repeated requests with the same original URI) and generates the correct content -- but somewhere upstream the browser appears to render only the first returned data for any given original URI, even if the data URI I generate is different.
Possibly of note is that webkit_web_resource_get_uri returns the original non-data: URI even after calling webkit_network_request_set_uri, so I assume this URI is being cached, and in turn is then being used as a key in some higher-level component to cache the data instead of using the real URI from the request.
Unfortunately this appears to be a G_PARAM_CONSTRUCT_ONLY property and there doesn't appear to be any public API to set and/or clear it so that it uses the rewritten URI of the request instead. Is there some way to force GTK to set the property after construction anyway? As far as I can tell it does have a setter method internally, and the getter would do the Right Thing™ if the internal property were reset to NULL.
Or is there some better method to force WebKit to render the new data: URI despite anything it thinks to the contrary?
For the moment I've worked around it by including the values that make it generate different data in the original custom URI (passed to webkit_web_view_load_uri or in links in the generated page). This does work but it's a bit ugly, and could be problematic if I forget to add something in the future, or if something changes generation but is not known in advance. It seems a bit silly that it goes to all the trouble of raising the event that generates the correct data, only to throw it away later, (presumably) due to an URI comparison on the wrong URI.
I suppose using a known-unique value (eg. sequential incrementing id) would also work, and resolve some of the unknown-in-advance issues, but that's no less ugly.

jMeter issue when using Cookie manager and Regular expression extractor

So basically I need to extract an auth token from header response of 1st http request and then use the extracted data in 2nd (and all the following) http requests cookies.
The issue here is, that I have cookie manager set for the whole controller and instead of getting actual data I get the name of variable in my cookie ".authToken=${auth}".
I am guessing the reason is that the variable is not declared when the test reaches Cookie manager, but I would expect jmeter to be smart enough to declare the variable when it gets to the regular expression extractor.
Structure
Thread
Cache Manager
Cookie Manager (Cookie Policy:compatibility; Implementation:HC3)
Controller
Http Request
Regular expression extractor
Http request (I need to use value extracted above in Request Cookie here)
Http request (I need to use the same value in Request Cookie here)
Http request (I need to use the same value in Request Cookie here)
.....
Details:
All the http requests are recorded with implementation HttpClient3.1
Pretty sure I have everything configured correctly as in variable names, regular expression since it works in a very specific case:
The only time it seemed to work correctly was when I had Cookie manager inside the http request and disabled the 'main' Cookie manager (the one for the whole controller). Then it got extracted correctly, but that would be really silly workaround for such a basic requirement and also I have many http requests (over 100) where I need to use the extracted value.
Jmeter doesn't need to use the variable before it's declared by the regular expression extractor, I made sure that the domain is correct and it gets used for the first time after it should have been extracted.
Another workaround I thought of would be having separate threads, have them linked and send the variable in between them, launching the next one once the data gets extracted, but that seems a little bit too drastic.
What I tried:
Splitting http requests into 2 different controllers and using 2 different Cookie managers - got "${auth}" instead of some value
Defining user variable above controller and then using "Apply to: Jmeter Variable" option - again got just string "${auth}" instead of some value.
Moving the Cookie manager to a position after the http request which is used for the extraction - again "${auth}" instead of some value
Setting different cookie's policy (not all of them, but few)
Setting "CookieManager.save.cookies=true" in jmeter.properties (and still have on true)
Any help/ideas are appreciated. I have been trying to figure this out for about an hour and I think I must be missing something very simple.
Alright, finally got this resolved after roughly 2 hours.
Thanks to this article, I was able to do what I needed
https://capacitas.wordpress.com/2013/06/11/thats-the-way-the-cookie-crumbles-jmeter-style-part-2/
In nutshell: You need to use beanshell pre-processor and add the cookie manually
Here is the beanshell script in case the site dies:
import org.apache.jmeter.protocol.http.control.CookieManager;
import org.apache.jmeter.protocol.http.control.Cookie;
CookieManager manager = sampler.getCookieManager();
Cookie cookie = new Cookie("CookieName", vars.get("YourExtractedVariable"), "Domain", "Path", false, 0);
manager.add(cookie);

Confused about Http verbs

I get confused when and why should you use specific verbs in REST?
I know basic things like:
Get -> for retrieval
Post -> adding new entity
PUT -> updating
Delete -> for deleting
These attributes are to be used as per the operation I wrote above but I don't understand why?
What will happen if inside Get method in REST I add a new entity or inside POST I update an entity? or may be inside DELETE I add an entity. I know this may be a noob question but I need to understand it. It sounds very confusing to me.
#archil has an excellent explanation of the pitfalls of misusing the verbs, but I would point out that the rules are not quite as rigid as what you've described (at least as far as the protocol is concerned).
GET MUST be safe. That means that a GET request must not change the server state in any substantial way. (The server could do some extra work like logging the request, but will not update any data.)
PUT and DELETE MUST be idempotent. That means that multiple calls to the same URI will have the same effect as one call. So for example, if you want to change a person's name from "Jon" to "Jack" and you do it with a PUT request, that's OK because you could do it one time or 100 times and the person's name would still have been updated to "Jack".
POST makes no guarantees about safety or idempotency. That means you can technically do whatever you want with a POST request. However, you will lose any advantage that clients can take of those assumptions. For example, you could use POST to do a search, which is semantically more of a GET request. There won't be any problems, but browsers (or proxies or other agents) would never cache the results of that search because it can't assume that nothing changed as a result of the request. Further, web crawlers would never perform a POST request because it could not assume the operation was safe.
The entire HTML version of the world wide web gets along pretty well without PUT or DELETE and it's perfectly fine to do deletes or updates with POST, but if you can support PUT and DELETE for updates and deletes (and other idempotent operations) it's just a little better because agents can assume that the operation is idempotent.
See the official W3C documentation for the real nitty gritty on safety and idempotency.
Protocol is protocol. It is meant to define every rule related to it. Http is protocol too. All of above rules (including http verb rules) are defined by http protocol, and the usage is defined by http protocol. If you do not follow these rules, only you will understand what happens inside your service. It will not follow rules of the protocol and will be confusing for other users. There was an example, one time, about famous photo site (does not matter which) that did delete pictures with GET request. Once the user of that site installed the google desktop search program, that archieves the pages locally. As that program knew that GET operations are only used to get data, and should not affect anything, it made GET requests to every available url (including those GET-delete urls). As the user was logged in and the cookie was in browser, there were no authorization problems. And the result - all of the user photos were deleted on server, because of incorrect usage of http protocol and GET verb. That's why you should always follow the rules of protocol you are using. Although technically possible, it is not right to override defined rules.
Using GET to delete a resource would be like having a function named and documented to add something to an array that deletes something from the array under the hood. REST has only a few well defined methods (the HTTP verbs). Users of your service will expect that your service stick to these definition otherwise it's not a RESTful web service.
If you do so, you cannot claim that your interface is RESTful. The REST principle mandates that the specified verbs perform the actions that you have mentioned. If they don't, then it can't be called a RESTful interface.

Double request from mod-rewrite

I've written a module that sets Apache environment variables to be used by mod-rewrite. It hooks into ap_hook_post_read_request() and that works fine, but if mod-rewrite matches a RewriteRule then it makes a second call to my request handler with the rewritten URL. This looks like a new request to me in that the environment variables are no longer set and therefore I have to execute my (expensive) code twice for each hit.
What am I doing wrong, or is there a work around for this?
Thanks
You can use the [NS] modifier on a rule to cause it to not be processed for internal subrequests (the second pass you're seeing is an internal subrequest).
As I understand it, the NS flag (suggested in another answer) on a rule makes it evaluate as "if I am being called a second time, ignore me". The trouble is, by then it's too late since the hook has already been called. I believe this will be a problem no matter what you do in mod_rewrite. You can detect the second request, but I don't know of any way to prevent the second request.
My best suggestion is to put the detection in your handler before your (expensive) code and exit if it's being run a second time. You could have mod_rewrite append something to the URL so you'd know when it's being called a second time.
However...
If your (expensive) code is being called on every request, it's also being called on images, css files, favicons, etc. Do you really want that? Or is that possibly what you are seeing as the second call?
Thanks a bunch, I did something similar to what bmb suggested and it works! But rather than involving mod-rewrite in this at all, I added a "fake" request header in my module's request handler, like so:
apr_table_set(r->headers_in, "HTTP_MY_MODULE", "yes");
Then I could detect it at the top of my handler on the second rewritten request. Turns out that even though mod-rewrite (or Apache?) doesn't preserve added env or notes variables (r->subprocess_env, r->notes) in a subrequest, it does preserve added headers.
As for my expensive code getting called on every request, I have a configurable URL suffix/extension filter in the handler to ignore image, etc requests.