Double request from mod-rewrite - apache

I've written a module that sets Apache environment variables to be used by mod-rewrite. It hooks into ap_hook_post_read_request() and that works fine, but if mod-rewrite matches a RewriteRule then it makes a second call to my request handler with the rewritten URL. This looks like a new request to me in that the environment variables are no longer set and therefore I have to execute my (expensive) code twice for each hit.
What am I doing wrong, or is there a work around for this?
Thanks

You can use the [NS] modifier on a rule to cause it to not be processed for internal subrequests (the second pass you're seeing is an internal subrequest).

As I understand it, the NS flag (suggested in another answer) on a rule makes it evaluate as "if I am being called a second time, ignore me". The trouble is, by then it's too late since the hook has already been called. I believe this will be a problem no matter what you do in mod_rewrite. You can detect the second request, but I don't know of any way to prevent the second request.
My best suggestion is to put the detection in your handler before your (expensive) code and exit if it's being run a second time. You could have mod_rewrite append something to the URL so you'd know when it's being called a second time.
However...
If your (expensive) code is being called on every request, it's also being called on images, css files, favicons, etc. Do you really want that? Or is that possibly what you are seeing as the second call?

Thanks a bunch, I did something similar to what bmb suggested and it works! But rather than involving mod-rewrite in this at all, I added a "fake" request header in my module's request handler, like so:
apr_table_set(r->headers_in, "HTTP_MY_MODULE", "yes");
Then I could detect it at the top of my handler on the second rewritten request. Turns out that even though mod-rewrite (or Apache?) doesn't preserve added env or notes variables (r->subprocess_env, r->notes) in a subrequest, it does preserve added headers.
As for my expensive code getting called on every request, I have a configurable URL suffix/extension filter in the handler to ignore image, etc requests.

Related

Mod_security turn off rule based on post data

Is it possible to remove a specific security based on post data?
For example, if post data contain username='someone'
then turn off rule 95004:
SecRuleRemoveById 950004
I already have SecRequestBodyAccess turned on
Location of the manual, which will give you guidance for creating the proper syntax for your installation: https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual-(v2.x)
You're not entirely clear in what you're after, but in general I think you're wanting the directive to begin something like:
SecRule ARGS_POST:username "someone"
That will test the post argument 'username' for the value 'someone' and if they match, trigger the action part of the rule. The contents of the rest will depend largely upon your own configuration, so I can't include it here, but the general idea is:
"phase:1,id:nnnnnnn,t:lowercase,nolog,pass,maturity:2,accuracy:9,ctl:ruleRemove ById:95004"
You'll need to replace the values in this action list with those appropriate to your setup. The only other important thing is this rule needs to be executed before 95004 gets executed. Most configurations of mod_security have a place for local exceptions to go. Use that.

How to force dispatcher cache urls with get parameters

As I understood after reading these links:
How to find out what does dispatcher cache?
http://docs.adobe.com/docs/en/dispatcher.html
The Dispatcher always requests the document directly from the AEM instance in the following cases:
If the HTTP method is not GET. Other common methods are POST for form data and HEAD for the HTTP header.
If the request URI contains a question mark "?". This usually indicates a dynamic page, such as a search result, which does not need to be cached.
The file extension is missing. The web server needs the extension to determine the document type (the MIME-type).
The authentication header is set (this can be configured)
But I want to cache url with parameters.
If I once request myUrl/?p1=1&p2=2&p3=3
then next request to myUrl/?p1=1&p2=2&p3=3 must be served from dispatcher cache, but myUrl/?p1=1&p2=2&p3=3&newParam=newValue should served by CQ for the first time and from dispatcher cache for subsequent requests.
I think the config /ignoreUrlParams is what you are looking for. It can be used to white list the query parameters which are used to determine whether a page is cached / delivered from cache or not.
Check http://docs.adobe.com/docs/en/dispatcher/disp-config.html#Ignoring%20URL%20Parameters for details.
It's not possible to cache the requests that contain query string. Such calls are considered dynamic therefore it should not be expected to cache them.
On the other hand, if you are certain that such request should be cached cause your application/feature is query driven you can work on it this way.
Add Apache rewrite rule that will move the query string of given parameter to selector
(optional) Add a CQ filter that will recognize the selector and move it back to query string
The selector can be constructed in a way: key_value but that puts some constraints on what could be passed here.
You can do this with Apache rewrites BUT it would not be ideal practice. You'll be breaking the pattern that AEM uses.
Instead, use selectors and extensions. E.g. instead of server.com/mypage.html?somevalue=true, use:
server.com/mypage.myvalue-true.html
Most things you will need to do that would ever get cached will work this way just fine. If you give me more details about your requirements and what you are trying to achieve, I can help you perfect the solution.

What could cause this API Get request to not work?

I am making a GET request to the following url:
http://testsurveys.com/surveys/demo-survey/?collector=10720
and the request works fine. The point is to assign collector ID 10720 to the survey. There is absolutely no issue with this request. However, when I add another parameter the collector ID is passed through as a get parameter but it does nothing. For example:
http://testsurveys.com/surveys/demo-survey/?code=123456&collector=10720
Why does the collector parameter work in the first scenario but not in the second?
That would depend on the code in your backend. There's nothing wrong with the second request. The fact that the parameters do get through to the other end is evidence of this. Try looking at your backend code and see why it isn't being processed properly. Were you coding it with any assumptions in mind that you ended up changing later? If you don't find anything, include the code in your question.

Performance issue on CInlineAction::runWithParams()

Based on New Relic report, about 70% of the time was spent to handle a request consumed by CInlineAction::runWithParams. In fact the Controller::action() took only less than 20% of total time.
At first I thought it might be because of using urlManager to rewrite the request instead of the webserver so I let the rewrite portion to be handled by the webserver and disabled the whole urlManager configuration in my config file.
Based on information provided by Xdebug still there is not much difference.
Just wanted to ask if it's natural for CInlineAction::runWithParams takes that amount of process or there is something wrong with the configuration.
Thanks.
Even though i can't really tell why New Relic reports so much time spent on that method i don't think there's something wrong with your configuration. Yii will create a new CInlineAction object for each call to an inline action (i.e. an action defined inline a controller as an action... function, hence the name). Then runWithParams is then called to run the currently requested action.
So it's quite natural that you see many calls to that method. The only suspicious thing inside that method is that reflection is used to actually call the controller method. But this should not have that dramatic impact on execution time. So i'd probably blame it on the way New Relic does it's measuring.
If you really want to hunt it down, you could add some echo microtime() before each line in that method to find out where the cycles go.

Intercepting with XMLHttpRequest for a specific address using greasemonkey

I'm trying to write a greasemonkey script that will work on either Chrome and Firefox.. a script that will block XMLHttpRequest to a certain hard-coded url..
I am kind of new to this area and would appreciate some help.
thanks.
it possible now using
#run-at document-start
http://wiki.greasespot.net/Metadata_Block#.40run-at
but it need more improvement, check the example
http://userscripts-mirror.org/scripts/show/125936
This almost impossible to do with Greasemonkey. It is the wrong tool for the job. Here's what to use, most effective first:
Set your hardware firewall, or router, to block the URL.
Set your software firewall to block the URL.
Use Adblock to block the URL.
Write a convoluted userscript that tries to block requests from one set of pages to a specific URL. Note that this potentially has to block inline src requests as well as AJAX, etc.