how to continue the request in mod_wsgi after processing the request - mod-wsgi

After processing the request in a mod_wsgi module, I want to continue the request as it was supposed to without the module. How to do that ?
def application(environ, startResponse):
// do some processing
then continue the request

If you mean you want to perform some task after the response has been sent, see:
http://code.google.com/p/modwsgi/wiki/RegisteringCleanupCode
Doing such tasks in process can be problematic. You are better off submitting details into a separate task system such as Celery, Redis Queue or Gearman and let it handle it. That way the request handler thread is released to handle other requests and you don't reduce the capacity of the WSGI server as far as handling HTTP requests is concerned.
If this is not what you are asking, you need to explain it a bit better as your description is a little confusing.

Related

Scrapy Redis: fetch next_request without waiting for idle signal

I am using scrapy framework to make api calls (broad crawls) and using scrapy redis for running it in the distributed network. I am fetching the start urls from Redis and then using middleware to make the subsequent request. The response time of the task (initial request + set of subsequent requests) varies with reference to the API parameters.
Since spiders in scrapy-redis rely on the spider idle signal to fetch start urls. I am unable to utilize all the resources as it waits for the batch request to be over ( batch size = 100 ).
How can I tweak the scrapy-redis, so it immediately fetches the start urls after the task is over. I tried running multiple processes with redis-batch-size=1, but it didn't solve my problem as each scrapy process takes a lot of memory.

Handling cache warm-up with twisted and systemd

I have a simple twisted application which I run using a systemd service, executing a script, which subsequently executes a .tac file.
The application is structured as a JSON RPC endpoint (fastjsonrpc), built into a t.w.r.Resource, which is in a t.w.s.Site, and served t.a.i.TCPServer, and the whole thing packed into a t.a.Application. This works fine.
Where I do run into trouble is when I try to warm up caches at startup. This warm-up process is pretty slow (~300 seconds), and makes systemd timeout and kill the process. Increasing the timeout is not really a viable option, since I wouldn't want this to block system boot.
Analogous code is used in a separate stack running on Flask from within Apache and wsgi. That server starts itself off and lets systemd go on while it takes its time building the caches. This behaviour is fine for me.
I've tried calling the warmup function using the following within the setup function of the t.w.r.Resource:
reactor.callLater(1, ep.warmup, None)
I've not yet tried using this from within systemd, and have been testing it from twistd directly on the command line. The server does work as expected, however it no longer responds to SIGINT (^C). Removing the callLater is all that's needed to let the server respond to SIGINT.
If the warmup function is called directly (not by callLater, i.e., the arrangement which makes systemd give up while waiting for warm up to complete), the resulting server also continues to respond to SIGINT.
Is there a better / good way to handle this sort of long-running warmup code?
Why would twistd / the reactor not respond to SIGINT? Am I missing something here?
Twisted is a single-threaded thing. It sounds like your "cache warmup" code is blocking the reactor for those 300 seconds. One easy way to fix this would be using deferToThread to let it run without blocking the reactor.

What happens AFTER Apache says "Script timed out before returning headers" to the running script?

I have a Perl web app served by Apache httpd using plain mod_cgi or optionally mod_perl with PerlHandler ModPerl::Registry. Recently the app encountered the error Script timed out before returning headers on some invocations and behaved differently afterwards: While some requests seemed to be processed successfully in the background, after httpd sent status 504 to the client, others didn't.
So how exactly behaves httpd AFTER it reached its configured timeout and sent the error to the client? The request/response cycle is finished now, so I guess things like KeepAlive come into play to decide if the TCP connections stays alive or not etc. But what happens to the running script in which environment, e.g. mod_cgi vs. mod_perl?
Especially in mod_cgi, where new processes are started for each request, I would have guessed that httpd keeps the processes simply running. Because all our Perl files have a shebang, I'm not even sure if httpd is able to track the processes and does so or not. That could be completely different with mod_perl, because in that case httpd is aware of the interpreters and what they are doing etc. In fact, the same operation which timed out using plain mod_cgi, succeeded using mod_perl without any timeout, but even with a timeout in mod_cgi at least one request succeeded afterwards as well.
I find this question interesting for other runtimes than Perl as well, because they share the concepts of plain mod_cgi vs. some persistent runtime embedded into the httpd processes or using some external daemons.
So, my question is NOT about how to get the error message away. Instead I want to understand how httpd behaves AFTER the error occurred, because I don't seem to find much information on that topic. It's all just about increasing configuration values and try to avoid the problem in the first place, which is fine, but not what I need to know currently.
Thanks!
Both mod_cgi and mod_cgid set a cleanup function on the request scope to kill the child process, but they do it slightly different ways. This would happen shortly after the timeout is reported (a little time for mod_cgi to return control, the error response to be written, the request logged, etc)
mod_cgi uses a core facility in httpd that does SIGTERM, sleeps for 3 seconds, then does a SIGKILL.

Force Twilio to respond to request slowly in order to test app's handling

I'm debugging an application that sends SMS messages via the Twilio REST API. The other day we had a strange bug that we can't reproduce, and I think it may have been happening because the Twilio API took very long to respond (2-3 seconds) and the app didn't handle the delay well.
We're working on improving the app to better handle a scenario like this, but I'm not sure how to test if we've really fixed the issue. Is there a way to force Twilio to respond slowly, in order to test this?
I realize that I could make my own mock web service with a long delay and substitute it in for Twilio -- but I'd like to avoid that if possible. In particular, I'm using one of the Twilio helper libraries for all of my call-outs, and would like to avoid monkey-patching them if at all possible.
You might want to try configuring a proxy server in your tests, that responds very slowly. Here's one that we use, in Nginx. Note it requires the nginx-lua module.
location ~ /slow {
# Proxy pass is necessary so the incoming request is accepted and
# processed by nginx.
proxy_pass http://127.0.0.1:11418;
}
server {
listen 127.0.0.1:11418;
location ~ / {
# return a funny HTTP code here, so it's clear that the slow block got
# hit.
add_header 'X-Served-By: slow-as-heck';
content_by_lua 'ngx.sleep(25); ngx.exit(418)';
}
}
You can also try connecting a dead IP that won't send back a TCP reset, like 10.255.255.1.
Hope it helps,
Kevin
Twilio evangelist here.
There is not currently a way to tell Twilio to simulate a slow HTTP response or an HTTP response timeout.
Depending on your platform there may be a way to catch a network timeout, which is what it sounds like you suspect caused the problem.
Hope that helps.

Apache CGI Timeout - how does it kill and/or notify the child?

I have some potentially long lived CGI applications which must clean up their environment regardless of whether they complete normally or if they're killed by Apache because they're taking too long. They're using shared memory so I can't rely on the operating system's normal process cleanup mechanisms.
How does Apache kill its CGI children when they're timing out? I can't find any documentation or specification for how its done, nor whether its possible for the child to intercept that so it can shut down cleanly.
I could not find any official Apache documentation on this, but the following script shows that CGI scripts are sent SIGTERM on timeout, not SIGKILL (at least in my version of Apache, 2.2.15):
#!/usr/bin/perl
use strict;
use warnings;
use sigtrap 'handler' => \&my_handler, 'normal-signals';
use CGI;
sub my_handler {
my ($sig) = #_;
open my $fh, ">", "/var/www/html/signal.log" or die $!;
print $fh "Caught SIG$sig";
close $fh;
}
sleep 10 while 1;
Output:
Caught SIGTERM
Nop, Apache send kill signal and this signal can not be caught or handled. So signal handler do nothing in this case.
looks like apache doesn't do anything? I just added a signal handler to one of my perl cgi scripts that Apache timed out on, and I get nothing :(
bit of a shame really.
Note that in case these tasks are really taking too long and there isn't really a reply expected by the client, you could instead start a background process on your server whenever you receive such a request.
This of course means that you probably want to make sure you don't start the background process more than a certain number of times (possibly just once) and you can have that process save information in a file or shared memory so the client can check progress.
Not allowing the background process from being started too many times will save your server memory / CPU... otherwise it will become unresponsive.
And that way you do not have to worry too much about Apache killing your long process since there is no more timeout concerns with it.