What to do if I have a CGI that runs for several minutes before outputting data, and Apache times it out? - apache

I have a CGI script that takes a really long time to execute. Long story short, it needs to process a lot of data, run a bunch of slow commands, and make some slow web queries, during which time it doesn't output anything, and when it's done, it finally prints its results out in JSON format. It takes several minutes to run, which is longer than the Timeout directive set in my Apache web server's httpd.conf.
I am not at liberty to change that Timeout value globally for everyone on the entire server. I thought of maybe overriding that in a per-directory basis using a .htaccess file, but it looks like the Timeout directive is not in .htaccess context, so that cannot be done. From what I understand, my script must continually output data, and if it doesn't output data for the Timeout number of seconds, Apache gives up.
I am getting the following error in Apache: (70007)The timeout specified has expired: ap_content_length_filter: apr_bucket_read() failed
What can I do?

Well, to offer the stupidly simple solution, why not just make the script occasionally produce some output while it's working? You could just print "Processing..." every few steps, or if you want to be more creative, have it print some status updates to indicate what it's doing. Or if you're worried about getting bored, print out a funny poem a line at a time. (Kind of reminds me of http://pages.cs.wisc.edu/~veeve/404.html)
If you don't want to do that, the next thing that comes to my mind is to use asynchronous processing. Basically, you'll have to spawn a separate process from the CGI script, and do the lengthy processing in that separate process. The main CGI script itself just outputs a simple HTML page that says the process is working and then exits. That HTML page would also have to contain some logic for periodically checking to see whether the background process on the server has finished. It could be a <meta http-equiv="refresh" ...> HTML element, or you could use AJAX.

I came up with a solution.
I would start outputting a dummy HTTP header, like Dummy: ..., and I can put whatever data I want as the value of that header, and it wouldn't affect the rest of the output. So I would output a character to that dummy value every minute or so, preventing it from timing out. And when I am ready, I can print a line return and continue printing the rest of my (real) HTTP headers and the content of the document.

A very pragmatic approach could be to start a background job and email the response to the client. 1O-1 they'd prefer that rather than having a browser window open all afternoon.

Related

Interrupt, stop or timeout LotusScript Agent internally

I would like to timeout a LotusScript agent internally. The Agent Manager has a timeout of 60 min with one task, which is needed for some agents. In my case the agent normally runs 7 - 10 min, but it might hang on opening a mail calendar profile. It just hangs, does nothing but consuming CPU and blocking other agents from running.
Is there any way to stop/interrupt the agent internally, so that I can set a timeout of 30 sec for that operation and if it does not succeed stop the agent?
Problematic Code Snippet:
Set notesDocument = notesDatabase.GetProfileDocument("calendarprofile")
Error on the Mail Server short time after the problem (other server than the agent server)
SchedMgr: Error processing calendar profile document (NoteID: NT00000902) in database XXXX/XXXX.nsf: Document has been deleted
Understanding internally as without external agent, process and so on.
If no operation is blocking, you can instanciate timer, check periodically if your time is comsumed and end graccefullly your code.
If your code is blocked (as in your example) on an operation, there is nothing you can do: no preampt task, no interruption.
I had the same issue when using ole to write in Excel. When a dialogue box were openend in Excel the task (run by http) just stoped forever. The max time execution even don't work.
As per #Emmanuel's answer, I don't believe you can do anything to set a timeout on the operation that is hanging. However, since you know about the problem you might be able to work around it using the NotesNoteCollection class. I.e., something like this:
dim c as NotesNoteCollection
set c = db.CreateNoteCollection(false)
c.selectProfiles = true
c.BuildCollection
Then you loop through the collection using id=c.getFirstNoteId and id=c.getNextNoteId(id) in a pattern similar to what you use to loop through a regular NotesDocumentCollection, retrieving each profile document using doc =db.GetDocumentByID(id) and checking with doc.isValid to make sure it's not a deletion stub (which seems to be the root of your problem), and then checking if it is the calendarprofile by calling doc.getItemValue("$Name") and examining the value in the 0th element of the value array. It's a string containing a prefix "$profile" followed by an underscore, a number (three digits, always?), and then the profile doc name and another underscore. (In some profile docs, $Name would also contain a username, which IIRC occurs after the second underscore, but that's not the case with the calendarprofile doc. Use NotesPeek to examine a mail database to see the format.) Then, once you've verified that the document exists and is not a stub, go ahead and use db.getProfileDocument to assure that you're working on the cached version of the note.
You might also want to investigate why your code is hanging. I've not run into a situation like this before, but I'm wondering if there might be an excessive number of deletion stubs in the database and your code is triggering some sort of cleanup operation on them that is taking a very long time. That's just a guess, but this behavior isn't normal and even though I believe you can work around it, that might not be true. It's just a guess. And building and iterating the NotesNoteCollection might even trigger the same bad behavior for all I know.

PhantomJS - set time limit on page.open()? Or workaround?

Using PhantomJS and bash, I'm working on a little piece of anti-malware that reads a web page, grabs all the domains that are delivering assets to the browser, then prints each server's country of origin. It works fine except for one site that has a... uh... 'suboptimal' piece of javascript that calls to an external server every 5 seconds. PhantomJS just loads the resource over and over and over, page.open() never finishes, and page.onLoadFinished() is never called.
Is there a way around this? Can I set a time limit on page.load()? I guess, as a workaround, can I set a time limit on the Linux process?
Thanks in advance, and if anyone is interested in a copy of this script let me know and I'll post it somewhere public.
I solved this problem using the solutions given here to set a execution time limit on the phantomjs command and kill it if needed.
Command line command to auto-kill a command after a certain amount of time

Hiawatha CGI: writing to the client as soon as a command gets executed

I have a CGI script run by Hiawatha web server that needs to
return some data to the client,
do some system work (may take 20-30 seconds)
and then return yet more data.
So far I haven't been able to achieve this result: the script doesn't write data as the commands get executed, rather, it writes everything in a single shot when its execution ends. Is it even possible to achieve with Hiawatha what I described above? Thank you.
I figured it out: you need to set the WaitForCGI = yes option in the config file.

PHP script stops running arbitrarily with no errors

I have a PHP script that seemed to stop running after about 20 minutes.
To try to figure out why, I made a very simple script to see how long it would run without any complex code to confuse me.
I found that the same thing was happening with this simple infinite loop. At some point between 15 and 25 minutes of running, it stops without any message or error. The browser says "Done".
I've been over every single possible thing I could think of:
set_time_limit ( session.gc_maxlifetime in the php.ini)
memory_limit
max_execution_time
The point that the script is stopped is not consistent. Sometimes it will stop at 15 minutes, sometimes 22 minutes.
Please, any help would be greatly appreciated.
It is hosted on a 1and1 server. I contacted them and they don't provide support for bugs caused by developers.
At some point your browser times out and stops loading the page. If you want to test, open up the command line and run the code in there. The script should run indefinitely.
Have you considered just running the script from the command line, eg:
php script.php
and have the script flush out a message every so often that its still running:
<?php
while (true) {
doWork();
echo "still alive...";
flush();
}
in such cases, i turn on all the development settings in php.ini, of course on a development server. This display many more messages, including deprecation warnings.
In my experience of debugging long running php scripts, the most common cause was memory allocation failure (Fatal error: Allowed memory size of xxxx bytes exhausted...)
I think what you need to find out is the exact time that it stops (you can set an initial time and keep dumping out the current time minus initial). There is something on the server side that is stopping the file. Also, consider doing an ini_get to check to make sure the execution time is actually 0. If you want, set the time limit to 30 and then EVERY loop you make, continue setting at 30. Every time you call set_time_limit, the counter resets and this might allow you to bypass the actual limits. If this still isn't working, there is something on 1and1's servers that might kill the script.
Also, did you try the ignore_user_abort?
I appreciate everyone's comments. Especially James Hartig's, you were very helpful and sent me on the right path.
I still don't know what the problem was. I got it to run on the server with using SSH, just by using the exec() command as well as the ignore_user_abort(). But it would still time out.
So, I just had to break it into small pieces that will run for only about 2 minutes each, and use session variables/arrays to store where I left off.
I'm glad to be done with this fairly simple project now, and am supremely pissed at 1and1. Oh well...
I think this is caused by some process monitor killing off "zombie processes" in order to allow resources for other users.
Run the exec using "2>&1" to log anything including stderr.
In my output I managed to catch this:
...
script.sh: line 4: 15932 Killed php5-cli -d max_execution_time=0 -d memory_limit=128M myscript.php
So something (an external force, not PHP itself) is killing my process!
I use IdWebSpace which is excellent BTW but I think most shared hosting providers impose this resource/process control mechanism just to be sane.

How can I speed up batch processing job in Coldfusion?

Every once in awhile I am fed a large data file that my client uploads and that needs to be processed through CMFL. The problem is that if I put the processing on a CF page, then it runs into a timeout issue after 120 seconds. I was able to move the processing code to a CFC where it seems to not have the timeout issue. However, sometime during the processing, it causes ColdFusion to crash and has to restarted. There are a number of database queries (5 or more, mixture of updates and selects) required for each line (8,000+) of the file I go through as well as other logic provided by me in the form of CFML.
My question is what would be the best way to go through this file. One caveat, I am not able to move the file to the database server and process it entirely with the DB. However, would it be more efficient to pass each line to a stored procedure that took care of everything? It would still be a lot of calls to the database, but nothing compared to what I have now. Also, what would be the best way to provide feedback to the user about how much of the file has been processed?
Edit:
I'm running CF 6.1
I just did a similar thing and use CF often for data parsing.
1) Maintain a file upload table (Parent table). For every file you upload you should be able to keep a list of each file and what status it is in (uploaded, processed, unprocessed)
2) Temp table to store all the rows of the data file. (child table) Import the entire data file into a temporary table. Attempting to do it all in memory will inevitably lead to some errors. Each row in this table will link to a file upload table entry above.
3) Maintain a processing status - For each row of the datafile you bring in, set a "process/unprocessed" tag. This way if it breaks, you can start from where you left off. As you run through each line, set it to be "processed".
4) Transaction - use cftransaction if possible to commit all of it at once, or at least one line at a time (with your 5 queries). That way if something goes boom, you don't have one row of data that is half computed/processed/updated/tested.
5) Once you're done processing, set the file name entry in the table in step 1 to be "processed"
By using the approach above, if something fails, you can set it to start where it left off, or at least have a clearer path of where to start investigating, or worst case clean up in your data. You will have a clear way of displaying to the user the status of the current upload processing, where it's at, and where it left off if there was an error.
If you have any questions, let me know.
Other thoughts:
You can increase timeouts, give the VM more memory, put it in 64 bit but all of those will only increase the capacity of your system so much. It's a good idea to do these per call and do it in conjunction with the above.
Java has some neat file processing libraries that are available as CFCS. if you run into a lot of issues with speed, you can use one of those to read it into a variable and then into the database
If you are playing with XML, do not use coldfusion's xml parsing. It works well for smaller files and has fits when things get bigger. There are several cfc's written out there (check riaforge, etc) that wrap some excellent java libraries for parsing xml data. You can then create a cfquery manually if need be with this data.
It's hard to tell without more info, but from what you have said I shoot out three ideas.
The first thing, is with so many database operations, it's possible that you are generating too much debugging. Make sure that under Debug Output settings in the administrator that the following settings are turned off.
Enable Robust Exception Information
Enable AJAX Debug Log Window
Request Debugging Output
The second thing I would do is look at those DB queries and make sure they are optimized. Make sure selects are happening with indicies, etc.
The third thing I would suspect is that the file hanging out in memory is probably suboptimal.
I would try looping through the file using file looping:
<cfloop file="#VARIABLES.filePath#" index="VARIABLES.line">
<!--- Code to go here --->
</cfloop>
Have you tried an event gateway? I believe those threads are not subject to the same timeout settings as page request threads.
SQL Server Integration Services (SSIS) is the recommended tool for complex ETL (Extract, Transform, and Load) work, which is what this sounds like. (It can be configured to access files on other servers.) The question might be, can you work up an interface between Cold Fusion and SSIS?
If you can upgrade to cf8 and take advantage of cfloop file="" which would give you greater speed and the file would not be put in memory (which is probably the cause of the crashing).
Depending on the situation you are encountering you could also use cfthread to speed up processing.
Currently, an event gateway is the only way to get around the timeout limits of an HTTP request cycle. CF does not have a way to process CF pages offline, that is, there is no command-line invocation (one of my biggest gripes about CF - very little offling processing).
Your best bet is to use an Event Gateway or rewrite your parsing logic in straight Java.
I had to do the same thing, Ben Nadel has written a bunch of great articles uses java file io, to allow you to more speedily read files, write files etc...
Really helped improve the performance of our csv importing application.