Hiawatha CGI: writing to the client as soon as a command gets executed - cgi

I have a CGI script run by Hiawatha web server that needs to
return some data to the client,
do some system work (may take 20-30 seconds)
and then return yet more data.
So far I haven't been able to achieve this result: the script doesn't write data as the commands get executed, rather, it writes everything in a single shot when its execution ends. Is it even possible to achieve with Hiawatha what I described above? Thank you.

I figured it out: you need to set the WaitForCGI = yes option in the config file.

Related

PhantomJS - set time limit on page.open()? Or workaround?

Using PhantomJS and bash, I'm working on a little piece of anti-malware that reads a web page, grabs all the domains that are delivering assets to the browser, then prints each server's country of origin. It works fine except for one site that has a... uh... 'suboptimal' piece of javascript that calls to an external server every 5 seconds. PhantomJS just loads the resource over and over and over, page.open() never finishes, and page.onLoadFinished() is never called.
Is there a way around this? Can I set a time limit on page.load()? I guess, as a workaround, can I set a time limit on the Linux process?
Thanks in advance, and if anyone is interested in a copy of this script let me know and I'll post it somewhere public.
I solved this problem using the solutions given here to set a execution time limit on the phantomjs command and kill it if needed.
Command line command to auto-kill a command after a certain amount of time

execute system command on rails - not working in production

On development everything works great. On production however, this line of code in a controller is no working:
output = `mclines #{paramFileName} #{logFileName} #{outputFileName}`
where mclines is a c program, and the rest are names of files. mclines is not executed on the production server, but it does on my laptop. I have no idea about what to fix. Have been trying different things for hours, but the truth is that I'm quite lost. In production the ssl in on, that's the only major difference.
If I execute the command on the shell, it gets executed. When I say it doesn't gets executed is because the first thing it should do is print some info in a file, and it doesn't. The server -as my laptop- is running ubuntu, but I have no idea about what logs could be usefull to read. systemlog had nothing usefull.
Any ideas that can lead to find the culprit are welcome.
Make sure mclines really exists on the production server, and use the full path to the mclines executable, as in
output = `/full/path/to/mclines #{paramFileName} #{logFileName} #{outputFileName}`.
Reference this
Try to print out your exit status code as:
$?.to_i
after the command...
or as pointed out in this link you can always use popen3/popen4 for better handling of input/output for system commands...

What to do if I have a CGI that runs for several minutes before outputting data, and Apache times it out?

I have a CGI script that takes a really long time to execute. Long story short, it needs to process a lot of data, run a bunch of slow commands, and make some slow web queries, during which time it doesn't output anything, and when it's done, it finally prints its results out in JSON format. It takes several minutes to run, which is longer than the Timeout directive set in my Apache web server's httpd.conf.
I am not at liberty to change that Timeout value globally for everyone on the entire server. I thought of maybe overriding that in a per-directory basis using a .htaccess file, but it looks like the Timeout directive is not in .htaccess context, so that cannot be done. From what I understand, my script must continually output data, and if it doesn't output data for the Timeout number of seconds, Apache gives up.
I am getting the following error in Apache: (70007)The timeout specified has expired: ap_content_length_filter: apr_bucket_read() failed
What can I do?
Well, to offer the stupidly simple solution, why not just make the script occasionally produce some output while it's working? You could just print "Processing..." every few steps, or if you want to be more creative, have it print some status updates to indicate what it's doing. Or if you're worried about getting bored, print out a funny poem a line at a time. (Kind of reminds me of http://pages.cs.wisc.edu/~veeve/404.html)
If you don't want to do that, the next thing that comes to my mind is to use asynchronous processing. Basically, you'll have to spawn a separate process from the CGI script, and do the lengthy processing in that separate process. The main CGI script itself just outputs a simple HTML page that says the process is working and then exits. That HTML page would also have to contain some logic for periodically checking to see whether the background process on the server has finished. It could be a <meta http-equiv="refresh" ...> HTML element, or you could use AJAX.
I came up with a solution.
I would start outputting a dummy HTTP header, like Dummy: ..., and I can put whatever data I want as the value of that header, and it wouldn't affect the rest of the output. So I would output a character to that dummy value every minute or so, preventing it from timing out. And when I am ready, I can print a line return and continue printing the rest of my (real) HTTP headers and the content of the document.
A very pragmatic approach could be to start a background job and email the response to the client. 1O-1 they'd prefer that rather than having a browser window open all afternoon.

PHP script stops running arbitrarily with no errors

I have a PHP script that seemed to stop running after about 20 minutes.
To try to figure out why, I made a very simple script to see how long it would run without any complex code to confuse me.
I found that the same thing was happening with this simple infinite loop. At some point between 15 and 25 minutes of running, it stops without any message or error. The browser says "Done".
I've been over every single possible thing I could think of:
set_time_limit ( session.gc_maxlifetime in the php.ini)
memory_limit
max_execution_time
The point that the script is stopped is not consistent. Sometimes it will stop at 15 minutes, sometimes 22 minutes.
Please, any help would be greatly appreciated.
It is hosted on a 1and1 server. I contacted them and they don't provide support for bugs caused by developers.
At some point your browser times out and stops loading the page. If you want to test, open up the command line and run the code in there. The script should run indefinitely.
Have you considered just running the script from the command line, eg:
php script.php
and have the script flush out a message every so often that its still running:
<?php
while (true) {
doWork();
echo "still alive...";
flush();
}
in such cases, i turn on all the development settings in php.ini, of course on a development server. This display many more messages, including deprecation warnings.
In my experience of debugging long running php scripts, the most common cause was memory allocation failure (Fatal error: Allowed memory size of xxxx bytes exhausted...)
I think what you need to find out is the exact time that it stops (you can set an initial time and keep dumping out the current time minus initial). There is something on the server side that is stopping the file. Also, consider doing an ini_get to check to make sure the execution time is actually 0. If you want, set the time limit to 30 and then EVERY loop you make, continue setting at 30. Every time you call set_time_limit, the counter resets and this might allow you to bypass the actual limits. If this still isn't working, there is something on 1and1's servers that might kill the script.
Also, did you try the ignore_user_abort?
I appreciate everyone's comments. Especially James Hartig's, you were very helpful and sent me on the right path.
I still don't know what the problem was. I got it to run on the server with using SSH, just by using the exec() command as well as the ignore_user_abort(). But it would still time out.
So, I just had to break it into small pieces that will run for only about 2 minutes each, and use session variables/arrays to store where I left off.
I'm glad to be done with this fairly simple project now, and am supremely pissed at 1and1. Oh well...
I think this is caused by some process monitor killing off "zombie processes" in order to allow resources for other users.
Run the exec using "2>&1" to log anything including stderr.
In my output I managed to catch this:
...
script.sh: line 4: 15932 Killed php5-cli -d max_execution_time=0 -d memory_limit=128M myscript.php
So something (an external force, not PHP itself) is killing my process!
I use IdWebSpace which is excellent BTW but I think most shared hosting providers impose this resource/process control mechanism just to be sane.

How can I speed up batch processing job in Coldfusion?

Every once in awhile I am fed a large data file that my client uploads and that needs to be processed through CMFL. The problem is that if I put the processing on a CF page, then it runs into a timeout issue after 120 seconds. I was able to move the processing code to a CFC where it seems to not have the timeout issue. However, sometime during the processing, it causes ColdFusion to crash and has to restarted. There are a number of database queries (5 or more, mixture of updates and selects) required for each line (8,000+) of the file I go through as well as other logic provided by me in the form of CFML.
My question is what would be the best way to go through this file. One caveat, I am not able to move the file to the database server and process it entirely with the DB. However, would it be more efficient to pass each line to a stored procedure that took care of everything? It would still be a lot of calls to the database, but nothing compared to what I have now. Also, what would be the best way to provide feedback to the user about how much of the file has been processed?
Edit:
I'm running CF 6.1
I just did a similar thing and use CF often for data parsing.
1) Maintain a file upload table (Parent table). For every file you upload you should be able to keep a list of each file and what status it is in (uploaded, processed, unprocessed)
2) Temp table to store all the rows of the data file. (child table) Import the entire data file into a temporary table. Attempting to do it all in memory will inevitably lead to some errors. Each row in this table will link to a file upload table entry above.
3) Maintain a processing status - For each row of the datafile you bring in, set a "process/unprocessed" tag. This way if it breaks, you can start from where you left off. As you run through each line, set it to be "processed".
4) Transaction - use cftransaction if possible to commit all of it at once, or at least one line at a time (with your 5 queries). That way if something goes boom, you don't have one row of data that is half computed/processed/updated/tested.
5) Once you're done processing, set the file name entry in the table in step 1 to be "processed"
By using the approach above, if something fails, you can set it to start where it left off, or at least have a clearer path of where to start investigating, or worst case clean up in your data. You will have a clear way of displaying to the user the status of the current upload processing, where it's at, and where it left off if there was an error.
If you have any questions, let me know.
Other thoughts:
You can increase timeouts, give the VM more memory, put it in 64 bit but all of those will only increase the capacity of your system so much. It's a good idea to do these per call and do it in conjunction with the above.
Java has some neat file processing libraries that are available as CFCS. if you run into a lot of issues with speed, you can use one of those to read it into a variable and then into the database
If you are playing with XML, do not use coldfusion's xml parsing. It works well for smaller files and has fits when things get bigger. There are several cfc's written out there (check riaforge, etc) that wrap some excellent java libraries for parsing xml data. You can then create a cfquery manually if need be with this data.
It's hard to tell without more info, but from what you have said I shoot out three ideas.
The first thing, is with so many database operations, it's possible that you are generating too much debugging. Make sure that under Debug Output settings in the administrator that the following settings are turned off.
Enable Robust Exception Information
Enable AJAX Debug Log Window
Request Debugging Output
The second thing I would do is look at those DB queries and make sure they are optimized. Make sure selects are happening with indicies, etc.
The third thing I would suspect is that the file hanging out in memory is probably suboptimal.
I would try looping through the file using file looping:
<cfloop file="#VARIABLES.filePath#" index="VARIABLES.line">
<!--- Code to go here --->
</cfloop>
Have you tried an event gateway? I believe those threads are not subject to the same timeout settings as page request threads.
SQL Server Integration Services (SSIS) is the recommended tool for complex ETL (Extract, Transform, and Load) work, which is what this sounds like. (It can be configured to access files on other servers.) The question might be, can you work up an interface between Cold Fusion and SSIS?
If you can upgrade to cf8 and take advantage of cfloop file="" which would give you greater speed and the file would not be put in memory (which is probably the cause of the crashing).
Depending on the situation you are encountering you could also use cfthread to speed up processing.
Currently, an event gateway is the only way to get around the timeout limits of an HTTP request cycle. CF does not have a way to process CF pages offline, that is, there is no command-line invocation (one of my biggest gripes about CF - very little offling processing).
Your best bet is to use an Event Gateway or rewrite your parsing logic in straight Java.
I had to do the same thing, Ben Nadel has written a bunch of great articles uses java file io, to allow you to more speedily read files, write files etc...
Really helped improve the performance of our csv importing application.