Debug and avoid periodic REBOL2 error, that try[] does not(?) catch? - rebol

Apparently un-catchable error while toying around with Rebol/Core (278-3-1) to make a kind-of web-server to serve a static text, containing a redirect link to a new service location.
The specific location of the error appear to be in example code written by Carl Sassenrath himself, back in 2006, so I'm kind of baffled there could be an undetected error after all these years.
I have three of these scripts running simultaneous, monitoring three individual ports. Essentially the script works as it should... when accessed repeatedly with multiple browsers at once (on all parallel scripts) it appear appear to be pretty stable... but one after another they fail. Sometimes after 2 minutes, sometimes after 20 minutes - after adding the print statements sometimes even after 60 minutes - but eventually they will fail like this:
** Script Error: Out of range or past end
** Where: forever
** Near: not empty? request: first http-port
I've tried wrapping just about every part of the program loop in a try[][exception], but the error still occurs. Unfortunately my search-fu appear to be weak this time of year, as I haven't found anything that could explain the problem.
The code is a cut down version of Carl Sassenrath's Tiny Web Server, slightly modified to bind to a specific IP, and to emit HTML instead of loading files:
REBOL [title: "TestMovedServer"]
AppName: "Test"
NewSite: "http://test.myserver.org"
listen-port: open/lines tcp://:81 browse http://10.100.44.6?
buffer: make string! 1024 ; will auto-expand if needed
forever [
http-port: first wait listen-port
clear buffer
while [not empty? request: first http-port][
print request
repend buffer [request newline]
print "----------"
]
repend buffer ["Address: " http-port/host newline]
print buffer
Location: ""
mime: "text/html"
parse buffer ["get" ["http" | "/ " | copy Location to " "]]
data: rejoin [{
<HTML><HEAD><TITLE>Site Relocated</TITLE></HEAD>
<BODY><CENTER><BR><BR><BR><BR><BR><BR>
<H1>} AppName { have moved to } NewSite {</H1>
<BR><BR><BR>Please update the link you came from.
<BR><BR><BR><BR><BR>(Continue directly to the requested page)
</CENTER></BODY></HTML>
}]
insert data rejoin ["HTTP/1.0 200 OK^/Content-type: " mime "^/^/"]
write-io http-port data length? data
close http-port
print "============"
]
I'm looking forward to see what you guys make out of this!

You get an error when trying to read from a closed connection. This seems to work.
n: 0
forever [
http-port: first wait listen-port
clear buffer
if attempt [all [request: first http-port not empty? request]] [
until [
print request
repend buffer [request newline]
print "----------"
any [not request: first http-port empty? request]
]
repend buffer ["Address: " http-port/host newline]
print buffer
Location: ""
mime: "text/html"
parse buffer ["get" ["http" | "/ " | copy Location to " "]]
data: rejoin [{
<HTML><HEAD><TITLE>Site Relocated</TITLE></HEAD>
<BODY><CENTER><BR><BR><BR><BR><BR><BR>
<H1>} AppName n: n + 1 { has moved to } NewSite {</H1>
<BR><BR><BR>Please update the link you came from.
<BR><BR><BR><BR><BR>(Continue directly to the requested page)
</CENTER></BODY></HTML>
}]
insert data rejoin ["HTTP/1.0 200 OK^/Content-type: " mime "^/^/"]
write-io http-port data length? data
]
attempt [close http-port]
print "============"
]

Let us see the documentation for empty?
Summary:
Returns TRUE if a series is at its tail.
Usage:
empty? series
Arguments:
series - The series argument. (must be: series port bitset)
So empty? requires series, port or bitset or string argument. Your variable (request) is getting any of them as long as there is connection to the port is open. empty? can thereafter determine whether it is at the tail of variable.
When the connection is closed/interrupted, your variable receives nothing but there is access error connecting to port. Error does not have tail. empty? gets confused and crashes with error.
sqlab has replaced empty? with attempt
if attempt [all [request: first http-port not empty? request]]
The ATTEMPT function is a shortcut for the frequent case of:
error? try [block]
with all he is guarding against error as well as none.
ATTEMPT returns the result of the block if an error did not occur. If an error did occur, a NONE is returned.
also with until and
any [not request: first http-port empty? request]
he is guarding against both.
Therefore his code is working.

Related

Ghostscript for PS integrity test: terminate at EOF, return error unless stack is empty

To test the integrity of PostScript files, I'd like to run Ghostscript in the following way:
Return 1 (or other error code) on error
Return 0 (success) at EOF if stack is empty
Return 1 (or other error code) otherwise
I could run gs in the background, and use a timeout to force termination if gs hangs with items left on the stack. Is there an easier solution?
Ghostscript won't hang if you send files as input (unless you write a program which enters an infinite loop or otherwise fails to reach a halting state). Having items on any of the stacks won't cause it to hang.
On the other hand, it won't give you an error if a PostScript program leaves operands on the operand stack (or dictionaries on the dictionary stack, clips on the clip stack or gstates on the graphics state stack). This is because that's not an error, and since PostScript interpreters normally run in a job server loop its not a problem either. Terminating the job returns control to the job server loop which does a save and restore round the total job, thereby clearing up anything left behind.
I'd suggest that if you really want to do this you need to adopt the same approach, you need to write a PostScript program which executes the PostScript program you want to 'test', then checks the operand stack (and other stacks if required) to see if anything is left. Note that you will want to execute the test program in a stopped context, as an error in the course of the program will clearly potentially leave stuff lying around.
Ghostscript returns 0 on a clean exit and a value less than 0 for errors, if I remember correctly. You would need to use signalerror in your test framework in order to raise an error if items are left at the end of a program.
[EDIT]
Anything supplied to Ghostscript on the command line by either -s or -d is defined in systemdict, so if we do -sInputFileName=/test.pdf then we will find in systemdict a key /InputFileName whose value is a string with the contents (/test.pdf). We can use that to pass the filename to our program.
The stopped operator takes an executable array as an argument, and returns either true or false depending on whether an error occurred while executing the array (3rd Edition PLRM, p 697).
So we need to run the program contained in the filename we've been given, and do it in a 'stopped' context. Something like this:
{InputFileName run} stopped
{
(Error occurred\n) print flush
%% Potentially check $error for more information.
}{
(program terminated normally\n) print flush
%% Here you could check the various stacks
} ifelse
The following, based 90% on KenS's answer, is 99% satisfactory:
Program checkIntegrity.ps:
{Script run} stopped
{
(\n===> Integrity test failed: ) print Script print ( has error\n\n) print
handleerror
(ignore this error which only serves to force a return value of 1) /syntaxerror signalerror
}{
% script passed, now check the stack
count dup 0 eq {
pop (\n===> Integrity test passed: ) print Script print ( terminated normally\n\n) print
} {
(\n===> Integrity test failed: ) print Script print ( left ) print
3 string cvs print ( item(s) on stack\n\n) print
Script /syntaxerror signalerror
} ifelse
} ifelse
quit
Execute with
gs -q -sScript=CodeToBeChecked.ps checkIntegrity.ps ; echo $?
For the last 1% of satisfaction I would need a replacement for
(blabla) /syntaxerror signalerror
It forces exit with return code 1, but is very verbous and distracts from the actual error in the checked script that is reported by handleerror. Therefore a cleaner way to exit(1) would be welcome.

How, in Rebol, to copy files without loading them into memory?

In Rebol, there are words for directory and file management, like make-dir, what-dir, rename, create-link, etc.
But I cannot find a word to simply copy a file to another location or to a newly created file.
A solution is to READ and WRITE. For example, I can do:
>> source: %.bash_history
== %.bash_history
>> target: %nothing
== %nothing
>> write/binary target (read/binary source)
And it works well. But what if I have a file larger than the available memory? Is there any way to copy a file without loading it into memory?
At the moment, I do with a CALL to the underlying OS:
>> call rejoin ["cp " to-string source " " to-string target]
But this is not portable to some different platforms than mine (GNU/Linux Mint): it will run on all Unices, Mac OSX, but not the rest.
I suppose it shouldn't be too hard to write a small function to do this, guessing the running operating system, and adapting the command line accordingly.
So my question: is there already a rebol standard word to copy files? If not, is there a plan to make one, in a module or something?
I don't recall a built-in way to do it aside from what's in the question, but you can do that by using file ports without buffering:
source: open/direct/binary/read %source
target: open/direct/binary/write %target
bytes_per: 1024 * 100
while [not none? data: copy/part source bytes_per][
insert target data
]
close target
close source
(Note: This answer is for Rebol 2)
You can also use system/version to detect which OS your script runs on:
call rejoin either 3 = system/version/4 [
;windows
[{copy "} to-local-file source {" "} to-local-file target {"}]
] [
;others
["cp " to-string source " " to-string target]
]
check this script as well http://www.rebol.org/view-script.r?script=environ.r
If there are other cases you can use;
switch/default system/version/4 [
2 [] ;mac
3 [] ;win
;...
] [
;default
]
Also check there, a few other answers for this problem:
Carl implemented something (I'm surprised it is not included in the heart of Rebol):
http://www.rebol.com/article/0281.html
And Patrick was as surprised as you, a decade and some days ago:
http://www.mail-archive.com/rebol-list#rebol.com/msg16473.html

Unexpected error while loading data

I am getting an "Unexpected" error. I tried a few times, and I still could not load the data. Is there any other way to load data?
gs://log_data/r_mini_raw_20120510.txt.gzto567402616005:myv.may10c
Errors:
Unexpected. Please try again.
Job ID: job_4bde60f1c13743ddabd3be2de9d6b511
Start Time: 1:48pm, 12 May 2012
End Time: 1:51pm, 12 May 2012
Destination Table: 567402616005:myvserv.may10c
Source URI: gs://log_data/r_mini_raw_20120510.txt.gz
Delimiter: ^
Max Bad Records: 30000
Schema:
zoneid: STRING
creativeid: STRING
ip: STRING
update:
I am using the file that can be found here:
http://saraswaticlasses.net/bad.csv.zip
bq load -F '^' --max_bad_record=30000 mycompany.abc bad.csv id:STRING,ceid:STRING,ip:STRING,cb:STRING,country:STRING,telco_name:STRING,date_time:STRING,secondary:STRING,mn:STRING,sf:STRING,uuid:STRING,ua:STRING,brand:STRING,model:STRING,os:STRING,osversion:STRING,sh:STRING,sw:STRING,proxy:STRING,ah:STRING,callback:STRING
I am getting an error "BigQuery error in load operation: Unexpected. Please try again."
The same file works from Ubuntu while it does not work from CentOS 5.4 (Final)
Does the OS encoding need to be checked?
The file you uploaded has an unterminated quote. Can you delete that line and try again? I've filed an internal bigquery bug to be able to handle this case more gracefully.
$grep '"' bad.csv
3000^0^1.202.218.8^2f1f1491^CN^others^2012-05-02 20:35:00^^^^^"Mozilla/5.0^generic web browser^^^^^^^^
When I run a load from my workstation (Ubuntu), I get a warning about the line in question. Note that if you were using a larger file, you would not see this warning, instead you'd just get a failure.
$bq show --format=prettyjson -j job_e1d8636e225a4d5f81becf84019e7484
...
"status": {
"errors": [
{
"location": "Line:29057 / Field:12",
"message": "Missing close double quote (\") character: field starts with: <Mozilla/>",
"reason": "invalid"
}
]
My suspicion is that you have rows or fields in your input data that exceed the 64 KB limit. Perhaps re-check the formatting of your data, check that it is gzipped properly, and if all else fails, try importing uncompressed data. (One possibility is that the entire compressed file is being interpreted as a single row/field that exceeds the aforementioned limit.)
To answer your original question, there are a few other ways to import data: you could upload directly from your local machine using the command-line tool or the web UI, or you could use the raw API. However, all of these mechanisms (including the Google Storage import that you used) funnel through the same CSV parser, so it's possible that they'll all fail in the same way.

Perl SQL file write delayed

Here is the simple perl script fetching data from SQL.
Read data and write on a file OUTFILE, and print the data on screen for every 10000th line.
One thing I am curious is that the printing the data on screen terminates very quickly(in 30 seconds), however, data fetching and writing on a file ends very slowly(30 minutes later).
The amount of data is not large. The output files size is less than 100Mbyte.
while ( my ($a,$b) = $curSqlEid->fetchrow_array() )
{
printf OUTFILE ("%s,%d\n", $a,$b);
$counter ++;
if($counter % 10000 == 0){
printf ("%s,%d\n", $a,$b);
}
}
$curSqlEid->finish();
$dbh->disconnect();
close(OUTFILE);
You are suffering from buffering.
Handles other than STDERR are buffered by default, and most handles use a block buffering. That means Perl will wait until there is 8KB* of data to write before sending anything to the system.
STDOUT is special. When is attached to a terminal (and only then), it uses a different kind of buffering: line buffering. When using line buffering, the data is flushed every time a newline is encountered in the data to write.
You can see this by running
$ perl -e'print "abc"; print "def"; sleep 5; print "\n"; sleep 5;'
[ 5 seconds pass ]
abcdef
[ 5 seconds pass ]
$ perl -e'print "abc"; print "def"; sleep 5; print "\n"; sleep 5;' | cat
[ 10 seconds pass ]
abcdef
The solution is to turn off buffering.
use IO::Handle qw( ); # Not needed on Perl 5.14 or later
OUTFILE->autoflush(1);
* — 8KB is the default. It can be configured when Perl is compiled. It used to be a non-configurable 4KB until 5.14.
I think you are seeing the output file size as 0 while the script is running and displaying on the console. Do not go by that. The file size will show up only once the script has finished. This is due to output buffering.
Anyways, the delay cannot be as large as 30 min. Once the script is done, you should see the output file data.
I tried various things, but the final conclusion is that python and perl has basically different handling data flow from DB. It looks like in perl, it is possible to handle data line by line while the data is transferred from DB. However, in Python it needs to wait until the entire data download from the server to process it.

What's the proper syntax in Rebol to execute an in-memory block of code including the header

This syntax doesn't work:
>> do load/header {rebol [Title: "Hello World"] Print System/Header/Script/Title }
** Script Error: Invalid path value: Header
** Near: Print System/Header/Script/Title
I want to get the meta-data in header.
My goal is mostly to be able to execute a whole rebol source including header to the clipboard and execute it in console by doing something like do read clipboard:// that doesn't work if I include the header, I can't strip it since I need it.
Rewritten in response to comment.
Use load/header/next to create a two-item block: the script header followed by the script content:
loaded: load/header/next {rebol [Title: "Hello World"] Print "this is my script"^/a: 99 + 5 print a}
probe loaded/1 ;; shows the header
do loaded/2 ;; executes the script