Script for Testing with Input files and Output Solutions - testing

I have a set of *.in files and a set of *.soln files with matching files names. I would like to run my program with the *.in file as input and compare the output to the ones found in the *.soln files. What would be the best way to go about this? I can think of 3 options.
Write some driver in Java to list files in the folder, run the program, and compare. This would be hard and difficult.
Write a bash script to do this. How?
Write a python script to do this?

I would go for a the bash solution. Also given that what you are doing is a test, I would always save the output of the myprogram so that if there are failures, that you always have the output to compare it to.
#!/bin/bash
for infile in *.in; do
basename=${infile%.*}
myprogram $infile > $basename.output
diff $basename.output $basename.soln
done
Adding the checking or exit statuses etc. as required by your report.

If the program exists, I suspect the bash script is the best bet.
If your soln files are named right, some kind of loop like
for file in base*.soln
do
myprogram > new_$file
diff $file new_$file
done
Of course, you can check the exit code of diff and
do various other things to create a test report . . .
That looks simplest to me . . .
Karl

This is primarily a problem that requires the use of the file-system with minimal logic. Bash isn't a bad choice for such problems. If it turns out you want to do something more complicated than just comparing for equality Python becomes a more attractive choice. Java doesn't seem like a good choice for a throwaway script such as this.
Basic bash implementation might look something like this:
cd dir_with_files
program=your_program
input_ext=".in"
compare_to_ext=".soIn"
for file in *$from_extension; do
diff <("$program" "$i") "${file:0:$((${#file}-3))}$compare_to_ext"
done
Untested.

Related

Buffering output with AWK

I have an input file which consists of three parts:
inputFirst
inputMiddle
inputLast
Currently I have an AWK script which with this input creates an output file which consists of two parts:
outputFirst
outputLast
where outputFirst and outputLast is generated (on the fly) from inputFirst and inputLast respectively. However, to calculate the outputMiddle part (which is only one line) I need to scan the entire input, so I store it in a variable. The problem is that the value of this variable should go in between outputFirst and outputLast in the output file.
Is there a way to solve this using a single portable AWK script that takes no arguments? Is there a portable way to create temporary files in an AWK script or should I store the output from outputFirst and outputLast in two variables? I suspect that using variables will be quite inefficient for large files.
All versions of AWK (since at least 1985) can do basic I/O redirection to files or pipelines, just like the shell can, as well as run external commands without I/O redirection.
So, there are any number of ways to approach your problem and solve it without having to read the entire input file into memory. The most optimal solution will depend on exactly what you're trying to do, and what constraints you must honour.
A simple approach to the more precise example problem you describe in your comment above would perhaps go something like this: first in the BEGIN clause form two unique filenames with rand() (and define your variables), then read and sum the first 50 numbers from standard input while also writing them to a temporary file, then continuing to read and sum the next 50 numbers and write them to a second file, then finally in an END clause you would use a loop to read the first temporary file with getline and write it to standard output, print the total sum, then read the second temporary file the same way and write it to standard output, and finally call system("rm " file1 " " file2) to remove the temporary files.
If the output file is not too large (whatever that is), saving outputLast in a variable is quite reasonable. The first part, outputFirst, can (as described) be generated on the fly. I tried this approach and it worked fine.
Print the "first" output while processing the file, then write the remainder to a temporary file until you have written the middle.
Here is a self-contained shell script which processes its input files and writes to standard output.
#!/bin/sh
t=$(mktemp -t middle.XXXXXXXXX) || exit 127
trap 'rm -f "$t"' EXIT
trap 'exit 126' HUP INT TERM
awk -v temp="$t" "NR<500000 { print n+1 }
{ s+=$1 }
NR>=500000 { print n+1 >>temp
END { print s }' "$#"
cat "$t"
For illustration purposes, I used really big line numbers. I'm afraid your question is still too vague to really obtain a less general answer, but perhaps this can help you find the right direction.

How do I tell Octave where to find functions without picking up other files?

I've written an octave script, hello.m, which calls subfunc.m, and which takes a single input file, a command line argument, data.txt, which it loads with load(argv(){1}).
If I put all three files in the same directory, and call it like
./hello.m data.txt
then all is well.
But if I've got another data.txt in another directory, and I want to run my script on it, and I call
../helloscript/hello.m data.txt
this fails because hello.m can't find subfunc.m.
If I call
octave --path "../helloscript" ../helloscript/hello.m data.txt
then that seems to work fine.
The problem is that if I don't have a data.txt in the directory, then the script will pick up any data.txt that is lying around in ../helloscript.
This seems a bit fragile. Is there any way to tell octave, preferably in the script itself, to get subfunctions from the same directory as the script, but to get everything else relative to the current directory.
The best robust solution I can think of at the moment is to inline the subfunction in the script, which is a bit nasty.
Is there a good way to do this, or is it just a thorny problem that will cause occasional hard to find problems and can't be avoided?
Is this in fact just a general problem with scripting languages that I've just never noticed before? How does e.g. python deal with it?
It seems like there should be some sort of library-load-path that can be set without altering the data-load-path.
Adding all your subfunctions to your program file is not nasty at all. Why would you think so? It is perfectly normal to have function definitions in your script. The only language I know that does not do this is Matlab but that's just braindead.
The other alternative you have is to check that the input file argument, data.txt exists. Like so:
fpath = argv (){1};
[info, err, msg] = stat (fpath);
if (err)
error ("could not stat `%s' : %s", fpath, msg);
endif
## continue your script knowing the file exists
But really, I would recommend you to use both. Add your subfunctions in your main program, the only reason to have it on separate file is if you plan on sharing with other programs, and always check input arguments.

SCIP write best feasible solution in automated test

Based on steps in http://scip.zib.de/doc/html/TEST.php, I have managed to set up an automated test using SCIP. However, I'd like to write the solution (best feasible solution) to a file, instead of just getting the objective value. Is there anyway to do it in the automated test?
I did a hack in check.sh by replacing
OPTCOMMAND=optimize; write solution myfilename.sol;
But too bad, it doesn't seem to work, when I tried to make TEST=mytest test, this line is observed from the output
bash ./check.sh mytest bin/scip-3.1.0.linux.x86_64.gnu.opt.spx default scip-3.1.0.linux.x86_64.gnu.opt.spx 3600 2100000000 6144 1 default 10000 false false 3.1.0 spx false /tmp optimize;
write: solution is not logged in on myfilename.sol
I know it is possible to write the solution via interactive shell, but I am trying to automate the test in order to retrieve both solution and obj value. Any help or clarification will be much appreciated!
You are getting an error because with the syntax you are using, you try to invoke a bash command called "write" because of the semicolon:
The write utility allows you to communicate with other users, by
copying lines from your terminal to theirs.
Just try without semicolon ;)
The cleaner solution would be to modify the file "check/configuration_tmpfile_setup_scip.sh"
and add the line
echo write solution /absolute/path/to/solutions/${INSTANCE}.sol >> $TMPFILE
before the quit command. This configuration file sets up a batch file to feed SCIP with all commands that the interactive shell should execute, and you can model arbitrary user behavior.

correct way to write to the same file from multiple processes awk

The title says it all.
I have 4 awk processes logging to the same file, and output seems fine, not mangled, but I'm not sure that just redirecting print output like this: print "xxx" >> file in every process is the right way to do it.
There are many similar questions around the site, but this one is particularly about awk and a pragmatic, code-correct way to approach the problem.
EDIT
Sorry folks, of course I wasn't "just redirecting" like I wrote, I was appending.
No it is not safe.
the awk print "foo" > "file" will open the file and overwrite the file content, till the end of script.
That is, if your 4 awk processes started writing to the same file on different time, they overwrite the result of each other.
To reproduce it, you could start two (or more) awk like this:
awk '{while(++i<9){system("sleep 2");print "p1">"file"}}' <<<"" &
awk '{while(++i<9){system("sleep 2");print "p2">"file"}}' <<<"" &
and same time you monitoring the content of file, you will see finally there are not exactly 8 "p1" and 8 "p2".
using >> could avoid the losing of entries. but the entry sequence from 4 processes could be messed up.
EDIT
Ok, the > was a typo.
I don't know why you really need 4 processes to write into same file. as I said, with >>, the entries won't get lost (if you awk scripts works correctly). however personally I won't do in this way. If I have to have 4 processes, i would write to different files. well I don't know your requirement, just speaking in general.
outputting to different files make the testing, debugging easier.. imagine when one of your processes had problem, you want to solve it. etc...
I think using the operating system print command is save. As in fact this will append the file write buffer with the string you provide as log. So the system will menage the actual writing process of the data to disc, also if another process will want to use the same file the system will see that the resource is already claimed and will wait for 1st thread to finish its processing, than will allow the 2nd process to write to the buffer.

Finding files in subdirectories created after a certain date

I'm in the process of writing a bash script (just learning it) which needs to find files in subdirectories created after a certain date. I have a folder /images/ with jpegs in various subfolders - I want to find all jpegs uploaded to that directory (or any subdirectories) after a certain date. I know about the -mtime flag, but my "last import" date is stored in %Y-%m-%d format and it'd be nice to use that if possible?
Also, each file/pathname will then be used to generate a MySQL SELECT query. I know find generally outputs the filenames found, line-by-line. But if find isn't actually the command that I should be using, it'd be nice to have a similar output format I could use to generate the SELECT query (WHERE image.file_name IN (...))
Try below script:
DATE=<<date>>
SEARCH_PATH=/images/
DATE=`echo $DATE|sed 's/-//g'`
DATE=$DATE"0000"
FILE=~/timecheck_${RANDOM}_$(date +"%Y%m%d%H%M")
touch -t $DATE $FILE
find $SEARCH_PATH -newer $FILE 2>/dev/null|awk 'BEGIN{f=0}{if(f==1)printf("\"%s\", ",l);l=$0;f=1}END{printf("\"%s\"",l)}'
rm -f $FILE
You can convert your date into the "last X days" format that find -mtime expects.
find is the correct command for this task. Send its output somewhere, then parse the file into the query.
Beware of SQL injection attacks if the files were uploaded by users. Beware of special-character quoting even if they weren't.