How to manipulate how getline sees lines - openvms

I have some code on openVMS where getline doesn't split the lines the same way as VMS editors for example.
Is there some way to manipulate how getline return lines?
It worked well with files ftped over, put it doesnt work with some other files - i think it is RMS fixed length, with a lot of binary zeroes in them.
I am using ifstream.getline(buffer, maxsize), but it can be any getline.

This may be an issue with the RMS record attributes, notably a missing implied newline.
Check out $ help set file /attr
Look for the various RAT options.
How did this file come to be?
If you would like further help, then please provide detail on the file with 'issues'.
Attached, or include, output from
$ DIRECTORY/FULL x.x
and if possible
$ DUMP/RECORD=COUNT=3/WID=80 x.x
as well as
$ DUMP/BLOCK=COUNT=1/WID=80 x.x
Hope this helps,
Hein

Related

Openvms: Extracting RMS Indexed file t to Windows as a sequential flat file

I haven't used openvms for 20+ years. It was my 1st OS. I've been asked if it possible to copy the data from RMS files from openvms server to windows as a text file - so that it's readable.
No-one has experience or knowledge of the record structures etc.
The files are xyz.DAT and are relative files. I'm hoping the dat files are fixed length.
My 1st attempt would be to try and use Datatrieve (DTR) but get an error that the image isn't loaded.
Thought it might be as easy using CONVERT/FDL = nnnn.FDL - by changing the Relative to Sequential. The file seems still to be unreadable.
Is there an easy way to stream an RMS index file to a flat ASCII file?
I use to use COBOL and C to access the data in the past but had lots of libraries to help....
I've notice some solution may use odbc to connect but not sure what I can or cannot install on the server.
I can FTP using Filezilla to the server....
Another plan writing C application to read a file and output out as string.....or DCL too.....doesn't have to be quick...
Any ideas
Has mentioned before
The simple solution MIGHT be to to just use: $ TYPE/OUT=test.TXT test.DAT.
This will handle Relatie and Indexed files alike.
It is much the same as $ CONVERT / FDL=NL: test.DAT test.TXT
Both will just read records from the source and transfer the bytes, byte for byte, to the records in a sequential file.
FTP in ASCII mode will transfer that nicely to windows.
You can also use an 'inline' FDL file to generate a 'unix' LF file like:
$ conv /fdl="record; format stream_lf" test.DAT test.TXT
Or CR-LF file using:
$ conv /fdl="record; format stream" test.DAT test.TXT
Both can be transferring in Binary or Ascii with FTP.
MOSTLY - because this really only works well for TEXT ONLY source .DAT file.
There should be no CR, LF, FF or NUL characters in the source or things will break.
As 'habo' points out, use DUMP /RECORD=COUNT=3 to see how 'readable' the source data is.
If you spot 'binary' data using DUMP then you will need to find a record defintion somewhere which maps byte to Integers or Floating points or Dates as needed.
These defintions can be COBOL LIB files, or BASIC MAPS and are often stores IN the CDD (Common Data Dictionary) or indeed in DATATRIEVE .DIC DICTIONARIES
To use such definition you likely need a program to just read following the 'map' and write/print as text. Normally that's not too hard - notably not when you can find an example program on the server to tweak.
If it is just one or two 'suspect' byte ranges, then you can create a DCL loop to read and write and use F$EXTRACT to select the chunks you like.
If you want further help, kindly describe in words what kind of data is expected and perhaps provide the output from DUMP for 3 or 5 rows.
Good luck!
Hein.

Compare 2 files in awk and append Pass / Fail

Thank you all for the feedback. Apologies since I am newer to coding and new to SO. Below is the code I have currently been running.
awk 'FNR==NR{a[$4,$5]=$0}{if(b=a[$4,$5]); print b, "PASS";next}else{if(b!=a[$4,$5]){print a, b, "FAIL";next}}'
This appends a PASS next to each line if it is the same, but does print FAIL if there are any inconsistencies in the line.
Trying to get myself more familiar with awk. Using FNR==NR I've been able to compare 2 files (line by line) and then print PASS at the end of the file. However, I cannot actually get it to properly fail the scenario and print FAIL if they do not match. Could anybody help a noobie out?
Here is some awk script to get you started.
$ awk 'NR==FNR{a[NR]=$0;next}
{f=$0!=a[FNR]; delete a[FNR]}
f{c=FNR;exit}
END{c=c?c:(FNR+1);print f||(c in a)?"FAIL on line "c:"PASS"}'
additional complexity is due to files might have different lengths. Note also that there are existing tools (diff, comm, ...) to do this in a much compact way.

Buffering output with AWK

I have an input file which consists of three parts:
inputFirst
inputMiddle
inputLast
Currently I have an AWK script which with this input creates an output file which consists of two parts:
outputFirst
outputLast
where outputFirst and outputLast is generated (on the fly) from inputFirst and inputLast respectively. However, to calculate the outputMiddle part (which is only one line) I need to scan the entire input, so I store it in a variable. The problem is that the value of this variable should go in between outputFirst and outputLast in the output file.
Is there a way to solve this using a single portable AWK script that takes no arguments? Is there a portable way to create temporary files in an AWK script or should I store the output from outputFirst and outputLast in two variables? I suspect that using variables will be quite inefficient for large files.
All versions of AWK (since at least 1985) can do basic I/O redirection to files or pipelines, just like the shell can, as well as run external commands without I/O redirection.
So, there are any number of ways to approach your problem and solve it without having to read the entire input file into memory. The most optimal solution will depend on exactly what you're trying to do, and what constraints you must honour.
A simple approach to the more precise example problem you describe in your comment above would perhaps go something like this: first in the BEGIN clause form two unique filenames with rand() (and define your variables), then read and sum the first 50 numbers from standard input while also writing them to a temporary file, then continuing to read and sum the next 50 numbers and write them to a second file, then finally in an END clause you would use a loop to read the first temporary file with getline and write it to standard output, print the total sum, then read the second temporary file the same way and write it to standard output, and finally call system("rm " file1 " " file2) to remove the temporary files.
If the output file is not too large (whatever that is), saving outputLast in a variable is quite reasonable. The first part, outputFirst, can (as described) be generated on the fly. I tried this approach and it worked fine.
Print the "first" output while processing the file, then write the remainder to a temporary file until you have written the middle.
Here is a self-contained shell script which processes its input files and writes to standard output.
#!/bin/sh
t=$(mktemp -t middle.XXXXXXXXX) || exit 127
trap 'rm -f "$t"' EXIT
trap 'exit 126' HUP INT TERM
awk -v temp="$t" "NR<500000 { print n+1 }
{ s+=$1 }
NR>=500000 { print n+1 >>temp
END { print s }' "$#"
cat "$t"
For illustration purposes, I used really big line numbers. I'm afraid your question is still too vague to really obtain a less general answer, but perhaps this can help you find the right direction.

correct way to write to the same file from multiple processes awk

The title says it all.
I have 4 awk processes logging to the same file, and output seems fine, not mangled, but I'm not sure that just redirecting print output like this: print "xxx" >> file in every process is the right way to do it.
There are many similar questions around the site, but this one is particularly about awk and a pragmatic, code-correct way to approach the problem.
EDIT
Sorry folks, of course I wasn't "just redirecting" like I wrote, I was appending.
No it is not safe.
the awk print "foo" > "file" will open the file and overwrite the file content, till the end of script.
That is, if your 4 awk processes started writing to the same file on different time, they overwrite the result of each other.
To reproduce it, you could start two (or more) awk like this:
awk '{while(++i<9){system("sleep 2");print "p1">"file"}}' <<<"" &
awk '{while(++i<9){system("sleep 2");print "p2">"file"}}' <<<"" &
and same time you monitoring the content of file, you will see finally there are not exactly 8 "p1" and 8 "p2".
using >> could avoid the losing of entries. but the entry sequence from 4 processes could be messed up.
EDIT
Ok, the > was a typo.
I don't know why you really need 4 processes to write into same file. as I said, with >>, the entries won't get lost (if you awk scripts works correctly). however personally I won't do in this way. If I have to have 4 processes, i would write to different files. well I don't know your requirement, just speaking in general.
outputting to different files make the testing, debugging easier.. imagine when one of your processes had problem, you want to solve it. etc...
I think using the operating system print command is save. As in fact this will append the file write buffer with the string you provide as log. So the system will menage the actual writing process of the data to disc, also if another process will want to use the same file the system will see that the resource is already claimed and will wait for 1st thread to finish its processing, than will allow the 2nd process to write to the buffer.

Script for Testing with Input files and Output Solutions

I have a set of *.in files and a set of *.soln files with matching files names. I would like to run my program with the *.in file as input and compare the output to the ones found in the *.soln files. What would be the best way to go about this? I can think of 3 options.
Write some driver in Java to list files in the folder, run the program, and compare. This would be hard and difficult.
Write a bash script to do this. How?
Write a python script to do this?
I would go for a the bash solution. Also given that what you are doing is a test, I would always save the output of the myprogram so that if there are failures, that you always have the output to compare it to.
#!/bin/bash
for infile in *.in; do
basename=${infile%.*}
myprogram $infile > $basename.output
diff $basename.output $basename.soln
done
Adding the checking or exit statuses etc. as required by your report.
If the program exists, I suspect the bash script is the best bet.
If your soln files are named right, some kind of loop like
for file in base*.soln
do
myprogram > new_$file
diff $file new_$file
done
Of course, you can check the exit code of diff and
do various other things to create a test report . . .
That looks simplest to me . . .
Karl
This is primarily a problem that requires the use of the file-system with minimal logic. Bash isn't a bad choice for such problems. If it turns out you want to do something more complicated than just comparing for equality Python becomes a more attractive choice. Java doesn't seem like a good choice for a throwaway script such as this.
Basic bash implementation might look something like this:
cd dir_with_files
program=your_program
input_ext=".in"
compare_to_ext=".soIn"
for file in *$from_extension; do
diff <("$program" "$i") "${file:0:$((${#file}-3))}$compare_to_ext"
done
Untested.