How to parse fields from different lines to build a new record whith all this fileds - awk

I have a file with this structure:
http://paste.ubuntu.com/21136265/
And I have to capture all the data from the line 'ADSTART ACTION(ADD)' to the next line with this same text, to create a single record, or line.
Sorry but I Can't post an example of the output because is all the data between the 'ADSTART' lines in a single line or record, I'm working under z/OS and we have the concept of record length.
I'm trying this in REXX for z/OS and in AWK in UNIX SYSTEM SERVICES for z/OS, but I'm stuck putting all fields in a line, and I can't figure out how to do it.
I'm capturing the data trough nested loops, but I don't know haw to put it in a single line.

If you're using REXX then why don't you just use the parse instruction to scrape the report file? The parse instruction uses a template pattern which is very simple but powerful.
Here's an example:
/* REXX */
queue "ADSTART ACTION(ADD)"
queue " ADID(ABCD0B ) ADVALFROM(111230) CALENDAR(CALSEM7J )"
queue " DESCR('DESCRIPTION ')"
queue " ADTYPE(A)"
queue " GROUP(PBQOPC )"
queue " OWNER('OWNER1')"
queue " PRIORITY( 5) ADSTAT(A)"
queue " ODESCR('ALADIN ')"
queue "ADRUN ACTION(ADD)"
queue " PERIOD(HEB ) RULE(3) VALFROM(091230) VALTO(711231)"
queue " SHIFT( 0) SHSIGN(F)"
queue " DESCR('DESCRIPTION')"
queue " TYPE(N)"
queue " IADAYS( 1, 2, 3, 4, 5, 6, 7)"
queue " IATIME(1700) DLDAY( 1) DLTIME(0600)"
do while queued() > 0
parse pull rec
select
when startswith(rec,"ADSTART") then do
p. = '' /* the output record */
parse var rec with . 'ACTION('p.action')'
do queued()
parse pull rec
if left(rec,1) /= ' ' then do
/* End of parameter group. Re-queue the record and break */
push rec
leave
end
select
when startswith(rec, " ADID") then do
parse var rec with . "ADID("p.adid") ADVALFROM("p.advalfrom")" ,
"CALENDAR("p.calendar")"
end
when startswith(rec, " DESCR") then do
parse var rec with "DESCR('"p.desc"')"
end
when startswith(rec, " PRI") then do
parse var rec with "PRIORITY("p.priority") ASTAT("p.adstat")"
end
otherwise nop
end
end
/* write out the record in 1 line */
say strip(p.action) strip(p.adid) strip(p.advalfrom) strip(p.calendar),
strip(p.desc) strip(p.priority) strip(p.adstat)
end
when startswith(rec,"ADRUN") then do
/* do some stuff to parse this */
end
otherwise nop
end
end
exit 0
startswith:
parse arg input, prefix
input_len = length(input)
if input_len = 0 then return 0
prefix_len = length(prefix)
if prefix_len = 0 then return 0
return input_len >= prefix_len & left(input,prefix_len) = prefix
Seeing as you're comfortable in z/OS UNIX environment, if you want something a little bit more powerful than REXX and/or AWK you should checkout my z/OS port of Lua. It comes with an LPeg package which makes it trivially easy to write lexers and parsers with very few lines of code.
If all you want to do is text flow the TWS control statements onto one line without capturing the fields then that's very simple to do.
/* REXX */
queue "ADSTART ACTION(ADD)"
queue " ADID(ABCD0B ) ADVALFROM(111230) CALENDAR(CALSEM7J )"
queue " DESCR('DESCRIPTION ')"
queue " ADTYPE(A)"
queue " GROUP(PBQOPC )"
queue " OWNER('OWNER1')"
queue " PRIORITY( 5) ADSTAT(A)"
queue " ODESCR('ALADIN ')"
queue "ADRUN ACTION(ADD)"
queue " PERIOD(HEB ) RULE(3) VALFROM(091230) VALTO(711231)"
queue " SHIFT( 0) SHSIGN(F)"
queue " DESCR('DESCRIPTION')"
queue " TYPE(N)"
queue " IADAYS( 1, 2, 3, 4, 5, 6, 7)"
queue " IATIME(1700) DLDAY( 1) DLTIME(0600)"
do while queued() > 0
parse pull rec
if left(rec,1) /= ' ' then do
line = rec
do queued()
parse pull rec
if left(rec,1) /= ' ' then do
push rec;leave
end
line = line rec
end
say space(line,1)
end
end
exit 0

Try this;
sed -n '/ADSTART ACTION(ADD)/,/ADRUN/p' <filename> | sed 's/ADRUN ACTION(ADD)//g'

Maybe this would do it:
awk '/ADSTART ACTION\(ADD\)/{print buf; buf=""} {buf=buf""$0" "} END{print buf}' test.in
Commented version:
/ADSTART ACTION\(ADD\)/ { # for records where ADSTART occurs
print buf # output the buffer variable
buf="" # then empty the buffer
}
{ # for all records
# gsub(/^ +| +$/,"") # here you could trim leading and trailing space
buf=buf""$0" " # build the buffer
}
END { # in the end
print buf # output the remaining buffer
}

Although the solution above could work for not many lines per block, a solution that only prints the text between ADSTART ACTION (ADD) and assumes only one block will be printed
Bash:
gawk 'BEGIN{s=0} /ADSTART.*ACTION(ADD)/ {s=(s+1)%2} (s==1){ print }' | sed ':a;N;$!ba;s/\n//g'
(ADSTART... lines are omitted)

Thank you very much for all the answers.
At last it was pretty easy, because when I do an FTP from z/OS to USS (Unix System Services for z/OS) in binary, all the data is in one line.
At first I was working with a file transfered with FTP (ASCII xlate) to my PC, and then transmitted to USS in binary FTP with WinSCP.
This is the code I used to replace a text pattern with carriage return:
sed 's/ADSTART ACTION(ADD)/\
/g' <input file> ><output file>
with carriage return inserted by pressing enter key, because /r /'$''' /n /x0D didn't worked in USS, I don't know why.
Thank you all again for your time.
Patricio.

Related

How to return 0 if awk returns null from processing an expression?

I currently have a awk method to parse through whether or not an expression output contains more than one line. If it does, it aggregates and prints the sum. For example:
someexpression=$'JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)'
might be the one-liner where it DOESN'T yield any information. Then,
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
for (i in a) {
printf "%d\n", a[i]
}
}'
this will yield NULL or an empty return. Instead, I would like to have it return a numeric value of $0$ if empty. How can I modify the above to do this?
Nothing in UNIX "returns" anything (despite the unfortunately named keyword for setting the exit status of a function), everything (tools, functions, scripts) outputs X and exits with status Y.
Consider these 2 identical functions named foo(), one in C and one in shell:
C (x=foo() means set x to the return code of foo()):
foo() {
printf "7\n"; // this is outputting 7 from the full program
return 3; // this is returning 3 from this function
}
x=foo(); <- 7 is output on screen and x has value '3'
shell (x=foo means set x to the output of foo()):
foo() {
printf "7\n"; # this is outputting 7 from just this function
return 3; # this is setting this functions exit status to 3
}
x=foo <- nothing is output on screen, x has value '7', and '$?' has value '3'
Note that what the return statement does is vastly different in each. Within an awk script, printing and return codes from functions behave the same as they do in C but in terms of a call to the awk tool, externally it behaves the same as every other UNIX tool and shell script and produces output and sets an exit status.
So when discussing anything in UNIX avoid using the term "return" as it's imprecise and ambiguous and so different people will think you mean "output" while others think you mean "exit status".
In this case I assume you mean "output" BUT you should instead consider setting a non-zero exit status when there's no match like grep does, e.g.:
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
for (i in a) {
print a[i]
}
exit (NR < 2)
}'
and then your code that uses the above can test for the success/fail exit status rather than testing for a specific output value, just like if you were doing the equivalent with grep.
You can of course tweak the above to:
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
if ( NR > 1 ) {
for (i in a) {
print a[i]
}
}
else {
print "$0$"
exit 1
}
}'
if necessary and then you have both a specific output value and a success/fail exit status.
You may keep a flag inside for loop to detect whether loop has executed or not:
echo "$someexpression" |
awk 'NR>1 {
a[$4]++
}
END
{
for (i in a) {
p = 1
printf "%d\n", a[i]
}
if (!p)
print "$0$"
}'
$0$

looks like widechar input from getline in awk

I'm having trouble with AWK that I've never seen before.
I'm reading in a file, no special chars, and printing it back out.
When I read a text file, it prints out with a NUL between every char.
Reading an HTML file works exactly as expected and prints out what was read in.
Code snippet:
while ((getline line < In) > 0) {
print ":0:", line, ":0:" > "out";
reads the line "signature1"
and prints
":0: xFFxFEsNULiNULgNULnNULaNULtNULuNULrNULeNUL1NUL/r
NUL :0:/r/n"
as viewed in Notepad++.
"In" is the input filename.
I assume it is some Language setting on my machine, but I can't find anything.
A second print line, redirected to a file, prints every other line in Chinese.
TL;RD; Complete text of the app:
BEGIN { ProcessFile(); }
function ProcessFile() {
In = "default.txt";
Works = "NoProblem.html";
Out = "quote.txt";
RS = "/n";
while ((getline textLine < In) > 0) {
print "*0*", textLine, "*0:*" > "out.txt";
print textLine > Out; # prints every other line in Chinese ???
}
close(In);
close(Out);
}
Output of the second print line:
signature1
਍猀椀最渀愀琀甀爀攀㈀ഀഀ

Endless recursion in gawk-script

Please pardon me in advance for posting such a big part of my problem, but I just can't put my finger on the part that fails...
I got input-files like this (abas-FO if you care to know):
.fo U|xiininputfile = whatever
.type text U|xigibsgarnich
.assign U|xigibsgarnich
..
..Comment
.copy U|xigibswohl = Spaß
.ein "ow1/UWEDEFTEST.FOP"
.in "ow1/UWEINPUT2"
.continue BOTTOM
.read "SOemthing" U|xttmp
!BOTTOM
..
..
Now I want to recursivly follow each .in[put]/.ein[gabe]-statement, parse the mentioned file and if I don't know it yet, add it to an array. My code looks like this:
#!/bin/awk -f
function getFopMap(inputregex, infile, mandantdir, infiles){
while(getline f < infile){
#printf "*"
#don't match if there is a '
if(f ~ inputregex "[^']"){
#remove .input-part
sub(inputregex, "", f)
#trim right
sub(/[[:blank:]]+$/, "", f)
#remove leading and trailing "
gsub(/(^\"|\"$)/,"" ,f)
if(!(f in infiles)){
infiles[f] = "found"
}
}
}
close(infile)
for (i in infiles){
if(infiles[i] == "found"){
infiles[i] = "parsed"
cmd = "test -f \"" i "\""
if(system(cmd) == 0){
close(cmd)
getFopMap(inputregex, f, mandantdir, infiles)
}
}
}
}
BEGIN{
#Matches something like [.input myfile] or [.ein "ow1/myfile"]
inputregex = "^\\.(in|ein)[^[:blank:]]*[[:blank:]]+"
#Get absolute path of infile
cmd = "python -c 'import os;print os.path.abspath(\"" ARGV[1] "\")'"
cmd | getline rootfile
close(cmd)
infiles[rootfile] = "parsed"
getFopMap(inputregex, rootfile, mandantdir, infiles)
#output result
for(infile in infiles) print infile
exit
}
I call the script (in the same directory the paths are relative to) like this:
./script ow1/UWEDEFTEST.FOP
I get no output. It just hangs up. If I remove the comment before the printf "*" command, I'm seeing stars, without end.
I appreciate every help and hints how to do it better.
My awk:
gawk Version 3.1.7
idk it it's your only problem but you're calling getline incorrectly and consequently will go into an infinite loop in some scenarios. Make sure you fully understand all of the caveats at http://awk.info/?tip/getline and you might want to use the recursion example there as the starting point for your code.
The most important item initially for your code is that when getline fails it can return a negative value so then while(getline f < infile) will create an infinite loop since the failing getline will always be returning non-zero and will so continue to be called and continue to fail. You need to use while ( (getline f < infile) > 0) instead.

How can I check if a GNU awk coprocess is open, or force it to open without writing to it?

I have a gawk program that uses a coprocess. However, sometimes I don't have any data to write to the coprocess, and my original script hangs while waiting for the output of the coprocess.
The code below reads from STDIN, writes each line to a "cat" program, running as a coprocess. Then it reads the coprocess output back in and writes it to STDOUT. If we change the if condition to be 1==0, nothing gets written to the coprocess, and the program hangs at the while loop.
From the manual, it seems that the coprocess and the two-way communication channels are only started the first time there is an IO operation with the |& operator. Perhaps we can start things without actually writing anything (e.g. writing an empty string)? Or is there a way to check if the coprocess ever started?
#!/usr/bin/awk -f
BEGIN {
cmd = "cat"
## print "" |& cmd
}
{
if (1 == 1) {
print |& cmd
}
}
END {
close (cmd, "to")
while ((cmd |& getline line)>0) {
print line
}
close(cmd)
}
Great question, +1 for that!
Just test the return code of the close(cmd, "to") - it will be zero if the pipe was open, -1 (or some other value) otherwise. e.g.:
if (close(cmd, "to") == 0) {
while ((cmd |& getline line)>0) {
print line
}
close(cmd)
}

Awk to scape quotation marks

So I have a file like
select * from tb where start_date = to_date('20131010','yyyymmdd');
p23 VARCHAR2(300):='something something
still part of something above with 'this' between single quotes and close
something to end';
(code goes on)
That would be some automatically generated code which I should be able execute via sqlplus. But that obviously won't work, since the third line should have had its quotes escaped like (..) with ''this'' between (...).
I can't access the script that generated that code, but I was trying to get a awk to do the job. Notice the script has to be smart enough to not scape every quote in the code (the to_date('20131010','yyyymmdd') is correct).
I am no expert in awk, so I went as far as:
BEGIN {
RS=";"
FS="\n"
}
/\tp[0-9]+/{
ini = match($0, "\tp[0-9]+")
fim = match($0, ":='")
s = substr($0,ini,fim+1)
txt = substr($0, fim+3, length($0))
block = substr(txt, 0, length(txt)-1)
print gensub("'", "''", block)
}
!/\tp[0-9]+/{
print $0";"
}
but it went way too messy with the print gensub("'", "''", block) and it is not working.
Can someone give me a quick way out?
You have forgotten one parameter to gensub. Try:
BEGIN {
RS=";"
FS="\n"
}
/^[[:space:]]+p[0-9]+/{
ini = match($0, "\tp[0-9]+")
fim = match($0, ":='")
s = substr($0,ini,fim+1)
txt = substr($0, fim+3, length($0))
block = substr(txt, 0, length(txt)-1)
printf "%s'%s';", s, gensub("'", "''", "g",block)
next
}
{
printf "%s;", $0
}