SED extract first occurance after 2 patterns match

SED extract first occurance after 2 patterns match - awk

I'm trying to use c-shell (I'm afraid that other option is not available) and SED to solve this problem. Given this example file with a report of all some tests that were failing:
============
test_085
============
- Signature code: F2B0C
- Failure reason: timeout
- Error: test has timed out
============
test_102
============
- Signature code: B4B4A
- Failure reason: syntax
- Error: Syntax error on file example.c at line 245
============
test_435
============
- Signature code: 000FC0
- Failure reason: timeout
- Error: test has timed out
I have a script that loops through all the tests that I'm running and I check them against this report to see if has failed and do some statistics later on:
if (`grep -c $test_name $test_report` > 0) then
printf ",TEST FAILED" >>! $report
else
printf ",TEST PASSED" >>! $report
endif
What I would like to do is to extract the reason if $test_name is found in $test_report. For example for test_085 I want to extract only 'timeout', for test_102 extract only 'syntax' and for test_435 'timeout', for test_045 it won't be the case because is not found in this report (meaning it has passed). In essence I want to extract first occurrence after these two pattern matches: test_085, Failure reason:

To extract "Failure reason" for the specified test name - short awk approach:
awk -v t_name="test_102" '$1==t_name{ f=1 }f && /Failure reason/{ print $4; exit }' reportfile
$1==t_name{ f=1 } - on encountering line matching the pattern(i.e. test name t_name) - set the flag f into active state
f && /Failure reason/ - while iterating through the lines under considered test name section (while f is "active") - capture the line with Failure reason and print the reason which is in the 4th field
exit - exit script execution immediately to avoid redundant processing
The output:
syntax

You can try handling RS and FS variables of awk to make the parsing easier:
$ awk -v RS='' -F='==*' '{gsub(/\n/," ")
sub(/.*Failure reason:/,"",$3)
sub(/- Error:.*/,"",$3)
printf "%s : %s\n",$2,$3}' file
output:
test_085 : timeout
test_102 : syntax
test_435 : timeout
If you don't care the newlines, you can remove the gsub() function.

Whenever you have input that has attributes with name to value mappings as your does, the best approach is to first create an array to capture those mappings (n2v[]) below and then access the values by their names. For example:
$ cat tst.awk
BEGIN { RS=""; FS="\n" }
$2 == id {
for (i=4; i<=NF; i++) {
name = value = $i
gsub(/^- |:.*$/,"",name)
gsub(/^[^:]+: /,"",value)
n2v[name] = value
}
print n2v[attr]
}
$ awk -v id='test_085' -v attr='Failure reason' -f tst.awk file
timeout
$ awk -v id='test_085' -v attr='Error' -f tst.awk file
test has timed out
$ awk -v id='test_102' -v attr='Signature code' -f tst.awk file
B4B4A
$ awk -v id='test_102' -v attr='Error' -f tst.awk file
Syntax error on file example.c at line 245
$ awk -v id='test_102' -v attr='Failure reason' -f tst.awk file
syntax

Related

Bash how to split file on empty line with awk

I have a text file (A.in) and I want to split it into multiple files. The split should occur everytime an empty line is found. The filenames should be progressive (A1.in, A2.in, ..)
I found this answer that suggests using awk, but I can't make it work with my desired naming convention
awk -v RS="" '{print $0 > $1".txt"}' file
I also found other answers telling me to use the command csplit -l but I can't make it match empty lines, I tried matching the pattern '' but I am not that familiar with regex and I get the following
bash-3.2$ csplit A.in ""
csplit: : unrecognised pattern
Input file:
A.in
4
RURDDD
6
RRULDD
KKKKKK
26
RRRULU
Desired output:
A1.in
4
RURDDD
A2.in
6
RRULDD
KKKKKK
A3.in
26
RRRULU

Another fix for the awk:
$ awk -v RS="" '{
split(FILENAME,a,".") # separate name and extension
f=a[1] NR "." a[2] # form the filename, use NR as number
print > f # output to file
close(f) # in case there are MANY to avoid running out f fds
}' A.in

In any normal case, the following script should work:
awk 'BEGIN{RS=""}{ print > ("A" NR ".in") }' file
The reason why this might fail is most likely due to some CRLF terminations (See here and here).
As mentioned by James, making it a bit more robust as:
awk 'BEGIN{RS=""}{ f = "A" NR ".in"; print > f; close(f) }' file
If you want to use csplit, the following will do the trick:
csplit --suppress-matched -f "A" -b "%0.2d.in" A.in '/^$/' '{*}'
See man csplit for understanding the above.

Input file content:
$ cat A.in
4
RURDDD
6
RRULDD
KKKKKK
26
RRRULU
AWK file content:
BEGIN{
n=1
}
{
if(NF!=0){
print $0 >> "A"n".in"
}else{
n++
}
}
Execution:
awk -f ctrl.awk A.in
Output:
$ cat A1.in
4
RURDDD
$ cat A2.in
6
RRULDD
KKKKKK
$ cat A3.in
26
RRRULU
PS: One-liner execution without AWK file:
awk 'BEGIN{n=1}{if(NF!=0){print $0 >> "A"n".in"}else{n++}}' A.in

How to avoid the return of system command with awk?

When I use awk with system command like this :
awk 'BEGIN{ if ( system("wc -l file_1") == 0 ) {print "something"} }' text.txt >> file_1
the result of system command is writen in my file file_1 :
0 file_1
something
How to avoid that? or just to redirect the output?

You appear to be under the impression that the output of the system() function includes the stdout of the command it runs. It does not.
If you want to test only for the existence of a non-zero-sized file, you might do it using the test command (on POSIX systems):
awk '
BEGIN{
if ( system("test -s file_1") ) { # a return value of 0 is "false" to awk
print "something"
}
}' text.txt >> file_1

Awk script: How to prevent ARGV from being treated as an input file name

It seems that awk script considers ARGV[1] to ARGV[ARGC] as input files.
Is there any way to make awk considering ARGV as simple arguments instead of an input file
Example:
test.awk
#!/usr/bin/awk -f
BEGIN {title=ARGV[2]}
{if ($1=="AA") {print title}}
dat file
AB
BA
AA
CC
$ test.awk dat 'My Interesting Title'
My Interesting Title
awk: test.awk:3: fatal: cannot open file `My Interesting Title' for reading (No such file or directory)

You can modify ARGV at any time. Awk processes the elements of ARGV in turn, so if you modify them during processing, you can arrange to read different files or not to treat some arguments as file names. In particular, if you modify ARGV in the BEGIN block, anything is possible. For example, the following snippet causes awk to read from standard input even when arguments were passed, and saves the arguments in an array called args:
awk '
BEGIN {for (i in ARGV) {args[i] = ARGV[i]; delete ARGV[i]}}
…
' hello world
If you just want to skip the first argument, delete it only:
awk '
BEGIN {title = ARGV[1]; delete ARGV[1]}
$1 == "AA" {print title}
' 'My Interesting Title' input.txt
However, this is unusual and therefore may be considered hard to maintain. Consider using a shell wrapper and passing the title through an environment variable instead.
#!/bin/sh
title=$1; shift
awk '
$1 == "AA" {print ENV["title"]}
' "$#"
You can also pass a string as an awk variable. Beware that the value undergoes backslash expansion.
awk -v 'title=My Interesting Title\nThis is a subtitle' '
$1 == "AA" {print title} # prints two lines!
' input.txt

Something like this?
$ awk -v title='My Interesting Title' '$0 ~ /AA/ {print title}1' input
AB
BA
My Interesting Title
AA
CC

Yes:
BEGIN{title=ARGV[2];ARGV[--ARGC]=""}
$1=="AA" {print title}
but you probably want this instead:
$ cat tst.sh
awk -v title="$2" '$1=="AA" {print title}'
See http://cfajohnson.com/shell/cus-faq-2.html#Q24 for details on those and the other ways to pass the value of shell variables to awk scripts.
As an aside, note that whether you use this script or your original, the contents of your file is a shell script that calls awk, not an awk script, so the suffix should not be .awk, it should be .sh or similar.

You can decrement ARGC after reading arguments so that only the first(s) argument(s) is(are) considered by awk as input file(s) :
#!/bin/awk -f
BEGIN {
for (i=ARGC; i>2; i--) {
print ARGV[ARGC-1];
ARGC--;
}
}
…
Or alternatively, you can reset ARGC after having read all arguments :
#!/bin/awk -f
BEGIN {
for (i=0; i<ARGC; i++) {
print ARGV[ARGC-1];
}
ARGC=2;
}
…
Both methods will correctly process myawkscript.awk foobar foo bar … as if foobar was the only file to process (of course you can set ARGC to 3 if you want the two first arguments as files, etc.).

allow also
awk 'BEGIN {title=ARGV[2]}
{if ($1=="AA") {print title}}
' input.txt -v "title=My Interesting Title"
argument for ARGV are also any string (argument of command line) of format varname=VarContent

simple awk string comparison unexpected result

In general string comparison, "A" > "a" is false.
However, I am getting unexpected result from this awk execution:
$ echo "A a"| awk '{if ($1 > $2) print "gt"; else print "leq"}'
gt
What am I missing?
Environment info:
$ uname -r -s -v -M
AIX 1 6 IBM,9110-510
$ locale
LANG=en_AU.8859-15
LC_COLLATE="en_AU.8859-15"
LC_CTYPE="en_AU.8859-15"
LC_MONETARY="en_AU.8859-15"
LC_NUMERIC="en_AU.8859-15"
LC_TIME="en_AU.8859-15"
LC_MESSAGES="en_AU.8859-15"
LC_ALL=
Diagnostics:
$ echo "A a"| awk '{print NF}'
2
Update It produces the correct result after setting LC_ALL=POSIX (thanks JS웃). Need to investigate further into this.

I am unable to reproduce this but you can force a string comparison by concatenating the operand with the null string:
echo "A a"| awk '{if ($1"" > $2"") print "gt"; else print "leq"}'
Note: Concatenating with any one operand should suffice.
Update:
As suspected the locale settings of OP were causing the issue. After setting LC_ALL=POSIX the issue was resolved.

awk won't print new line characters

I am using the below code to change an existing awk script so that I can add more and more cases with a simple command.
echo `awk '{if(/#append1/){print "pref'"$1"'=0\n" $0 "\n"} else{print $0 "\n"}}' tf.a
note that the first print is "pref'"$1"'=0\n" so it is referring to the variable $1 in its environment, not in awk itself.
The command ./tfb.a "c" should change the code from:
BEGIN{
#append1
}
...
to:
BEGIN{
prefc=0
#append1
}
...
However, it gives me everything on one line.
Does anyone know why this is?

If you take awk right out of the equation you can see what's going on:
# Use a small test file instead of an awk script
$ cat xxx
hello
there
$ echo `cat xxx`
hello there
$ echo "`cat xxx`"
hello
there
$ echo "$(cat xxx)"
hello
there
$
The backtick operator expands the output into shell "words" too soon. You could play around with the $IFS variable in the shell (yikes), or you could just use double-quotes.
If you're running a modern sh (e.g. ksh or bash, not the "classic" Bourne sh), you may also want to use the $() syntax (it's easier to find the matching start/end delimiter).

do it like this. pass the variable from shell to awk properly using -v
#!/bin/bash
toinsert="$1"
awk -v toinsert=$toinsert '
/#append1/{
$0="pref"toinsert"=0\n"$0
}
{print}
' file > temp
mv temp file
output
$ cat file
BEGIN{
#append1
}
$ ./shell.sh c
BEGIN{
prefc=0
#append1
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SED extract first occurance after 2 patterns match - awk

Related

Bash how to split file on empty line with awk

How to avoid the return of system command with awk?

Awk script: How to prevent ARGV from being treated as an input file name

simple awk string comparison unexpected result

awk won't print new line characters

Categories

Resources