awk if else condition when string is not present in the file - awk

I am trying to use awk to convert a multi-line block to a single record and then trying to run a search operation on it. I am running lspci -v as the input command, but to I have mock the data for this question.
Input data:
name: foobar
data: 123 bad
name: foozoo
data: 123 good
name: foozoo
data: 123 bad
name: zoobar
data: 123 good
name: barzpp
data: 123 bad
First I converted the input data that was in blocks into single-line records.
awk -v RS='' '{$1=$1}1' xx
name: foobar data: 123 bad
name: foozoo data: 123 good
name: foozoo data: 123 bad
name: zoobar data: 123 good
name: barzpp data: 123 bad
Now I am searching for a string "foozoo" and this gives me desired results. Here, I am first checking if foozoo is present on the line, and then I am checking if .*good is present in the same line. This works fine.
awk -v RS='' -v var=foozoo '{$1=$1}; {if(match($0,var)) if(match($0,var ".*good")) print var " is good"; else print var " is missing"}' xx
foozoo is good
foozoo is missing
Now, when I supply a non-existing string the awk will return nothing, which make sense as there is no else block.
awk -v RS='' -v var=THIS_DOES_NOT_EXIST '{$1=$1}; {if(match($0,var)) if(match($0,var ".*good")) print var " is good"; else print var " is missing"}' xx
When I put else block and search for an existing, string in the input. I get this, I do not want this. I only want the foozoo is good and foozoo is bad lines.
awk -v RS='' -v var=foozoo '{$1=$1}; {if(match($0,var)) {if(match($0,var ".*good")) print var " is good"; else print var " is missing"} else {print "NON-EXISTING_DATA_REQUESTED"}}' xx
NON-EXISTING_DATA_REQUESTED
foozoo is good
foozoo is missing
NON-EXISTING_DATA_REQUESTED
NON-EXISTING_DATA_REQUESTED
Similarly, when I run for non-existing data, I get the line NON-EXISTING_DATA_REQUESTED for each, record, how to print just one line saying data does not exist.
awk -v RS='' -v var=monkistrying '{$1=$1}; {if(match($0,var)) {if(match($0,var ".*good")) print var " is good"; else print var " is missing"} else {print "NON-EXISTING_DATA_REQUESTED"}}' xx
NON-EXISTING_DATA_REQUESTED
NON-EXISTING_DATA_REQUESTED
NON-EXISTING_DATA_REQUESTED
NON-EXISTING_DATA_REQUESTED
NON-EXISTING_DATA_REQUESTED
Here's that last script above formatted legibly by gawk -o-:
{
$1 = $1
}
{
if (match($0, var)) {
if (match($0, var ".*good")) {
print var " is good"
} else {
print var " is missing"
}
} else {
print "NON-EXISTING_DATA_REQUESTED"
}
}

It sounds to me that you only want to print NON-EXISTING_DATA_REQUESTED if no matches (foozoo and good) are found and then only print one occurrence of NON-EXISTING_DATA_REQUESTED; if this is correct, one idea would be to keep track of the number of matches and in an END{...} block if that count is zero then print the single occurrence of NON-EXISTING_DATA_REQUESTED ...
Found a match:
awk -v RS='' -v var=foozoo '
{ $1=$1 }
{ if(match($0,var)) {
# found++ # uncomment if both "is good" AND "is missing" should be considered as "found"
if(match($0,var ".*good"))
{ print var " is good"; found++ } # remove "found++" if the previous line is uncommented
else
{ print var " is missing" }
}
}
END { if (!found) print "NON-EXISTING_DATA_REQUESTED" }
' xx
foozoo is good
foozoo is missing
Found no matches:
awk -v RS='' -v var=monkistrying '
{ $1=$1 }
{ if(match($0,var)) {
# found++
if(match($0,var ".*good"))
{ print var " is good"; found++ }
else
{ print var " is missing" }
}
}
END { if (!found) print "NON-EXISTING_DATA_REQUESTED" }
' xx
NON-EXISTING_DATA_REQUESTED

There's no need to compress your records onto individual lines, that's just wasting time and potentially making the comparisons harder, and by using match() you're treating var as a regexp and doing a partial record comparison when it looks like you just want a string full-field comparison. Try match($0,var) when the input contains badfoozoohere and foozoo given -v var=foozoo to see one way in which the way you're using match() will fail (there are several others). Also since you aren't using RSTART or RLENGTH, using match($0,var) instead of $0 ~ var was inefficient anyway.
$ cat tst.awk
BEGIN { RS="" }
$2 == var {
print var, "is", ( $NF == "good" ? "good" : "missing" )
found = 1
}
END {
if ( !found ) {
print "NON-EXISTING_DATA_REQUESTED"
}
}
$ awk -v var='foozoo' -f tst.awk file
foozoo is good
foozoo is missing
$ awk -v var='monkistrying' -f tst.awk file
NON-EXISTING_DATA_REQUESTED

single-pass awk based solution w/o needing to transform the data:
bytes xC0 \300, xC1 \301, and xF9 \371 aren't UTF-8 valid,
so chances of them appearing in input data are absolutely minuscule
INPUT
name: foobar
data: 123 bad
name: foozoo
data: 123 good
name: foozoo
data: 123 bad
name: zoobar
data: 123 good
name: barzpp
data: 123 bad
CODE (gawk, mawk 1/2, or LC_ALL=C nawk)
{m,n~,g}awk '
BEGIN {
______ = "NON-EXISTING_DATA_REQUESTED\n"
FS = "((data|name): ([0-9]+ )?|\n)+"
RS = "^$" (ORS = _)
___ = "\300"
____ = "\371"
_____ =(_="\301")(__="foozoo")(\
OFS = _)
} ! ( NF *= /name: foozoo[ \n]/) ? $NF = ______\
: gsub(_____ "bad"_, (_)(___)_) + \
gsub(_____ "good"_,(_)(____)_) + gsub("[\1-~]+","")+\
gsub( ___, __ " is missing\n") + \
gsub(____, __ " is " "good\n") + gsub((___)"|"(_)("|")____,"")'
OUTPUT
foozoo is good
foozoo is missing

Related

Awk printing each line in file seperatly

I am making a script that takes a list of zones records and values and puts it in a DNS server.
The formatting for the wanted output is just for ansible, the problem is i get that i can't operate on each line seperatly with awk.
What it does is when i don't mention a NR it prints all the items in the same line.
When i mention an NR it prints either nothing or only the specified NR (Ie only if i do NR==1 it will print the first line)
My objective is to iterate on all lines and print them in the format i want with newline after end of line.
bash_script
#! /bin/bash
read -p "File: " file
zone=`awk 'NR==1 { print $2}' ${file}`
echo "${zone}:" >> /etc/ansible/roles/create_dns/defaults/main/${zone}.yml
lines=`wc -l < ${file}`
for line_num in $(seq 1 $lines)
do
echo $line_num
echo `awk 'NR==$line_num {print " - { zone: \x27" $2"\x27, record: \x27" $1"\x27, value: \x27" $3"\x27 }\\n"}' ${file}` >> /etc/ansible/roles/create_dns/defaults/main/${zone}.yml
done
$file
ansible ben.com 10.0.0.10
test ben.com 10.0.0.110
example ben.com 10.0.0.120
Wanted output:
ben.com:
- { zone: 'ben.com', record: 'ansible', value: '10.0.0.10' }
- { zone: 'ben.com', record: 'test', value: '10.0.0.110' }
- { zone: 'ben.com', record: 'example', value: '10.0.0.120' }
Output i get:
ben.com:
You can use this single awk for this:
read -p "File: " file
awk '{printf "\t- { zone: \047%s\047, record: \047%s\047, value: \047%s\047 }\n", $2, $1, $3 > $2}' "$file"
cat ben.com
- { zone: 'ben.com', record: 'ansible', value: '10.0.0.10' }
- { zone: 'ben.com', record: 'test', value: '10.0.0.110' }
- { zone: 'ben.com', record: 'example', value: '10.0.0.120' }```
With your shown samples, please try following solution. This is a Generic solution, on basis of, you could give a number of column names in BEGIN section of this program under split section, but this is considering that you want to add strings(eg: zone, record etc) before each field/column values. IN case your number of strings are lesser than number of fields/columns in Input_file then you can change condition from i<=NF too as per your need, to fetch how many columns you want to get.
read -p "File: " file
awk -v s1='\047' 'BEGIN{OFS=", ";split("zone:,record:,value:",headings,",")} {for(i=1;i<=NF;i++){$i=headings[i]" " s1 $i s1};$0=" - { " $0" }"} 1' "$file"
Adding a non one liner form of solution here:
awk -v s1="'" '
BEGIN{
OFS=", "
split("zone:,record:,value:",headings,",")
}
{
for(i=1;i<=NF;i++){
$i=headings[i]" " s1 $i s1
}
$0=" - { " $0" }"
}
1
' "$file"

Getting awk to print a line with a keyword, but only within a range

I am using FreeBSD's geom command to gather information about partitions on my storage devices and filter it using awk. Specifically, I'm trying to extract two lines from the Providers section of the output: Mediasize, and type.
This is what the unfiltered output looks like:
$ geom part list da0
Geom name: da0
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 120845263
first: 40
entries: 128
scheme: GPT
Providers:
1. Name: da0p1
Mediasize: 61872754688 (58G)
Sectorsize: 512
Stripesize: 0
Stripeoffset: 20480
Mode: r0w0e0
efimedia: HD(1,GPT,1b5fe285-3be5-11ea-8179-b827ebb30e4e,0x28,0x733f3a8)
rawuuid: 1b5fe285-3be5-11ea-8179-b827ebb30e4e
rawtype: 516e7cb6-6ecf-11d6-8ff8-00022d09712b
label: (null)
length: 61872754688
offset: 20480
type: freebsd-ufs
index: 1
end: 120845263
start: 40
Consumers:
1. Name: da0
Mediasize: 61872793600 (58G)
Sectorsize: 512
Mode: r0w0e0
I can use this awk one-liner to get Mediasize and type, but it returns both the Providers and Consumers Mediasize: since the search string appears in both sections:
$ geom part list da0 | awk '/Mediasize:/ { print $2 } /[ ]+type:/ { print $2 }'
61872754688
freebsd-ufs
61872793600
I can use this command to limit the output to only the lines that fall between Providers: and Consumers:
$ geom part list da0 | awk '/Providers:/,/Consumers:/'
Providers:
1. Name: da0p1
Mediasize: 61872754688 (58G)
Sectorsize: 512
Stripesize: 0
Stripeoffset: 20480
Mode: r0w0e0
efimedia: HD(1,GPT,1b5fe285-3be5-11ea-8179-b827ebb30e4e,0x28,0x733f3a8)
rawuuid: 1b5fe285-3be5-11ea-8179-b827ebb30e4e
rawtype: 516e7cb6-6ecf-11d6-8ff8-00022d09712b
label: (null)
length: 61872754688
offset: 20480
type: freebsd-ufs
index: 1
end: 120845263
start: 40
Consumers:
What I'm struggling with is how to combine the two into an awk one-liner, to print Mediasize:, but only from the Providers: section.
I've tried this, but it gives me errors:
$ geom part list da0 | awk '/Providers:/,/Consumers:/ { /Mediasize:/ { print $2 } /[ ]+type:/ { print $2 } }'
awk: syntax error at source line 1
context is
/Providers:/,/Consumers:/ { /Mediasize:/ >>> { <<<
awk: illegal statement at source line 1
awk: syntax error at source line 1
Piping the output of one awk program to another gets me what I want, but it seems like a kludge.
$ geom part list da0 | awk '/Providers:/,/Consumers:/' | awk '/Mediasize:/ { print $2 } /[ ]+type:/ { print $2 }'
61872754688
freebsd-ufs
Ideally, I'd like to get the output from a single awk one-liner.
Ways I can think of (ordered from most elegant to least elegant) include:
1) Somehow fixing awk '/Providers:/,/Consumers:/ { /Mediasize:/ { print $2 } /[ ]+type:/ { print $2 } }'
2) Exiting premeturely once the Consumers: keyword is encountered.
3) Using a flag to toggle printing off once the Consumers: keyword is encountered.
I can get #3 to work, with a flag and a ternary operator, but it seems less than elegant:
$ geom part list da0 | awk '/Mediasize:/ { print (++flag==1)?$2:"" } /[ ]type:/ { print (flag==1)?$2:"" }'
61872754688
freebsd-ufs
Any ideas on how I might get solution #1 or #2 to work, or perhaps another solution I am overlooking?
Untested:
/Mediasize/ { print $2 }
/type/ { print $2 }
/Consumers/ { exit }
You could use a flag, for example:
awk '/Providers/ {f=1; next} f && /Mediasize/{print $2; f=0}
This can be read as after matching Providers, find Mediazise and return the second field.
For those interested in the final outcome, I was able to put user448810's answer to work and get the output I wanted.
The command:
geom part list mmcsd0 | awk 'BEGIN { printf "{" } /Name/ { printf "%s\n \"%s\": { ", (++count==1)?"":",", $3 } /Mediasize/ { printf "\"size\": %s, ", $2 } / type:/ { printf "\"type\": \"%s\" }", $2 } /Consumers/ { exit } END { printf "\n}\n" }'
The output:
{
"mmcsd0s1": { "size": 52383744, "type": "fat32lba" },
"mmcsd0s2": { "size": 31052026368, "type": "freebsd" }
}
Qué bello!

AWK Column not printing at all when others do?

I'm having a very weird issue where I'm working with two files
My awk script:
Its meant to match in both first fields of both files where rows are equal. Then do other conditionals on the other fields and check if they match. This seems to be working fine for all other fields however the second field, $2, of the first file fails to be populated.
#!/bin/awk -f
BEGIN {
FS=OFS=","
total = 0;
}
FNR==NR{
reg[$1] = $1;
reg_s[$2] = $2;
account[$3] = $3;
site_name[$4] = $4;
next;
}
{
if ($1 in reg)
if ( (($2 != "Yes") && (reg_s[$2] == "3")) || (($2 == "Yes") && (reg_s[$2] != "3")) ) {
print "Status Error";
total++;
}
}
END {
print " - DONE - " total" Errors"
}
Where am I going wrong?
file1:
abcd,3,Paper,go
abcde,3,stapler,staples
abb,0,pencil,sharpener
file2:
abcd,Yes,Paper,go
abcde,Yes,stapler,staples
abb,No,pencil,sharpener
to run it:
awk -f myscript.awk file1 file2
Here is something you can use...
$ join -t, <(sort file1) <(sort file2) |
awk -F, '($2==3) != ($5=="Yes"){count++} END{print count+0}'
join the files by the key (need to be sorted first), count the matching records. Note that !a && b || a && !b is the definition of xor and can be written simply as a!=b as I did above.
This should print zero. (count+0 is to initialize the value as a numeric in case it never satisfies the condition)
Run the script with the following debug modifications. It debugs the first part, when you populate the arrays:
#!/bin/awk -f
BEGIN {
FS=OFS=","
total = 0;
}
FNR==NR{
reg[$1] = $1;
reg_s[$2] = $2;
account[$3] = $3;
site_name[$4] = $4;
next;
}
{
print "----------reg----------------"
for (key in reg) { print key " : " reg[key] }
print "----------reg_s--------------"
for (key in reg_s) { print key " : " reg_s[key] }
print "----------account------------"
for (key in account) { print key " : " account[key] }
print "-----------site_name---------"
for (key in site_name) { print key " : " site_name[key] }
print "============================"
}
The output is:
----------reg----------------
abcd : abcd
abb : abb
abcde : abcde
----------reg_s--------------
0 : 0
3 : 3
----------account------------
stapler : stapler
Paper : Paper
pencil : pencil
-----------site_name---------
staples : staples
go : go
sharpener : sharpener
============================
As you can see, all arrays have three items except reg_s, and it is because reg_s gets assigned twice with the same key "3", and in awk when an array item is assigned with an existing key, it doesn't create a new array item, instead it replace the prevoius value.
That is why you have all arrays with three elements, because they have all differente keys, except this one, reg_s, which was populated using only two different keys, "3", and "0".
Hope this help, I can edit and elaborate some more if you need.

Convert rows into columns using awk

Not all columns (&data) are present for all records. Hence whenever fields missing are missing, they should be replaced with nulls.
My Input format:
.set 1000
EMP_NAME="Rob"
EMP_DES="Developer"
EMP_DEP="Sales"
EMP_DOJ="20-10-2010"
EMR_MGR="Jack"
.set 1001
EMP_NAME="Koster"
EMP_DEP="Promotions"
EMP_DOJ="20-10-2011"
.set 1002
EMP_NAME="Boua"
EMP_DES="TA"
EMR_MGR="James"
My desired output Format:
Rob~Developer~Sales~20-10-2010~Jack
Koster~~Promotions~20-10-2011~
Boua~TA~~~James
I tried the below:
awk 'NR>1{printf "%s"(/^\.set/?RS:"~"),a} {a=substr($0,index($0,"=")+1)} END {print a}' $line
This is printing:
Rob~Developer~Sales~20-10-2010~Jack
Koster~Promotions~20-10-2011~
Boua~TA~James~
This awk script produces the desired output:
BEGIN { FS = "[=\"]+"; OFS = "~" }
/\.set/ { ++records; next }
NR > 1 { f[records,$1] = $2 }
END {
for (i = 1; i <= records; ++i) {
print f[i,"EMP_NAME"], f[i,"EMP_DES"], f[i,"EMP_DEP"], f[i,"EMP_DOJ"], f[i,"EMR_MGR"]
}
}
A two-dimensional array is used to store all of the values that are defined for each record.
After all the file has been processed, the loop goes through each row of the array and prints all of the values. The elements that are undefined will be evaluated as an empty string.
Specifying the elements explicity allows you to control the order in which they are printed. Using print rather than printf allows you to make correct use of the OFS variable which has been set to ~, as well as the ORS which is a newline character by default.
Thanks to #Ed for his helpful comments that pointed out some flaws in my original script.
Output:
Rob~Developer~Sales~20-10-2010~Jack
Koster~~Promotions~20-10-2011~
Boua~TA~~~James
$ cat tst.awk
BEGIN{ FS="[=\"]+"; OFS="~" }
/\.set/ { ++numRecs; next }
{ name2val[numRecs,$1] = $2 }
!seen[$1]++ { names[++numNames] = $1 }
END {
for (recNr=1; recNr<=numRecs; recNr++)
for (nameNr=1; nameNr<=numNames; nameNr++)
printf "%s%s", name2val[recNr,names[nameNr]], (nameNr<numNames?OFS:ORS)
}
$ awk -f tst.awk file
Rob~Developer~Sales~20-10-2010~Jack
Koster~~Promotions~20-10-2011~
Boua~TA~~~James
If you want some pre-defined order of fields in your output rather than creating it on the fly from the rows in each record as they're read, just populate the names[] array explicitly in the BEGIN section and if you have that situation AND don't want to save the whole file in memory:
$ cat tst.awk
BEGIN{
FS="[=\"]+"; OFS="~";
numNames=split("EMP_NAME EMP_DES EMP_DEP EMP_DOJ EMR_MGR",names,/ /)
}
function prtName2val( nameNr, i) {
if ( length(name2val) ) {
for (nameNr=1; nameNr<=numNames; nameNr++)
printf "%s%s", name2val[names[nameNr]], (nameNr<numNames?OFS:ORS)
delete name2val
}
}
/\.set/ { prtName2val(); next }
{ name2val[$1] = $2 }
END { prtName2val() }
$ awk -f tst.awk file
Rob~Developer~Sales~20-10-2010~Jack
Koster~~Promotions~20-10-2011~
Boua~TA~~~James
The above uses GNU awk for length(name2val) and delete name2val, if you don't have that then use for (i in name2val) { do stuff; break } and split("",name2val) instead..
This is all I can suggest:
awk '{ t = $0; sub(/^[^"]*"/, "", t); gsub(/"[^"]*"/, "~", t); sub(/".*/, "", t); print t }' file
Or sed:
sed -re 's|^[^"]*"||; s|"[^"]*"|~|g; s|".*||' file
Output:
Rob~Developer~Sales~20-10-2010~Jack~Koster~Promotions~20-10-2011~Boua~TA~James

get the user input in awk

Is there any way to read the user input through the awk programming?
I try writing a script to read a file which contained student's name and ID.
I have to get the name of the student from the user through the keyboard and return all student's results by using the awk.
You can collect user input using the getline function. Make sure to set this in the BEGIN block. Here's the contents of script.awk:
BEGIN {
printf "Enter the student's name: "
getline name < "-"
}
$2 == name {
print
}
Here's an example file with ID's, names, and results:
1 jonathan good
2 jane bad
3 steve evil
4 mike nice
Run like:
awk -f ./script.awk file.txt
Assuming the input file is formatted as:
name<tab>id
pairs and you want to print the line where the name in the file matches the user input, try this:
awk '
BEGIN { FS=OFS="\t"; printf "Enter name: " }
NR == FNR { name = $0; next }
$1 == name
' - file
or with GNU awk you can use nextfile so you don't have to enter control-D after your input:
awk '
BEGIN { FS=OFS="\t"; printf "Enter name: " }
NR == FNR { name = $0; nextfile }
$1 == name
' - file
Post some sample input and expected output if that's not what you're trying to do.
I've tested with line
"awk 'BEGIN{printf "enter:";getline name<"/dev/tty"} {print $0} END{printf "[%s]", name}' < /etc/passwd"
and for me is better solution and more readeable.