Awk not printing what is wanted - awk

my attempt:
awk '$4 != "AZ" && max<$6 || NR==1{ max=$6; data=$0 } END{ print data }' USA.txt
I am trying to print the row that
does NOT have "AZ" in the 4th column
and the greatest value in the 6th column
the file has 6 colums
firstname lastname town/city state-abv. zipcode score
Shellstrop Eleanor Phoenix AZ 85023 -2920765
Shellstrop Donna Tarantula_Springs NV 89047 -5920765
Mendoza Jason Jacksonville FL 32205 -4123794
Mendoza Douglas Jacksonville FL 32209 -3193274
Peleaz Steven Jacksonville FL 32203 -3123794

Based on your attempts, please try following awk code. This checks if 4th field is NOT AZ then it compares previous value of max with current value of $6 if its greater than previous value then it assigns current $6 to max else keeps it to previous value. In END block of awk program its printing its value.
awk -v max="" '$4!="AZ"{max=(max>$6?max:$6)} END{print max}' Input_file
To print complete row for a maximum value found would be:
awk -v max="" '$4!="AZ"{max=(max>$6?max:$6);arr[$6]=$0} END{print arr[max]}' Input_file

Related

Join of two files introduces extraneous newline

Update: I figured out the reason for the extraneous newline. I created file1 and file2 on a Windows machine. Windows adds <cr><newline> to the end of each line. So, for example, the first record in file1 is not this:
Bill <tab> 25 <newline>
Instead, it is this:
Bill <tab> 25 <cr><newline>
So when I set a[Bill] to $2 I am actually setting it to $2<cr>.
I used a hex editor and removed all of the <cr> symbols in file1 and file2. Now the AWK program works as desired.
I have seen the SO posts on using AWK to do a natural join of two files. I took one of the solutions and am trying to get it to work. Alas, I have been unsuccessful. I am hoping you can tell me what I am doing wrong.
Note: I appreciate other solutions, but what I really want is to understand why my AWK program doesn't work (i.e., why/how an extraneous newline is being introduced).
I want to do a join of these two files:
file1 (name, tab, age):
Bill 25
John 24
Mary 21
file2 (name, tab, marital-status)
Bill divorced
Glenn married
John married
Mary single
When joined, I expect to see this (name, tab, age, tab, marital-status):
Bill 25 divorced
John 24 married
Mary 21 single
Notice that file2 has a person named Glenn, but file1 doesn't. No record in file1 joins to it.
My AWK program almost produces that result. But, for reasons I don't understand, the marital-status value is on the next line:
Bill 25
divorced
John 24
married
Mary 21
single
Here is my AWK program:
awk 'BEGIN { OFS = '\t' }
NR == FNR { a[$1] = ($1 in a? a[$1] OFS : "")$2; next }
$1 in a { $0 = $0 OFS a[$1]; delete a[$1]; print }' file2 file1 > joined_file1_file2
You may try this awk solution:
awk 'BEGIN {FS=OFS="\t"} {sub(/\r$/, "")}
FNR == NR {m[$1]=$2; next} {print $0, m[$1]}' file2 file1
Bill 25 divorced
John 24 married
Mary 21 single
Here:
Using sub(/\r$/, "") to remove any DOS line ending
If $1 doesn't exist in mapping m then m[$1] will be an empty string so we can simplify awk processing

Not extracting data between two patterns

I have tried this awk command, but for some reason it is not printing out the data between two patterns
This is my entire awk command
for file in `cat out.txt`
do
awk -v ff="$file" 'BEGIN {print "Start Parsing for"ff} /ff-START/{flag=1; next}/ff-END/{flag=0}flag; END{print "End Parsing"ff}' data.txt
done
This is the content of data.txt
JOHN SMITH-START
Device,Number
TV,1
Washing Machine,1
Phones, 5
JOHN SMITH-END
MARY JOE-START
Device,Number
TV,3
Washing Machine,1
Phones, 2
MARY JOE-END
and there are 100 more similar lines here the patterns is NAME-START and NAME-END. So for eg JOHN SMITH-START is the first pattern and then JOHN SMITH-END is the second pattern, and I want to extract the data between these two which is
Device,Number
TV,1
Washing Machine,1
Phones, 5
But the output I get is
Start Parsing forJOHN SMITH
End ParsingJOHN SMITH
Content of out.txt is
JOHN SMITH
MARY JOE
With your shown samples, could you please try following.
awk '/JOHN SMITH-END/ && found{exit} /JOHN SMITH-START/{found=1;next} found' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/JOHN SMITH-END/ && found{ ##Checking condition if line contains JOHN SMITH-END and found is SET then do following.
exit ##exiting from program from here.
}
/JOHN SMITH-START/{ ##Checking condition if line contains JOHN SMITH-START then do following.
found=1 ##Setting found to 1 here.
next ##next will skip all further statements from here.
}
found ##If found is set then print that line.
' Input_file ##Mentioning Input_file name here.
NOTE: In case you want to use variables in awk for search then try following:
awk -v start="JOHN SMITH-START" -v end="JOHN SMITH-END" '$0 ~ end && found{exit} $0 ~ start{found=1;next} found' Input_file

Linux word in column ends with 'a'

I have text file. I need to print people whose names ends with 'a' and have more than 30 years.
I did this:
awk '{if($4>30)print $1,$2}' New.txt
I don't know how to finish.
New.txt:
Name Lastname City Age
John Smith Verona 12
Barney Stinson York 55
Jessica Alba London 33
Could you please try following, written as per shown samples in GNU awk.
awk '$1~/[aA]$/ && $NF>30{print $1,$2}' Input_file
Explanation: Simply checking condition if 1st field ends with a OR A AND last field is greater than 30 then print that line's first and second fields.
You can try
awk '{if($4>30 && $1 ~ /a$/) print$1, $2}' New.txt

AWK: How do I subtract the final field on those lines that have numbers

I have a file containing:
A......Page 23
by John Smith
B......Page 73
by Jane Doe
C......Page 131
by Alice Grey
And I want to subtract the numbers by 22, so the first line will be A......Page 1.
I have searched in many places about gsub or any other awk option with no avail. I have done it through vim editor but knowing the awk solution will be great.
Short awk approach:
$ awk '$NF ~ /^[0-9]+$/{ $NF = $NF-22 }1' file
A......Page 1
by John Smith
B......Page 51
by Jane Doe
C......Page 109
by Alice Grey
$NF ~ /^[0-9]+$/ - considering only lines which last field value $NF is a digit, as it is subjected to arithmetic operation
Could you please try following.
awk '{$NF=$NF~/[0-9]/ && $NF>22?$NF-22:$NF} 1' Input_file
I have considered that you need NOT to have negative values in last column so I have checked condition if $NF last column is greater than 22 then only perform subtraction and then 2nd condition I considered that(which is obvious) that you want to subtract 22 with digits only so have put condition where it checks if last column is having digits in it or not.

Understanding two file processing in awk

I am trying to understand how two file processing works. So here created an example.
file1.txt
zzz pq Fruit Apple 10
zzz rs Fruit Car 50
zzz tu Study Book 60
file2.txt
aa bb Book 100
cc dd Car 200
hj kl XYZ 500
ee ff Apple 300
ff gh ABC 400
I want to compare 4th column of file1 to 3rd column of file2, if matched then print the 3rd,4th,5th column of file1 followed by 3rd, 4th column of file2 with sum of 5th column of file1 and 4th column of file2.
Expected Output:
Fruit Apple 10 300 310
Fruit Car 50 200 250
Study Book 60 100 160
Here what I have tried:
awk ' FNR==NR{ a[$4]=$5;next} ( $3 in a){ print $3, a[$4],$4}' file1.txt file2.txt
Code output;
Book 100
Car 200
Apple 300
I am facing problem in printing file1 column and how to store the other column of file1 in array a. Please guide me.
Could you please try following.
awk 'FNR==NR{a[$4]=$3 OFS $4 OFS $5;b[$4]=$NF;next} ($3 in a){print a[$3],$NF,b[$3]+$NF}' file1.txt file2.txt
Output will be as follows.
Study Book 60 100 160
Fruit Car 50 200 250
Fruit Apple 10 300 310
Explanation: Adding explanation for above code now.
awk ' ##Starting awk program here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named file1.txt is being read.
a[$4]=$3 OFS $4 OFS $5 ##Creating an array named a whose index is $4 and value is 3rd, 4th and 5th fields along with spaces(By default OFS value will be space for awk).
b[$4]=$NF ##Creating an array named b whose index is $4 and value if $NF(last field of current line).
next ##next keyword will skip all further lines from here.
}
($3 in a){ ##Checking if 3rd field of current line(from file2.txt) is present in array a then do following.
print a[$3],$NF,b[$3]+$NF ##Printing array a whose index is $3, last column value of current line and then SUM of array b with index $3 and last column value here.
}
' file1.txt file2.txt ##Mentioning Input_file names file1.txt and file2.txt