Awk - Conditionally print an element from a certain row, based on the condition of a different element in a different row - awk

Say I have a lot of files with a consistent number of columns and rows, and a sample one looks like this:
1 2 3
4 5 6
7 8 9
I want to print column 3 of row 2, but only if column 3 of row 3 == 4 (in this case it is 9). I'm using this logic is a means to determine if the file is valid for my use-case, and extract the relevant field if it is.
My attempt, based on other answers to people asking how to isolate certain rows was this: awk 'BEGIN{FNR=3} $3=="4"{FNR=2;print $2}'

so you are looking for something like this?
awk 'FNR==2{ x = $3 }FNR==3 && $3=="4"{ print x }' file.txt
cat file.txt
1 2 3
4 5 6
7 8 4
Output:
6
cat file.txt
1 2 3
4 5 6
7 8 9
Output:
Nothing since column 3 of row 3 is 9

awk 'FNR==3 && $3==4{print p} {p=$3}' *

Here's another which doesn't care for the order in which the records appear. In the OP the problem was to print a value (v) from 2nd record based on the tested value (t) on the 3rd record. This solution allows for the test value to appear in an earlier record than the value to be printed:
$ awk '
FNR==2 { # record on which is the value to print
v=$3
f=1 # flag indicating the value v has been read
}
FNR==3 { # record of which is the value to test
t=$3
g=1 # test value read indicator
}
f && g { # once the value and test value are acquired and
if(t==4) # test the test
print v # output
exit # and exit
}' file
6
Record order reversed (FNR values changed in the code):
$ cat file2
1 2 3
7 8 4 # records
4 5 6 # reversed
$ awk 'FNR==3{v=$3;f=1}FNR==2{t=$3;g=1}f&&g{if(t==4)print v;exit}' file2
6
Flags f and g are different from v and t in case either should be empty ("").

Related

How do I print starting from a certain row of output with awk? [duplicate]

I have millions of records in my file, what i need to do is print columns 1396 to 1400 for specific number of rows, and if i can get this in excel or notepad.
Tried with this command
awk {print $1396,$1397,$1398,$1399,$1400}' file_name
But this is running for each row.
You need a condition to specify which rows to apply the action to:
awk '<<condition goes here>> {print $1396,$1397,$1398,$1399,$1400}' file_name
For example, to do this only for rows 50 to 100:
awk 'NR >= 50 && NR <= 100 {print $1396,$1397,$1398,$1399,$1400}' file_name
(Depending on what you want to do, you can also have much more complicated selection patterns than this.)
Here's a simpler example for testing:
awk 'NR >= 3 && NR <= 5 {print $2, $3}'
If I run this on an input file containing
1 2 3 4
2 3 4 5
3 a b 6
4 c d 7
5 e f 8
6 7 8 9
I get the output
a b
c d
e f

Find the ratio among columns

I have some input files of the following format:
File1.txt File2.txt File3.txt
1 2 1 6 1 20
2 3 2 9 2 21
3 7 3 14 3 28
Now I need to output a new single file using AWK with three columns, the first column remains the same, and it is the same among the three files (just an ordinal number).
However for 2nd and the 3rd column of this newly created file, I need to values of the 2nd column of the second file divided by the values of the 2nd column of the 1st file, also the values of the second column of the third file divided by the value of the 2nd column of the first file. In other words, the 2nd columns for the 2nd and 3rd file divided by the 2nd column of the first file.
e.g.:
Result.txt
1 3 10
2 3 7
3 2 4
Use a multidimensional matrix to store the values:
awk 'FNR==NR {a[$1]=$2; next}
{b[$1,ARGIND]=$2/a[$1]}
END {for (i in a)
print i,b[i,2],b[i,3]
}' f1 f2 f3
Test
$ awk 'FNR==NR {a[$1]=$2; next} {b[$1,ARGIND]=$2/a[$1]} END {for (i in a) print i,b[i,2],b[i,3]}' f1 f2 f3
1 3 10
2 3 7
3 2 4

Processing Multiple Files with Awk - Unwanted Lines Are Printing

I'm trying to write a script to process two files. I'm getting stuck on a small detail that I've been unsuccessful in troubleshooting - hoping someone here can help!
I have two text files, the first with a single column and seven rows (all fruits). The second text file has two columns and seventeen rows (first column numbers, second column colors). My script is below - I've eliminated the rest of it, because after some troubleshooting I've found that the problem is here.
This script...:
BEGIN { FS = " " }
NR==FNR
{
print NR "\t" FNR
}
END{}
When invoked with "awk -f script.awk file1.txt file2.txt", produces this output:
apples
1 1
oranges
2 2
pears
3 3
grapes
4 4
mango
5 5
kiwi
6 6
banana
7 7
8 1
9 2
10 3
11 4
(truncated)
I don't understand what's happening here. The fields of file1 (the fruits) are being printed, but the only print statement in this script is printing the values of NR and FNR, which, from what I understand, are always numbers.
When I comment out the NR==FNR statement,
BEGIN { FS = " " }
#NR==FNR
{
print NR "\t" FNR
}
END{}
The output is as expected:
1 1
2 2
2 2
3 3
4 4
5 5
6 6
7 7
8 1
9 2
10 3
11 4
(truncated)
I need to use the NR==FNR statement in order to process multiple files.
Does anyone know what's happening here? Seems like such a basic issue (it's only 3 statements!), but I can't get rid of the damn fruits.
NR==FNR by itself is a pattern without an action. And the default action is to print the line (e.g {print}).
So awk sees NR==FNR as a test for the first file (as you indicated) and when it succeeds it then uses the default action.
So your script is effectively:
BEGIN { FS = " " }
NR==FNR {
print
}
{
print NR "\t" FNR
}
END{}

how to output data from multiple files in side by side columns in one file via awk?

I have 30 files, called UE1.dat, UE2.dat .... with 4 columns in every of them. An example of their column structure is given below for UE1.dat and UE2.dat.
UE1.dat
1 4 2 1
2 2 3 3
3 2 4 4
4 4 4 2
UE2.dat
2 6 8 7
4 4 9 6
7 1 1 2
9 3 3 3
So, i have tried with the following code:
for((i=1;i<=30;i++)); do awk 'NR$i {printf $1",";next} 1; END {print ""}' UE$i.dat; done > UE_all.dat
to get only the first column from every file and write them in a single file and columns to be side by side,The desired OUTPUT is given below.
1 2
2 4
3 7
4 9
But unfortunately, the code orders them in rows, can you give a hint?
Thank you in advance!
In awk you can do it this way:
1) Put this code in a file named output_data_from_multiple_files.awk:
BEGIN {
# All the input files are processed in one run.
# filenumber counts the number of input files.
filenumber = 1
}
{
# FNR is the input record number in the current input file.
# Concatenate the value of the first column in the corresponding
# line in the output.
output[FNR] = output[FNR] " " $1
# FNR == 1 means we are processing a new file.
if (FNR == 1) {
++filenumber
}
}
END {
# print the output
for (i=1; i<=FNR; i++)
printf("%s\n", output[i])
}
2) Run awk -f output_data_from_multiple_files.awk UE*
All the files are handled in a single execution of awk. FNR is the input record number in the current input file. filenumber is used to count the number of processed files. The values read in the input files are concatenated in the output array.
Concatenate all of the columns into one file with an awk associative array:
# use a wildcard to get all the files (could also use a for-loop)
# add each new row to the array using line number as an index
# at the end of reading all files, go through each index (will be 1-4 in
# your example) and print index, and then the fully concatenated rows
awk '{a[FNR] = a[FNR]" "$0}END{ for (i in a) print i, a[i] | "sort -k1n"}' allfiles*
I'd probably go with something like - using perl rather than awk because I prefer the handling of data structures. In this case - we use a two dimensional array, insert the first column of each file into a new column of the array, then print the whole thing.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $num_files = 2;
my #rows;
my $count = 0;
my $max = 0;
for my $filenum ( 1..$num_files ) {
open ( my $input, "<", "UE${filenum}.dat" ) or die $!;
while ( <$input> ) {
my #fields = split;
push ( #{$rows[$filenum]}, $fields[0] );
$count++;
}
close ( $input );
if ( $count > $max ) { $max = $count };
}
print Dumper \#rows;
for ( 0..$count ) {
foreach my $filenum ( 1..$num_files ) {
print shift #{$rows[$filenum]} || ''," ";
}
print "\n";
}
My solution is this
gawk 'BEGINFILE{f++}{print FNR,f,$1}' UE* | sort -nk 1,2 | cut -d" " -f3 | xargs -L $(ls UE*.dat | wc -l)
This is how I got to it... I number the lines and files using gawk, then sort them by line number, then secondly by file, simply using sort and remove the file and line numbers. So...
gawk 'BEGINFILE{f++}{print FNR,f,$1}' UE*
1 1 1 # line 1 file 1 is 1
2 1 2 # line 2 file 1 is 2
3 1 3 # line 3 file 1 is 3
4 1 4 # line 4 file 1 is 4
1 2 2 # line 1 file 2 is 2
2 2 4 # line 2 file 2 is 4
3 2 7 # line 3 file 2 is 7
4 2 9 # line 4 file 2 is 9
Then use sort like this to put the first line of file 1 followed by the first line of file 2, first line of file n, second line of file 1, second line of file 2, second line of file n. Then get the third column:
gawk 'BEGINFILE{f++}{print FNR,f,$1}' UE* | sort -nk 1,2 | cut -d" " -f3
1
2
2
4
3
7
4
9
And then put them back together with xargs
gawk 'BEGINFILE{f++}{print FNR,f,$1}' UE* | sort -nk 1,2 | cut -d" " -f3 | xargs -L2
1 2
2 4
3 7
4 9
The -L2 at the end must match the number of files, i.e. -L30 in your case.

How to Add Column with Percentage

I would like to calculate percentage of value in each line out of all lines and add it as another column.
Input (delimiter is \t):
1 10
2 10
3 20
4 40
Desired output with added third column showing calculated percentage based on values in second column:
1 10 12.50
2 10 12.50
3 20 25.00
4 40 50.00
I have tried to do it myself, but when I calculated total for all lines I didn't know how to preserve rest of line unchanged. Thanks a lot for help!
Here you go, one pass step awk solution -
awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
[jaypal:~/Temp] cat file
1 10
2 10
3 20
4 40
[jaypal:~/Temp] awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
1 10 12.5
2 10 12.5
3 20 25
4 40 50
Update: If tab is a required in output then just set the OFS variable to "\t".
[jaypal:~/Temp] awk -v OFS="\t" 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' file file
1 10 12.5
2 10 12.5
3 20 25
4 40 50
Breakout of pattern {action} statements:
The first pattern is NR==FNR. FNR is awk's in-built variable that keeps track of number of records (by default separated by a new line) in a given file. So FNR in our case would be 4. NR is similar to FNR but it does not get reset to 0. It continues to grow on. So NR in our case would be 8.
This pattern will be true only for the first 4 records and thats exactly what we want. After perusing through the 4 records, we are assign the total to a variable a. Notice that we did not initialize it. In awk we don't have to. However, this would break if entire column 2 is 0. So you can handle it by putting an if statement in the second action statement i.e do the division only if a > 0 else say division by 0 or something.
next is needed cause we don't really want second pattern {action} statement to execute. next tells awk to stop further actions and move to the next record.
Once the four records are parsed, the next pattern{action} begins, which is pretty straight forward. Doing the percentage and print column 1 and 2 along with percentage next to them.
Note: As #lhf mentioned in the comment, this one-liner will only work as long as you have the data set in a file. It won't work if you pass data through a pipe.
In the comments, there is a discussion going on ways to make this awk one-liner take input from a pipe instead of a file. Well the only way I could think of was to store the column values in array and then using for loop to spit each value out along with their percentage.
Now arrays in awk are associative and are never in order, i.e pulling the values out of arrays will not be in the same order as they went in. So if that is ok then the following one-liner should work.
[jaypal:~/Temp] cat file
1 10
2 10
3 20
4 40
[jaypal:~/Temp] cat file | awk '{b[$1]=$2;sum=sum+$2} END{for (i in b) print i,b[i],(b[i]/sum)*100}'
2 10 12.5
3 20 25
4 40 50
1 10 12.5
To get them in order, you can pipe the result to sort.
[jaypal:~/Temp] cat file | awk '{b[$1]=$2;sum=sum+$2} END{for (i in b) print i,b[i],(b[i]/sum)*100}' | sort -n
1 10 12.5
2 10 12.5
3 20 25
4 40 50
You can do it in a couple of passes
#!/bin/bash
total=$(awk '{total=total+$2}END{print total}' file)
awk -v total=$total '{ printf ("%s\t%s\t%.2f\n", $1, $2, ($2/total)*100)}' file
You need to escape it as %%. For instance:
printf("%s\t%s\t%s%%\n", $1, $2, $3)
Perhaps there is better way but I would pass file twice.
Content of 'infile':
1 10
2 10
3 20
4 40
Content of 'script.awk':
BEGIN {
## Tab as field separator.
FS = "\t";
}
## First pass of input file. Get total from second field.
ARGIND == 1 {
total += $2;
next;
}
## Second pass of input file. Print each original line and percentage as third field.
{
printf( "%s\t%2.2f\n", $0, $2 * 100 / total );
}
Run the script in my linux box:
gawk -f script.awk infile infile
And result:
1 10 12.50
2 10 12.50
3 20 25.00
4 40 50.00