use awk for printing selected rows - awk

I have a text file and i want to print only selected rows of it. Below is the dummy format of the text file:
Name Sub Marks percentage
A AB 50 50
Name Sub Marks percentage
b AB 50 50
Name Sub Marks percentage
c AB 50 50
Name Sub Marks percentage
d AB 50 50
I need the output as:(Don't need heading before every record and need only 3 columns omitting "MARKS")
Name Sub percentage
A AB 50
b AB 50
c AB 50
d AB 50
Please Suggest me a form of awk command using which I can achieve this, and thanks for supporting.

You can use:
awk '(NR == 1) || ((NR % 2) == 0) {print $1" "$2" "$4}' inputFile
This will print columns one, two and four but only if the record number is one or even. The results are:
Name Sub percentage
A AB 50
b AB 50
c AB 50
d AB 50
If you want it nicely formatted, you can use printf instead:
awk '(NR == 1) || ((NR % 2) == 0) {printf "%-10s %-10s %10s\n", $1, $2, $4}' inputFile
Name Sub percentage
A AB 50
b AB 50
c AB 50
d AB 50

awk solution:
awk 'NR==1 || !(NR%2){ print $1,$2,$4 }' OFS='\t' file
NR==1 || !(NR%2) - considering only the 1st and each even line
OFS='\t' - output field separator
The output:
Name Sub percentage
A AB 50
b AB 50
c AB 50
d AB 50

In case that input file has a slight different format above solution will fail. For example:
Name Sub Marks percentage
A AB 50 50
Name Sub Marks percentage
b AB 50 50
c AB 50 50
Name Sub Marks percentage
d AB 50 50
In such a case, something like this will work in all cases:
$ awk '$0!=h;NR==1{h=$0}' file1
Name Sub Marks percentage
A AB 50 50
b AB 50 50
c AB 50 50
d AB 50 50

Related

AWK: print ALL rows with MAX value in one field Per the other field including Identical Rows with Max value

I am trying to keep the rows with highest value in column 2 per column 1 including identical rows with max value like the desired output below.
Data is
a 55
a 66
a 130
b 88
b 99
b 99
c 110
c 130
c 130
Desired output is
a 130
b 99
b 99
c 130
c 130
I could find great answers from this site, but not exactly for the current question.
awk '{ max=(max>$2?max:$2); arr[$2]=(arr[$2]?arr[$2] ORS:"")$0 } END{ print arr[max] }' file
yields the output which includes the identical rows But max value is from all rows not per column 1.
a 130
c 130
c 130
awk '$2>max[$1] {max[$1]=$2 ; row[$1]=$0} END{for (i in row) print row[i]}' file
Output includes the max value per column 1 but NOT include identical rows with max values.
a 130
b 99
c 130
Would you please help me to trim the data in desired way. Even all codes above are obtained from your questions and answers in this site. Appreciate that!! Many thanks for helps in advance!!!
I've used this approach in the past:
awk 'NR==FNR{if($2 > max[$1]){max[$1]=$2}; next} max[$1] == $2' test.txt test.txt
a 130
b 99
b 99
c 130
c 130
This requires you to pass in the same file twice (i.e. awk '...' test.txt test.txt), so it's not ideal, but hopefully it provides the required output with your actual data.
Using any awk:
awk '
{ cnt[$1,$2]++; max[$1]=$2 }
END { for (key in max) { val=max[key]; for (i=1; i<=cnt[key,val]; i++) print key, val } }
' file
a 130
b 99
b 99
c 130
c 130
Here is a ruby to do that:
ruby -e '
grps=$<.read.split(/\R/).
group_by{|line| line[/^\S+/]}
# {"a"=>["a 55", "a 66", "a 130"], "b"=>["b 88", "b 99", "b 99"], "c"=>["c 110", "c 130", "c 130"]}
maxes=grps.map{|k,v| v.max_by{|s| s.split[-1].to_f}}
# ["a 130", "b 99", "c 130"]
grps.values.flatten.each{|s| puts s if maxes.include?(s)}
' file
Prints:
a 130
b 99
b 99
c 130
c 130
Another way using awk. The second loop should be light, just repeating the duplicated max values.
% awk 'arr[$1] < $2{arr[$1] = $2; # get max value
co[$1]++; if(co[$1] == 1){x++; id[x] = $1}} # count unique ids
arr[$1] == $2{n[$1,arr[$1]]++} # count repeated max
END{for(i=1; i<=x; i++){
for(j=1; j<=n[id[i],arr[id[i]]]; j++){print id[i], arr[id[i]]}}}' file
a 130
b 99
b 99
c 130
c 130
or, if order doesn't matter
% awk 'arr[$1] < $2{arr[$1] = $2}
arr[$1] == $2{n[$1,arr[$1]]++}
END{for(i in arr){
j=0; do{print i, arr[i]; j++} while(j < n[i,arr[i]])}}' file
c 130
c 130
b 99
b 99
a 130
-- EDIT --
Printing data in additional columns
% awk 'arr[$1] < $2{arr[$1] = $2}
arr[$1] == $2{n[$1,arr[$1]]++; line[$1,arr[$1],n[$1,arr[$1]]] = $0}
END{for(i in arr){
j=0; do{j++; print line[i,arr[i],j]} while(j < n[i,arr[i]])}}' file
c 130 data8
c 130 data9
b 99 data5
b 99 data6
a 130 data3
Data
% cat file
a 55 data1
a 66 data2
a 130 data3
b 88 data4
b 99 data5
b 99 data6
c 110 data7
c 130 data8
c 130 data9

Count number of occurrences of a number larger than x from every raw

I have a file with multiple rows and 26 columns. I want to count the number of occurrences of values that are higher than 0 (I guess is also valid different from 0) in each row (excluding the first two columns). The file looks like this:
X Y Sample1 Sample2 Sample3 .... Sample24
a a1 0 7 0 0
b a2 2 8 0 0
c a3 0 3 15 3
d d3 0 0 0 0
I would like to have an output file like this:
X Y Result
a a1 1
b b1 2
c c1 3
d d1 0
awk or sed would be good.
I saw a similar question but in that case the columns were summed and the desired output was different.
awk 'NR==1{printf "X\tY\tResult%s",ORS} # Printing the header
NR>1{
count=0; # Initializing count for each row to zero
for(i=3;i<=NF;i++){ #iterating from field 3 to end, NF is #fields
if($i>0){ #$i expands to $3,$4 and so which are the fields
count++; # Incrementing if the condition is true.
}
};
printf "%s\t%s\t%s%s",$1,$2,count,ORS # For each row print o/p
}' file
should do that
another awk
$ awk '{if(NR==1) c="Result";
else for(i=3;i<=NF;i++) c+=($i>0);
print $1,$2,c; c=0}' file | column -t
X Y Result
a a1 1
b a2 2
c a3 3
d d3 0
$ awk '{print $1, $2, (NR>1 ? gsub(/ [1-9]/,"") : "Result")}' file
X Y Result
a a1 1
b a2 2
c a3 3
d d3 0

average of specific rows in a file

I have 6 rows in files. I need to find average only of specific rows in a file and the others should be left as they are. The average should be calculated for A1 and A2, B1 and B2, other lines should stay as they are
Input:
A1 1 1 2
A2 5 6 1
A3 1 1 1
B1 10 12 12
B2 10 12 10
B3 100 200 300
Output:
A1A2 3 3.5 1.5
A3 1 1 1
B1B2 10 12 11
B3 100 200 300
EDIT: There are n columns in total
awk to the rescue!
$ awk '/[AB][12]/{a=substr($1,1,1);
k=a"1"a"2";
c1[k]+=$2; c2[k]+=$3; c3[k]+=$4; n[k]++; next}
1;
END{for(k in c1)
print k, c1[k]/n[k], c2[k]/n[k], c3[k]/n[k]}' file | sort | column -t
A1A2 3 3.5 1.5
A3 1 1 1
B1B2 10 12 11
B3 100 200 300
pattern match grouped rows, create a key, calculate sum of all fields and count of rows per key; print unmatched rows; when done print the averaged rows, since order is not preserved sort and pipe to column for easy formatting.
$ cat tst.awk
$1 ~ /^[AB]1$/ { for (i=2;i<=NF;i++) val[$1,i]=$i; next }
$1 ~ /^[AB]2$/ { p=$1; sub(2,1,p); $1=p $1; for (i=2;i<=NF;i++) $i=($i + val[p,i])/2 }
{ print }
$ awk -f tst.awk file | column -t
A1A2 3 3.5 1.5
A3 1 1 1
B1B2 10 12 11
B3 100 200 300

Processing 2 files with different field separators using awk

Let's say I have 2 files :
$ cat file1
A:10
B:5
C:12
$ cat file2
100 A
50 B
42 C
I'd like to have something like :
A 10 100
B 5 50
C 12 42
I tried this :
awk 'BEGIN{FS=":"}NR==FNR{a[$1]=$2;next}{FS=" ";print $2,a[$2],$1}' file1 file2
Which outputs me that :
100 A
B 5 50
C 12 42
I guess the problem comes from the Field Separator which is set too late for the second file. How can I set different field separator for different files (and not for a single file) ?
Thanks
Edit: a more general case
With file2 and file3 like this :
$ cat file3
A:10 foo
B:5 bar
C:12 baz
How to get :
A 10 foo 100
B 5 bar 50
C 12 baz 42
Just set FS between files:
awk '...' FS=":" file1 FS=" " file2
i.e.:
$ awk 'NR==FNR{a[$1]=$2;next}{print $2,a[$2],$1}' FS=":" file1 FS=" " file2
A 10 100
B 5 50
C 12 42
You need to get awk to re-split $0 after you change FS.
You can do that with $0=$0 (for example).
So {FS=" ";$0=$0;...} in your final block will do what you want.
Though only doing that the first time you need to change FS will likely perform slightly better for large files.
You can try something like:
$ cat f1
A:10
B:5
C:12
$ cat f2
100 A
50 B
42 C
$ awk 'NR==FNR{split($0,tmp,/:/);a[tmp[1]]=tmp[2];next}$2 in a{print $2,a[$2],$1}' f1 f2
A 10 100
B 5 50
C 12 42
or set multiple field separators
$ awk -F"[: ]" 'NR==FNR{a[$1]=$2;next}$2 in a{print $2,a[$2],$1}' f1 f2
A 10 100
B 5 50
C 12 42

extract columns from multiple text files with awk

I am trying to extract column1 based on the values of column2. I would like to print the values of column1 only if column2 is ≤30 and greater than 5.
I also need to print the total number of the values of column1 based on the output. How can I do this with awk from multiple text files?
The sample of text file is shown below.
col1 col2
aa 25
bb 4
cc 6
dd 23
aa 30
The output would be
aa
cc
dd
aa
Total number of aa is 2
Total number of cc is 1
Total number of dd is 1
Something like this to get you started:
{ if ($2 <= 30 && $2 > 5) {
print $1
tot[$1] += 1 }
}
END {
for (i in tot) {
print "Total number of", i, "is", tot[i]
}
}
Output:
$ awk -f i.awk input
aa
cc
dd
aa
Total number of aa is 2
Total number of cc is 1
Total number of dd is 1