awk NR wrong with the total number of lines returned - awk

when awk NR was used for getting the total number of lines of a file, wrong number was returned. Could you help to find out what happened?
File 'test.txt' contents :
> 2012 09 10 30.0 8 14
fdafadf
> 2013 08 11 05.0 9 1.5
fdafa
> 2011 01 12 02.0 7 1.2
daff
The average of the last column of records with '>' beginning was expected to get.
Code:
awk 'BEGIN{SUM=0}{/^> /{SUM=SUM+$6}END{print SUM/NR}' test.txt
With this code, the wrong mean of the last column was obtained instead of the right number 3. How can I get the right result with awk mission? Thanks

Could you please try following. This will take SUM of all line's last column and it will keep doing till Input_file is done with reading. It will alos count the number of occurrences of > lines because average means SUM divided by count(here count of lines), in END block of awk we could divide them and could get average as needed.
awk 'BEGIN{sum=0;count=0}/^>/{sum+=$NF;count++} END{print "avg="sum/count}' Input_file
If you want to take average of 6th column then use $6 in spite of $NF in above code too.
Explanation: Adding following only for explanation purposes.
awk ' ##Starting awk command/script here.
/^>/{ ##Checking condition if a line starts from > then do following.
sum+=$NF ##Creating a variable named sum wohse value is adding in its own value of $NF last field of current line.
count++ ##Creating a variable named count whose value is incrementing by 1 each time cursor comes here.
}
END{ ##END block of awk code here.
print "avg="sum/count ##Printing string avg= then dividing sum/count it will print the result of it.
}
' Input_file ##Mentioning Input_file name here.

Related

awk - Print all lines containing the Max value found in the initial analysis (Containing U+2500 Unicode Character between the lines)

I have an issue that was answered here awk - Print all lines containing the Max value found in the initial analysis, but that now needs to be adapted to work when there is U+2500 unicode character between the lines.
The problem is as follows below, I have a new entry file as below:
0.0008 6
────────────
9.0 10
────────────
9.0 19
────────────
0.7 33
If I try to find the maximum value using the answers to awk - Print all lines containing the Max value found in the initial analysis, the output is always being as shown below:
──────
──────
──────
This is not the expected exit, But I should get something like:
9.0 10
9.0 19
Note: This issue was created so as not to compromise the choice of the solution marked as "resolved" in the awk - Print all lines containing the Max value found in the initial analysis.
You may use this 2 pass awk as well:
awk '$1+0 != $1 {next} FNR==NR {if (max < $1) max=$1; next} $1 == max' file{,}
9.0 10
9.0 19
We compute max in first phase and ignore all lines where $1 is non-numeric, then in 2nd phase print all records that have $1 same as max value.
With your shown samples, please try following. Written and tested in GNU awk.
awk '
$1+0==$1{
max=(max>$1?max:$1)
arr[$1]=(arr[$1]?arr[$1] ORS:"")$0
}
END{
print arr[max]
}
' Input_file
Explanation: Adding detailed explanation for above solution.
awk ' ##Starting awk program from here.
$1+0==$1{ ##Checking condition if 1st field is an integer.
max=(max>$1?max:$1) ##Create max variable by having maximum value out of 1st field and max variable here.
arr[$1]=(arr[$1]?arr[$1] ORS:"")$0 ##Create array with index of $1 and keep adding its value to it wit same index values.
}
END{ ##Starting END block of this program from here.
print arr[max] ##Printing array arr value with key of max here.
}
' Input_file ##Mentioning Input_file name here.
NOTE: As per #karafka's suggestion adding $1+0==$1 so that scientific notation, negative numbers are not missed

print the count of the lines from one file and total from another file

I am having a directory called Stem, in that stem directory I am having two files called result.txt and title.txt as below:
result.txt:
Column1 Column2 Column3 Column4
----------------------------------------------------------
Setup First Second Third
Setdown Fifth Sixth Seven
setover Eight Nine Ten
Setxover Eleven Twelve Thirteen
Setdrop Fourteen Fifteen sixteen
title.txt:
Column1 Column2 Column3 Column4
----------------------------------------------------------
result 20 40 60
result1 40 80 120
Total: 60 120 180
I need to count the number of lines except first two in the first file(result.txt) and from the second file(title.txt) I need the data from the line Total (Column3), I need to get the output like below:
Stem : 5 120
I used this script but am not getting the exact output.
#!/bin/bash
for d in stem;
do
echo "$d"
File="result.txt"
File1="title.txt"
awk 'END{print NR - 2}' "$d"/"$File"
awk '/Total/{print $(NF-1);exit}' "$d"/"$File1"
done
EDIT: Since OP's question was not clear which value exactly needed, previous answer provides sum of 2nd columns, in case OP needs to get 2nd last field value of line which has Total: keyword in it then try following:
awk '
FNR==NR{
tot=FNR
next
}
/Total:/{
sum=$(NF-1)
}
END{
print "Stem : ",tot-2,sum+0
}
' result.txt title.txt
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when result.txt is being read.
tot=FNR ##Creating tot which has value of FNR in it.
next ##next will skip all further statements from here.
}
/Total:/{ ##Checking condition if line contains Total: then do following.
sum=$(NF-1) ##Creating sum which has 2nd last field of current line.
}
END{ ##Starting END block of this program from here.
print "Stem : ",tot-2,sum+0 ##Printing Stem string tot-2 and sum value here.
}
' result.txt title.txt ##Mentioning Input_file names here.

Sum specific column value until a certain value is reached

I want to print first column's value until its reached a certain value, like;
43 12.00 53888
29 10.00 36507
14 9.00 18365
8 8.00 10244
1 7.00 2079
1 9.50 1633
0 6.00 760
I would like the output to be:
val = 90
43 12.00 53888
29 10.00 36507
14 9.00 18365
Could you please try following, written and tested with shown samples. Explicitly putting exit in condition when 1st column's sum goes greater than mentioned value to avoid unnecessary reading rest of the Input_file.
awk -v val="90" '($1+prev)>val{exit} ($1+prev)<=val{print}{prev+=$1}' Input_file
OR
awk -v val="90" '($1+prev)>val{exit} ($1+prev)<=val; {prev+=$1}' Input_file
Explanation: Adding detailed explanation for above.
awk -v val="90" ' ##Starting awk program from here and mentioning variable val as 90 here.
($1+prev)>val{ ##Checking condition if first field and prev variable sum is greater than val then do following.
exit ##exit from program to save some time.
}
($1+prev)<=val; ##Checking condition if sum of $1 and prev is lesser than or equal to val then print the current line.
{
prev+=$1 ##keep adding 1st field to prev variable here.
}
' Input_file ##Mentioning Input_file name here.
Perl to the rescue!
perl -sape ' $s += $F[0] ; exit if $s > $vv' -- -vv=90 file
-s enables setting variables from the command line, -vv=90 sets the $vv variable to 90
-p processes the input line by line, it prints each line after processing
-a splits each line on whitespace and populates the #F array
The variable $s is used to hold the running sum. The line is printed only when the sum is less than $vv, once the sum is too large, the program exits.
Consider small one-line awk
Revised: (Sep 2020): Modified to take into account Bruno's comments, going for readable solution, see kvantour for compact solution.
awk -v val=85 '{ s+= $1 ; if ( s > val ) exit ; print }'
Original Post: (Aug 2020)
awk -v val=85 '{ s += $1 ; if ( s <= val ) print }'
Or even
awk -v val=85 '{ s+= $1 } s <= val'
Consider an even smaller awk which is very much in line with the solution of dash-o
awk -v v=90 '((v-=$1)<0){exit}1' file
or the smallest:
awk -v v=90 '0<=(v-=$1)' file

Add new column with times same value was found in 2 columns

Add new column with value of how many times the values in columns 1 and 2 contends exactly same value.
input file
46849,39785,2,012,023,351912.29,2527104.70,174.31
46849,39785,2,012,028,351912.45,2527118.70,174.30
46849,39785,3,06,018,351912.12,2527119.51,174.33
46849,39785,3,06,020,351911.80,2527105.83,174.40
46849,39797,2,012,023,352062.45,2527118.50,173.99
46849,39797,2,012,028,352062.51,2527105.51,174.04
46849,39797,3,06,020,352063.29,2527116.71,174.13,
46849,39809,2,012,023,352211.63,2527104.81,173.74
46849,39809,2,012,028,352211.21,2527117.94,173.69
46849,39803,2,012,023,352211.63,2527104.81,173.74
46849,39803,2,012,028,352211.21,2527117.94,173.69
46849,39801,2,012,023,352211.63,2527104.81,173.74
Expected output file:
4,46849,39785,2,012,023,351912.29,2527104.70,174.31
4,46849,39785,2,012,028,351912.45,2527118.70,174.30
4,46849,39785,3,06,018,351912.12,2527119.51,174.33
4,46849,39785,3,06,020,351911.80,2527105.83,174.40
3,46849,39797,2,012,023,352062.45,2527118.50,173.99
3,46849,39797,2,012,028,352062.51,2527105.51,174.04
3,46849,39797,3,06,020,352063.29,2527116.71,174.13,
2,46849,39809,2,012,023,352211.63,2527104.81,173.74
2,46849,39809,2,012,028,352211.21,2527117.94,173.69
2,46849,39803,2,012,023,352211.63,2527104.81,173.74
1,46849,39803,2,012,028,352211.21,2527117.94,173.69
1,46849,39801,2,012,023,352211.63,2527104.81,173.74
attempt:
awk -F, '{x[$1 $2]++}END{ for(i in x) {print i,x[i]}}' file
4684939785 4
4684939797 3
4684939801 1
4684939803 2
4684939809 2
Could you please try following.
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
a[$1,$2]++
next
}
{
print a[$1,$2],$0
}
' Input_file Input_file
Explanation: reading Input_file 2 times. Where first time I am creating an array named a with index of first and second field and counting their value on each occurrence too. On 2nd time file reading it printing count of the first 2 fields total and then printing while line.
One liner code:
awk 'BEGIN{FS=OFS=","} FNR==NR{a[$1,$2]++;next} {print a[$1,$2],$0}' Input_file Input_file

Select current and previous line if values are the same in 2 columns

Check values in columns 2 and 3, if the values are the same in the previous line and current line( example lines 2-3 and 6-7), then print the lines separated as ,
Input file
1 1 2 35 1
2 3 4 50 1
2 3 4 75 1
4 7 7 85 1
5 8 6 100 1
8 6 9 125 1
4 6 9 200 1
5 3 2 156 2
Desired output
2,3,4,50,1,2,3,4,75,1
8,6,9,125,1,4,6,9,200,1
I tried to modify this code, but not results
awk '{$6=$2 $3 - $p2 $p3} $6==0{print p0; print} {p0=$0;p2=p2;p3=$3}'
Thanks in advance.
$ awk -v OFS=',' '{$1=$1; cK=$2 FS $3} pK==cK{print p0, $0} {pK=cK; p0=$0}' file
2,3,4,50,1,2,3,4,75,1
8,6,9,125,1,4,6,9,200,1
With your own code and its mechanism updated:
awk '(($2=$2) $3) - (p2 p3)==0{printf "%s", p0; print} {p0=$0;p2=$2;p3=$3}' OFS="," file
2,3,4,50,12,3,4,75,1
8,6,9,125,14,6,9,200,1
But it has underlying problem, so better use this simplified/improved way:
awk '($2=$2) FS $3==cp{print p0,$0} {p0=$0; cp=$2 FS $3}' OFS=, file
The FS is needed, check the comments under Mr. Morton's answer.
Why your code fails:
Concatenate (what space do) has higher priority than minus-.
You used $6 to save the value you want to compare, and then it becomes a part of $0 the line.(last column). -- You can change it to a temporary variable name.
You have a typo (p2=p2), and you used $p2 and $p3, which means to get p2's value and find the corresponding column. So if p2==3 then $p2 equals $3.
You didn't set OFS, so even if your code works, the output will be separated by spaces.
print will add a trailing newline\n, so even if above problems don't exist, you will get 4 lines instead of the 2 lines output you wanted.
Could you please try following too.
awk 'prev_2nd==$2 && prev_3rd==$3{$1=$1;print prev_line,$0} {prev_2nd=$2;prev_3rd=$3;$1=$1;prev_line=$0}' OFS=, Input_file
Explanation: Adding explanation for above code now.
awk '
prev_2nd==$2 && prev_3rd==$3{ ##Checking if previous lines variable prev_2nd and prev_3rd are having same value as current line 2nd and 3rd field or not, if yes then do following.
$1=$1 ##Resetting $1 value of current line to $1 only why because OP needs output field separator as comma and to apply this we need to reset it to its own value.
print prev_line,$0 ##Printing value of previous line and current line here.
} ##Closing this condition block here.
{
prev_2nd=$2 ##Setting current line $2 to prev_2nd variable here.
prev_3rd=$3 ##Setting current line $3 to prev_3rd variable here.
$1=$1 ##Resetting value of $1 to $1 to make comma in its values applied.
prev_line=$0 ##Now setting pre_line value to current line edited one with comma as separator.
}
' OFS=, Input_file ##Setting OFS(output field separator) value as comma here and mentioning Input_file name here.