Print if col2 is greater than col2 in last line - awk

I want to use awk to extract line with greater column value than last line.
the sample data like that
a 3
a 5
a 4
b 1
c 2
c 3
c 6
I try to use below command, but not worked
awk '{if(($1!=a) || ($1==a && $2>b)){getline; print}};{a=$1;b=$2}'
the expected output:
a 3
a 5
b 1
c 2
c 3
c 6
only "a 4" line should be removed, because 4 is smaller than 2nd column of last line (5).
But the actual result from my code:
a 5
c 2
c 6
How can I resolve it? Thanks

Here is one:
$ awk '$1!=p1 || $2>p2; {p1=$1;p2=$2}' file
a 3
a 5
b 1
c 2
c 3
c 6
If $1 changes or $2 is greater from previous round, print.
Generic solution for more fields, see this comment below.

Could you please try following(when you are NOT looking for same 1st field value should be compared).
awk '$2>prev; {prev=$2}' Input_file
In case you want to look for same 1st field's comparisons with 2nd field values then try following.
awk 'prev_1st!=$1 || prev!=""; $2>prev && prev_1st==$1; {prev=$2;prev_1st=$1}' Input_file
Explanation: Adding explanation for above code.
awk ' ##Starting awk program here.
prev_1st!=$1 || prev!="" ##Checking condition if prev_1st variable NOT equal to $1 OR variable prev is NULL then simply print the line.
$2>prev && prev_1st==$1 ##Checking condition if $2 is greater than prev AND prev_1st equals to $1 then print the line.
{
prev=$2 ##Creating variable prev and setting its value to $2.
prev_1st=$1 ##Creating variable prev_1st and setting its value to $1.
}
' Input_file ##Mentioning Input_file name here.

Related

how to keep newline(s) when selecting a given column with awk

Suppose I have a file like this (disclaimer: this is not fixed I can have more than 7 rows, and more than 4 columns)
R H A 23
S E A 45
T E A 34
U A 35
Y T A 35
O E A 353
J G B 23
I want the output to select second column if third column is A but keeping newline or whitespace character.
output should be:
HEE TE
I tried this:
awk '{if ($3=="A") print $2}' file | awk 'BEGIN{ORS = ""}{print $1}'
But this gives:
HEETE%
Which has a weird % and is missing the space.
You may use this gnu-awk solution using FIELDWIDTHS:
awk 'BEGIN{ FIELDWIDTHS = "1 1 1 1 1 1 *" } $5 == "A" {s = s $3}
END {print s}' file
HEE TE
awk splits each record using width values provided in this variable FIELDWIDTHS.
1 1 1 1 1 1 * means each of first 6 columns will have single character length and remaining text will be filled in 7th column. Since you have a space after each value so $2,$4,$6 will be filled with a single space and $1,$3,$5 will be filled with the provided values in input.
$5 == "A" {s = s $3}: Here we are checking if $5 is A and if that condition is true then we keep appending value of $3 in a variable s. In the END block we just print variable s.
Without using fixed width parsing, awk will treat A in 4th row as $2.
Or else if we let spaces part of column value then use:
awk '
BEGIN{ FIELDWIDTHS = "2 2 2 *" }
$3 == "A " {s = s substr($2,1,1)}
END {print s}
' file

Add new column with times same value was found in 2 columns

Add new column with value of how many times the values in columns 1 and 2 contends exactly same value.
input file
46849,39785,2,012,023,351912.29,2527104.70,174.31
46849,39785,2,012,028,351912.45,2527118.70,174.30
46849,39785,3,06,018,351912.12,2527119.51,174.33
46849,39785,3,06,020,351911.80,2527105.83,174.40
46849,39797,2,012,023,352062.45,2527118.50,173.99
46849,39797,2,012,028,352062.51,2527105.51,174.04
46849,39797,3,06,020,352063.29,2527116.71,174.13,
46849,39809,2,012,023,352211.63,2527104.81,173.74
46849,39809,2,012,028,352211.21,2527117.94,173.69
46849,39803,2,012,023,352211.63,2527104.81,173.74
46849,39803,2,012,028,352211.21,2527117.94,173.69
46849,39801,2,012,023,352211.63,2527104.81,173.74
Expected output file:
4,46849,39785,2,012,023,351912.29,2527104.70,174.31
4,46849,39785,2,012,028,351912.45,2527118.70,174.30
4,46849,39785,3,06,018,351912.12,2527119.51,174.33
4,46849,39785,3,06,020,351911.80,2527105.83,174.40
3,46849,39797,2,012,023,352062.45,2527118.50,173.99
3,46849,39797,2,012,028,352062.51,2527105.51,174.04
3,46849,39797,3,06,020,352063.29,2527116.71,174.13,
2,46849,39809,2,012,023,352211.63,2527104.81,173.74
2,46849,39809,2,012,028,352211.21,2527117.94,173.69
2,46849,39803,2,012,023,352211.63,2527104.81,173.74
1,46849,39803,2,012,028,352211.21,2527117.94,173.69
1,46849,39801,2,012,023,352211.63,2527104.81,173.74
attempt:
awk -F, '{x[$1 $2]++}END{ for(i in x) {print i,x[i]}}' file
4684939785 4
4684939797 3
4684939801 1
4684939803 2
4684939809 2
Could you please try following.
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
a[$1,$2]++
next
}
{
print a[$1,$2],$0
}
' Input_file Input_file
Explanation: reading Input_file 2 times. Where first time I am creating an array named a with index of first and second field and counting their value on each occurrence too. On 2nd time file reading it printing count of the first 2 fields total and then printing while line.
One liner code:
awk 'BEGIN{FS=OFS=","} FNR==NR{a[$1,$2]++;next} {print a[$1,$2],$0}' Input_file Input_file

How do I sum of the first n rows of another column in bash

For example given
1 4
2 5
3 6
I want to sum up the numbers in the second column and create a new column with it. The new column is 4, 9 (4+5), and 15 (4+5+6)
1 4 4
2 5 9
3 6 15
Could you please try following if you are ok with awk.
awk 'FNR==1{print $0,$2;prev=$2;next} {print $0,$2+prev;prev+=$2}' Input_file
OR
awk 'FNR==1{print $0,$2;prev=$2;next} {prev+=$2;print $0,prev}' Input_file
Explanation: Adding explanation for above code now.
awk ' ##Startig awk program here.
FNR==1{ ##Checking condition if line is first line then do following.
print $0,$2 ##Printing current line with 2nd field here.
prev=$2 ##Creating variable prev whose value is 2nd field of current line.
next ##next will skip all further statements from here.
} ##Closing block for FNR condition here.
{ ##Starting new block here.
prev+=$2 ##Adding $2 value to prev variable value here.
print $0,prev ##Printing current line and prev variable here.
}' Input_file ##mentioning Input_file name here.
PS: Welcome to SO, you need to mention your efforts which you have put in order to solve your problems as we all are here to learn.
this is more idiomatic
$ awk '{print $0, s+=$2}' file
1 4 4
2 5 9
3 6 15
print the current line and the value s, which is incremented with second field, in other words is a rolling sum.
this can be golfed into the following if all values are positive (so no chance of summing to 0), but perhaps too cryptic.
$ awk '$3=s+=$2' file
Another awk..
$ cat john_ward.txt
1 4
2 5
3 6
$ awk ' {$(NF+1)=s+=$NF}1 ' john_ward.txt
1 4 4
2 5 9
3 6 15
$

Select current and previous line if values are the same in 2 columns

Check values in columns 2 and 3, if the values are the same in the previous line and current line( example lines 2-3 and 6-7), then print the lines separated as ,
Input file
1 1 2 35 1
2 3 4 50 1
2 3 4 75 1
4 7 7 85 1
5 8 6 100 1
8 6 9 125 1
4 6 9 200 1
5 3 2 156 2
Desired output
2,3,4,50,1,2,3,4,75,1
8,6,9,125,1,4,6,9,200,1
I tried to modify this code, but not results
awk '{$6=$2 $3 - $p2 $p3} $6==0{print p0; print} {p0=$0;p2=p2;p3=$3}'
Thanks in advance.
$ awk -v OFS=',' '{$1=$1; cK=$2 FS $3} pK==cK{print p0, $0} {pK=cK; p0=$0}' file
2,3,4,50,1,2,3,4,75,1
8,6,9,125,1,4,6,9,200,1
With your own code and its mechanism updated:
awk '(($2=$2) $3) - (p2 p3)==0{printf "%s", p0; print} {p0=$0;p2=$2;p3=$3}' OFS="," file
2,3,4,50,12,3,4,75,1
8,6,9,125,14,6,9,200,1
But it has underlying problem, so better use this simplified/improved way:
awk '($2=$2) FS $3==cp{print p0,$0} {p0=$0; cp=$2 FS $3}' OFS=, file
The FS is needed, check the comments under Mr. Morton's answer.
Why your code fails:
Concatenate (what space do) has higher priority than minus-.
You used $6 to save the value you want to compare, and then it becomes a part of $0 the line.(last column). -- You can change it to a temporary variable name.
You have a typo (p2=p2), and you used $p2 and $p3, which means to get p2's value and find the corresponding column. So if p2==3 then $p2 equals $3.
You didn't set OFS, so even if your code works, the output will be separated by spaces.
print will add a trailing newline\n, so even if above problems don't exist, you will get 4 lines instead of the 2 lines output you wanted.
Could you please try following too.
awk 'prev_2nd==$2 && prev_3rd==$3{$1=$1;print prev_line,$0} {prev_2nd=$2;prev_3rd=$3;$1=$1;prev_line=$0}' OFS=, Input_file
Explanation: Adding explanation for above code now.
awk '
prev_2nd==$2 && prev_3rd==$3{ ##Checking if previous lines variable prev_2nd and prev_3rd are having same value as current line 2nd and 3rd field or not, if yes then do following.
$1=$1 ##Resetting $1 value of current line to $1 only why because OP needs output field separator as comma and to apply this we need to reset it to its own value.
print prev_line,$0 ##Printing value of previous line and current line here.
} ##Closing this condition block here.
{
prev_2nd=$2 ##Setting current line $2 to prev_2nd variable here.
prev_3rd=$3 ##Setting current line $3 to prev_3rd variable here.
$1=$1 ##Resetting value of $1 to $1 to make comma in its values applied.
prev_line=$0 ##Now setting pre_line value to current line edited one with comma as separator.
}
' OFS=, Input_file ##Setting OFS(output field separator) value as comma here and mentioning Input_file name here.

Substract two fields of two consecutive rows in awk

I have a file as follows:
5 6
7 8
12 15
Using awk, how can I find the distance between the second column of one line with the first column of the next line. In this case, distance between 6 and 7 and 8 and 12 and print as follows, distance of first line set to zero:
5 6 0
7 8 1
12 15 4
awk '{print $0, (NR>1?$1-p:0); p=$2}' file
try:
awk 'NR==1{val=$2;print $0,"0";next} {print $0,$1-val;val=$2}' Input_file
Adding explanation now too successfully.
Checking for NR==1(when first line of Input_file) is there, then create a variable named val tp second field of the Input_file and then print the current line with "0" then do next(which will skip all further statements). Then printing the current line along with $1-val's value and then assigning the value of variable of val to $2 of the current line then.
Short awk approach:
awk 'NR==1{ $3=0 }NR>1{ $3=$1-p }{ p=$2 }1' file
The output:
5 6 0
7 8 1
12 15 4
p=$2 - capture the 2nd field value (p - considered as previous line value)