Select current and previous line if values are the same in 2 columns - awk

Check values in columns 2 and 3, if the values are the same in the previous line and current line( example lines 2-3 and 6-7), then print the lines separated as ,
Input file
1 1 2 35 1
2 3 4 50 1
2 3 4 75 1
4 7 7 85 1
5 8 6 100 1
8 6 9 125 1
4 6 9 200 1
5 3 2 156 2
Desired output
2,3,4,50,1,2,3,4,75,1
8,6,9,125,1,4,6,9,200,1
I tried to modify this code, but not results
awk '{$6=$2 $3 - $p2 $p3} $6==0{print p0; print} {p0=$0;p2=p2;p3=$3}'
Thanks in advance.

$ awk -v OFS=',' '{$1=$1; cK=$2 FS $3} pK==cK{print p0, $0} {pK=cK; p0=$0}' file
2,3,4,50,1,2,3,4,75,1
8,6,9,125,1,4,6,9,200,1

With your own code and its mechanism updated:
awk '(($2=$2) $3) - (p2 p3)==0{printf "%s", p0; print} {p0=$0;p2=$2;p3=$3}' OFS="," file
2,3,4,50,12,3,4,75,1
8,6,9,125,14,6,9,200,1
But it has underlying problem, so better use this simplified/improved way:
awk '($2=$2) FS $3==cp{print p0,$0} {p0=$0; cp=$2 FS $3}' OFS=, file
The FS is needed, check the comments under Mr. Morton's answer.
Why your code fails:
Concatenate (what space do) has higher priority than minus-.
You used $6 to save the value you want to compare, and then it becomes a part of $0 the line.(last column). -- You can change it to a temporary variable name.
You have a typo (p2=p2), and you used $p2 and $p3, which means to get p2's value and find the corresponding column. So if p2==3 then $p2 equals $3.
You didn't set OFS, so even if your code works, the output will be separated by spaces.
print will add a trailing newline\n, so even if above problems don't exist, you will get 4 lines instead of the 2 lines output you wanted.

Could you please try following too.
awk 'prev_2nd==$2 && prev_3rd==$3{$1=$1;print prev_line,$0} {prev_2nd=$2;prev_3rd=$3;$1=$1;prev_line=$0}' OFS=, Input_file
Explanation: Adding explanation for above code now.
awk '
prev_2nd==$2 && prev_3rd==$3{ ##Checking if previous lines variable prev_2nd and prev_3rd are having same value as current line 2nd and 3rd field or not, if yes then do following.
$1=$1 ##Resetting $1 value of current line to $1 only why because OP needs output field separator as comma and to apply this we need to reset it to its own value.
print prev_line,$0 ##Printing value of previous line and current line here.
} ##Closing this condition block here.
{
prev_2nd=$2 ##Setting current line $2 to prev_2nd variable here.
prev_3rd=$3 ##Setting current line $3 to prev_3rd variable here.
$1=$1 ##Resetting value of $1 to $1 to make comma in its values applied.
prev_line=$0 ##Now setting pre_line value to current line edited one with comma as separator.
}
' OFS=, Input_file ##Setting OFS(output field separator) value as comma here and mentioning Input_file name here.

Related

Add new column with times same value was found in 2 columns

Add new column with value of how many times the values in columns 1 and 2 contends exactly same value.
input file
46849,39785,2,012,023,351912.29,2527104.70,174.31
46849,39785,2,012,028,351912.45,2527118.70,174.30
46849,39785,3,06,018,351912.12,2527119.51,174.33
46849,39785,3,06,020,351911.80,2527105.83,174.40
46849,39797,2,012,023,352062.45,2527118.50,173.99
46849,39797,2,012,028,352062.51,2527105.51,174.04
46849,39797,3,06,020,352063.29,2527116.71,174.13,
46849,39809,2,012,023,352211.63,2527104.81,173.74
46849,39809,2,012,028,352211.21,2527117.94,173.69
46849,39803,2,012,023,352211.63,2527104.81,173.74
46849,39803,2,012,028,352211.21,2527117.94,173.69
46849,39801,2,012,023,352211.63,2527104.81,173.74
Expected output file:
4,46849,39785,2,012,023,351912.29,2527104.70,174.31
4,46849,39785,2,012,028,351912.45,2527118.70,174.30
4,46849,39785,3,06,018,351912.12,2527119.51,174.33
4,46849,39785,3,06,020,351911.80,2527105.83,174.40
3,46849,39797,2,012,023,352062.45,2527118.50,173.99
3,46849,39797,2,012,028,352062.51,2527105.51,174.04
3,46849,39797,3,06,020,352063.29,2527116.71,174.13,
2,46849,39809,2,012,023,352211.63,2527104.81,173.74
2,46849,39809,2,012,028,352211.21,2527117.94,173.69
2,46849,39803,2,012,023,352211.63,2527104.81,173.74
1,46849,39803,2,012,028,352211.21,2527117.94,173.69
1,46849,39801,2,012,023,352211.63,2527104.81,173.74
attempt:
awk -F, '{x[$1 $2]++}END{ for(i in x) {print i,x[i]}}' file
4684939785 4
4684939797 3
4684939801 1
4684939803 2
4684939809 2
Could you please try following.
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
a[$1,$2]++
next
}
{
print a[$1,$2],$0
}
' Input_file Input_file
Explanation: reading Input_file 2 times. Where first time I am creating an array named a with index of first and second field and counting their value on each occurrence too. On 2nd time file reading it printing count of the first 2 fields total and then printing while line.
One liner code:
awk 'BEGIN{FS=OFS=","} FNR==NR{a[$1,$2]++;next} {print a[$1,$2],$0}' Input_file Input_file

Print if col2 is greater than col2 in last line

I want to use awk to extract line with greater column value than last line.
the sample data like that
a 3
a 5
a 4
b 1
c 2
c 3
c 6
I try to use below command, but not worked
awk '{if(($1!=a) || ($1==a && $2>b)){getline; print}};{a=$1;b=$2}'
the expected output:
a 3
a 5
b 1
c 2
c 3
c 6
only "a 4" line should be removed, because 4 is smaller than 2nd column of last line (5).
But the actual result from my code:
a 5
c 2
c 6
How can I resolve it? Thanks
Here is one:
$ awk '$1!=p1 || $2>p2; {p1=$1;p2=$2}' file
a 3
a 5
b 1
c 2
c 3
c 6
If $1 changes or $2 is greater from previous round, print.
Generic solution for more fields, see this comment below.
Could you please try following(when you are NOT looking for same 1st field value should be compared).
awk '$2>prev; {prev=$2}' Input_file
In case you want to look for same 1st field's comparisons with 2nd field values then try following.
awk 'prev_1st!=$1 || prev!=""; $2>prev && prev_1st==$1; {prev=$2;prev_1st=$1}' Input_file
Explanation: Adding explanation for above code.
awk ' ##Starting awk program here.
prev_1st!=$1 || prev!="" ##Checking condition if prev_1st variable NOT equal to $1 OR variable prev is NULL then simply print the line.
$2>prev && prev_1st==$1 ##Checking condition if $2 is greater than prev AND prev_1st equals to $1 then print the line.
{
prev=$2 ##Creating variable prev and setting its value to $2.
prev_1st=$1 ##Creating variable prev_1st and setting its value to $1.
}
' Input_file ##Mentioning Input_file name here.

How do I sum of the first n rows of another column in bash

For example given
1 4
2 5
3 6
I want to sum up the numbers in the second column and create a new column with it. The new column is 4, 9 (4+5), and 15 (4+5+6)
1 4 4
2 5 9
3 6 15
Could you please try following if you are ok with awk.
awk 'FNR==1{print $0,$2;prev=$2;next} {print $0,$2+prev;prev+=$2}' Input_file
OR
awk 'FNR==1{print $0,$2;prev=$2;next} {prev+=$2;print $0,prev}' Input_file
Explanation: Adding explanation for above code now.
awk ' ##Startig awk program here.
FNR==1{ ##Checking condition if line is first line then do following.
print $0,$2 ##Printing current line with 2nd field here.
prev=$2 ##Creating variable prev whose value is 2nd field of current line.
next ##next will skip all further statements from here.
} ##Closing block for FNR condition here.
{ ##Starting new block here.
prev+=$2 ##Adding $2 value to prev variable value here.
print $0,prev ##Printing current line and prev variable here.
}' Input_file ##mentioning Input_file name here.
PS: Welcome to SO, you need to mention your efforts which you have put in order to solve your problems as we all are here to learn.
this is more idiomatic
$ awk '{print $0, s+=$2}' file
1 4 4
2 5 9
3 6 15
print the current line and the value s, which is incremented with second field, in other words is a rolling sum.
this can be golfed into the following if all values are positive (so no chance of summing to 0), but perhaps too cryptic.
$ awk '$3=s+=$2' file
Another awk..
$ cat john_ward.txt
1 4
2 5
3 6
$ awk ' {$(NF+1)=s+=$NF}1 ' john_ward.txt
1 4 4
2 5 9
3 6 15
$

awk to copy and move of file last line to previous line above

In the awk below I am trying to move the last line only, to the one above it. The problem with the below is that since my input file varies (not always 4 lines like in the below), I can not use i=3 everytime and can not seem to fix it. Thank you :).
file
this is line 1
this is line 2
this is line 3
this is line 4
desired output
this is line 1
this is line 2
this is line 4
this is line 3
awk (seems like the last line is being moved, but to i=2)
awk '
{lines[NR]=$0}
END{
print lines[1], lines[NR];
for (i=3; i<NR; i++) {print lines[i]}
}
' OFS=$'\n' file
this is line 1
this is line 2
this is line 4
this is line 3
$ seq 4 | awk 'NR>2{print p2} {p2=p1; p1=$0} END{print p1 ORS p2}'
1
2
4
3
$ seq 7 | awk 'NR>2{print p2} {p2=p1; p1=$0} END{print p1 ORS p2}'
1
2
3
4
5
7
6
try following awk once:
awk '{a[FNR]=$0} END{for(i=1;i<=FNR-2;i++){print a[i]};print a[FNR] ORS a[FNR-1]}' Input_file
Explanation: Creating an array named a with index FNR(current line's number) and keeping it's value to current line's value. Now in END section of awk, starting a for loop from i=1 to i<=FNR-2 why till FNR-2 because you need to swap only last 2 lines here. Once it prints all the lines then simply printing a[FNR](which is last line) and then printing a[FNR-1] with ORS(to print new line).
Solution 2nd: By counting the number of lines in a Input_file and putting them into a awk variable.
awk -v lines=$(wc -l < Input_file) 'FNR==(lines-1){val=$0;next} FNR==lines{print $0 ORS val;next} 1' Input_file
You nearly had it. You just have to change the order.
awk '
{lines[NR]=$0}
END{
for (i=1; i<NR-1; i++) {print lines[i]}
print lines[NR];
print lines[NR-1];
}
' OFS=$'\n' file
I'd reverse the file, swap the first two lines, then re-reverse the file
tac file | awk 'NR==1 {getline line2; print line2} 1' | tac

ignore spaces after particular column no

Hi Everyone I have below data.
61684 376 23 106 38695633 1 0 0 -1 /C/Program Files (x86)/ 16704 root;TrustedInstaller#NT:SERVICE root;TrustedInstaller#NT:SERVICE 0 1407331175 1407331175 1247541608
8634 416 13 86 574126 1 0 0 -1 /E/KYCImages/ 16832 root;kycfinal#CGKYCAPP03 root;None#CGKYCAPP03 0 1406018846 1406018846 1352415392
60971 472 22 86 38613076 1 0 0 -1 /E/KYCwebsvc binaries/ 16832 root;kycfinal#CGKYCAPP03 root;None#CGKYCAPP03 0 1390829495 1390829495 1353370744
1 416 10 86 1 1 0 0 -1 /E/KycApp/ 16832 root;kycfinal#CGKYCAPP03 root;None#CGKYCAPP03 0 1411465772 1411465772 1351291187
Now I am using below code:
awk 'BEGIN{FPAT = "([^ ]+)|(\"[^\"]+\")"}{print $10}' | awk '$1!~/^\/\./' | sort -u | sed -e 's/\,//g' | perl -p00e 's/\n(?!\Z)/;/g' filename
I am getting this output
/C/Program;/E/KycApp/;/E/KYCImages/;/E/KycServices/;/E/KYCwebsvc
However I need to start the output from $10 till "/" is again encountered, basically I want to ignore any spaces from column 10 till "/" is encountered.
Is it possible?
Desired output is
/C/Program Files (x86)/;/E/KycApp/;/E/KYCImages/;/E/KycServices/;/E/KYCwebsvc binaries/
With single gawk:
awk 'BEGIN{ FPAT="/[^/]+/[^/]+/"; PROCINFO["sorted_in"]="#ind_str_asc"; IGNORECASE = 1 }
{ a[$1] }END{ for(i in a) r=(r!="")? r";"i : i; print r }' filename
The output (without /E/KycServices/; - cause it's not within your input):
/C/Program Files (x86)/;/E/KycApp/;/E/KYCImages/;/E/KYCwebsvc binaries/
try following too in single awk.
awk '{match($0,/\/.*\//);VAL=VAL?VAL ORS substr($0,RSTART,RLENGTH):substr($0,RSTART,RLENGTH)} END{num=split(VAL, array,"\n");for(i=1;i<=num;i++){printf("%s%s",array[i],i==num?"":";")};print""}' Input_file
Will add non-one liner form of solution with explanation too shortly.
EDIT1: Adding non-one liner form of solution successfully too now.
awk '{
match($0,/\/.*\//);
VAL=VAL?VAL ORS substr($0,RSTART,RLENGTH):substr($0,RSTART,RLENGTH)
}
END{
num=split(VAL, array,"\n");
for(i=1;i<=num;i++){
printf("%s%s",array[i],i==num?"":";")
};
print""
}
' Input_file
EDIT2: Adding explanation of code in non-one liner form of solution too now.
awk '{
match($0,/\/.*\//); ##Using match functionality of awk which will match regex to find the string in a line from / to \, note I am escaping them here too.
VAL=VAL?VAL ORS substr($0,RSTART,RLENGTH):substr($0,RSTART,RLENGTH) ##creating a variable named VAL here which will concatenate its own value if more than one occurrence are there. Also RSTART and RSTART are the variables of built-in awk which will be having values once a match has TRUE value which it confirms once a regex match is found in a line.
}
END{ ##Starting this block here.
num=split(VAL, array,"\n");##creating an variable num whose value will be number of elements in array named array, split is a built-in keyword of awk which will create an array with a defined delimiter, here it is new line.
for(i=1;i<=num;i++){ ##Starting a for loop here whose value will go till num value from i variable value 1 to till num.
printf("%s%s",array[i],i==num?"":";") ##printing the array value whose index is variable i and second string it is printing is semi colon, there a condition is there if i value is equal to num then print null else print a semi colon.
};
print"" ##print NULL value to print a new line.
}
' Input_file ###Mentioning the Input_file here.