Add new column with value of how many times the values in columns 1 and 2 contends exactly same value.
input file
46849,39785,2,012,023,351912.29,2527104.70,174.31
46849,39785,2,012,028,351912.45,2527118.70,174.30
46849,39785,3,06,018,351912.12,2527119.51,174.33
46849,39785,3,06,020,351911.80,2527105.83,174.40
46849,39797,2,012,023,352062.45,2527118.50,173.99
46849,39797,2,012,028,352062.51,2527105.51,174.04
46849,39797,3,06,020,352063.29,2527116.71,174.13,
46849,39809,2,012,023,352211.63,2527104.81,173.74
46849,39809,2,012,028,352211.21,2527117.94,173.69
46849,39803,2,012,023,352211.63,2527104.81,173.74
46849,39803,2,012,028,352211.21,2527117.94,173.69
46849,39801,2,012,023,352211.63,2527104.81,173.74
Expected output file:
4,46849,39785,2,012,023,351912.29,2527104.70,174.31
4,46849,39785,2,012,028,351912.45,2527118.70,174.30
4,46849,39785,3,06,018,351912.12,2527119.51,174.33
4,46849,39785,3,06,020,351911.80,2527105.83,174.40
3,46849,39797,2,012,023,352062.45,2527118.50,173.99
3,46849,39797,2,012,028,352062.51,2527105.51,174.04
3,46849,39797,3,06,020,352063.29,2527116.71,174.13,
2,46849,39809,2,012,023,352211.63,2527104.81,173.74
2,46849,39809,2,012,028,352211.21,2527117.94,173.69
2,46849,39803,2,012,023,352211.63,2527104.81,173.74
1,46849,39803,2,012,028,352211.21,2527117.94,173.69
1,46849,39801,2,012,023,352211.63,2527104.81,173.74
attempt:
awk -F, '{x[$1 $2]++}END{ for(i in x) {print i,x[i]}}' file
4684939785 4
4684939797 3
4684939801 1
4684939803 2
4684939809 2
Could you please try following.
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
a[$1,$2]++
next
}
{
print a[$1,$2],$0
}
' Input_file Input_file
Explanation: reading Input_file 2 times. Where first time I am creating an array named a with index of first and second field and counting their value on each occurrence too. On 2nd time file reading it printing count of the first 2 fields total and then printing while line.
One liner code:
awk 'BEGIN{FS=OFS=","} FNR==NR{a[$1,$2]++;next} {print a[$1,$2],$0}' Input_file Input_file
I want to use awk to extract line with greater column value than last line.
the sample data like that
a 3
a 5
a 4
b 1
c 2
c 3
c 6
I try to use below command, but not worked
awk '{if(($1!=a) || ($1==a && $2>b)){getline; print}};{a=$1;b=$2}'
the expected output:
a 3
a 5
b 1
c 2
c 3
c 6
only "a 4" line should be removed, because 4 is smaller than 2nd column of last line (5).
But the actual result from my code:
a 5
c 2
c 6
How can I resolve it? Thanks
Here is one:
$ awk '$1!=p1 || $2>p2; {p1=$1;p2=$2}' file
a 3
a 5
b 1
c 2
c 3
c 6
If $1 changes or $2 is greater from previous round, print.
Generic solution for more fields, see this comment below.
Could you please try following(when you are NOT looking for same 1st field value should be compared).
awk '$2>prev; {prev=$2}' Input_file
In case you want to look for same 1st field's comparisons with 2nd field values then try following.
awk 'prev_1st!=$1 || prev!=""; $2>prev && prev_1st==$1; {prev=$2;prev_1st=$1}' Input_file
Explanation: Adding explanation for above code.
awk ' ##Starting awk program here.
prev_1st!=$1 || prev!="" ##Checking condition if prev_1st variable NOT equal to $1 OR variable prev is NULL then simply print the line.
$2>prev && prev_1st==$1 ##Checking condition if $2 is greater than prev AND prev_1st equals to $1 then print the line.
{
prev=$2 ##Creating variable prev and setting its value to $2.
prev_1st=$1 ##Creating variable prev_1st and setting its value to $1.
}
' Input_file ##Mentioning Input_file name here.
For example given
1 4
2 5
3 6
I want to sum up the numbers in the second column and create a new column with it. The new column is 4, 9 (4+5), and 15 (4+5+6)
1 4 4
2 5 9
3 6 15
Could you please try following if you are ok with awk.
awk 'FNR==1{print $0,$2;prev=$2;next} {print $0,$2+prev;prev+=$2}' Input_file
OR
awk 'FNR==1{print $0,$2;prev=$2;next} {prev+=$2;print $0,prev}' Input_file
Explanation: Adding explanation for above code now.
awk ' ##Startig awk program here.
FNR==1{ ##Checking condition if line is first line then do following.
print $0,$2 ##Printing current line with 2nd field here.
prev=$2 ##Creating variable prev whose value is 2nd field of current line.
next ##next will skip all further statements from here.
} ##Closing block for FNR condition here.
{ ##Starting new block here.
prev+=$2 ##Adding $2 value to prev variable value here.
print $0,prev ##Printing current line and prev variable here.
}' Input_file ##mentioning Input_file name here.
PS: Welcome to SO, you need to mention your efforts which you have put in order to solve your problems as we all are here to learn.
this is more idiomatic
$ awk '{print $0, s+=$2}' file
1 4 4
2 5 9
3 6 15
print the current line and the value s, which is incremented with second field, in other words is a rolling sum.
this can be golfed into the following if all values are positive (so no chance of summing to 0), but perhaps too cryptic.
$ awk '$3=s+=$2' file
Another awk..
$ cat john_ward.txt
1 4
2 5
3 6
$ awk ' {$(NF+1)=s+=$NF}1 ' john_ward.txt
1 4 4
2 5 9
3 6 15
$
In the awk below I am trying to move the last line only, to the one above it. The problem with the below is that since my input file varies (not always 4 lines like in the below), I can not use i=3 everytime and can not seem to fix it. Thank you :).
file
this is line 1
this is line 2
this is line 3
this is line 4
desired output
this is line 1
this is line 2
this is line 4
this is line 3
awk (seems like the last line is being moved, but to i=2)
awk '
{lines[NR]=$0}
END{
print lines[1], lines[NR];
for (i=3; i<NR; i++) {print lines[i]}
}
' OFS=$'\n' file
this is line 1
this is line 2
this is line 4
this is line 3
$ seq 4 | awk 'NR>2{print p2} {p2=p1; p1=$0} END{print p1 ORS p2}'
1
2
4
3
$ seq 7 | awk 'NR>2{print p2} {p2=p1; p1=$0} END{print p1 ORS p2}'
1
2
3
4
5
7
6
try following awk once:
awk '{a[FNR]=$0} END{for(i=1;i<=FNR-2;i++){print a[i]};print a[FNR] ORS a[FNR-1]}' Input_file
Explanation: Creating an array named a with index FNR(current line's number) and keeping it's value to current line's value. Now in END section of awk, starting a for loop from i=1 to i<=FNR-2 why till FNR-2 because you need to swap only last 2 lines here. Once it prints all the lines then simply printing a[FNR](which is last line) and then printing a[FNR-1] with ORS(to print new line).
Solution 2nd: By counting the number of lines in a Input_file and putting them into a awk variable.
awk -v lines=$(wc -l < Input_file) 'FNR==(lines-1){val=$0;next} FNR==lines{print $0 ORS val;next} 1' Input_file
You nearly had it. You just have to change the order.
awk '
{lines[NR]=$0}
END{
for (i=1; i<NR-1; i++) {print lines[i]}
print lines[NR];
print lines[NR-1];
}
' OFS=$'\n' file
I'd reverse the file, swap the first two lines, then re-reverse the file
tac file | awk 'NR==1 {getline line2; print line2} 1' | tac
Hi Everyone I have below data.
61684 376 23 106 38695633 1 0 0 -1 /C/Program Files (x86)/ 16704 root;TrustedInstaller#NT:SERVICE root;TrustedInstaller#NT:SERVICE 0 1407331175 1407331175 1247541608
8634 416 13 86 574126 1 0 0 -1 /E/KYCImages/ 16832 root;kycfinal#CGKYCAPP03 root;None#CGKYCAPP03 0 1406018846 1406018846 1352415392
60971 472 22 86 38613076 1 0 0 -1 /E/KYCwebsvc binaries/ 16832 root;kycfinal#CGKYCAPP03 root;None#CGKYCAPP03 0 1390829495 1390829495 1353370744
1 416 10 86 1 1 0 0 -1 /E/KycApp/ 16832 root;kycfinal#CGKYCAPP03 root;None#CGKYCAPP03 0 1411465772 1411465772 1351291187
Now I am using below code:
awk 'BEGIN{FPAT = "([^ ]+)|(\"[^\"]+\")"}{print $10}' | awk '$1!~/^\/\./' | sort -u | sed -e 's/\,//g' | perl -p00e 's/\n(?!\Z)/;/g' filename
I am getting this output
/C/Program;/E/KycApp/;/E/KYCImages/;/E/KycServices/;/E/KYCwebsvc
However I need to start the output from $10 till "/" is again encountered, basically I want to ignore any spaces from column 10 till "/" is encountered.
Is it possible?
Desired output is
/C/Program Files (x86)/;/E/KycApp/;/E/KYCImages/;/E/KycServices/;/E/KYCwebsvc binaries/
With single gawk:
awk 'BEGIN{ FPAT="/[^/]+/[^/]+/"; PROCINFO["sorted_in"]="#ind_str_asc"; IGNORECASE = 1 }
{ a[$1] }END{ for(i in a) r=(r!="")? r";"i : i; print r }' filename
The output (without /E/KycServices/; - cause it's not within your input):
/C/Program Files (x86)/;/E/KycApp/;/E/KYCImages/;/E/KYCwebsvc binaries/
try following too in single awk.
awk '{match($0,/\/.*\//);VAL=VAL?VAL ORS substr($0,RSTART,RLENGTH):substr($0,RSTART,RLENGTH)} END{num=split(VAL, array,"\n");for(i=1;i<=num;i++){printf("%s%s",array[i],i==num?"":";")};print""}' Input_file
Will add non-one liner form of solution with explanation too shortly.
EDIT1: Adding non-one liner form of solution successfully too now.
awk '{
match($0,/\/.*\//);
VAL=VAL?VAL ORS substr($0,RSTART,RLENGTH):substr($0,RSTART,RLENGTH)
}
END{
num=split(VAL, array,"\n");
for(i=1;i<=num;i++){
printf("%s%s",array[i],i==num?"":";")
};
print""
}
' Input_file
EDIT2: Adding explanation of code in non-one liner form of solution too now.
awk '{
match($0,/\/.*\//); ##Using match functionality of awk which will match regex to find the string in a line from / to \, note I am escaping them here too.
VAL=VAL?VAL ORS substr($0,RSTART,RLENGTH):substr($0,RSTART,RLENGTH) ##creating a variable named VAL here which will concatenate its own value if more than one occurrence are there. Also RSTART and RSTART are the variables of built-in awk which will be having values once a match has TRUE value which it confirms once a regex match is found in a line.
}
END{ ##Starting this block here.
num=split(VAL, array,"\n");##creating an variable num whose value will be number of elements in array named array, split is a built-in keyword of awk which will create an array with a defined delimiter, here it is new line.
for(i=1;i<=num;i++){ ##Starting a for loop here whose value will go till num value from i variable value 1 to till num.
printf("%s%s",array[i],i==num?"":";") ##printing the array value whose index is variable i and second string it is printing is semi colon, there a condition is there if i value is equal to num then print null else print a semi colon.
};
print"" ##print NULL value to print a new line.
}
' Input_file ###Mentioning the Input_file here.