I have a file (72 columns totally) and would like to add every other columns starting from column4,
infile
20170101 1 1 1.5 2 2 3 3
20170101 2 1 2 2 4 3 4
20170101 3 1 5 2 3 3 6
The output should be
20170101 1 6.5
20170101 2 10
20170101 3 14
this is what I have, but it will not work.
awk '{for(i=4;i<=NF;i+=2) sum[i]+=$i; print}' infile
Thank you for help.
Following simple awk could help you on this.
awk '{for(i=4;i<=NF;i+=2){sum+=$i};print $1,$2,sum;sum=0}' Input_file
Adding a non-one liner form of solution too now.
awk '
{
for(i=4;i<=NF;i+=2){ sum+=$i };
print $1,$2,sum;
sum=0
}
' Input_file
Related
I have a very large table of values that is formatted like this:
apple 1 1
apple 2 1
apple 3 1
apple 4 1
banana 25 4
banana 35 10
banana 36 10
banana 37 10
Column 1 has many different fruit, with varying numbers of rows for each fruit.
I would like to calculate the cumulative sum of column 3 for each type of fruit in column 1, and the cumulative percentage of the total at each row, and add these as new columns. So the desired output would be this:
apple 1 1 1 25.00
apple 2 1 2 50.00
apple 3 1 3 75.00
apple 4 1 4 100.00
banana 25 4 4 11.76
banana 35 10 14 41.18
banana 36 10 24 70.59
banana 37 10 34 100.00
I can get part way there with awk, but I am struggling with how to get the cumulative sum to reset at each new fruit. Here is my horrendous awk attempt for your viewing pleasure:
#!/bin/bash
awk '{cumsum += $3; $3 = cumsum} 1' fruitfile > cumsum.tmp
total=$(awk '{total=total+$3}END{print total}' fruitfile)
awk -v total=$total '{ printf ("%s\t%s\t%s\t%.5f\n", $1, $2, $3, ($3/total)*100)}' cumsum.tmp > cumsum.txt
rm cumsum.tmp
Could you please try following, written and tested with shown samples.
awk '
FNR==NR{
a[$1]+=$NF
next
}
{
sum[$1]+=($NF/a[$1])*100
print $0,++b[$1],sum[$1]
}
' Input_file Input_file |
column -t
Output for shown samples will be as follows.
apple 1 1 1 25
apple 2 1 2 50
apple 3 1 3 75
apple 4 1 4 100
banana 25 4 1 11.7647
banana 35 10 2 41.1765
banana 36 10 3 70.5882
banana 37 10 4 100
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
a[$1]+=$NF ##Creating array a with index $1 and keep adding its last field value to it.
next ##next will skip all further statements from here.
}
{
sum[$1]+=($NF/a[$1])*100 ##Creating sum with index 1st field and keep adding its value to it, each value will have last field/value of a[$1] and multiplying it with 100.
print $0,++b[$1],sum[$1] ##Printing current line, array b with 1st field with increasing value of 1 and sum with index of 1st field.
}
' Input_file Input_file | ##Mentioning Input_file name here.
column -t ##Sending awk output to column command for better look.
Using awk I would like to insert a row whenever the value in the second column changes.
I have:
1 3
2 3
3 1
4 1
5 2
I would like to get:
1 3
2 3
>
3 1
4 1
>
5 2
Could anyone point me in the right direction how this can be achieved in one file?
You can use this awk:
awk 'NR==1{prev=$2; print; next} prev!=$2{print ">"} {prev=$2}1' file
I'm trying to compare 2 tables and retrieve the matches based on two columns:
file 1
0.736 5 100 T
0.723 1 15 T
0.792 6 100 T
0.634 3 100 T
0.754 7 100 T
0.708 2 100 T
0.722 9 100 T
0.542 1 6 T
File 2
0.736 5
0.634 3
0.542 1
output
0.736 5 100 T
0.634 3 100 T
0.542 1 6 T
When I try this code it tells me that awk is not found, which doesnt make sense because I use awk regularly.. Could you help me out spotting the error here please?
awk 'FNR==NR{a[$1,$2]=$0;next}{if(b=a[$1,$2]){print b}}' file1 file2> output
you could use grep
grep -f file2 file1
or awk
awk 'NR==FNR{A[$1];next}$1 in A' file2 file1
Hope this helps :)
I have a file that looks like the following
1
1
1
1
1
1
12
2
2
2
2
2
2
2
3
4
What I want to do is convert this column in multiple rows. Each new line/row should start after 5 entries, so that the output will like like this
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
I tried to achieve that by using
awk '{printf "%s" (NR%5==0?"RS:FS),$1}' file
but I get the following error
awk: line 1: runaway string constant "RS:FS),$1} ...
Any idea on how to achieve the desired output?
Maybe this awk one liner can help.
awk '{if (NR%5==0){a=a $0" ";print a; a=""} else a=a $0" "}END{print}' file
Output:
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
Longer awk:
{
if (NR%5==0)
{
a=a $0" ";
print a;
a="";
}
else
{
a=a $0" ";
}
}
END
{
print
}
Slightly different approach, still using awk:
$ awk '{if (NR%5) {ORS=""} else {ORS="\n"}{print " "$0}}' input.txt
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
Using perl:
$ perl -p -e 's/\n/ / if $.%5' input.txt
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
No need to complicate...
$ pr -5ats' ' <file
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
I have these two files
File1:
9 8 6 8 5 2
2 1 7 0 6 1
3 2 3 4 4 6
File2: (which has over 4 million lines)
MN 1 0
JK 2 0
AL 3 90
CA 4 83
MK 5 54
HI 6 490
I want to compare field 6 of file1, and compare field 2 of file 2. If they match, then put field 3 of file2 at the end of file1
I've looked at other solutions but I can't get it to work correctly.
Desired output:
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
My attempt:
awk 'NR==FNR{a[$2]=$2;next}a[$6]{print $0,a[$6]}' file2 file1
program just hangs after that.
To print all lines in file1 with match if available:
$ awk 'FNR==NR{a[$2]=$3;next;} {print $0,a[$6];}' file2 file1
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
To print only the lines that have a match:
$ awk 'NR==FNR{a[$2]=$3;next} $6 in a {print $0,a[$6]}' file2 file1
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
Note that I replaced a[$2]=$2 with a[$2]=$3 and changed the test a[$6] (which is false if the value is zero) to $6 in a.
Your own attempt basically has two bugs as seen in #John1024's answer:
You use field 2 as both key and value in a, where you should be storing field 3 as the value (since you want to keep it for later), i.e., it should be a[$2] = $3.
The test a[$6] is false when the value in a is zero, even if it exists. The correct test is $6 in a.
Hence:
awk 'NR==FNR { a[$2]=$3; next } $6 in a {print $0, a[$6] }' file2 file1
However, there might be better approaches, but it is not clear from your specifications. For instance, you say that file2 has over 4 million lines, but it is unknown if there are also that many unique values for field 2. If yes, then a will also have that many entries in memory. And, you don't specify how long file1 is, or if its order must be preserved for output, or if every line (even without matches in file2) should be output.
If it is the case that file1 has many fewer lines than file2 has unique values for field 2, and only matching lines need to be output, and order does not need to be preserved, then you might wish to read file1 first…