Using awk I would like to insert a row whenever the value in the second column changes.
I have:
1 3
2 3
3 1
4 1
5 2
I would like to get:
1 3
2 3
>
3 1
4 1
>
5 2
Could anyone point me in the right direction how this can be achieved in one file?
You can use this awk:
awk 'NR==1{prev=$2; print; next} prev!=$2{print ">"} {prev=$2}1' file
Related
I have a file (72 columns totally) and would like to add every other columns starting from column4,
infile
20170101 1 1 1.5 2 2 3 3
20170101 2 1 2 2 4 3 4
20170101 3 1 5 2 3 3 6
The output should be
20170101 1 6.5
20170101 2 10
20170101 3 14
this is what I have, but it will not work.
awk '{for(i=4;i<=NF;i+=2) sum[i]+=$i; print}' infile
Thank you for help.
Following simple awk could help you on this.
awk '{for(i=4;i<=NF;i+=2){sum+=$i};print $1,$2,sum;sum=0}' Input_file
Adding a non-one liner form of solution too now.
awk '
{
for(i=4;i<=NF;i+=2){ sum+=$i };
print $1,$2,sum;
sum=0
}
' Input_file
I have a file that looks like the following
1
1
1
1
1
1
12
2
2
2
2
2
2
2
3
4
What I want to do is convert this column in multiple rows. Each new line/row should start after 5 entries, so that the output will like like this
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
I tried to achieve that by using
awk '{printf "%s" (NR%5==0?"RS:FS),$1}' file
but I get the following error
awk: line 1: runaway string constant "RS:FS),$1} ...
Any idea on how to achieve the desired output?
Maybe this awk one liner can help.
awk '{if (NR%5==0){a=a $0" ";print a; a=""} else a=a $0" "}END{print}' file
Output:
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
Longer awk:
{
if (NR%5==0)
{
a=a $0" ";
print a;
a="";
}
else
{
a=a $0" ";
}
}
END
{
print
}
Slightly different approach, still using awk:
$ awk '{if (NR%5) {ORS=""} else {ORS="\n"}{print " "$0}}' input.txt
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
Using perl:
$ perl -p -e 's/\n/ / if $.%5' input.txt
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
No need to complicate...
$ pr -5ats' ' <file
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
I have these two files
File1:
9 8 6 8 5 2
2 1 7 0 6 1
3 2 3 4 4 6
File2: (which has over 4 million lines)
MN 1 0
JK 2 0
AL 3 90
CA 4 83
MK 5 54
HI 6 490
I want to compare field 6 of file1, and compare field 2 of file 2. If they match, then put field 3 of file2 at the end of file1
I've looked at other solutions but I can't get it to work correctly.
Desired output:
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
My attempt:
awk 'NR==FNR{a[$2]=$2;next}a[$6]{print $0,a[$6]}' file2 file1
program just hangs after that.
To print all lines in file1 with match if available:
$ awk 'FNR==NR{a[$2]=$3;next;} {print $0,a[$6];}' file2 file1
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
To print only the lines that have a match:
$ awk 'NR==FNR{a[$2]=$3;next} $6 in a {print $0,a[$6]}' file2 file1
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
Note that I replaced a[$2]=$2 with a[$2]=$3 and changed the test a[$6] (which is false if the value is zero) to $6 in a.
Your own attempt basically has two bugs as seen in #John1024's answer:
You use field 2 as both key and value in a, where you should be storing field 3 as the value (since you want to keep it for later), i.e., it should be a[$2] = $3.
The test a[$6] is false when the value in a is zero, even if it exists. The correct test is $6 in a.
Hence:
awk 'NR==FNR { a[$2]=$3; next } $6 in a {print $0, a[$6] }' file2 file1
However, there might be better approaches, but it is not clear from your specifications. For instance, you say that file2 has over 4 million lines, but it is unknown if there are also that many unique values for field 2. If yes, then a will also have that many entries in memory. And, you don't specify how long file1 is, or if its order must be preserved for output, or if every line (even without matches in file2) should be output.
If it is the case that file1 has many fewer lines than file2 has unique values for field 2, and only matching lines need to be output, and order does not need to be preserved, then you might wish to read file1 first…
I have a file in which the 4th column has numbers.
If 4th column is greater than 2 I want to add 5th column corresponding as gain; otherwise, the 5th column will have the string loss.
Input
1 762097 6706109 6
1 7202143 7792617 3
1 8922949 9815420 1
1 10502346 11074110 3
1 11188922 12267136 1
1 12566829 13910626 3
Desired output:
1 762097 6706109 6 gain
1 7202143 7792617 3 gain
1 8922949 9815420 1 loss
1 10502346 11074110 3 gain
1 11188922 12267136 1 loss
1 12566829 13910626 4 gain
How should I do this with awk?
Use awk like this:
$ awk '{print $0, ($4>2?"gain":"lose")}' file
1 762097 6706109 6 gain
1 7202143 7792617 3 gain
1 8922949 9815420 1 lose
1 10502346 11074110 3 gain
1 11188922 12267136 1 lose
1 12566829 13910626 3 gain
As you see, it is printing the full line ($0) followed by a string. This string is determined by the value of $4 using a ternary operator.
I have a one column file composed by only integer as
1
1
4
3
3
2
I want to count how many time a number appear in the file. The output file should be:
1 2
2 1
3 2
4 1
Thanks
try this line:
awk '{a[$0]++}END{for(x in a)print x,a[x]}' file
awk '{tot[$0]++} END{for (n in tot) {print n,tot[n]}} ' numbers