This question already has answers here:
how to match two lines and subtract them [closed]
(3 answers)
Match two columns and put them in one file
(2 answers)
Closed 8 years ago.
I have two files in each file i have tow columns ,i need to match the first value of column two of file1 with each value from column two of file2 if they are equal i need to subtract the value of the matched value of column one from each other, after that i need to combine column one from the two files in one file but the two columns should be adjacent to each other
if the two vales do not matched do nothing
file1
344 0
465 1
729 2
777 3
676 4
862 5
file2
766 0
937 1
980 2
237 3
736 5
example
422
208
251
I don't understand how you come up with your given output. This might help you:
$ join -j 2 file1 file2
0 344 766
1 465 937
2 729 980
3 777 237
5 862 736
But where you go from there, you need to provide more details.
To expand a bit on #glenn jackman's answer... perhaps this:
join -j 2 file1 file2 | awk '{if ($3>$2) print $3-$2; else print $2-$3}'
That will print the absolute difference between the two column 1's when there is a match in the column 2's. But it doesn't match your expected output, which, aside from the first value, doesn't seem to have an obvious relation to your inputs in any way that seems to match your question...
May be this can help:
awk '
BEGIN { printf "%s\t%s\t%s\n","File1","File2","Difference(f2-f1)" }
NR==FNR { a[$2]=$1; next }
{ printf "%d\t%d\t%d\n",a[$2],$1,$1-a[$2] }' file1 file2
Output:
File1 File2 Difference(f2-f1)
344 766 422
465 937 472
729 980 251
777 237 -540
862 736 -126
Related
Hi I am trying to remove "duplicate" lines from a file but I want to maintain the file order and I dont want to match the entire line, just the second column.
Example
23 google.com 345 432 3
543 google.com d9 0ds aa
8 amazon.com 820 2 2
45 google.com 80a s0d e
32 yahoo.com wqq 33 234
Would become
23 google.com 345 432 3
8 amazon.com 820 2 2
32 yahoo.com wqq 33 234
I know how to sort -u -o file but that matches the enitre line and it reorders the file. I saw this awk '!seen[$0]++' file which will avoid the sorting but it still matches the entire line.
Anyone know if this can be done?
Thanks,
Chris
The awk solution can be modified to match only the second column:
awk '!seen[$2]++' file
Try just matching the 2nd column:
awk '!seen[$2]++'
i want to perform two different sort and count on a file, based on each line's content.
1. i need to take the first column of a .tsv file
i would like to group by each line that starts with three digits, and keep only the three first digits, and for everything else, just sort and count the whole occurrence of the sentence in the first column.
Sample data:
687/878 9
890987 4
01a 55
1b 8743917
890a 34
abcdee 987
dfeqfe fkdjald
890897 34213
6878853 834
32fasd 53891
abcdee 8794371
abd 873
result:
687 2
890 3
01a 1
1b 1
32fasd 1
abd 1
dfeqfe 1
abcdee 2
I would also appreciate a solution that would
also take into account a sample input like
687/878 9
890987 4
01a 55
1b 8743917
890a 34
abcdee 987
dfeqfe 545
890897 34213
6878853 834
(632)fasd 53891
(88)abcdee 8794371
abd 873
so the first column may have values like (,), #, ', all kind of characters
so output will have two columns, the first with the values extracted, and the second with the new count, with the new values extracted from the source file.
Again preferred output format tsv.
so i need to extract all values that start with
^\d\d\d, and then for these three first digits, sort and count unique values,
but in a second pass, also do the same for each line, that does not start with 3 digits, but this time, keep the whole columns value and sort count by it.
what i have tried:
| sort | uniq -c | sort -nr for the lines that do start with ^\d\d\d, and
the same for those that do not fulfill the above regex, but is there a more elegant way using either sed or awk?
$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ cnt[/^[0-9]{3}/ ? substr($1,1,3) : $1]++ }
END {
for (key in cnt) {
print (key !~ /^[0-9]{3}/), cnt[key], key, cnt[key]
}
}
$ awk -f tst.awk file | sort -k1,2n | cut -f3-
687 1
890 2
abcdee 1
You can try Perl
$ cat nefijaka.txt
687 878 9
890987 4
890a 34
abcdee 987
$ perl -lne ' /^(\d{3})|(\S+)/; $x=$1?$1:$2; $kv{$x}++; END { print "$_\t$kv{$_}" for (sort keys %kv) } ' nefijaka.txt
687 1
890 2
abcdee 1
$
You can pipe it to sort and get the values sorted..
$ perl -lne ' /^(\d{3})|(\S+)/; $x=$1?$1:$2; $kv{$x}++; END { print "$_\t$kv{$_}" for (sort keys %kv) } ' nefijaka.txt | sort -k2 -nr
890 2
abcdee 1
687 1
EDIT1:
$ cat nefijaka.txt2
687 878 9
890987 4
890a 34
abcdee 987
a word and then 23
$ perl -lne ' /^(\d{3})|(.+?\t)/; $x=$1?$1:$2; $x=~s/\t//g; $kv{$x}++; END { print "$_\t$kv{$_}" for (sort keys %kv) } ' nefijaka.txt2
687 1
890 2
a word and then 1
abcdee 1
$
I want to output the sum of every N lines, for example, every 4 lines:
cat file
1
11
111
1111
2
22
222
2222
3
33
333
3333
The output should be:
6 #(1+2+3)
66 #(11+22+33)
666 #(111+222+333)
6666 #(1111+2222+3333)
How can I do this with awk?
Basically you can use the following awk command:
awk -vN=4 '{s[NR%N]+=$0}END{for(i=0;i<N;i++){print s[i]}}' input.txt
You can choose N like you wish.
Output:
6666
6
66
666
But you see, the output isn't sorted as expected when iterating through an awk array. You can fix this by shifting the line number by -1:
awk -vN=4 '{s[(NR-1)%N]+=$0}END{for(i=0;i<N;i++){print s[i]}}' a.txt
Output:
6
66
666
6666
my question
i have one file
344 0
465 1
729 2
777 3
676 4
862 5
766 0
937 1
980 2
837 3
936 5
i need to compare each two pair (zero with zero, one with one and so on) if the value exist(any value of column two should exist two times) subtract 766-344 , 937-465 and so on if not exist like the forth value do nothing (4 exist one time so do nothing) the output
422
472
251
060
074
also i need to add index
example
1 422
2 472
3 251
4 060
5 074
finally i need to add this code as part of tcl script, or function of tcl porgram
I have a tcl script contain awk functions like this
set awkCBR0 {
{
if ($1 == "r" && $6 == 280) {
print $2, i >> "cbr0.q";
i +=1 ;
}
}
}
exec rm -f cbr0.q
exec touch cbr0.q
exec awk $awkCBR0 cbr.trq
thanks
Try this:
awk 'a[$2]{printf "%d %03d\n",++x,$1-a[$2];next}{a[$2]=$1}' file
Output
$ awk 'a[$2]{printf "%d %03d\n",++x,$1-a[$2];next}{a[$2]=$1}' file
1 422
2 472
3 251
4 060
5 074
I will leave it for you to add it to tcl function.
I have file1 as a result of a first operation, it has the following structure
201 12 0.298231 8.8942
206 13 -0.079795 0.6367
101 34 0.86348 0.7456
301 15 0.215355 4.6378
303 16 0.244734 5.9895
and file2 as a result of a different operation and has the same type of structure.
File 2 sample
204 60 -0.246038 6.0535
304 83 -0.246209 6.0619
101 34 -0.456629 6.0826
211 36 -0.247003 6.1011
305 83 -0.247134 6.1075
206 46 -0.247485 6.1249
210 39 -0.248066 6.1537
107 41 -0.248201 6.1603
102 20 -0.248542 6.1773
I would like to select fields 1 and 2 that have a field 3 value higher than a threshold in file1 (0.8) , then for these selected values of field 1 and 2, select the values that have a field 3 value higher than another threshold in file 2 (abs(x)=0.4).
Note that although files 1 and 2 have the same structure fields 1 and 2 values are not the same (not the same number of lines etc..)
Can you do this with awk?
desired output
101 34
If you combine awk with unix commands you can do the following
sort file1.txt > sorted1.txt
sort file2.txt > sorted2.txt
Sorting will allow you to use JOIN on the first line (which I assume is unique). Now field 3 of file1 is $3 and file2 is $6. Using awk you can write the following.:
join sorted1.txt sorted2.txt | awk 'function abs(value){return (value<0?-value:value);}{print $1"\t"$2} $3 >=0.8 && abs($6) >=0.4'
In essence, in the awk you first write a function to deal with absolute values, then you simply ask it to print line 1 and 2 selecting for the criteria you detailed at $3 and $6 (formely field 3 of file1 and file2 respectively)
Hope this helps...