Compare 4 columns in two files; and output the line for unique combination (from first file) and line for duplicate combination (from second file) - awk

I have two tab separated values file, say
File1.txt
chr1 894573 rs13303010 GG
chr2 18674 rs10195681 **CC**
chr3 104972 rs990284 AA <--- Unique Line
chr4 111487 rs17802159 AA
chr5 200868 rs4956994 **GG**
chr5 303686 rs6896163 AA <--- Unique Line
chrX 331033 rs4606239 TT
chrY 2893277 i4000106 **GG**
chrY 2897433 rs9786543 GG
chrM 57 i3002191 **TT**
File2.txt
chr1 894573 rs13303010 GG
chr2 18674 rs10195681 AT
chr4 111487 rs17802159 AA
chr5 200868 rs4956994 CC
chrX 331033 rs4606239 TT
chrY 2893277 i4000106 GA
chrY 2897433 rs9786543 GG
chrM 57 i3002191 TA
Desired Output:
Output.txt
chr1 894573 rs13303010 GG
chr2 18674 rs10195681 AT
chr3 104972 rs990284 AA <--Unique Line from File1.txt
chr4 111487 rs17802159 AA
chr5 200868 rs4956994 CC
chr5 303686 rs6896163 AA <--Unique Line from File1.txt
chrX 331033 rs4606239 TT
chrY 2893277 i4000106 GA
chrY 2897433 rs9786543 GG
chrM 57 i3002191 TA
File1.txt has total 10 entries while File2.txt has 8 entries.
I want to compare the both the file using Column 1 and Column 2.
If both the file's first two column values are same, it should print the corresponding line to Output.txt from File2.txt.
When File1.txt has unique combination (Column1:column2, which is not present in File2.txt) it should print the corresponding line from File1.txt to the Output.txt.
I tried various awk and perl combination available at website, but couldn't get correct answer.
Any suggestion will be helpful.
Thanks,
Amit

next time, show your awk code tryso we can help on error or missing object
awk 'NR==FNR || (NR>=FNR&&($1","$2 in k)){k[$1,$2]=$0}END{for(K in k)print k[K]}' file1 file2

Related

How to update one file's column from another file's column in awk

I have two files, the first file:
1 AA
2 BB
3 CC
4 DD
and the second file
15 AA
17 BB
20 CC
25 FF
File 1 should be updated and the expected output should looks like this:
15 AA
17 BB
20 CC
4 DD
I have tried this script from another post but it didn't work
awk 'NR==FNR{a[$1]=$2;next}a[$1]{print $2,a[$1]}' file1 file2
$ awk 'NR==FNR{a[$2]=$1; next} $2 in a{$1=a[$2]} 1' file2 file1
15 AA
17 BB
20 CC
4 DD
Here is an awk:
awk 'FNR==NR{f2[$2]=$0; next}
$2 in f2 {print f2[$2]; next}
1' file2 file1
Prints:
15 AA
17 BB
20 CC
4 DD

Cut Columns and Append to Same File

I'm working with a tab separated file on MacOS. The file contains 15 columns and thousands of rows. I want to cut columns 1, 2, and 3 and then append them with columns 11, 12, and 13. I was hoping to do this in a pipe so that no extra files need to be created. The only post I found used a command sponge but I evidently don't have that on MacOS, or it isn't in my BASH.
The input tsv file is actually being generated within the same line of code,
arbitrary command to generate input.tsv | cut -f1-3,11-13 | <Step to cut -f4-6 and append -f1-3> | sort > out.file
Input tsv
chr1 21018 21101 A B C D E F G chr1 20752 21209
chr10 74645 74836 A B C D E F G chr10 74638 74898
chr10 75267 75545 A B C D E F G chr10 75280 75917
chr4 212478 212556 A B C D E F G chr4 212491 213285
Desired Output tsv
chr1 21018 21101
chr1 20752 21209
chr10 74638 74898
chr10 74645 74836
chr10 75280 75917
chr4 212478 212556
chr4 212491 213285
Using perl and awk :
code
perl -pe 's/chr[0-9]+/\n$&/g' file | awk '/./{print $1, $2, $3}'
 Output
chr1 21018 21101
chr1 20752 21209
chr10 74645 74836
chr10 74638 74898
chr10 75267 75545
chr10 75280 75917
chr4 212478 212556
chr4 212491 213285
here is short awk solution:
awk '{print $1, $2, $3, "\n" $1, $12, $13;}' input.tsv
output:
chr1 21018 21101
chr1 20752 21209
chr10 74645 74836
chr10 74638 74898
chr10 75267 75545
chr10 75280 75917
chr4 212478 212556
chr4 212491 213285
Explanation
{ # for each input line
print $1, $2, $3; # print 1st field, append 2nd and 3rd fields. Terminate with new line
print $1, $12, $13; #print 1st field, append 12th and 13th field. Terminate with new line
}

manipulating columns in a text file in awk

I have a tab separated text file and want to do some math operation on one column and make a new tab separated text file.
this is an example of my file:
chr1 144520803 144520804 12 chr1 144520813 58
chr1 144520840 144520841 12 chr1 144520845 36
chr1 144520840 144520841 12 chr1 144520845 36
chr1 144520848 144520849 14 chr1 144520851 32
chr1 144520848 144520849 14 chr1 144520851 32
i want to change the 4th column. in fact I want to divide every single element in the 4th column by sum of all elements in the 4th column and then multiply by 1000000 . like the expected output.
expected output:
chr1 144520803 144520804 187500 chr1 144520813 58
chr1 144520840 144520841 187500 chr1 144520845 36
chr1 144520840 144520841 187500 chr1 144520845 36
chr1 144520848 144520849 218750 chr1 144520851 32
chr1 144520848 144520849 218750 chr1 144520851 32
I am trying to do that in awk using the following command but it does not return what I want. do you know how to fix it:
awk '{print $1 "\t" $2 "\t" $3 "\t" $4/{sum+=$4}*1000000 "\t" $5 "\t" $6 "\t" $7}' myfile.txt > new_file.txt
you need two passes, one to compute the sum and then to scale the field
something like this
$ awk -v OFS='\t' 'NR==FNR {sum+=$4; next}
{$4*=(1000000/sum)}1' file{,} > newfile

awk,merge two data sets based on column value

I need to combine two data sets stored in variables. This merge needs to be conditional based on the value of 1st column of "$x" and third column of "$y"
-->echo "$x"
12 hey
23 hello
34 hi
-->echo "$y"
aa bb 12
bb cc 55
ff gg 34
ss ww 23
By following command, I managed to store the value of first column of $x in a[] and check for third column of $y but not getting what I am expecting, can someone please help here.
awk 'NR==FNR{a[$1]=$1;next} $3 in a{print $0,a[$1]}' <(echo "$x") <(echo "$y")
aa bb 12
ff gg 34
ss ww 23
Expected result:
aa bb 12 hey
ff gg 34 hi
ss ww 23 hello
Your answer is almost right:
awk 'NR==FNR{a[$1]=$2;next} ($3 in a){print $0,a[$3]}' <(echo "$x") <(echo "$y")
Note the a[$1]=$2 and the print $0,a[$3].
join -1 1 -2 3 <(sort -k 1b,1 a.txt) <(sort -k 3b,3 b.txt) |awk '{print $3, $4, $1, $2 }'
Might be a solution for your input in two textfiles a.txt and b.txt using join on your two number columns.
It does not keep the order though. You might have to sort again if it is important.

Reading from a file and writing to another using Awk

There are two tab delimiter text files. My aim is to change File 1 so that corresponding values in the 2nd column of File 2 will be substituted with zeros in File 1.
To visualize,
File 1:
AA 0
BB 0
CC 0
DD 0
EE 0
File 2:
AA 256
DD 142
EE 26
File 1 - Output:
AA 256
BB 0
CC 0
DD 142
EE 26
I wrote below but as you can see I give the value of 1st row of File 2 by hand. I want to achieve this task automatically. What should I do?
awk -F'\t' 'BEGIN {OFS=FS} {if($1 == "AA") $2="256";print}' test > test.tmp && mv test.tmp test
Thank you in advance.
awk 'BEGIN {FS=OFS="\t"} NR==FNR{a[$1]=$2; next} {print $1, a[$1]+0}' file2 file1