matching columns in two files with different line numbers - awk

This is a rather repeated question but I could not figure it out with my files, so, any help will be highly appreciated.
I have two files, I want to compare their first fields and print the common lines into a third file, an example of my files:
file 1:
gene1
gene2
gene3
file 2:
gene1|trans1|12|233|345 45
gene1|trans2|12|342|232 45
gene2|trans2|12|344|343 12
gene2|trans2|12|344|343 45
gene2|trans2|12|344|343 12
gene2|trans3|12|34r|343 325
gene2|trans2|12|344|343 545
gene3|trans4|12|344|333 454
gene3|trans2|12|343|343 545
gene3|trans3|12|344|343 45
gene4|trans2|12|344|343 2112
gene4|trans2|12|344|343 455
file 2 contains more fields. Please pay attention that the first field is not exactly like the first file but the gene element only matches.
The output should look like this:
gene1|trans1|12|233|345 45
gene1|trans2|12|342|232 45
gene2|trans2|12|344|343 12
gene2|trans2|12|344|343 45
gene2|trans2|12|344|343 12
gene2|trans3|12|34r|343 325
gene2|trans2|12|344|343 545
gene3|trans4|12|344|333 454
gene3|trans2|12|343|343 545
gene3|trans3|12|344|343 45
I use this code, it does not give me any error but it does not give me any output either:
awk '{if (f[$1] != FILENAME) a[$1]++; f[$1] = FILENAME; } END{ for (i in a) if (a[i] > 1) print i; }' file1 file1
thank you very much

Some like this?
awk -F\| 'FNR==NR {a[$0]++;next} $1 in a' file1 file2
gene1|trans1|12|233|345 45
gene1|trans2|12|342|232 45
gene2|trans2|12|344|343 12
gene2|trans2|12|344|343 45
gene2|trans2|12|344|343 12
gene2|trans3|12|34r|343 325
gene2|trans2|12|344|343 545
gene3|trans4|12|344|333 454
gene3|trans2|12|343|343 545
gene3|trans3|12|344|343 45

In this example, grep is sufficient:
grep -w -f file1 file2

Related

conditional awk statement to create a new field with additive value

Question
How would I use awk to create a new field that has $2+consistent value?
I am planning to cycle through a list of values but I wouldn't mind using a one liner for each command
PseudoCode
awk '$1 == Bob {$4 = $2 + 400}' file
Sample Data
Philip 13 2
Bob 152 8
Bob 4561 2
Bob 234 36
Bob 98 12
Rey 147 152
Rey 15 1547
Expected Output
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547
just quote Bob, also you want to add third field not second
$ awk '$1=="Bob" {$4=$3+400}1' file | column -t
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547
Here , check if $1 is equal to Bob and , reconstruct the record ($0) by appending $2 FS 400 in to $0. Here FS is the field separator used between 3rd and 4th fields. 1 in the end means tell awk to take the default action which is print.
awk '$1=="Bob"{$0=$0 FS $2 + 400}1' file
Philip 13 2
Bob 152 8 552
Bob 4561 2 4961
Bob 234 36 634
Bob 98 12 498
Rey 147 152
Rey 15 1547
Or , if you want to keep name(Bob) as variable
awk -vname="Bob" '$1==name{$0=$0 FS $2 + 400}1' file
1st solutiuon: Could you please try following too once. I am using here NF and NF+1 awk's out of the box variables. Where $NF denotes value of last column of current line and $(NF+1) will create an additional column if condition of st field stringBob` is found is TRUE.
awk '{$(NF+1)=$1=="Bob"?400+$NF:""} 1' OFS="\t" Input_file
2nd solution: In case we don't want to create a new field and simply want to print the values as per condition then try following, this should be more faster I believe.
awk 'BEGIN{OFS="\t"}{$1=$1;print $0,$1=="Bob"?400+$NF:""}' Input_file
Output will be as follows.
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547
Explanation: Adding explanation for above code now.
awk ' ##Starting awk program here.
{
$(NF+1)=$1=="Bob"?400+$NF:"" ##Creating a new last field here whose value will be depending upon condition check.
##its checking condition if 1st field is having Bob string in it then add 400 value to last field value or make it NULL.
}
1 ##awk works on method of condition then action. so by mentioning 1 making condition TRUE here and NO action defined so by default print of current line will happen.
' OFS="\t" Input_file ##Setting OFS as TAB here where OFS ois output field separator and mentioning Input_file name here.

exchange columns based on some conditions

I have a text file with 5 columns. If the number of the 5th column is less than the 3rd column, replace the 4th and 5th column as 2nd and 3rd column. If the number of the 5th column is greater than 3rd column, leave that line as same.
1EAD A 396 B 311
1F3B A 118 B 171
1F5V A 179 B 171
1G73 A 162 C 121
1BS0 E 138 G 230
Desired output
1EAD B 311 A 396
1F3B A 118 B 171
1F5V B 171 A 179
1G73 C 121 A 162
1BS0 E 138 G 230
$ awk '{ if ($5 >= $3) print $0; else print $1"\t"$4"\t"$5"\t"$2"\t"$3; }' foo.txt

How to append the count of numbers in each line of text using awk?

I have several very large text files and would like to append the count of numbers following by a space in front of each line. Could anyone kindly suggest how to do it efficiently using Awk?
Input:
10 109 120 200 1148 1210 1500 5201
9 139 1239 1439 6551
199 5693 5695
Desired Output:
8 10 109 120 200 1148 1210 1500 5201
5 9 139 1239 1439 6551
3 199 5693 5695
You can use
awk '{print NF,$0}' input.txt
It says print number of field of the current line NF separated by current field separator , which in this case is a space then print the current line itself $0.
this will work for you:
awk '{$0=NF FS $0}7' file

awk print multiple column file into single column

My file looks like that:
315
717
461 737
304
440
148 206 264 322 380 438 496
801
495
355
249 989
768
946
I want to print all those columns in a single column file (one long first column).
If I try to
awk '{print $1}'> new_file;
awk '{print $2}' >> new_file
There are white spaces in between. How to solve this thing?
Perhaps a bit cryptic:
awk '1' RS='[[:space:]]+' inputfile
It says "print every record, treating any whitespace as record separators".
You can simply use something like:
awk '{ for (i=1; i<=NF; i++) print $i }' file
For each line, iterate through columns, and print each column in a new line.
You don't need as much as sed for this: just translate spaces to newlines
tr ' ' '\n' < file
tr is purely a filter, so you have to redirect a file into it.
A perl solution:
perl -pe 's/\s+(?=\S)/\n/g' infile
Output:
315
717
461
737
304
440
148
206
264
322
380
438
496
801
495
355
249
989
768
946

Linux red-hat5.4 + awk file manipulation

how to match PARAM (param=name) word in file.txt and print the lines between
NAMEx and NAMEy, via awk , as the following way :
if PARAM matched in the file.txt , then awk will print only the words between the close NAMES strings while PARAM is one of the names
remark1 PARAM can be any name as Pitter , Bob , etc.....
remark2 awk will get PARAM=(any name)
remark3 we not know how many spaces we have between (# to NAME)
more file.txt
# NAMES1
Pitter 23
Bob 75
# NAMES2
Donald 54
Josef 85
Patrick 21
# NAMES3
Tom 32
Jennifer 85
Svetlana 25
# NAMES4
examples ( regarding file.txt contents )
In case PARAM=pitter then awk will print the names to out.txt file
Pitter 23
Bob 75
In case PARAM=Josef then awk will print the names to out.txt file
Donald 54
Josef 85
Patrick 21
In case PARAM=Jennifer then awk will print the names to out.txt file
Tom 32
Jennifer 85
Svetlana 25
using RS of awk would be helpful in this case. see the test below:
testing with example
kent$ cat file
# NAMES1
Pitter 23
Bob 75
# NAMES2
Donald 54
Josef 85
Patrick 21
# NAMES3
Tom 32
Jennifer 85
Svetlana 25
# NAMES4
kent$ awk -vPARAM="Pitter" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Pitter 23
Bob 75
kent$ awk -vPARAM="Josef" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Donald 54
Josef 85
Patrick 21
kent$ awk -vPARAM="Jennifer" 'BEGIN{RS="# NAMES."} {if($0~PARAM)print}' file
Tom 32
Jennifer 85
Svetlana 25
note, there are some empty lines in output, because they existed in your input. however it would be easy to remove them from output.
update
if you have spaces between # and NAMES, you can try:
awk -vPARAM="Pitter" 'BEGIN{RS="# *NAMES."} {if($0~PARAM)print}' file