Awk command has unexpected results when comparing two files - awk
I am using an awk command to compare the first column in two file.
I want to take col1 of file1 and if there is a match in col1 of file2, update the "date updated" in the last column. If there is no match, I want to append the entire line of file1 to file2 and append a "date updated" value to that line as well. Here is the command I'm currently using:
awk 'FNR == NR { f1[$1] = $0; next }
$1 in f1 { print; delete f1[$1] }
END { for (user in f1) print f1[user] }' file1 file2
File1:
tnash,172.2,plasma-de+,serv01,datetimeNEW
jhwagner,169.4,plasma-de+,serv01,datetimeNEW
steadmah,161.1,plasma-de+,serv01,datetimeNEW
guillwt,158.3,plasma-de+,serv01,datetimeNEW
mwinebra,122.2,plasma-de+,serv01,datetimeNEW
jbomba,100.0,plasma-de+,serv01,datetimeNEW
mschrode,161.1,plasma-de+,serv01,datetimeNEW
File2:
jbomba,114.0,plasma-de+,serv01,datetimeOLD
mschrode,104.0,plasma-de+,serv01,datetimeOLD
deleteme,192.0,random,serv01,datetimeOLD #<---- Needs to be removed: WORKS!
Expected Output:(order does not matter)
tnash,172.2,plasma-de+,serv01,datetimeNEW
jhwagner,169.4,plasma-de+,serv01,datetimeNEW
steadmah,161.1,plasma-de+,serv01,datetimeNEW
guillwt,158.3,plasma-de+,serv01,datetimeNEW
mwinebra,122.2,plasma-de+,serv01,datetimeNEW
jbomba,100.0,plasma-de+,serv01,datetimeOLD #<---- NEED THIS VALUE
mschrode,161.1,plasma-de+,serv01,datetimeOLD #<---- NEED THIS VALUE
Current Output:(order does not matter)
tnash,172.2,plasma-de+,serv01,datetimeNEW
jhwagner,169.4,plasma-de+,serv01,datetimeNEW
steadmah,161.1,plasma-de+,serv01,datetimeNEW
guillwt,158.3,plasma-de+,serv01,datetimeNEW
mwinebra,122.2,plasma-de+,serv01,datetimeNEW
jbomba,100.0,plasma-de+,serv01,datetimeNEW #<----WRONG OUTPUT
mschrode,161.1,plasma-de+,serv01,datetimeNEW #<----WRONG OUTPUT
The Logic Broken Down:
If $usr/col1 in file2 does NOT exist in file1
remove entire line from file2
(ex: line3 in file2, user: deleteme)
If $usr/col1 in file1 does NOT exist in file2
append entire line to file2
(ex: lines 1-5 in file1)
So the issue is, when there IS a match between the two files, I need to keep the information from file2, not the information from file1. In the output examples above you'll see I need to keep the datetimeOLD from file2 along with the new information from file1.
Set field separator to comma, and read file2 first:
$ awk -F',' 'FNR==NR{a[$1]=$0;next} $1 in a{print a[$1];next} 1' file2 file1
tnash,172.2,plasma-de+,serv01,datetimeNEW
jhwagner,169.4,plasma-de+,serv01,datetimeNEW
steadmah,161.1,plasma-de+,serv01,datetimeNEW
guillwt,158.3,plasma-de+,serv01,datetimeNEW
mwinebra,122.2,plasma-de+,serv01,datetimeNEW
jbomba,114.0,plasma-de+,serv01,datetimeOLD
mschrode,104.0,plasma-de+,serv01,datetimeOLD
Related
Use awk to remove lines based on a column from another file
I have the following code that works to extract lines from the multiple-column file_1 that have a value in the first column that appears in the single-column file_2: awk 'NR==FNR{a[$1][$0];next} $0 in a {for (i in a[$0]) print i}' file_1 file_2 I got this code from the answer to this question: AWK to filter a file based upon columns of another file I want to change to code to do the opposite, namely to remove every line from file_1 where the first column matches any value that appears in the single-column file_2. How do I do this?
awk 'NR==FNR { arr[$0]="1";next } arr[$1]!="1" { print $0 }' file2 file1 Process the second file first (NR==FNR) and create an array called arr, with the line ($0) as the key and 1 the value. Then when processing the next file (file1), check if the first space delimited field ($1) exists as a key in the array arr and if it doesn't, print the line. Direct the output to a file if you want to store the results: awk 'NR==FNR { arr[$0]="1";next } arr[$1]!="1" { print $0 }' file2 file1 > file3
how to get the common rows according to the first column in awk
I have two ',' separated files as follow: file1: A,inf B,inf C,0.135802 D,72.6111 E,42.1613 file2: A,inf B,inf C,0.313559 D,189.5 E,38.6735 I want to compare 2 files ans get the common rows based on the 1st column. So, for the mentioned files the out put would look like this: A,inf,inf B,inf,inf C,0.135802,0.313559 D,72.6111,189.5 E,42.1613,38.6735 I am trying to do that in awk and tried this: awk ' NR == FNR {val[$1]=$2; next} $1 in val {print $1, val[$1], $2}' file1 file2 this code returns this results: A,inf B,inf C,0.135802 D,72.6111 E,42.1613 which is not what I want. do you know how I can improve it?
$ awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1]=$0;next}$1 in a{print a[$1],$2}' file1 file2 A,inf,inf B,inf,inf C,0.135802,0.313559 D,72.6111,189.5 E,42.1613,38.6735 Explained: $ awk ' BEGIN {FS=OFS="," } # set separators NR==FNR { # first file a[$1]=$0 # hash to a, $1 as index next # next record } $1 in a { # second file, if $1 in a print a[$1],$2 # print indexed record from a with $2 }' file1 file2
Your awk code basically works, you are just missing to tell awk to use , as the field delimiter. You can do it by adding BEGIN{FS=OFS=","} to the beginning of the script. But having that the files are sorted like in the examples in your question, you can simply use the join command: join -t, file1 file2 This will join the files based on the first column. -t, tells join that columns are separated by commas. If the files are not sorted, you can sort them on the fly like this: join -t, <(sort file1) <(sort file2)
printing multiple NR from one file based on the value from other file using awk
I want to print out multiple rows from one file based on the input values from the other. Following is the representation of file 1: 2 4 1 Following is the representation of file 2: MANCHKLGO kflgklfdg fhgjpiqog fkfjdkfdg fghjshdjs jgfkgjfdk ghftrysba gfkgfdkgj jfkjfdkgj Based on the first column of the first file, the code should first print the second row of the second file followed by fourth row and then the first row of the second file. Hence, the output should be following: kflgklfdg fkfjdkfdg MANCHKLGO Following are the codes that I tried: awk 'NR==FNR{a[$1];next}FNR in a{print $0}' file1.txt file2.txt However, as expected, the output is not in the order as it first printed the first row then the second and fourth row is the last. How can I print the NR from the second file as exactly in the order given in the first file?
Try: $ awk 'NR==FNR{a[NR]=$0;next} {print a[$1]}' file2 file1 kflgklfdg fkfjdkfdg MANCHKLGO How it works NR==FNR{a[NR]=$0;next} This saves the contents of file2 in array a. print a[$1] For each number in file1, we print the desired line of file2. Solution to earlier version of question $ awk 'NR==FNR{a[NR]=$0;next} {print a[2*$1];print a[2*$1+1]}' file2 file1 fkfjdkfdg fghjshdjs gfkgfdkgj jfkjfdkgj kflgklfdg fhgjpiqog
Another take: awk ' NR==FNR {a[$1]; order[n++] = $1; next} FNR in a {lines[FNR] = $0} END {for (i=0; i<n; i++) print lines[order[i]]} ' file1.txt file2.txt This version stores fewer lines in memory, if your files are huge.
How to print two lines of several files to a new file with speicific order?
I have a task to do with awk. I am doing sequence analysis for some genes. I have several files with sequences in order. I would like to extract first sequence of each file into new file and like till the last sequence. I know only how to do with first or any specific line with awk. awk 'FNR == 2 {print; nextfile}' *.txt > newfile Here I have input like this File 1 Saureus081.1 ATCGGCCCTTAA Saureus081.2 ATGCCTTAAGCTATA Saureus081.3 ATCCTAAAGGTAAGG File 2 SaureusRF1.1 ATCGGCCCTTAC SauruesRF1.2 ATGCCTTAAGCTAGG SaureusRF1.3 ATCCTAAAGGTAAGC File 3 SaureusN305.1 ATCGGCCCTTACT SauruesN305.2 ATGCCTTAAGCTAGA SaureusN305.3 ATCCTAAAGGTAATG And similar files 12 are there File 4 . . . .File 12 Required Output Newfile Saureus081.1 ATCGGCCCTTAA SaureusRF1.1 ATCGGCCCTTAC SaureusN305.1 ATCGGCCCTTACT Saureus081.2 ATGCCTTAAGCTATA SaureusRF1.2 ATGCCTTAAGCTAGG SauruesN305.2 ATGCCTTAAGCTAGA Saureus081.3 ATCCTAAAGGTAAGG SaureusRF1.3 ATCCTAAAGGTAAGC SaureusN305.3 ATCCTAAAGGTAATG I guess this task can be done easily with awk but not getting any idea how to do for multiple lines
Based on the modified question, the answer shall be done with some changes. $ awk -F'.' 'NR%2{k=$2;v=$0;getline;a[k]=a[k]?a[k] RS v RS $0:v RS $0} END{for(i in a)print a[i]}' file1 file2 file3 Saureus081.1 ATCGGCCCTTAA SaureusRF1.1 ATCGGCCCTTAC SaureusN305.1 ATCGGCCCTTACT Saureus081.2 ATGCCTTAAGCTATA SauruesRF1.2 ATGCCTTAAGCTAGG SauruesN305.2 ATGCCTTAAGCTAGA Saureus081.3 ATCCTAAAGGTAAGG SaureusRF1.3 ATCCTAAAGGTAAGC SaureusN305.3 ATCCTAAAGGTAATG Brief explanation, Set '.' as the delimeter For every odd record, distinguish k=$2 as the key of array a Invoke getline to set $0 of next record as the value corresponds to the key k Print the whole array for the last step
If your data is very large, I would suggest creating temporary files: awk 'FNR%2==1 { filename = $1 } { print $0 >> filename }' file1 ... filen Afterwards, you can cat them together: cat Seq1 ... Seqn > result This has the additional advantage that it will work if not all sequences are present in all files.
paste + awk solution: paste File1 File2 | awk '{ p=$2;$2="" }NR%2{ k=p; print }!(NR%2){ v=p; print $1 RS k RS v }' paste File1 File2 - merge corresponding lines of files p=$2;$2="" - capture the value of the 2nd field which is the respective key/value from File2 The output: Seq1 ATCGGCCCTTAA Seq1 ATCGGCCCTTAC Seq2 ATGCCTTAAGCTATA Seq2 ATGCCTTAAGCTAGG Seq3 ATCCTAAAGGTAAGG Seq3 ATCCTAAAGGTAAGC Additional approach for multiple files: paste Files[0-9]* | awk 'NR%2{ k=$1; n=NF; print k } !(NR%2){ print $1; for(i=2;i<=n;i++) print k RS $i }'
output the record by comparing string in other file using awk script
I want to output the record having matching string in another file using awk script file1 code 849002|48|1208004|1 849007|28|1208004|1 855003|48|1208004|1 855004|28|1208004|1 855006|28|1208004|1 file2 code : 00990029000000004804470425|ST1400029|0.550|Recurring|1248073|ST1400029 00990029000000008410517183|IM1450029|1.000|Recurring|855003|ST1400029 009900290000000000007800612988|IM3350029|1.000|Recurring|1248063|ST1400029 Notice that 855003 occurs in the middle row of each sample? That's the match I'm looking for, and the output should be: 00990029000000008410517183|IM1450029|1.000|Recurring|855003|ST1400029 Because I want to search $1 of file1 in $5 in file2, if match found then output the line I tried this but its returning zero record awk 'NR==FNR{a[$1]=$1;next}a[$5]{print $0}' file2 file1 > outfile Your help will resolve my issue, I have to search long list of data
Don't forget to set the delimiter using the -F flag: awk -F "|" 'FNR==NR { a[$1]; next } $5 in a' file1 file2 Results: 00990029000000008410517183|IM1450029|1.000|Recurring|855003|ST1400029
try this (didn't test) awk 'NR==FNR{a[$1];next}$5 in a' file1 file2