Compare two column in two different files using awk - awk
I have two files as given below
file1
1|a
2|b
3|c
4|d
5|e
6|g
7|h
8|i
file2
2
3
expected output using awk in linux
1|a
4|d
5|e
6|g
7|h
8|i
Tried below but not getting expected op
awk '{k=$1 FS $2} NR!=FNR{a[k]; next} !(k in a)' file1 file2
expected output using awk in linux
1|a
4|d
5|e
6|g
7|h
8|i
We have to set different FS for both Input_file(s) since once Input_file is PIPE(|) delimited and other is not, could you please try following. This will only print those lines which are NOT in Input_file2 and present in Input_file1.
awk 'FNR==NR{a[$0];next} !($1 in a)' Input_file2 FS="|" Input_file1
Output will be as follows.
1|a
4|d
5|e
6|g
7|h
8|i
Explanation: Adding explanation for above command.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when Input_file2 is being read.
a[$0] ##Creating an array named a whose index is $0 here.
next ##next function is awk out of the box function which skips all further statements from here onward.
} ##Closing BLOCK for FNR==NR condition here.
!($1 in a) ##Checking condition if $1 is NOT present in Input_file1, this condition will be executed when Input_file named Input_file1 is being read.
' Input_file2 FS="|" Input_file1 ##Mentioning Input_file2 name then setting FS as pipe and mentioning Input_file1 here.
Related
How to join two CSV files by a temporary common column in awk?
I have two CSV files in the form of file1 A,44 A,21 B,65 C,79 file2 A,7 B,4 C,11 I used awk as awk -F, 'NR==FNR{a[$1]=$0;next} ($1 in a){print a[$1]","$2 }' file1.csv file2.csv producing A,44,7 A,21,7 B,65,4 C,79,11 a[$1] prints the entire line from file1. How can I omit the first columns in both files (the first column is only used to match the second columns) to produce: 44,7 21,7 65,4 79,11 In other words, how can I pass the columns from the first file to the print block, as $2 does for the second file?
Could you please try following, tested and written on shown samples only. awk 'BEGIN{FS=OFS=","} FNR==NR{a[$1]=$2;next} ($1 in a){print $2,a[$1]}' file2 file1 Explanation: Adding detailed explanation for above. awk ' ##Starting awk program from here. BEGIN{ ##Starting BEGIN section from here. FS=OFS="," ##Setting field and output field separator as comma here. } FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file2 is being read. a[$1]=$2 ##Creating array a with index $1 and value is $2 from current line. next ##next will skip all further statement from here. } ($1 in a){ ##Statements from here will be executed when file1 is being read and it's checking if $1 is present in array a then do following. print $2,a[$1] ##Printing 2nd field and value of array a with index $1 here. } ' file2 file1 ##Mentioning Input_file names here. Output will be as follows for shown samples. 44,7 21,7 65,4 79,11 2nd solution: More Generic solution, where considering that your both Input_files could have duplicates in that case it will print 1st value of A in Input_file1 to first value of Input_file2 and so on. awk ' BEGIN{ FS=OFS="," } FNR==NR{ a[$1] b[$1,++c[$1]]=$2 next } ($1 in a){ print $2,b[$1,++d[$1]] } ' file2 file1
You can join them using the join command and chose which fields you want to have in the output: kent$ join -t',' -o 1.2,2.2 file1 file2 44,7 21,7 65,4 79,11
bash compare two columns with exact match
I am comparing columns between two files for exact match but I am ending up with inaccurate result. Example as follows. File1 File2 adam sunny jhon adam kelly adam matt kevin stuart adam Gary Gary When we look at the files there is only match i.e. Garry. My output should be following. Emptyline Emptyline Emptyline Emptyline Emptyline Gary In order to achieve requirement. I am running the following command awk 'NR==FNR { n[$1]=$0;next } ($1 in n) { print n[$1],$2 }' file1 file2 and I am getting output as follows adam adam adam Garry
You should be tracking line numbers, not just line contents: $ awk 'NR==FNR { lines[NR]=$0; next } { if ($0 == lines[FNR]) print; else print "" }' file1.txt file2.txt Gary
1st solution: With simple awk. awk 'FNR==NR{a[FNR]=$0;next} a[FNR]==$0{print;next} {print ""}' file1 file2 OR as per anubhava sir's comment: awk 'FNR==NR{a[FNR]=$0;next} a[FNR]!=$0{$0=""} 1' file1 file2 Explanation: Adding detailed explanation for above code. awk ' ##Starting awk program from here. FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first file Input_file1 is being read. a[FNR]=$0 ##Creating an array a with index FNR and value of current line here. next ##next will skip all further statements from here. } a[FNR]==$0{ ##Checking condition if value of array a with FNR index and current line is equal then do following. print $0,a[FNR] ##Printing current line and value array a with index FNR here. } ' file1 file2 ##Mentioning Input_file names here 2nd solution: Considering that your actual Input_file(s) have only 2 columns as per shown samples, could you please try following then. paste Input_file1 Input_file2 | awk '$1==$2{print $1};$1!=$2{print ""}' This code will only print lines whose values are equal in Input_file1 and Input_file2.
Extract info using awk
My input is: group;|line1_1|line2_2|line3_3 I want to extract the inforamtion line1,line2,line3 as below output: line1,line2,line3 I have tried by using the following command but not executable: LINE="group;|line1_1|line2_2|line3_3"; echo $LINE | awk -F ";" '{print $2}' | awk -F "|" '{for(i=1;i<=NF;++i){print $i | system("cut -d _ -f1")}}'
Considering that your actual Input_file is same as shown samples. Could you please try following. awk -F'|' '{for(i=2;i<=NF;i++){sub(/_.*/,"",$i);val=val?val OFS $i:$i};print val;val=""}' OFS="," Input_file 2nd Solution: Only using sub and gsub of awk here. awk '{sub(/^[^|]*\|/,"");gsub(/_[0-9]\|/,",");sub(/_[0-9]$/,"")} 1' Input_file OR awk '{gsub(/^[^|]*\||_[0-9]$/,"");gsub(/_[0-9]\|/,",")} 1' Input_file Output will be as follows. line1,line2,line3 Explanation: Adding explanation for above code now. awk -F'|' ' ##Setting field separator as | here for all lines. { ##Starting block here. for(i=2;i<=NF;i++){ ##Starting for loop starting form i=2 to till value of NF here. sub(/_.*/,"",$i) ##Using sub for substitution from _ to till everything with NULL in value of $i. val=val?val OFS $i:$i ##Creating variable val and concatenate its own value if it is having NON ZERO value else save in it for 1st time. } ##Close for loop block here. print val ##Printing value of val here. val="" ##Nullifying value of variable val here. }' OFS="," Input_file ##Setting value of OFS to comma here and mentioning Input_file name here.
Comparing Only First 4 columns from File1 (CSV Formatted ) to First 4 columns in File2 and printing all columns from File1 i
I'm Trying to compare two Files in CSV format First File is dynamic , daily the columns will be added , on second File it has only 4 columns ( static ) comparing First 4 columns on File1 to 4 columns on File2 and printing all columns from File1 which matches from File2 Ex : File1 AIXTSM1,VHOST,10.199.114.72,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP AIXTSM1,VHOST,ADMET007,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP AIXTSM2,VHOST,ADMET014,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP AIXTSM1,VHOST,AGGREGATE,DAILY_2200_VM_SDC-CTL-PROD5,COMP,COMP,COMP AIXTSM1,PHOST,APLEE01,DAILY_2000_SU_W3,COMP,COMP,COMP AIXTSM1,PHOST,APYRK02,DAILY_2000_SU_W3,COMP,COMP,COMP AIXTSM2,PHOST,APYRK04,DAILY_1800_V7K,COMP,COMP,COMP AIXTSM1,VHOST,ARCLIC01,DAILY_2200_VM_SDC-CTL-PROD5,COMP,COMP,COMP AIXTSM2,PHOST,ARIELN,DAILY_1800_V7K,COMP,COMP,COMP AIXTSM2,VHOST,ASMET005,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP AIXTSM2,PHOST,ASMET014,WIN_INCRE_2000,COMP,COMP,COMP AIXTSM1,VHOST,ASMET038,DAILY_1800_VM_SDC-CTL-PROD2,COMP,COMP,COMP AIXTSM2,VHOST,ASMET042,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP AIXTSM1,VHOST,ASMET044,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP AIXTSM2,VHOST,ASMET046,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP AIXTSM2,VHOST,ASMET068,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP AIXTSM2,VHOST,ASMET069,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP AIXTSM2,VHOST,ASMET070,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP AIXTSM2,VHOST,ASMET071,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP AIXTSM2,VHOST,ASMET072,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP AIXTSM2,VHOST,ASMET073,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP AIXTSM1,PHOST,ASMET074,DAILY_INCR_1900,COMP,COMP,COMP AIXTSM1,VHOST,ASMET084-T,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP File2 AIXTSM1,VHOST,10.199.114.72,DAILY_1800_VM_SDC-CTL-PROD AIXTSM1,VHOST,ADMET007,DAILY_1800_VM_SDC-CTL-PROD3 AIXTSM2,VHOST,ADMET014,DAILY_1900_VM_UDC-CTL-PROD AIXTSM1,VHOST,AGGREGATE,DAILY_2200_VM_SDC-CTL-PROD5 Result AIXTSM1,VHOST,10.199.114.72,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP,OK AIXTSM1,VHOST,ADMET007,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP,OK AIXTSM2,VHOST,ADMET014,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP,OK AIXTSM1,VHOST,AGGREGATE,DAILY_2200_VM_SDC-CTL-PROD5,COMP,COMP,COMP,OK Code awk -F, 'NR==FNR{ arr[$2]=$1 $2 $3 $4; next } { print $0, (arr[$2]==$1 $2 $3 $4?"OK":"NOK") }' OFS=, File2 File1 But it matches only first line .
I believe your expected output's first line is typo? Since all 4 fields of Input_file2 are NOT coming in Input_file1. Could you please try following. awk 'BEGIN{FS=OFS=","}FNR==NR{a[$1,$2,$3,$4];next} (($1,$2,$3,$4) in a){print $0, "OK"}' Input_file2 Input_file1 Explanation: Adding explanation for above code too here. awk -F, ' ##Mentioning field separator as comma(,) here for all lines of Input_file(s). BEGIN{ ##Starting BEGIN section of awk program here. FS=OFS="," ##Setting field separator and output field separator as comma(,) here. } FNR==NR{ ##FNR==NR condition will be when 1st Input_file named Input_file2 is being read. a[$1,$2,$3,$4] ##Creating an array named a whose index is $1,$2,$3,$4 fields of Input_file2 lines. next ##next will skip all further statements from here. } ##Closing first condition block now. (($1,$2,$3,$4) in a){ ##Checking condition if $1,$2,$3,$4 of Input_file1 are present in array a if yes then do following. print $0,"OK" ##Printing current line with OFS and OK string here now. } ' Input_file2 Input_file1 ##Mentioning Input_file name(s) Input_file2 and Input_file1 here.
vlookup function between 2 files and append matches at EOL
Need to vlookup from two different files having multiple entries: cat file1.csv aaaaaaa;24/09/2018;06/09/2018;1;89876768 bbbbbbb;15/09/2018;03/09/2018;2;76958489 ccccccc;10/09/2018;28/08/2018;3;57848472 ddddddd;22/09/2018;08/09/2018;4;17929730 eeeeeee;19/09/2018;30/08/2018;5;18393770 cat file2.csv 20180901;abc;1 20180901;sdf;2 20180904;jhh;2 20180905;skf;3 20180911;asf;2 20180923;ghf;4 20180925;asb;4 20180918;mnj;3 In addition for file1.csv, the fourth column is the identifier of the third colunm into file2.csv. Output required is: aaaaaaa;24/09/2018;06/09/18;1;89876768;20180901 bbbbbbb;15/09/2018;03/09/18;2;76958489;20180901;20180904;20180911 ccccccc;10/09/2018;28/08/18;3;57848472;20180905;20180918 ddddddd;22/09/2018;08/09/18;4;17929730;20180923;20180925 eeeeeee;19/09/2018;30/08/18;5;18393770;unknown
Could you please try following. awk 'BEGIN{FS=OFS=";"}FNR==NR{a[$NF]=a[$NF]?a[$NF] OFS $1:$1;next} {print ($4 in a)?$0 OFS a[$4]:$0 OFS "unknown"}' file2.csv file1.csv Output will be as follows. aaaaaaa;24/09/2018;06/09/2018;1;89876768;20180901 bbbbbbb;15/09/2018;03/09/2018;2;76958489;20180901;20180904;20180911 ccccccc;10/09/2018;28/08/2018;3;57848472;20180905;20180918 ddddddd;22/09/2018;08/09/2018;4;17929730;20180923;20180925 eeeeeee;19/09/2018;30/08/2018;5;18393770;unknown Explanation of code: awk ' BEGIN{ ##Starting BEGIN section for awk here. FS=OFS=";" ##Setting values for FS and OFS as semi colon. } ##Closing block for BEGIN section here. FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named file2.csv is being read. a[$NF]=a[$NF]?a[$NF] OFS $1:$1 ##Creating an array named a whose index is $NF and value is $1 and concatenating its own value with same index. next ##next will skip all further statements from here. } ##Closing block for FNR==NR condition here. { print ($4 in a)?$0 OFS a[$4]:$0 OFS "unknown" ##These statements will execute when 2nd Input_file is being read and printing value of $0 with condition if $4 is present in array a then concatenate its value with current line else concatenate unknown with it. }' file2.csv file1.csv ##Mentioning Input_file names here.