Here are the input:
files1.csv
21|AAAAA|1023
21|BBBBB|1203
21|CCCCC|2533
22|DDDDD|1294
22|EEEEE|1249
22|FFFFF|4129
22A|GGGGG|4121
22A|HHHHH|1284
31B|IIIII|5403
31B|JJJJJ|1249
file2.csv
21|A800
22|B900
22A|C1000
31B|D1000
expect output:
files3.csv
21|A800|AAAAA|1023
21|A800|BBBBB|1203
21|A800|CCCCC|2533
22|B900|EEEEE|1249
22|B900|FFFFF|4129
22A|C1000|GGGGG|4121
22A|C1000|HHHHH|1284
31B|D1000|IIIII|5403
31B|D1000|JJJJJ|1249
currently tried using join,
join -a1 -t '|' -1 1 -2 1 -o 1.1,2.2,1.2,1.3 file1.csv file2.csv > file3.csv
But it found that some rows missed matching, so i turn my concept to use most likely vlookup functionality for this two files. Please help.
Thanks all
Could you please try following with awk, written and tested with GNU awk with shown samples.
awk '
BEGIN{
FS=OFS="|"
}
FNR==NR{
arr[$1]=$2
next
}
($1 in arr){
$1=($1 OFS arr[$1])
}
1
' file2.csv file1.csv
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here of this program.
FS=OFS="|" ##Setting | as field separator and output field separator.
}
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when file2.csv is being read.
arr[$1]=$2 ##Creating arr with index of 1st field and value of 2nd field.
next ##next will skip all further statements from here.
}
($1 in arr){ ##checking condition if $1 is present in arr then do following.
$1=($1 OFS arr[$1]) ##Saving current $1 OFS and value of arr with index of $1 in $1.
}
1 ##1 will print the current line.
' file2.csv file1.csv ##Mentioning Input_file names here.
I tested the join command you provided and I think it produces the intended output on my machine (FreeBSD 12.2-RELEASE):
21|A800|AAAAA|1023
21|A800|BBBBB|1203
21|A800|CCCCC|2533
22|B900|DDDDD|1294
22|B900|EEEEE|1249
22|B900|FFFFF|4129
22A|C1000|GGGGG|4121
22A|C1000|HHHHH|1284
31B|D1000|IIIII|5403
31B|D1000|JJJJJ|1249
It's possible that you need to sort both files first on the columns (or in this case where you join the first columns the whole line should also work) that you intend to join, i.e. join -a1 -t '|' -1 1 -2 1 -o 1.1,2.2,1.2,1.3 <(sort file1.csv) <(sort file2.csv) > file3.csv
I have two CSV files in the form of
file1
A,44
A,21
B,65
C,79
file2
A,7
B,4
C,11
I used awk as
awk -F, 'NR==FNR{a[$1]=$0;next} ($1 in a){print a[$1]","$2 }' file1.csv file2.csv
producing
A,44,7
A,21,7
B,65,4
C,79,11
a[$1] prints the entire line from file1. How can I omit the first columns in both files (the first column is only used to match the second columns) to produce:
44,7
21,7
65,4
79,11
In other words, how can I pass the columns from the first file to the print block, as $2 does for the second file?
Could you please try following, tested and written on shown samples only.
awk 'BEGIN{FS=OFS=","} FNR==NR{a[$1]=$2;next} ($1 in a){print $2,a[$1]}' file2 file1
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS="," ##Setting field and output field separator as comma here.
}
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file2 is being read.
a[$1]=$2 ##Creating array a with index $1 and value is $2 from current line.
next ##next will skip all further statement from here.
}
($1 in a){ ##Statements from here will be executed when file1 is being read and it's checking if $1 is present in array a then do following.
print $2,a[$1] ##Printing 2nd field and value of array a with index $1 here.
}
' file2 file1 ##Mentioning Input_file names here.
Output will be as follows for shown samples.
44,7
21,7
65,4
79,11
2nd solution: More Generic solution, where considering that your both Input_files could have duplicates in that case it will print 1st value of A in Input_file1 to first value of Input_file2 and so on.
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
a[$1]
b[$1,++c[$1]]=$2
next
}
($1 in a){
print $2,b[$1,++d[$1]]
}
' file2 file1
You can join them using the join command and chose which fields you want to have in the output:
kent$ join -t',' -o 1.2,2.2 file1 file2
44,7
21,7
65,4
79,11
I'm Trying to compare two Files in CSV format
First File is dynamic , daily the columns will be added ,
on second File it has only 4 columns ( static )
comparing First 4 columns on File1 to 4 columns on File2 and printing all columns from File1 which matches from File2
Ex :
File1
AIXTSM1,VHOST,10.199.114.72,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP
AIXTSM1,VHOST,ADMET007,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP
AIXTSM2,VHOST,ADMET014,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP
AIXTSM1,VHOST,AGGREGATE,DAILY_2200_VM_SDC-CTL-PROD5,COMP,COMP,COMP
AIXTSM1,PHOST,APLEE01,DAILY_2000_SU_W3,COMP,COMP,COMP
AIXTSM1,PHOST,APYRK02,DAILY_2000_SU_W3,COMP,COMP,COMP
AIXTSM2,PHOST,APYRK04,DAILY_1800_V7K,COMP,COMP,COMP
AIXTSM1,VHOST,ARCLIC01,DAILY_2200_VM_SDC-CTL-PROD5,COMP,COMP,COMP
AIXTSM2,PHOST,ARIELN,DAILY_1800_V7K,COMP,COMP,COMP
AIXTSM2,VHOST,ASMET005,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP
AIXTSM2,PHOST,ASMET014,WIN_INCRE_2000,COMP,COMP,COMP
AIXTSM1,VHOST,ASMET038,DAILY_1800_VM_SDC-CTL-PROD2,COMP,COMP,COMP
AIXTSM2,VHOST,ASMET042,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP
AIXTSM1,VHOST,ASMET044,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP
AIXTSM2,VHOST,ASMET046,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP
AIXTSM2,VHOST,ASMET068,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP
AIXTSM2,VHOST,ASMET069,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP
AIXTSM2,VHOST,ASMET070,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP
AIXTSM2,VHOST,ASMET071,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP
AIXTSM2,VHOST,ASMET072,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP
AIXTSM2,VHOST,ASMET073,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP
AIXTSM1,PHOST,ASMET074,DAILY_INCR_1900,COMP,COMP,COMP
AIXTSM1,VHOST,ASMET084-T,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP
File2
AIXTSM1,VHOST,10.199.114.72,DAILY_1800_VM_SDC-CTL-PROD
AIXTSM1,VHOST,ADMET007,DAILY_1800_VM_SDC-CTL-PROD3
AIXTSM2,VHOST,ADMET014,DAILY_1900_VM_UDC-CTL-PROD
AIXTSM1,VHOST,AGGREGATE,DAILY_2200_VM_SDC-CTL-PROD5
Result
AIXTSM1,VHOST,10.199.114.72,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP,OK
AIXTSM1,VHOST,ADMET007,DAILY_1800_VM_SDC-CTL-PROD3,COMP,COMP,COMP,OK
AIXTSM2,VHOST,ADMET014,DAILY_1900_VM_UDC-CTL-PROD,COMP,COMP,COMP,OK
AIXTSM1,VHOST,AGGREGATE,DAILY_2200_VM_SDC-CTL-PROD5,COMP,COMP,COMP,OK
Code
awk -F, 'NR==FNR{ arr[$2]=$1 $2 $3 $4; next } { print $0, (arr[$2]==$1 $2 $3 $4?"OK":"NOK") }' OFS=, File2 File1
But it matches only first line .
I believe your expected output's first line is typo? Since all 4 fields of Input_file2 are NOT coming in Input_file1. Could you please try following.
awk 'BEGIN{FS=OFS=","}FNR==NR{a[$1,$2,$3,$4];next} (($1,$2,$3,$4) in a){print $0, "OK"}' Input_file2 Input_file1
Explanation: Adding explanation for above code too here.
awk -F, ' ##Mentioning field separator as comma(,) here for all lines of Input_file(s).
BEGIN{ ##Starting BEGIN section of awk program here.
FS=OFS="," ##Setting field separator and output field separator as comma(,) here.
}
FNR==NR{ ##FNR==NR condition will be when 1st Input_file named Input_file2 is being read.
a[$1,$2,$3,$4] ##Creating an array named a whose index is $1,$2,$3,$4 fields of Input_file2 lines.
next ##next will skip all further statements from here.
} ##Closing first condition block now.
(($1,$2,$3,$4) in a){ ##Checking condition if $1,$2,$3,$4 of Input_file1 are present in array a if yes then do following.
print $0,"OK" ##Printing current line with OFS and OK string here now.
}
' Input_file2 Input_file1 ##Mentioning Input_file name(s) Input_file2 and Input_file1 here.