change the date of month in number format in awk [duplicate] - gawk

This question already has answers here:
Calculate date difference between $2,$3 from file in awk
(2 answers)
Closed 9 years ago.
File 1
P1,06/Jul/2013,09/Jul/2013
P2,06/Jul/2013,10/Jul/2013
P3,06/Jul/2013,15/Jul/2013
Ouput i want like this:
P1,06/07/2013,09/07/2013,3days
P2,06/07/2013,10/07/2013,4days
P3,06/07/2013,15/07/2013,9days
some one help is need for this please

This answer is heavily dependent on BSD date formatting available on a mac
#!/usr/bin/awk -f
BEGIN { FS=" " }
{
split( $0, arr, "," )
ts1 = date2ts( arr[ 2 ] )
ts2 = date2ts( arr[ 3 ] )
days = (ts2-ts1)/86400
date1 = ts2date( ts1 )
date2 = ts2date( ts2 )
printf( "%s,%s,%s,%d days\n", arr[ 1 ], date1, date2, days )
}
function runCmd( cmd ) {
cmd | getline output
close( cmd )
gsub( "\"", "", output )
return output
}
function date2ts( date ) {
return runCmd( sprintf( \
"date -j -f\\\"%%d/%%b/%%Y\\\" \\\"%s\\\" +\\\"%%s\\\"", date ) )
}
function ts2date( ts ) {
return runCmd( sprintf( \
"date -j -f\\\"%%s\\\" \\\"%s\\\" +\\\"%%d/%%m/%%Y\\\"", ts ) )
}
I get the following output:
P1,06/07/2013,09/07/2013,3days
P2,06/07/2013,10/07/2013,4days
P3,06/07/2013,15/07/2013,9days

Try this another awk solution.
#!/usr/bin/awk
BEGIN {
FS=",";
}
{
epoch_date_format($2)|getline d1
epoch_date_format($3)|getline d2
days=(d2-d1)/3600/24;
month_format(d1)|getline sd1
month_format(d2)|getline sd2
print $1","sd1","sd2","days"days"
}
function epoch_date_format( string ) {
split(string,array,"/");
return "date -d\""array[1]"-"array[2]"-"array[3]"\" +%s";
}
function month_format( epoch ) {
return "date -d#"epoch" +%d/%m/%Y"
}
Output:
P1,06/07/2013,09/07/2013,3days
P2,06/07/2013,10/07/2013,4days
P3,06/07/2013,15/07/2013,9days

With GNU awk:
awk -F'[,/]' -v OFS=',' '
function date2secs(fld) {
return mktime($(fld+1)" "(match("JanFebMarAprMayJunJulAugSepOctNovDec",$fld)+2)/3" "$(fld-1)" 0 0 0")
}
{
start= date2secs(3)
end = date2secs(6)
diff = int((end-start)/(60*60*24))
print $1,strftime("%d/%m/%Y",start),strftime("%d/%m/%Y",end),diff"days"
}
' file
P1,06/07/2013,09/07/2013,3days
P2,06/07/2013,10/07/2013,4days
P3,06/07/2013,15/07/2013,9days

Related

Split big file in multiple files based on column

I have a file with a semicolon as delimiter and headers. I would like to split that file based on the date column. The file has dates in ascending order.
The name of the output file should be as follows: 01_XX_YYMMDD_YYMMDD.txt
e.g. 01_XX_210920_210920.txt
Here's an example file:
--INPUT
K;V1.00;;;;;;
P;01.01.2021 00:01;16;EXA;31;TESTA;95.9;XXXX
P;01.01.2021 00:02;33;EXA;31;TESTA;95.9;XYXY
P;02.01.2021 00:54;16;EXB;33;TESTB;94.0;DWAD
P;02.01.2021 00:56;11;EXB;33;TESTB;94.0;DADA
P;03.01.2021 01:00;16;EXC;32;TESTC;94.6;WEWEQ
P;03.01.2021 01:22;16;EXC;32;TESTC;94.6;QEQR
P;04.01.2021 02:39;16;EXD;33;TESTD;94.3;DFAG
The output should be as follows, while taking the previous file as example
--OUTPUT FILES
FILE1: 01_XX_210101_210101.txt
P;01.01.2021 00:01;16;EXA;31;TESTA;95.9;XXXX
P;01.01.2021 00:02;33;EXA;31;TESTA;95.9;XYXY
FILE2: 01_XX_210102_210102.txt
P;02.01.2021 00:54;16;EXB;33;TESTB;94.0;DWAD
P;02.01.2021 00:56;11;EXB;33;TESTB;94.0;DADA
FILE3: 01_XX_210103_210103.txt
P;03.01.2021 01:00;16;EXC;32;TESTC;94.6;WEWEQ
P;03.01.2021 01:22;16;EXC;32;TESTC;94.6;QEQR
FILE4: 01_XX_210104_210104.txt
P;04.01.2021 02:39;16;EXD;33;TESTD;94.3;DFAG
I tried AWK but no success because of the timestamp my file has…
Thank you!
x
UPDATE: Solution
awk -F';' '
NR > 1 {
dt = substr($2,9,2) substr($2,4,2) substr($2,1,2)
print > ("01_LPR_" dt "_" dt ".txt")
}' input
You may try this awk:
awk -F';' '
NR > 1 {
dt = substr($2,9,2) substr($2,4,2) substr($2,1,2)
print > ("01_XX_" dt "_" dt ".txt")
}' input
For the updated requirements in comments below:
awk -F';' '
NR == 1 {
hdr = $0
next
}
{
dt = substr($2,9,2) substr($2,4,2) substr($2,1,2)
}
dt != pdt {
if (pdt) {
print "END" > fn
close(fn)
}
fn = "01_XX_" dt "_" dt ".txt"
print hdr > fn
}
{
print > fn
pdt = dt
}
END {
print "END" > fn
close(fn)
}' input
With your shown samples, please try following awk code, this is using close function which will take care of avoiding too many opened files error too.
awk -F'\\.| |;' '
{
outputFile="01_XX_"substr($4,3)$3 $2"_"substr($4,3)$3 $2".txt"
}
FNR>1{
if(prev!=outputFile){
close(prev)
}
print > (outputFile)
prev=outputFile
}
' Input_file
Try the following script:
while read; do
day=${REPLY:2:2}
month=${REPLY:5:2}
year=${REPLY:10:2}
echo "$REPLY" >> 01_XX_${year}${month}${day}_${year}${month}${day}.txt
done<inputfile.txt
or the same in "oneline":
while read do echo "$REPLY" >> 01_XX_${REPLY:10:2}${REPLY:5:2}${REPLY:2:2}_${REPLY:10:2}${REPLY:5:2}${REPLY:2:2}.txt; done<inputfile.txt

Writing from lines into columns based on third column

I have files that look like this -- already sorted by year (inside years sorted by id, which appears to be equivalent to strict sorting by id, but this may not always apply).
ID,COU,YEA, VOT
1,USA,2000,1
2,USA,2000,0
3,USA,2001,1
4,USA,2003,2
5,USA,2003,0
I would like to rewrite them like this (ids for year N after 1999 in column 2N-1, corresponding votes in column 2N):
2000 IDS, VOTE, 2001 IDS, VOTE, 2002 IDS, VOTE, 2003 IDS, VOTE
1,1,3,1, , ,4,2
2,0, , , , ,5,0
I don't know how to do it. My basic thinking with awk was:
if $3 == 2000, { print $1, $4 }
if $3 == 2001, { print " "," ", $1, $4 } etc
But there are two problems:
this way the columns for years other than 2000 would start with a lot of empty lines
I have found no intelligent way to generalise the print command, so I would have to write 20 if-statements
The only working idea I have is, to create 20 unneeded files and glue them with paste which I have never used, but which seems suitable, according to man on my system.
The key is to use multidimensional arrays
BEGIN {FS = ","}
NR == 2 {minYear = maxYear = $3}
NR > 1 {
year=$3
count[year]++
id[year, count[year]] = $1
vote[year, count[year]] = $4
if (year < minYear) minYear = year
if (year > maxYear) maxYear = year
if (count[year] > maxCount) maxCount = count[year]
}
END {
sep = ""
for (y=minYear; y<=maxYear; y++) {
printf "%s%d ID,VOTE", sep, y
sep = ","
}
print ""
for (i=1; i<=maxCount; i++) {
sep = ""
for (y=minYear; y<=maxYear; y++) {
printf "%s%s,%s", sep, id[y, i], vote[y, i]
sep = ","
}
print ""
}
}
Then,
$ awk -f transpose.awk input_file
2000 ID,VOTE,2001 ID,VOTE,2002 ID,VOTE,2003 ID,VOTE
1,1,3,1,,,4,2
2,0,,,,,5,0
If you really want hte spaces in the output, change the last printf to
printf "%s%s,%s", sep,
((y, i) in id ? id[y, i] : " "),
((y, i) in vote ? vote[y, i] : " ")
This is functionally the same as #Glenn's and no better than it in any way so his should remain the accepted answer but I came up with it before looking at his and thought it might be useful to post it anyway to show some small alternatives in style and implementation details:
$ cat tst.awk
BEGIN { FS=OFS="," }
NR == 1 { next }
{
id = $1
year = $3
votes = $4
if ( ++numYears == 1 ) {
begYear = year
}
endYear = year
yearIds[year,++numIds[year]] = id
yearVotes[year,numIds[year]] = votes
maxIds = (numIds[year] > maxIds ? numIds[year] : maxIds)
}
END {
for (year=begYear; year<=endYear; year++) {
printf "%s IDS%sVOTE%s", year, OFS, (year<endYear ? OFS : ORS)
}
for (idNr=1; idNr<=maxIds; idNr++) {
for (year=begYear; year<=endYear; year++) {
id = votes = " "
if ( (year,idNr) in yearIds ) {
id = yearIds[year,idNr]
votes = yearVotes[year,idNr]
}
printf "%s%s%s%s", id, OFS, votes, (year<endYear ? OFS : ORS)
}
}
}
$ awk -f tst.awk file
2000 IDS,VOTE,2001 IDS,VOTE,2002 IDS,VOTE,2003 IDS,VOTE
1,1,3,1, , ,4,2
2,0, , , , ,5,0
With respect and permission of glenn jackman I am taking his suggested code the only thing I am trying to add here is get maximum and minimum year in awk's variable itself and NOT calculating it inside main block of awk program, since OP confirmed that Input_file is sorted by year. Answers by Glenn and Ed sir are awesome, just thought to add a variant here.
BTW we could use awk in stead of using tail and heads in variables too here :)
awk -v max=$(tail -1 Input_file | cut -d, -f3) -v min=$(head -2 Input_file | tail -1 | cut -d, -f3) '
BEGIN { FS = "," }
NR > 1 {
year=$3
count[year]++
id[year, count[year]] = $1
vote[year, count[year]] = $4
if (count[year] > maxCount) maxCount = count[year]
}
END {
sep = ""
for (y=min; y<=max; y++) {
printf "%s%d ID,VOTE", sep, y
sep = ","
}
print ""
for (i=1; i<=maxCount; i++) {
sep = ""
for (y=min; y<=max; y++) {
printf "%s%s,%s", sep, id[y, i], vote[y, i]
sep = ","
}
print ""
}
}' Input_file

awk count selective combinations only:

Would like to read and count the field value == "TRUE" only from 3rd field to 5th field.
Input.txt
Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,FALSE,TRUE,ab1234
ab123,Name2,FALSE,FALSE,TRUE,ab1234
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,ab1234
ab123,Name3,FALSE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,FALSE,ab1234
While reading the headers from 3rd field to 5th field , i,e A, B, C want to generate unique combinations and permutations like A,B,C,AB,AC,AB,ABC only.
Note: AA, BB, CC, BA etc excluded
If the "TRUE" is considered for "AB" combination count then it should not be considered for "A" conut & "B" count again to avoid duplicate ..
Example#1
Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,TRUE,ab1234
Op#1
Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,,,,1
Example#2
Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,FALSE,ab1234
Op#2
Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,1,,,
Example#3
Locationx,Desc,A,B,C,Locationy
ab123,Name1,FALSE,TRUE,FALSE,ab1234
Op#3
Desc,A,B,C,AB,AC,BC,ABC
Name1,,1,,,,,
Desired Output:
Desc,A,B,C,AB,AC,BC,ABC
Name1,,,,1,,,2
Name2,,,1,,1,,1
Name3,1,,,2,,,
Actual File is like below :
Input.txt
Locationx,Desc,INCOMING,OUTGOING,SMS,RECHARGE,DEBIT,DATA,Locationy
ab123,Name1,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,ab1234
ab123,Name2,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,ab1234
ab123,Name3,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,ab1234
Have tried lot , nothing is materialised , any suggestions please !!!
Edit: Desired Output from Actual Input:
Desc,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DEBIT-DATA,INCOMING-SMS-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT,SMS-RECHARGE-DEBIT-DATA,OUTGOING-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DATA,OUTGOING-SMS-RECHARGE-DEBIT,INCOMING-RECHARGE-DEBIT-DATA,INCOMING-SMS-DEBIT-DATA,INCOMING-SMS-RECHARGE-DATA,INCOMING-SMS-RECHARGE-DEBIT,INCOMING-OUTGOING-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT,INCOMING-OUTGOING-SMS-DATA,INCOMING-OUTGOING-SMS-DEBIT,INCOMING-OUTGOING-SMS-RECHARGE,RECHARGE-DEBIT-DATA,SMS-DEBIT-DATA,SMS-RECHARGE-DATA,SMS-RECHARGE-DEBIT,OUTGOING-RECHARGE-DATA,OUTGOING-RECHARGE-DEBIT,OUTGOING-SMS-DATA,OUTGOING-SMS-DEBIT,OUTGOING-SMS-RECHARGE,INCOMING-DEBIT-DATA,INCOMING-RECHARGE-DATA,INCOMING-RECHARGE-DEBIT,INCOMING-SMS-DATA,INCOMING-SMS-DEBIT,INCOMING-SMS-RECHARGE,INCOMING-OUTGOING-DATA,INCOMING-OUTGOING-DEBIT,INCOMING-OUTGOING-RECHARGE,INCOMING-OUTGOING-SMS,DEBIT-DATA,RECHARGE-DATA,RECHARGE-DEBIT,SMS-DATA,SMS-DEBIT,SMS-RECHARGE,OUTGOING-DATA,OUTGOING-DEBIT,OUTGOING-RECHARGE,OUTGOING-SMS,INCOMING-DATA,INCOMING-DEBIT,INCOMING-RECHARGE,INCOMING-SMS,INCOMING-OUTGOING,DATA,DEBIT,RECHARGE,SMS,OUTGOING,INCOMING
Name1,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,,,,,,,,,,
Name2,,,,1,1,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Name3,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,1,,,
Don't have Perl and Python access !!!
I have written a perl script that does this for you. As you can see from the size and comments, it is really simple to get this done.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use Algorithm::Combinatorics qw(combinations);
## change the file to the path where your file exists
open my $fh, '<', 'file';
my (%data, #new_labels);
## capture the header line in an array
my #header = split /,/, <$fh>;
## backup the header
my #fields = #header;
## remove first, second and last columns
#header = splice #header, 2, -1;
## generate unique combinations
for my $iter (1 .. +#header) {
my $combination = combinations(\#header, $iter);
while (my $pair = $combination->next) {
push #new_labels, "#$pair";
}
}
## iterate through rest of the file
while(my $line = <$fh>) {
my #line = split /,/, $line;
## identify combined labels that are true
my #is_true = map { $fields[$_] } grep { $line[$_] eq "TRUE" } 0 .. $#line;
## increment counter in hash map keyed at description and then new labels
++$data{$line[1]}{$_} for map { s/ /-/g; $_ } "#is_true";
}
## print the new header
print join ( ",", "Desc", map {s/ /-/g; $_} reverse #new_labels ) . "\n";
## print the description and counter values
for my $desc (sort keys %data){
print join ( ",", $desc, ( map { $data{$desc}{$_} //= "" } reverse #new_labels ) ) . "\n";
}
Output:
Desc,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DEBIT-DATA,INCOMING-SMS-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT-DATA,INCOMING-OUTGOING-SMS-DEBIT-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DATA,INCOMING-OUTGOING-SMS-RECHARGE-DEBIT,SMS-RECHARGE-DEBIT-DATA,OUTGOING-RECHARGE-DEBIT-DATA,OUTGOING-SMS-DEBIT-DATA,OUTGOING-SMS-RECHARGE-DATA,OUTGOING-SMS-RECHARGE-DEBIT,INCOMING-RECHARGE-DEBIT-DATA,INCOMING-SMS-DEBIT-DATA,INCOMING-SMS-RECHARGE-DATA,INCOMING-SMS-RECHARGE-DEBIT,INCOMING-OUTGOING-DEBIT-DATA,INCOMING-OUTGOING-RECHARGE-DATA,INCOMING-OUTGOING-RECHARGE-DEBIT,INCOMING-OUTGOING-SMS-DATA,INCOMING-OUTGOING-SMS-DEBIT,INCOMING-OUTGOING-SMS-RECHARGE,RECHARGE-DEBIT-DATA,SMS-DEBIT-DATA,SMS-RECHARGE-DATA,SMS-RECHARGE-DEBIT,OUTGOING-DEBIT-DATA,OUTGOING-RECHARGE-DATA,OUTGOING-RECHARGE-DEBIT,OUTGOING-SMS-DATA,OUTGOING-SMS-DEBIT,OUTGOING-SMS-RECHARGE,INCOMING-DEBIT-DATA,INCOMING-RECHARGE-DATA,INCOMING-RECHARGE-DEBIT,INCOMING-SMS-DATA,INCOMING-SMS-DEBIT,INCOMING-SMS-RECHARGE,INCOMING-OUTGOING-DATA,INCOMING-OUTGOING-DEBIT,INCOMING-OUTGOING-RECHARGE,INCOMING-OUTGOING-SMS,DEBIT-DATA,RECHARGE-DATA,RECHARGE-DEBIT,SMS-DATA,SMS-DEBIT,SMS-RECHARGE,OUTGOING-DATA,OUTGOING-DEBIT,OUTGOING-RECHARGE,OUTGOING-SMS,INCOMING-DATA,INCOMING-DEBIT,INCOMING-RECHARGE,INCOMING-SMS,INCOMING-OUTGOING,DATA,DEBIT,RECHARGE,SMS,OUTGOING,INCOMING
Name1,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,,,,,,,,,,
Name2,,,,1,,1,,,,,,,,,,,,,,,,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Name3,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2,,,,,,,,,,,,,,,,,,,1,,,
Note: Please revisit your expected output. It has few mistakes in it as you can see from the output generated from the script above.
Here is an attempt at solving this using awk:
Content of script.awk
BEGIN { FS = OFS = "," }
function combinations(flds, itr, i, pre) {
for (i=++cnt; i<=numRecs; i++) {
++n
sep = ""
for (pre=1; pre<=itr; pre++) {
newRecs[n] = newRecs[n] sep (sprintf ("%s", flds[pre]));
sep = "-"
}
newRecs[n] = newRecs[n] sep (sprintf ("%s", flds[i])) ;
}
}
NR==1 {
for (fld=3; fld<NF; fld++) {
recs[++numRecs] = $fld
}
for (iter=0; iter<numRecs; iter++) {
combinations(recs, iter)
}
next
}
!seen[$2]++ { desc[++d] = $2 }
{
y = 0;
var = sep = ""
for (idx=3; idx<NF; idx++) {
if ($idx == "TRUE") {
is_true[++y] = recs[idx-2]
}
}
for (z=1; z<=y; z++) {
var = var sep sprintf ("%s", is_true[z])
sep = "-"
}
data[$2,var]++;
}
END{
printf "%s," , "Desc"
for (k=1; k<=n; k++) {
printf "%s%s", newRecs[k],(k==n?RS:FS)
}
for (name=1; name<=d; name++) {
printf "%s,", desc[name];
for (nR=1; nR<=n; nR++) {
printf "%s%s", (data[desc[name],newRecs[nR]]?data[desc[name],newRecs[nR]]:""), (nR==n?RS:FS)
}
}
}
Sample file
Locationx,Desc,A,B,C,Locationy
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,FALSE,TRUE,ab1234
ab123,Name2,FALSE,FALSE,TRUE,ab1234
ab123,Name1,TRUE,TRUE,TRUE,ab1234
ab123,Name2,TRUE,TRUE,TRUE,ab1234
ab123,Name3,FALSE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,FALSE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name3,TRUE,TRUE,FALSE,ab1234
ab123,Name1,TRUE,TRUE,FALSE,ab1234
Execution:
$ awk -f script.awk file
Desc,A,B,C,A-B,A-C,A-B-C
Name1,,,,1,,2
Name2,,,1,,1,1
Name3,1,,,2,,
Now, there is pretty evident bug in the combination function. It does not recurse to print all combinations. For eg: for A B C D it will print
A
B
C
AB
AC
ABC
but not BC

awk system not setting variables properly

I am having a issue in having the output of the grep (used in system() in nawk ) assigned to a variable .
nawk '{
CITIZEN_COUNTRY_NAME = "INDIA"
CITIZENSHIP_CODE=system("grep "CITIZEN_COUNTRY_NAME " /tmp/OFAC/country_codes.config | cut -d # -f1")
}'/tmp/*****
The value IND is displayed in the console but when i give a printf the value of citizenshipcode is 0 - Can you pls help me here
printf("Country Tags|%s|%s\n", CITIZEN_COUNTRY_NAME ,CITIZENSHIP_CODE)
Contents of country_codes.config file
IND#INDIA
IND#INDIB
CAN#CANADA
system returns the exit value of the called command, but the output of the command is not returned to awk (or nawk). To get the output, you want to use getline directly. For example, you might re-write your script:
awk ' {
file = "/tmp/OFAC/country_codes.config";
CITIZEN_COUNTRY_NAME = "INDIA";
FS = "#";
while( getline < file ) {
if( $0 ~ CITIZEN_COUNTRY_NAME ) {
CITIZENSHIP_CODE = $1;
}
}
close( file );
}'
Pre-load the config file with awk:
nawk '
NR == FNR {
split($0, x, "#")
country_code[x[2]] = x[1]
next
}
{
CITIZEN_COUNTRY_NAME = "INDIA"
if (CITIZEN_COUNTRY_NAME in country_code) {
value = country_code[CITIZEN_COUNTRY_NAME]
} else {
value = "null"
}
print "found " value " for country name " CITIZEN_COUNTRY_NAME
}
' country_codes.config filename

awk Merge two files based on common field and print similarities and differences

I have two files I would like to merge into a third but I need to see both when they share a common field and where they differ.Since there are minor differences in other fields, I cannot use a diff tool and I thought this could be done with awk.
File 1:
aWonderfulMachine 1 mlqsjflk
AnotherWonderfulMachine 2 mlksjf
YetAnother WonderfulMachine 3 sdg
TrashWeWon'tBuy 4 jhfgjh
MoreTrash 5 qsfqf
MiscelleneousStuff 6 qfsdf
MoreMiscelleneousStuff 7 qsfwsf
File2:
aWonderfulMachine 22 dfhdhg
aWonderfulMachine 23 dfhh
aWonderfulMachine 24 qdgfqf
AnotherWonderfulMachine 25 qsfsq
AnotherWonderfulMachine 26 qfwdsf
MoreDifferentStuff 27 qsfsdf
StrangeStuffBought 28 qsfsdf
Desired output:
aWonderfulMachine 1 mlqsjflk aWonderfulMachine 22 dfhdhg
aWonderfulMachine 23 dfhdhg
aWonderfulMachine 24 dfhh
AnotherWonderfulMachine 2 mlksjf AnotherWonderfulMachine 25 qfwdsf
AnotherWonderfulMachine 26 qfwdsf
File1
YetAnother WonderfulMachine 3 sdg
TrashWeWon'tBuy 4 jhfgjh
MoreTrash 5 qsfqf
MiscelleneousStuff 6 qfsdf
MoreMiscelleneousStuff 7 qsfwsf
File2
MoreDifferentStuff 27 qsfsdf
StrangeStuffBought 28 qsfsdf
I have tried a few awks scripts here and there, but they are either based on two fields only, and I don't know how to modify the output, or they delete the duplicates based on two fields only, etc (I am new to this and awk syntax is tough).
Thank you much in advance for your help.
You can come very close using these three commands:
join <(sort file1) <(sort file2)
join -v 1 <(sort file1) <(sort file2)
join -v 2 <(sort file1) <(sort file2)
This assumes a shell, such as Bash, that supports process substitution (<()). If you're using a shell that doesn't, the files would need to be pre-sorted.
To do this in AWK:
#!/usr/bin/awk -f
BEGIN { FS="\t"; flag=1; file1=ARGV[1]; file2=ARGV[2] }
FNR == NR { lines1[$1] = $0; count1[$1]++; next } # process the first file
{ # process the second file and do output
lines2[$1] = $0;
count2[$1]++;
if ($1 != prev) { flag = 1 };
if (count1[$1]) {
if (flag) printf "%s ", lines1[$1];
else printf "\t\t\t\t\t"
flag = 0;
printf "\t%s\n", $0
}
prev = $1
}
END { # output lines that are unique to one file or the other
print "File 1: " file1
for (i in lines1) if (! (i in lines2)) print lines1[i]
print "File 2: " file2
for (i in lines2) if (! (i in lines1)) print lines2[i]
}
To run it:
$ ./script.awk file1 file2
The lines won't be output in the same order that they appear in the input files. The second input file (file2) needs to be sorted since the script assumes that similar lines are adjacent. You will probably want to adjust the tabs or other spacing in the script. I haven't done much in that regard.
One way to do it (albeit with hardcoded file names):
BEGIN {
FS="\t";
readfile(ARGV[1], s1);
readfile(ARGV[2], s2);
ARGV[1] = ARGV[2] = "/dev/null"
}
END{
for (k in s1) {
if ( s2[k] ) printpair(k,s1,s2);
}
print "file1:"
for (k in s1) {
if ( !s2[k] ) print s1[k];
}
print "file2:"
for (k in s2) {
if ( !s1[k] ) print s2[k];
}
}
function readfile(fname, sary) {
while ( getline <fname ) {
key = $1;
if (sary[key]) {
sary[key] = sary[key] "\n" $0;
} else {
sary[key] = $0;
};
}
close(fname);
}
function printpair(key, s1, s2) {
n1 = split(s1[key],l1,"\n");
n2 = split(s2[key],l2,"\n");
for (i=1; i<=max(n1,n2); i++){
if (i==1) {
b = l1[1];
gsub("."," ",b);
}
if (i<=n1) { f1 = l1[i] } else { f1 = b };
if (i<=n2) { f2 = l2[i] } else { f2 = b };
printf("%s\t%s\n",f1,f2);
}
}
function max(x,y){ z = x; if (y>x) z = y; return z; }
Not particularly elegant, but it handles many-to-many cases.