Remove commas and format dates

Remove commas and format dates - awk

I have a large file with entries such as:
<VAL>17,451.26</VAL>
<VAL>353.93</VAL>
<VAL>395.00</VAL>
<VAL>2,405.00</VAL>
<DATE>31 Jul 2013</DATE>
<DATE>31 Jul 2013</DATE>
<DATE>31 Dec 2014</DATE>
<DATE>21 Jun 2002</DATE>
<DATE>10 Jul 2002</DATE>
<MOD>PL</MOD>
<BATCH>13382</BATCH>
<TYPE>Invoice</TYPE>
<REF1>13541/13382</REF1>
<REF2>671042638320</REF2>
<NOTES>a-07 final elec</NOTES>
<SNAME>EDF ENERGY ( Electricity )</SNAME>
<VAL>55.22</VAL>
</CLT>
<CLT>
<CHD>MAT-01</CHD>
<OPN>U5U1</OPN>
<PERIOD>07 2013</PERIOD>
<DATE>13 Jun 2013</DATE>
<DATE>10 Jul 2002</DATE>
<DATE>10 Jul 2002</DATE>
<DATE>21 Aug 2007</DATE>
<DATE>10 Jul 2002</DATE>
<VAL>-4,122,322.03</VAL>
I need to remove the commas in the VAL fields and change the dates to YYYY-MM-DD (e.g. 2013-07-31) in the DATE fields.
Looking for a quick (efficient) way of doing this.
Thanks

This should get you started:
awk -F"[<>]" 'BEGIN {split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec",month," ");for (i=1;i<=12;i++) mdigit[month[i]]=i} /<VAL>/ {gsub(/\,/,"")} /<DATE>/ {split($3,a," ");$0=sprintf("<DATE>%s-%02d-%02d</DATE>",a[3],mdigit[a[2]],a[1])}1' file
<VAL>17451.26</VAL>
<VAL>353.93</VAL>
<VAL>395.00</VAL>
<VAL>2405.00</VAL>
<DATE>2013-07-31</DATE>
<DATE>2013-07-31</DATE>
<DATE>2014-12-31</DATE>
<DATE>2002-06-21</DATE>
<DATE>2002-07-10</DATE>
<MOD>PL</MOD>
<BATCH>13382</BATCH>
<TYPE>Invoice</TYPE>
<REF1>13541/13382</REF1>
<REF2>671042638320</REF2>
<NOTES>a-07 final elec</NOTES>
<SNAME>EDF ENERGY ( Electricity )</SNAME>
<VAL>55.22</VAL>
</CLT>
<CLT>
<CHD>MAT-01</CHD>
<OPN>U5U1</OPN>
<PERIOD>07 2013</PERIOD>
<DATE>2013-06-13</DATE>
<DATE>2002-07-10</DATE>
<DATE>2002-07-10</DATE>
<DATE>2007-08-21</DATE>
<DATE>2002-07-10</DATE>
<VAL>-4122322.03</VAL>

sed '# init month convertor in holding buffer
1{h;s/.*/Jan01Fev02Mar03Apr04May05Jun06Jul07Aug08Sep09Oct10Nov11Dec12/;x;}
# change Val
/^<VAL>/ s/,//g
# Change Date
/^<DATE>/ {
# change month
G
s/[[:space:]]\{1,\}\([A-Z][a-z][a-z]\)[[:space:]]\{1,\}\(.*\)\n.*\1\([0-9][0-9]\).*/-\3-\2/
# reformat order
s/>\(.*\)-\(.*\)-\(.*\)</>\3-\2-\1</
}' YourFile
posix sed with not extra sub shell for dae conversion
reformat date take 2 s///here but could be merged in 1 s/// a bit more unreadeable (already very attractive regex like this)
could easily add some security feature about source date like bad date format

Your input seems like XML. I'd use a proper XML handling tool, e.g. XML::XSH2, a wrapper around Perl's XML::LibXML:
open file.xml ;
for //VAL set . xsh:subst(., ',', '','g') ;
perl { use Time::Piece } ;
for my $d in //DATE {
$t = $d/text() ;
set $d/text() { Time::Piece->strptime($t, '%d %b %Y')->ymd } ;
}
save :b ;

This might work for you (GNU sed & bash):
sed -r '/^<VAL>/s/,//g;/^(<DATE>)(.*)(<\/DATE>)$/s//echo "\1"$(date -d "\2" +%F)"\3"/e' file
This removes all commas on a line starting <VAL> and for those lines that contain date tags, uses the date utility and the evaluate flag in the substitution command to rearrange the date to YYYY-MM-DD.
An alternative solution, using only seds commands:
sed -r '/^<VAL>/s/,//g;/^<DATE>/!b;s/$/\nJan01Feb02Mar03Apr04May05Jun06Jul07Aug08Sep09Oct10Nov11Dec12/;s/^(<DATE>)(..) (...) (....)(<\/DATE>\n).*\3(..)/\1\4-\6-\2\5/;P;d' file
Appends a lookup to the end of the date line and uses regexp to rearrange the output.

Related

How to print matching line, 3 lines after, and matching URL

I try to extract information from SMTP mails in text, i.e:
the date (ex: Wed, 9 Oct 2019 01:55:58 -0700 (PDT)
the sender (ex: from xxx.yyy.com (zzz:com. [111.222.333.444])
URLs present in the mail (ex: http://some.thing)
Here's an example of an input:
Delivered-To: SOME#ADDRESS.COM
Received: by X.X.X.X with SMTP id SOMEID;
Wed, 9 Oct 2019 01:55:58 -0700 (PDT)
X-Received: by X.X.X.X with SMTP id SOMEID;
Wed, 09 Oct 2019 01:55:58 -0700 (PDT)
Return-Path: <SOME#ADDRESS.COM>
Received: from SOME.URL.COM (SOME.OTHER.URL.COM. [X.X.X.X])
by SOME.THIRD.URL.COM with ESMTP id SOMEID
for <SOME#ADDRESS.COM>;
Wed, 09 Oct 2019 01:55:58 -0700 (PDT)
SOME_HTML
SOME_HTML
href="http://URL1"><img
SOME_HTML
src="http://URL2"
SOME_HTML
The example is deliberately truncated because the header is longer, but this is for the example
I've tried sed and awk and I managed to do some thing but not as I want.
SED:
sed -e 's/http/\nhttp/g' -n -e '/Received: from/{h;n;n;n;H;x;s/\n \+/;/;p}' a.txt
The first one is to have the URL on one lien but I didn't manage to use it after.
And anyway, it's not in order.
AWK:
BEGIN{
RS = "\n";
FS = "";
}
/Received: from/{
from = $0;
getline;
getline;
getline;
date = $0
}
/"\"https?://[^\"]+"/
{
FS="\"";
print $0;
}
END{
print date";"from;
};
This one works except for the URL. The rexgexp doesn't works while in a oneline yes.
I also tried to find a more elegant way for the date by using the value of NR+3, but it didn't work.
And display this in csv format:
date;sender;URL1;URL2;...
I would prefer pure sed or pure awk, because I think I can do it with grep, tail, sed and awk but as I want to learn, I prefer one or both of them :)

Well, the following longish sed script with comments inside:
sed -nE '
/Received: from /{
# hold mu line!
h
# ach, spagetti code, here we go again
: notdate
${
s/.*/ERROR: INVALID INPUT: DATE NOT FOUND/
p
q1
}
# the next line after the line ending with ; should be the date
/;$/!{
# so search for a line ending with ;
n
b notdate
}
# the next line is the date
n
# remove leading spaces
s/^[[:space:]]*//
# grab the Received: from line
G
# and save it for later
h
}
# headers end with an empty line
/^$/{
# loop over lines
: read_next_line
n
# flag with \x01<URL>\x02 all occurences of URLs
s/"(http[^"]+)"/\x01\1\x02/g
# we found at least one URL if there is \x01 in the pattern space
/\x01/{
# extract each occurence to the end of pattern space with a newline
: again
s/^([^\x01]*)\x01([^\x02]*)\x02(.*)$/\1\3\n\2/
t again
# remove everything in front of separator - the unparsed part of line
s/^[^\n]*\n//
# add URLs to hold space
H
}
# if this is the last line, we should finally print something!, and, exit
${
# grab the hold space
x
# replace the separator for a ;
s/\n/;/g
# print and exit successfully
p
q 0
}
# here we go again!
b read_next_line
}
'
for the following input:
Delivered-To: SOME#ADDRESS.COM
Received: by X.X.X.X with SMTP id SOMEID;
Wed, 9 Oct 2019 01:55:58 -0700 (PDT)
X-Received: by X.X.X.X with SMTP id SOMEID;
Wed, 09 Oct 2019 01:55:58 -0700 (PDT)
Return-Path: <SOME#ADDRESS.COM>
Received: from SOME.URL.COM (SOME.OTHER.URL.COM. [X.X.X.X])
by SOME.THIRD.URL.COM with ESMTP id SOMEID
for <SOME#ADDRESS.COM>;
Wed, 09 Oct 2019 01:55:58 -0700 (PDT)
SOME_HTML
SOME_HTML
href="http://URL1"><img
SOME_HTML
src="http://URL2"
SOME_HTML
SOMEHTML src="http://URL3" SOMEHTML src="http://URL4"
outputs:
Wed, 09 Oct 2019 01:55:58 -0700 (PDT);Received: from SOME.URL.COM (SOME.OTHER.URL.COM. [X.X.X.X]);http://URL1;http://URL2;http://URL3;http://URL4

Text processing to create a list of unique ID's

I Have a file with ID’s and Names applicable to them as below:
1234|abc|cde|fgh
5678|ijk|abc|lmn
9101|cde|fgh|klm
1213|klm|abc|cde
I need a file with only unique Names as a list.
Output File:
abc|sysdate
cde|sysdate
fgh|sysdate
ijk|sysdate
lmn|sysdate
klm|sysdate
Where sysdate is the current timestamp of processing.
Requesting you to help on this. Also requesting for a explanation for the code suggested.

What this code does :
awk -F\| '{ for(i=2; i <= NF; i++) a[$i] = a[$i] FS $1 }' input.csv
-F sets the delimiter to |, awk process line by line your file, creates a map named 'a', reads from column 2 until the end and fill the map using the current cell processed as key and the current cell + file separator + value in the first column as value.
When awk ends processing the first line, 'a' is :
a['abc'] = 'abc|1234'
a['cde'] = 'cde|1234'
a['fgh'] = 'fgh|1234'
This script does not print anything.
What you want is something like this :
awk -F'|' '{for(i=2;i<=NF;i++){if(seen[$i] != 1){print $i, strftime(); seen[$i]=1}}}' OFS='|' input.csv
-F sets the input delimiter to |, OFS does the same for the output delimiter.
For each value from the column 2 to the end of the line, we check if it has already been seen before. If not, we print the value and the time of process. Then we register the value in a map so we can avoid to process it again.
Output :
abc|Thu Oct 18 10:40:13 CEST 2018
cde|Thu Oct 18 10:40:13 CEST 2018
fgh|Thu Oct 18 10:40:13 CEST 2018
ijk|Thu Oct 18 10:40:13 CEST 2018
lmn|Thu Oct 18 10:40:13 CEST 2018
klm|Thu Oct 18 10:40:13 CEST 2018
You can change the format of sysdate. See documentation of gawk strftime here

Search logs within date/time range

I, Newbie, have searched this forum high and low, and have tried several awks, seds, & greps.
I am trying to search log files to output all logs within a date & time.
Unfortunately, the logs that I am searching all have different date formats.
I did get this one to work:
awk '$0 >= "2018-08-23.11:00:00" && $0 <= "2018-08-23.14:00:00"' catalina.out
for that specific date format.
I can't get these date formats to work, maybe an issue with the spacing?
2018-08-23 11:00:00, or Aug 23, 2018 11:00:00
Some examples of what I have tried:
sed -n '/2018-08-23 16:00/,/2018-08-23 18:00/p' testfile.txt
sed -n '/Feb 23 13:55/,/Feb 23 14:00/p' testfile.txt
awk '$0 >= "2018-08-23 17:00:00" && $0 <= "2018-08-23 19:00:00"' testfile.txt
I have also tried setting variables:
FROM="Aug 23, 2018 17:00:00" , TO="Aug 23, 2018 19:00:00"
awk '$0 >= "$FROM" && $0 <= "$TO"' testfile.txt
Can anyone help me with this?
UPDATE: I got THIS to work for the 2018-08-23 11:00:00 format
grep -n '2018-08-23 11:[0-9][0-9]' testfile.txt | head -1
grep -n '2018-08-23 12:[0-9][0-9]' testfile.txt | tail -1
awk 'NR>=2 && NR<=4' testfile.txt > rangeoftext
But I could not get it to work with the Aug 23, 2018 11:00:00 -- again, I think this may be a space issue? Not sure how to resolve....

This is a difficult problem. grep and sed have no concept of a date, and even GNU awk has only limited support for dates and times.
The problem becomes somewhat more tractable if you use a sane date format, i.e. a date format that can be used in string comparisons, such as 2018-08-15 17:00:00. This should work regardless of whether the string contains whitespace or not. However, beware of tools that automatically split on whitespace, such as the shell and awk.
Now, to your examples:
sed -n '/2018-08-23 16:00/,/2018-08-23 18:00/p' testfile.txt
sed -n '/Feb 23 13:55/,/Feb 23 14:00/p' testfile.txt
awk '$0 >= "2018-08-23 17:00:00" && $0 <= "2018-08-23 19:00:00"' testfile.txt
The first two should work, but only if the file really contains both timestamps, since you are only checking for the presence of certain arbitrary strings. The third should also work, provided that the records all start with a timestamp.

This might be what you're looking for (making some assumptions about what your input file might look like):
$ cat file
Aug 22, 2018 11:00:00 bad
2018-08-23 11:00:00 good
Aug 23, 2018 11:00:00 good
2018-08-24 11:00:00 bad
$ cat tst.awk
BEGIN {
min = raw2dt(min)
max = raw2dt(max)
}
{ cur = raw2dt($0) }
(cur >= min) && (cur <= max)
function raw2dt(raw, tmp, mthNr, dt, fmt) {
fmt = "%04d%02d%02d%02d%02d%02d"
if ( match(raw,/[0-9]{4}(-[0-9]{2}){2}( [0-9:]+)?/) ) {
split(substr(raw,RSTART,RLENGTH),tmp,/[^[:alnum:]]+/)
dt = sprintf(fmt, tmp[1], tmp[2], tmp[3], tmp[4], tmp[5], tmp[6])
}
else if ( match(raw,/[[:alpha:]]{3} [0-9]{2}, [0-9]{4}( [0-9:]+)?/) ) {
split(substr(raw,RSTART,RLENGTH),tmp,/[^[:alnum:]]+/)
mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",tmp[1])+2)/3
dt = sprintf(fmt, tmp[3], mthNr, tmp[2], tmp[4], tmp[5], tmp[6])
}
return dt
}
$ awk -v min='Aug 23, 2018 11:00' -v max='2018-08-23 11:00' -f tst.awk file
2018-08-23 11:00:00 good
Aug 23, 2018 11:00:00 good
The above will work using any POSIX awk in any shell on any UNIX box.

When trying to obtain a set of log-entries which appear between two dates, one should never use sed to check for this. Yes it is true that sed has a cool and very useful feature to check address ranges (so does awk btw.) but
sed -n `/date1/,/date2/p` file
will not always work. This means it will only work if date1 and date2 are actually in the file. If one of them is missing, this will fail.
An editing command with two addresses shall select the inclusive range from the first pattern space that matches the first address through the next pattern space that matches the second.
[address[,address]]
On top of that, when comparing dates, one should never use string comparisons unless you use a sane format. Some sane formats are YYYY-MM-DD, YYYY-MM-DD hh:mm:ss, ... Some bad formats are "Aug 1 2018" as it comes before "Jan 1 2018" and "99-01-31" comes after "01-01-31", or "2018-2-1" comes after "2018-11-1"
So if you can, try to convert your date you obtain to a sane format. The sanest format is computing the date-difference wrt an epoch. Unix has various tools that allow computing the number of seconds since the UNIX EPOCH of 1970-01-01 00:00:00 UTC. This is what you are really after.
As you mention, your log-file has various date-formats, and this does not make things easy. Even though gnu awk has various Time Functions, they require that you know the format beforehand.
Since we do not know which formats exist in your log-file, we will make use of the unix function date which has a very elaborate interpreter that knows a lot of formats.
Also, I will make the assumption that in awk you are able to uniquely identify the date somehow store the date in a string called date. Maybe there is a special character always appearing after the date that allows you to do this:
Example input file:
2018-08-23 16:00 | some entry
Aug 23 2018 16:01:01 | some other entry
So, in this case, we can say:
awk -F| -v t1=$(date -d "START_DATE" "+%s") \
-v t2=$(date -d "END_DATE" "+%s") \
'{date=$1}
{cmd="date -d \""$1"\" +%s"; cmd | getline epoch; close cmd}
(t1 <= epoch && epoch <= t2)' testfile

convert month from Aaa to xx in little script with awk

I am trying to report on the number of files created on each date. I can do that with this little one liner:
ls -la foo*.bar|awk '{print $7, $6}'|sort|uniq -c
and I get a list how many fooxxx.bar files were created by date, but the month is in the form: Aaa (ie: Apr) and I want xx (ie: 04).
I have feeling the answer is in here:
awk '
BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
months[d[o]]=sprintf("%02d",o)
}
format = "%m/%d/%Y %H:%M"
}
{
split($4,time,":")
date = (strftime("%Y") " " months[$2] " " $3 " " time[1] " " time[2] " 0")
print strftime(format, mktime(date))
}'
But have no to little idea what I need to strip out and no idea how to pass $7 to whatever I carve out of this to convert Apr to 04.
Thanks!

Here's the idiomatic way to convert an abbreviated month name to a number in awk:
$ echo "Feb" | awk '{printf "%02d\n",(index("JanFebMarAprMayJunJulAugSepOctNovDec",$0)+2)/3}'
02
$ echo "May" | awk '{printf "%02d\n",(index("JanFebMarAprMayJunJulAugSepOctNovDec",$0)+2)/3}'
05
Let us know if you need more info to solve your problem.

Assuming the name of the months only appear in the month column, then you could do this:
ls -la foo*.bar|awk '{sub(/Jan/,"01");sub(/Feb/,"02");print $7, $6}'|sort|uniq -c

Just use the field number of your month as an index into the months array.
print months[$6]
Since ls output differs from system to system and sometimes on the same system depending on file age and you didn't give any examples, I have no way of knowing how to guide you further.
Oh, and don't parse ls.

To parse AIX istat, I use:
istat .profile | grep "^Last modified" | read dummy dummy dummy mon day time dummy yr dummy
echo "M: $mon D: $day T: $time Y: $yr"
-> Month: Mar Day: 12 Time: 12:05:36 Year: 2012
To parse AIX istat month, I use this two-liner AIX 6.1 ksh 88:
monstr="???JanFebMarAprMayJunJulAugSepOctNovDec???"
mon="Oct" ; hugo=${monstr%${mon}*} ; hugolen=${#hugo} ; let hugol=hugolen/3 ; echo "Month: $hugol"
-> Month: 10
1..12 : month name ok
If lt 1 or gt 12 : month name not ok
Instead of "hugo" use speaking names ;-))

Adding a version for AIX, that shows how to retrieve all the date elements (in whatever timezone you need it them in), and display an iso8601 output
tempTZ="UTC" ; TZ="$tempTZ" istat /path/to/somefile \
| grep modified \
| awk -v tmpTZ="$tempTZ" '
BEGIN {Mmms="Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec";
n=split(Mmms,Mmm," ") ;
for(i=1;i<=n;i++){ mm[Mmm[i]]=sprintf("%02d",i) }
}
{ printf("%s-%s-%sT%s %s",$NF, mm[$4], $5, $6, tmpTZ ) }
' ## this will output an iso8601 date of the modification date of that file,
## for ex: 2019-04-18T14:16:05 UTC
## you can tempTZ=anything, for ex: tempTZ="UTC+2" to see that date in UTC+2 timezone... or tempTZ="EST" , etc
I show the iso8601 version to make it more known & used, but of course you may only need the "mm" portion, which is easly done : mm[$4]

Modification of date format within a text file

I have some text files containing lines as follow :
07JAN01, -0.247297942769082E+07, -0.467133797284279E+07, 0.355810777473149E+07
07JAN02, -0.247297942405032E+07, -0.467133797586388E+07, 0.355810777517715E+07
07JAN03, -0.247297942584851E+07, -0.467133797727224E+07, 0.355810777627353E+07
. . . .
. . . .
I need to produce a script which will amend the format of the date to :
01/01/07, -0.247297942769082E+07, -0.467133797284279E+07, 0.355810777473149E+07
02/01/07, -0.247297942405032E+07, -0.467133797586388E+07, 0.355810777517715E+07
03/01/07, -0.247297942584851E+07, -0.467133797727224E+07, 0.355810777627353E+07
. . . .
. . . .
I was looking for an appropriate sed or grep command to extract only some characters of each line, to define it as a variable in my script. As I would like to "reorganize" the date, I was thinking about defining three variable, where, for the for the first line it would be :
a=07
b=JAN (need to implement a "case" in the script to deal with this I think?)
c=03
I looked at some grep examples, and tons of docs, but nothing really clear appeared ...
found something about the -cut command, but ... I'm not too sure it's appropriate here.
The other question I have is about the output, as sed doesn't modify the input data, how can I modify directly the files ? Is there a way ?
Any help would really be appreciated :)

I don't think grep is the right tool for the job myself. You need something a little more expressive like Perl or awk:
echo '07JAN01, -0.24729E+07, -0.46713E+07, 0.35581E+07
07JAN02, -0.24729E+07, -0.46713E+07, 0.35581E+07
07AUG03, -0.24729E+07, -0.46713E+07, 0.35581E+07' | awk -F, '
{
yy=substr($1,1,2);
mm=substr($1,3,3);
mm=(index(":JAN:FEB:MAR:APR:MAY:JUN:JUL:AUG:SEP:OCT:NOV:DEC",mm)+2)/4;
dd=substr($1,6,2);
printf "%02d/%02d/%02d,%s,%s,%s\n",dd,mm,yy,$2,$3,$4
}'
which generates:
01/01/07, -0.24729E+07, -0.46713E+07, 0.35581E+07
02/01/07, -0.24729E+07, -0.46713E+07, 0.35581E+07
03/08/07, -0.24729E+07, -0.46713E+07, 0.35581E+07
Obviously, that's just pumping some test data through a command line awk script. You'd be better off putting that into an actual awk script file and running your input through it.
If datchg.awk contains:
{
yy=substr($1,1,2);
mm=substr($1,3,3);
mm=(index(":JAN:FEB:MAR:APR:MAY:JUN:JUL:AUG:SEP:OCT:NOV:DEC",mm)+2)/4;
dd=substr($1,6,2);
printf "%02d/%02d/%02d,%s,%s,%s\n",dd,mm,yy,$2,$3,$4
}
then:
echo '07JAN01, -0.24729E+07, -0.46713E+07, 0.35581E+07
07JAN02, -0.24729E+07, -0.46713E+07, 0.35581E+07
07AUG03, -0.24729E+07, -0.46713E+07, 0.35581E+07' | awk -F, -fdatechg.awk
also produces:
01/01/07, -0.24729E+07, -0.46713E+07, 0.35581E+07
02/01/07, -0.24729E+07, -0.46713E+07, 0.35581E+07
03/08/07, -0.24729E+07, -0.46713E+07, 0.35581E+07
The way this works is as follows. Each line is split into fields (-F, sets the field separator to a comma) and we extract and process the relevant parts of field 1 (the date). By this I mean the year and day are reversed and the textual month is turned into a numeric month by searching a string for it and manipulating the index where it was found, so that it falls in the range 1 through 12.
This is the only (relatively) tricky bit and is done with some basic mathematics: the index function simply finds the position within the string of your month (where the first char is 1). So JAN is at position 2, FEB at 6, MAR at 10, ..., DEC at 46 (the set {2, 6, 10, ..., 46}). They're 4 apart so we're going to need to divide by 4 eventually to get consecutive month numbers but first we add 2 so the division will work well. Adding that 2 gives you the set {4, 8, 12, ..., 48}. Then you divide by 4 to get {1, 2, 3, ... 12} and there's your month number:
Text Pos +2 /4
---- --- -- --
JAN 2 4 1
FEB 6 8 2
MAR 10 12 3
APR 14 16 4
MAY 18 20 5
JUN 22 24 6
JUL 26 28 7
AUG 30 32 8
SEP 34 36 9
OCT 38 40 10
NOV 42 44 11
DEC 46 48 12
Then we just output the new information. Obviously, this is likely to barf if you provide bad data but I'm assuming either:
the data is good; or
you'll add your own error checks.
Regarding modifying the files directly, the time-honored UNIX tradition is to use a shell script to save the current file elsewhere, process it to create a new file, then overwrite the old file with the new file (but not touching the saved file, in case something goes horribly wrong).
I won't make my answer any longer by detailing that, you've probably fallen asleep already :-)

A bit clunky, but you could do:
sed -e 's/^\(..\)JAN\(..\)/\2\/01\/\1/'
sed -e 's/^\(..\)FEB\(..\)/\2\/02\/\1/'
...
In order to run sed in-place, see the -i commandline option:
sed -i -e ...
Edit
Just to point out that this answers a previous version of the question where AWK was not specified.

awk 'BEGIN{
OFS=FS=","
# create table of mapping of months to numbers
s=split("JAN:FEB:MAR:APR:MAY:JUN:JUL:AUG:SEP:OCT:NOV:DEC",d,":")
for(o=1;o<=s;o++){
m=sprintf("%02s",o) # add 0 is single digit
date[d[o]]=m
}
}
{
yr=substr($1,1,2)
mth=substr($1,3,3)
day=substr($1,6,2)
$1=day"/"date[mth]"/"yr
}1' file

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove commas and format dates - awk

Related

How to print matching line, 3 lines after, and matching URL

Text processing to create a list of unique ID's

Search logs within date/time range

convert month from Aaa to xx in little script with awk

Modification of date format within a text file

Categories

Resources