Bash - does the awk function have a built in julian day function? - awk

I have a csv file where the 1st column is the date and time the data was generated - I am trying to use awk to convert that date into a julian day and add it as an extra column on the end
"2021-01-22 22:02:00",475673,485,0,0,0,0,0,0,0,0,-1.308788,-4.421722,-99
"2021-01-22 23:03:00",475674,485,0,0,0,0,0,0,0,0,-1.329033,-4.373959,-99
"2021-01-22 24:04:00",475675,485,0,0,0,0,0,0,0,0,-1.320374,-4.359528,-99
"2021-01-22 25:05:00",475676,485,0,0,0,0,0,0,0,0,-1.329685,-4.494766,-99
"2021-01-22 26:06:00",475677,485,0,0,0,0,0,0,0,0,-1.343422,-4.650154,-99
I have written a script in bash that is called when a file arrives for processing. I have tried a couple of different variation on the below
awk '{ jday=date -d(substr($0,2,10)) +%j;print $0","jday }' temp.CMP
The reason I am using the awk command is because I am also extracting the year, month, day, hour, minute data and adding as individual columns on the end of each line.
is what I am trying possible using awk in Bash?
thanks in advance for any help.

If you have access to GNU awk, you can try the following:
awk -F, '{ dattim=gensub("[-:\"]"," ","g",$1);print $0","strftime("%j",mktime(dattim))}' file
Use gensub to replace all "-" and ":" entries for a space in the first comma delimited field. Read the result into the variable dattim. Then use this variable along with strftime and mktime functions to append the julian format date to the end of the line.

Related

Filtering data using a specific column on a unix system

I'm trying to filter this data by year, I've tried using the grep command and awk, I need to have the data only for 1975 onwards. Currently, when I run the script I am using I also get data from over year where the year appears in the other columns 1 This is one example of the code I have tried to use to initially focusing on trying to filter one year.
{awk -F, '{ if($1 == "1975") print $0} ' caleq.dat >! regioneq.dat}
I have also tried using
{grep -F "1975" caleq.dat >! regioneq.dat}
I would be grateful for any assistance. Thank you.

check for value in csv file then print line with awk / sed

Is it possible to parse a .csv file and look for the 13th entry containing a particular value.
So data for example would be
10,1,a,bhd,5,7,10,,,8,9,3,19,0
I only want to extract lines which have a value of 3 in the 13th field if that makes sense.
Tried it wish a bash while loop using cut etc but was messy.
Not sure if there a awk / sed method.
Thanks in advance.
This is beginner level awk.
awk -F, '$13==3' file
-F, is for setting field separator to comma, $13 is the 13th field's value. For each line, if $13==3 evaluates true the line is printed.

awk: calculating sum from values in single field with multiple delimiters

Related to another post I had...
parsing a sql string for integer values with multiple delimiters,
In which I say I can easily accomplish the same with UNIX tools (ahem). I found it a bit more messy than expected. I'm looking for an awk solution. Any suggestions on the following?
Here is my original post, paraphrased:
#
I want to use awk to parse data sourced from a flat file that is pipe delimited. One of the fields is sub-formatted as follows. My end state is to sum the integers within the field, but my question here is to see of ways to use awk to sum the numeric values in the field. The pattern of the sub-formatting will always be where the desired integers will be preceded by a tilde (~) and followed by an asterisk (*), except for the last one in field. The number of sub fields may vary too (my example has 5, but there could more or less). The 4 char TAG name is of no importance.
So here is a sample:
|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|
From this example, all I would want for processing is the final number of 219. Again, I can work on the sum part as a further step; just interested in getting the numbers.
#
My solution currently entails two awk statements. First using gsub to replace the '~' with a '*' delimiter in my target field, 77:
awk -F'|' 'BEGIN {OFS="|"} { gsub("~", "*", $77) ; print }' file_1 > file_2
My second awk statement is to calculate the numeric sums on the target field, 77, which is the last field, and replace it with the calculated value. It is built on the assumption that there will be no other asterisks (*) anywhere else in the file. I'm okay with that. It is working for most examples, but not others, and my gut tells me this isn't that robust of an answer. Any ideas? The suggestions on my other post for SQL were great, but I couldn't implement them for unrelated silly reasons.
awk -F'*' '{if (NF>=2) {s=0; for (i=1; i<=NF; i++) s=s+$i; print substr($1, 1, length($1)-4) s;} else print}' file_2 > file_3
To get the sum (219) from your example, you can use this:
awk -F'[^0-9.]+' '{for(i=1;i<=NF;i++)s+=$i;print s}' file
or the following for 219.00 :
awk -F'[^0-9.]+' '{for(i=1;i<=NF;i++)s+=$i;printf "%.2f\n", s}' file

awk for date ranges date format like mm/dd/yyyy hh:mm:ss

I have my log file like this i am trying retrive date range
"07/10/2013 01:31:54","SNMP"
"07/10/2013 01:31:54","SNMP"
.... ... ..
"07/10/2013 03:03:54","SNMP"
I am using fallowing awk command, it gives all the rows, i tried different combination no use, is there standard data format need to use in awk?
awk -F, '"07/10/2013 01:35:40" > $1&&$1 <= "07/10/2013 01:50:03"' Mylog.log | wc -l
You have two problems: CSV parsing and date comparison.
The first one you can solve using match(), or a CSV parsing function.
The second one you can solve by either using a proper date format like ISO-8601, a happy side-effect being that dates (ex timezone/DST changes) can be compared lexically.
If you are really using gawk instead of plain awk or nawk you can use the built-in date function mktime() to parse timestamps and return an epoch-second ordinal which allows dates to be compared numerically.
awk has no native date/time types, and no standard data/time libraries, so lexical or numeric comparisons are the most straightforward option here.
A final option with gawk is a nasty hack:
/^"07.10.2013 01:35:40"/,/^"07.10.2013 01:50:03"/ {
# your code here
}
This uses a range expression to limit the scope of matching to between certain lines. This should work for your file format as long as the times are monotonic increasing – this is not true for Apache logs (since they are logged in order of completion, but by default contain the original request time stamp, and are not guaranteed to be monotonic increasing).
String compare:
$ awk -F" '"07/10/2013 01:35:40" > $2 && $2 <= "07/10/2013 01:50:03"' file
"07/10/2013 01:31:54","SNMP"
"07/10/2013 01:31:54","SNMP"
.... ... ..
... seems to work! To compare times better use mktime("YYYY MM DD HH MM SS") .

Using awk to fill in SQL Dates

I am trying to generate SQL filling in a date via command line, using awk's printf. The code I am using is:
awk 'BEGIN{ printf " convert_tz(time,\047GMT\047,\047America/New_York\047) as timestamp , date_format(convert_tz(time,\047GMT\047,\047America/New_York\047), \047\045Y\045m\045d \045H\045i\045s\047) as dt from table where time >= convert_tz(%s,\047America/New_York\047,\047GMT\047) and time <= convert_tz(%s + interval 1 day ,\047America/New_York\047,\047GMT\047);", "2011-01-01", "2011-01-01"}'
I believe I have the escaping correct, but I get the following result:
awk: fatal: not enough arguments to satisfy format string
Does anyone have an idea of why the %s is not getting caught and populated?
The specific version of awk I'm using is GNU Awk 3.1.6.
Your escaping is a bit off, you can't replace % with \045 since it's replaced back to % before printf is called and makes it confused. The way to escape % in printf is to instead use %% and it will work well.
...\047%%Y%%m%%d %%H%%i%%s\047...