for example
I want to add 13.29s to 2013-4-24 3:10:50.50
how to handle the milisecond ?
I've tried to use mktime and strftime, but it seems that can only deal with seconds...
awk is really a powerful tool, but I don't think awk was the best choice here, I would go with gnu date.
see the test with your example data:
#add 13.29s to date 2013-4-24 3:10:50.50
kent$ date -d'+13.29 second 2013-4-24 3:10:50.50' +"%F %T.%N"
2013-04-24 03:11:03.790000000
well I know that there are trailing zeros for nano seconds. but I think it wouldn't be problem for you if you want to remove them.
you can invoke external command from awk, if using awk is a must for you.
Not simple thing to do, but here we go:
time="2013-4-24 3:10:50.50"
echo "13.29" | awk '{split(v,a,"[ -:.]");t=mktime(a[1]" "a[2]" "a[3]" "a[4]" "a[5]" "a[6])+(a[7]/100)+$1;print strftime("%Y-%m-%d %H:%M:%S",t)"."(t-int(t))*100}' v="$time"
2013-04-24 03:11:03.79
With explanation
echo "13.29" | awk '
{
split(v,a,"[ -:.]") # Split the date string into separate parts
t=mktime(a[1]" "a[2]" "a[3]" "a[4]" "a[5]" "a[6])+(a[7]/100)+$1 # Convert to epoch time and add milliseconds and calcualte the new value
print strftime("%Y-%m-%d %H:%M:%S",t)"."(t-int(t))*100 # Convert back to normal time format and print it out
}
' v="$time" # Read the variable
Related
For my powerlevel10k custom prompt, I currently have this function to display the seconds since the epoch, comma separated. I display it under the current time so I always have a cue to remember roughly what the current epoch time is.
function prompt_epoch() {
MYEPOCH=$(/bin/date +%s | sed ':a;s/\B[0-9]\{3\}\>/,&/;ta')
p10k segment -f 66 -t ${MYEPOCH}
}
My prompt looks like this: https://imgur.com/0IT5zXi
I've been told I can do this without the forked processes using these commands:
$ zmodload -F zsh/datetime p:EPOCHSECONDS
$ printf "%'d" $EPOCHSECONDS
1,648,943,504
But I'm not sure how to do that without the forking. I know to add the zmodload line in my ~/.zshrc before my powerlevel10k is sourced, but formatting ${EPOCHSECONDS} isn't something I know how to do without a fork.
If I were doing it the way I know, this is what I'd do:
function prompt_epoch() {
MYEPOCH=$(printf "%'d" ${EPOCHSECONDS})
p10k segment -f 66 -t ${MYEPOCH}
}
But as far as I understand it, that's still forking a process every time the prompt is called, correct? Am I misunderstanding the advice given because I don't think I can see a way to get the latest epoch seconds without running some sort of process, which requires a fork, correct?
The printf zsh builtin can assign the value to a variable using the -v flag. Therefore my function can be rewritten as:
function prompt_epoch() {
printf -v MYEPOCH "%'d" ${EPOCHSECONDS}
p10k segment -f 66 -t ${MYEPOCH}
}
Thanks to this answer in Unix Stackoverflow: https://unix.stackexchange.com/a/697807/101884
awk can generate a timestamp with strftime function, e.g.
$ awk 'BEGIN {print strftime("%Y/%m/%d %H:%M:%S")}'
2019/03/26 08:50:42
But I need a timestamp with fractional seconds, ideally down to nanoseconds. gnu date can do this with the %N element:
$ date "+%Y/%m/%d %H:%M:%S.%N"
2019/03/26 08:52:32.753019800
But it is relatively inefficient to invoke date from within awk compared to calling strftime, and I need high performance as I'm processing many large files with awk and need to generate many timestamps while processing the files. Is there a way that awk can efficiently generate a timestamp that includes fractional seconds (ideally nanoseconds, but milliseconds would be acceptable)?
Adding an example of what I am trying to perform:
awk -v logFile="$logFile" -v outputFile="$outputFile" '
BEGIN {
print "[" strftime("%Y%m%d %H%M%S") "] Starting to process " FILENAME "." >> logFile
}
{
data[$1] += $2
}
END {
print "[" strftime("%Y%m%d %H%M%S") "] Processed " NR " records." >> logFile
for (id in data) {
print id ": " data[id] >> outputFile
}
}
' oneOfManyLargeFiles
If you are really in need of subsecond timing, then any call to an external command such as date or reading an external system file such as /proc/uptime or /proc/rct defeats the purpose of the subsecond accuracy. Both cases require to many resources to retrieve the requested information (i.e. the time)
Since the OP already makes use of GNU awk, you could make use of a dynamic extension. Dynamic extensions are a way of adding new functionality to awk by implementing new functions written in C or C++ and dynamically loading them with gawk. How to write these functions is extensively written down in the GNU awk manual.
Luckily, GNU awk 4.2.1 comes with a set of default dynamic libraries which can be loaded at will. One of these libraries is a time library with two simple functions:
the_time = gettimeofday()
Return the time in seconds that has elapsed since 1970-01-01 UTC as a floating-point value. If the time is unavailable on this platform, return -1 and set ERRNO. The returned time should have sub-second precision, but the actual precision may vary based on the platform. If the standard C gettimeofday() system call is available on this platform, then it simply returns the value. Otherwise, if on MS-Windows, it tries to use GetSystemTimeAsFileTime().
result = sleep(seconds)
Attempt to sleep for seconds seconds. If seconds is negative, or the attempt to sleep fails, return -1 and set ERRNO. Otherwise, return zero after sleeping for the indicated amount of time. Note that seconds may be a floating-point (nonintegral) value. Implementation details: depending on platform availability, this function tries to use nanosleep() or select() to implement the delay.
source: GNU awk manual
It is now possible to call this function in a rather straightforward way:
awk '#load "time"; BEGIN{printf "%.6f", gettimeofday()}'
1553637193.575861
In order to demonstrate that this method is faster then the more classic implementations, I timed all 3 implementations using gettimeofday():
awk '#load "time"
function get_uptime( a) {
if((getline line < "/proc/uptime") > 0)
split(line,a," ")
close("/proc/uptime")
return a[1]
}
function curtime( cmd, line, time) {
cmd = "date \047+%Y/%m/%d %H:%M:%S.%N\047"
if ( (cmd | getline line) > 0 ) {
time = line
}
else {
print "Error: " cmd " failed" | "cat>&2"
}
close(cmd)
return time
}
BEGIN{
t1=getimeofday(); curtime(); t2=gettimeofday();
print "curtime()",t2-t1
t1=getimeofday(); get_uptime(); t2=gettimeofday();
print "get_uptime()",t2-t1
t1=getimeofday(); gettimeofday(); t2=gettimeofday();
print "gettimeofday()",t2-t1
}'
which outputs:
curtime() 0.00519109
get_uptime() 7.98702e-05
gettimeofday() 9.53674e-07
While it is evident that curtime() is the slowest as it loads an external binary, it is rather startling to see that awk is blazingly fast in processing an extra external /proc/ file.
If you are on Linux, you could use /proc/uptime:
$ cat /proc/uptime
123970.49 354146.84
to get some centiseconds (the first value is the uptime) and compute the time difference between the beginning and whenever something happens:
$ while true ; do echo ping ; sleep 0.989 ; done | # yes | awk got confusing
awk '
function get_uptime( a, line) {
if((getline line < "/proc/uptime") > 0)
split(line,a," ")
close("/proc/uptime")
return a[1]
}
BEGIN {
basetime=get_uptime()
}
{
if(!wut) # define here the cause
print get_uptime()-basetime # calculate time difference
}'
Output:
0
0.99
1.98
2.97
3.97
I am using orgmode on Emacs and want to automatically update parts of an orgmode file using cron scheduling.
I know how to get the cron job to run at the times I choose but now I am faced with the issue of selecting certain parts of the file to change.
I would like to increment numbers at a certain locations in a file everyday (like every day at 3am or something).
So say I have the file fruit.org:
* Apple
age: 2
* Bananas
age: 1
A really bad fruit
* Cranberry
* Death
* Easter
A cool day
I want to select all the numerical values after age and then increment them every day. How would I do this selection and replacing. I believe it would involve regexp and some tool (maybe awk) but I am relatively clueless from there on.
In awk, you could say:
awk '/age:/ { $2++ } { print }' foo.org
If you have a recent version of GNU awk, you can edit the file in-place using the option -i inplace. Otherwise, just do the usual, i.e. redirect to a temporary file and then replace the original:
awk '/age:/ { $2++ } { print }' foo.org > foo.org.tmp && mv foo.org{.tmp,}
That's basically what the inplace option of awk or sed does behind the scenes anyway.
I have a text file in the below format.The first column represents a timestamp with a very high resolution.The second number represents the sequence number.I want to plot a graph between these two values.i.e Sequence number Over time.For this purpose I want to scale the sequence number and the timestamp.Time stamp can be scaled by subtracting the first time stamp from the remaining time stamps.Sequence number also should be scaled the same way.However when scaled the sequence number can have negative values.How do I write a bash script using awk to achieve this.This file name is print_1010171.txt.Please not that I do have a number of files of the same format.so I want the script to get generic.
5698771509078629376 1133254688
5698771509371165696 1150031904
5698771510035551232 1150031904
5698771510036082688 4170258464
5698771510036715520 2895583264
5698771510037202176 1620908064
5698771510037665280 346232864
5698771510038193664 3366459424
5698771510332259072 2091784224
5698771510332816128 817109024
5698771510333344512 3837335584
5698771510339882240 2562660384
5698771510340411392 1287985184
5698771510340939776 13309984
5698771510348048896 3033536544
5698771510348577280 1758861344
5698771510349228800 484186144
5698771510632804864 3504412704
5698771510633441792 2229737504
5698771510634390272 955062304
5698771510638858496 3975288864
5698771510639347712 2700613664
5698771510642663168 1425938464
5698771510643387136 134486304
5698771510643808768 3154712864
5698771510648858368 1880037664
5698771510649410560 605362464
5698771510655600384 3625589024
5698771510656128768 2350913824
5698771510656657408 1076238624
Very similar to Dennis Williamson's solution -- This should be more efficient (but probably not something you'd ever notice) and it will also silently ignore blank lines (the other solution will give very large negative numbers for blank lines).
#script coolscript.gp
if(!exists("DATAFILE")) DATAFILE='test.dat'
EXT_INDEX=strstr(DATAFILE,'.txt') #assume data has a .txt extension.
set term post enh color
set output DATAFILE[:EXT_INDEX] . '.ps' #gnuplot string slicing and concatenation
plot "< awk 'BEGIN{getline; header_col1=$1; header_col2=$2 }{if(NF){print $1-header_col1,$2-header_col2}}' ".DATAFILE using 1:2
You can definitely do this using an all-gnuplot solution. (See #andyras's nice solution and my answer that he linked to). This (alternate) solution works by reading the first line in awk and assigning the variables header_col1 and header_col2 with the data in column 1 and column 2. It then subtracts those from the future columes (as expected) as long as the line isn't empty.
Note that this solution can be called from the commandline using:
gnuplot -e "DATAFILE='mydatafile.txt'" coolscript.gp
Unfortunately, the quotes are necessary since gnuplot needs them, meaning that if you're using this in a shell loop, you should definitely use the double quotes on the outside as I show.
for FILE in *.dat; do
gnuplot -e "DATAFILE='${FILE}'" coolscript.gp
done
awk 'NR == 1 {basets = $1; baseseq = $2} {print $1 - basets, $2 - baseseq}' inputfile
or, if you don't want to output the initial pair of zeros:
awk 'NR == 1 {basets = $1; baseseq = $2; next} {print $1 - basets, $2 - baseseq}' inputfile
Here is a bash wrapper script which should do what you want:
#!/bin/bash
gnuplot << EOF
set terminal png truecolor size 800,600
set output 'plot_$1.png'
firstx=0
offsetx=0
funcx(x)=(offsetx=(firstx==0)?x:offsetx,firstx=1,x-offsetx)
firsty=0
offsety=0
funcy(x)=(offsety=(firsty==0)?x:offsety,firsty=1,x-offsety)
plot '$1' u (funcx(\$1)):(funcy(\$2))
EOF
To use the script, give it the name of the file you want to plot as an argument:
$ myscript.sh print_1010171.txt
I modified the answer given here to accommodate two variables. See that answer also if you want to subtract the lowest value from all data rather than the first.
I'm currently running an awk script to process a large (8.1GB) access-log file, and it's taking forever to finish. In 20 minutes, it wrote 14MB of the (1000 +- 500)MB I expect it to write, and I wonder if I can process it much faster somehow.
Here is the awk script:
#!/bin/bash
awk '{t=$4" "$5; gsub("[\[\]\/]"," ",t); sub(":"," ",t);printf("%s,",$1);system("date -d \""t"\" +%s");}' $1
EDIT:
For non-awkers, the script reads each line, gets the date information, modifies it to a format the utility date recognizes and calls it to represent the date as the number of seconds since 1970, finally returning it as a line of a .csv file, along with the IP.
Example input: 189.5.56.113 - - [22/Jan/2010:05:54:55 +0100] "GET (...)"
Returned output: 189.5.56.113,124237889
#OP, your script is slow mainly due to the excessive call of system date command for every line in the file, and its a big file as well (in the GB). If you have gawk, use its internal mktime() command to do the date to epoch seconds conversion
awk 'BEGIN{
m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",d,"|")
for(o=1;o<=m;o++){
date[d[o]]=sprintf("%02d",o)
}
}
{
gsub(/\[/,"",$4); gsub(":","/",$4); gsub(/\]/,"",$5)
n=split($4, DATE,"/")
day=DATE[1]
mth=DATE[2]
year=DATE[3]
hr=DATE[4]
min=DATE[5]
sec=DATE[6]
MKTIME= mktime(year" "date[mth]" "day" "hr" "min" "sec)
print $1,MKTIME
}' file
output
$ more file
189.5.56.113 - - [22/Jan/2010:05:54:55 +0100] "GET (...)"
$ ./shell.sh
189.5.56.113 1264110895
If you really really need it to be faster, you can do what I did. I rewrote an Apache log file analyzer using Ragel. Ragel allows you to mix regular expressions with C code. The regular expressions get transformed into very efficient C code and then compiled. Unfortunately, this requires that you are very comfortable writing code in C. I no longer have this analyzer. It processed 1 GB of Apache access logs in 1 or 2 seconds.
You may have limited success removing unnecessary printfs from your awk statement and replacing them with something simpler.
If you are using gawk, you can massage your date and time into a format that mktime (a gawk function) understands. It will give you the same timestamp you're using now and save you the overhead of repeated system() calls.
This little Python script handles a ~400MB worth of copies of your example line in about 3 minutes on my machine producing ~200MB of output (keep in mind your sample line was quite short, so that's a handicap):
import time
src = open('x.log', 'r')
dest = open('x.csv', 'w')
for line in src:
ip = line[:line.index(' ')]
date = line[line.index('[') + 1:line.index(']') - 6]
t = time.mktime(time.strptime(date, '%d/%b/%Y:%X'))
dest.write(ip)
dest.write(',')
dest.write(str(int(t)))
dest.write('\n')
src.close()
dest.close()
A minor problem is that it doesn't handle timezones (strptime() problem), but you could either hardcode that or add a little extra to take care of it.
But to be honest, something as simple as that should be just as easy to rewrite in C.
gawk '{
dt=substr($4,2,11);
gsub(/\//," ",dt);
"date -d \""dt"\" +%s"|getline ts;
print $1, ts
}' yourfile