find the difference in substring of timestamp in awk

find the difference in substring of timestamp in awk - awk

I am trying to append some text in /var/log/messages output whenever the timestamp between the two consecutive log is different such as :
previous log: 00:01:59 and current log 00:02:00
or
previous log:00:01:49 and current log 00:01:50
above substring of timestamp if different in consecutive log, append some message to $0.
You may run below command it is working for 1 minute, needed it for 10 sec.
tail -f /var/log/messages |awk '{split($3,a,":");split($3,b,"");new_time=a[1]":"a[2]":"b[1]; if(prev_time==new_time) print $0; else print "10 Second group is over, starting new: "$0" "prev_time " "new_time } {split($3,a,":");split($3,b,"");prev_time=a[1]":"a[2]":"b[1]}'
Required result is modification of above command to print same message in 10 second gap of logs , currently its doing for 1 minute. I have used split() to capture 'HH:MM:S" not "HH:MM:SS",so whenever privious 'HH:MM:S" and current 'HH:MM:S"differ , print the message "10 Second group is over, starting new: $0". Not sure what is the mistake here.
In short, currently its working when a minute changes, I need it when second changes from 39 to 40th sec or 09 sec to 10 sec. NOT 11 sec to 12 sec. HH:MM:SS , S marked in bold needed to be changed.
Sample lines:
Jan 23 15:09:54 foo bar
Jan 23 15:10:04 bla bla

this is the general idea:
$ for((i=35;i<45;i++)); do echo Jan 23 00:01:$i; done |
awk '{split($3,a,":"); print $0, (p!=(c=int(a[3]/10))?"<<<":""); p=c}'
Jan 23 00:01:35 <<<
Jan 23 00:01:36
Jan 23 00:01:37
Jan 23 00:01:38
Jan 23 00:01:39
Jan 23 00:01:40 <<<
Jan 23 00:01:41
Jan 23 00:01:42
Jan 23 00:01:43
Jan 23 00:01:44
first part is the test data generated for the script since you didn't provide enough. There is spurious first line match, which can be eliminated with NR>1 condition but I don't think that's critical.

Related

Remove repeated occurences in column 2

I have the following input file with 2 fields separated by tab. I hope to explain good enough.
Description
The field 2 contains chapters. In this case there are 2, HISTORY OF THE COUNTRY and PHYSICAL GEOGRAPHY,
All chapters are related with value 10 in field 1.
The sections beneath each chapter are related with value 07.
The content is related with value 05
The next chapter begins when $1==10 and $2 different than previous chapter.
In this case Chapter 1 goes from line 1 to line 16
In this case Chapter 2 goes from line 17 to end of file.
The chapters, sections and content could appear repeated times in field 2.
For example:
HISTORY OF THE COUNTRY appears 4 times between line 1 and line 16
PHYSICAL GEOGRAPHY appears 2 times between line 17 and end of file
My goal is:
Remove repeated occurrences of Chapters and Sections leaving all in the same order of appearence. For the content, don´t remove anything. I mean,
for chapter 1, remove repeated HISTORY OF THE COUNTRY within
chapter 1's context (between line 1 and 16)
for chapter 2, remove repeated PHYSICAL GEOGRAPHY within chapter 2's context (between
line 17 and 25)
The Input is this:
10 HISTORY OF THE COUNTRY
07 FIRST PART
07 INTRODUCTION
05 Article 1
10 HISTORY OF THE COUNTRY
07 FIRST PART
07 INTRODUCTION
05 Article 2
10 HISTORY OF THE COUNTRY
07 SECOND PART
07 REVIEW
05 Article 1
10 HISTORY OF THE COUNTRY
07 SECOND PART
07 METHODOLOGY
05 Article1
10 PHYSICAL GEOGRAPHY
07 FIRST PART
07 INTRODUCTION
05 First section
10 PHYSICAL GEOGRAPHY
07 FIRST PART
07 INTRODUCTION
05 Second Section
and output would be like this:
10 HISTORY OF THE COUNTRY
07 FIRST PART
07 INTRODUCTION
05 Article 1
05 Article 2
07 SECOND PART
07 REVIEW
05 Article 1
07 METHODOLOGY
05 Article1
10 PHYSICAL GEOGRAPHY
07 FIRST PART
07 INTRODUCTION
05 First section
05 Second Section
My current code prints something close, but not what I'm looking for.
awk '$2 in a {next} {
a[$2]++
}1' input.txt
10 HISTORY OF THE COUNTRY
07 FIRST PART
07 INTRODUCTION
05 Article 1
07 SECOND PART
07 REVIEW
07 METHODOLOGY
05 Article1
10 PHYSICAL GEOGRAPHY
05 First section
05 Second Section
To ease understanding, I show below the input with chapters in yellow and orange, sections in green. Besides that I show current output and desired output. Thanks for any help.

You may use this awk:
awk -F '\t' '$1 == 10 { ch = $2; sec = "" } $1+0 == 7 { sec = $2 }
($1+0 == 5 && !seen[ch,sec,$0]++) || !seen[ch,$0]++' file
10 HISTORY OF THE COUNTRY
07 FIRST PART
07 INTRODUCTION
05 Article 1
05 Article 2
07 SECOND PART
07 REVIEW
05 Article 1
07 METHODOLOGY
05 Article1
10 PHYSICAL GEOGRAPHY
07 FIRST PART
07 INTRODUCTION
05 First section
05 Second Section

another awk
You need to apply uniqueness in a path dependent way. Print all unique chapters, parts within a chapter, and articles/sections within a part.
$ awk '$1=="10" && !h1[c=$0]++;
$1=="07" && !h2[c,s=$0]++;
$1=="05" && !h3[c,s,$0]++' file
10 HISTORY OF THE COUNTRY
07 FIRST PART
07 INTRODUCTION
05 Article 1
05 Article 2
07 SECOND PART
07 REVIEW
05 Article 1
07 METHODOLOGY
05 Article1
10 PHYSICAL GEOGRAPHY
07 FIRST PART
07 INTRODUCTION
05 First section
05 Second Section
{print $0} is implied when the statement is missing.
in awk associated array a[k] (similar to hash maps) using the key k, initially equal to 0 or null value (false in boolean), post incremented a[k]++ and negated !a[k]++. So it will only be true for the first k, therefore can be used as a unique filter. Here h1, h2, and h3 correspond to unique headers at each level, where keys include the parent path as well, will only print the unique values based on the keys.
This script can be shortened for this problem but as given it's easy to modify if you need to add another layer.

Calculation of values that rely on a date variable

I am trying to calculate the value of the last measurement taken (according to the date column) divided by the lowest value recorded (according to the measurement column) if two values in the “SUBJECT” column match and two values in the “PROCEDURE” column match. The the calculation would be produced in a new column. I am having trouble with this and I would appreciate a solution to this matter.
data Have;
input Subject Type :$12. Date &:anydtdte. Procedure :$12. Measurement;
format date yymmdd10.;
datalines;
500 Initial 15 AUG 2017 Invasive 20
500 Initial 15 AUG 2017 Surface 35
500 Followup 15 AUG 2018 Invasive 54
428 Followup 15 AUG 2018 Outer 29
765 Seventh 3 AUG 2018 Other 13
500 Followup 3 JUL 2018 Surface 98
428 Initial 3 JUL 2017 Outer 10
765 Initial 20 JUL 2019 Other 19
610 Third 20 AUG 2019 Invasive 66
610 Initial 17 Mar 2018 Invasive 17
;
*Intended output table
Subject Type Date Procedure Measurement Output
500 Initial 15 AUG 2017 Invasive 20 20/20
500 Initial 15 AUG 2017 Surface 35 35/35
500 Followup 15 AUG 2018 Invasive 54 54/20
428 Followup 15 AUG 2018 Outer 29 29/10
765 Seventh 3 AUG 2018 Other 13 13/19
500 Followup 3 JUL 2018 surface 98 98/35
428 Initial 3 JUL 2017 Outer 10 10/10
765 Initial 20 JUL 2019 Other 19 19/19
610 Third 20 AUG 2019 Invasive 66 66/17
610 Initial 17 Mar 2018 Invasive 17 17/17 ;
*Attempt;
PROC SQL;
create table want as
select a.*,
(select measurement as measurement_last_date
from have
where subject = a.subject and type = a.type
having date = max(date)) / min(a.measurement) as ratio
from have as a
group by subject, type
order by subject, type, date;
QUIT;

I think that you need use statement retain with data step.
the statement will retain your last row and you can 'll compare the last row with actual row processed.
link of some tutorial of how use statement retain.
enter link description here
SAS documentation
enter link description here

Skip operation on the line if one of the columns have letters - bash

I'm trying to skip operations on columns rows where End_time has the value "Failed".
Here is my actual file.
check_time.log
Done City Start_time End_time
Yes Chicago 18 10
Yes Atlanta 208 11
No Minnetonka 57 Failed
Yes Hopkins 112 80
No Marietta 2018 Failed
Here is what I have so far.
awk 'BEGIN { OFS = "\t" } NR == 1 { $5 = "Time_diff" } NR >= 2 { $5 = $3 - $4 } 1' < files |column -t
Output:
Done City Start_time End_time Time_diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed 57
Yes Hopkins 112 80 32
No Marietta 2018 Failed 2018
Desired output should look like this:
Done City Start_time End_time Time_diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
So how do I skip that?

You should be just able to change:
$5 = $4 - $5
into:
if ($4 != "Failed") { $5 = $3 - $4 }
This will:
refuse to change $5 from empty to the calculated value in lines where the end time is Failed; and
correctly do the calculation for all other lines.
I say correctly since it appears you want the start time minus the end time in those cases, despite the fact durations tend to be end time minus the start time. I've changed it to match your desired output rather than the "sane" expectation.
A transcript follows so you can see it in action:
pax$ awk 'BEGIN{OFS="\t"}NR==1{$5="Time_diff"}NR>=2{if($4!="Failed"){$5=$3-$4}}1' <inputFile.txt |column -t
Done City Start_time End_time Time_diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
And, just as an aside, you may want to consider what will happen when you start getting information from New York, San Antonio, Salt Lake City or, even worse, Maccagno con Pino e Veddasca :-)

Could you please try following.(Here considering that your Input_file's last fields will be this order only and will not have any other additional fields, if they have then you may need to adjust field numbers because in case your city's value is having space in it then field number from starting will create an issue in simply differentiating values for all lines because field values will be different then as per line)
awk '
FNR==1{
print $0,"Time_Diff"
next
}
$NF!="Failed"{
$(NF+1)=$(NF-1)-$NF
}
1
' Input_file | column -t
Output will be as follows.
Done City Start_time End_time Time_Diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
Explanation: Adding complete explanation for above code now.
awk ' ##Starting awk program from here.
FNR==1{ ##Checking conditoin if line is very first line then do following.
print $0,"Time_Diff" ##Printing current line with string Time_Diff here on very first line to print headings.
next ##next is awk keyword which will skip all further statements from here.
}
$NF!="Failed"{ ##Checking if last field $NF where NF is number of fields and $ means in awk field value is NOT failed then do following.
$(NF+1)=$(NF-1)-$NF ##Add a new column by doing $(NF+1) whose value will be difference of 2nd last column and last column as per samples.
} ##Closing this condition block here.
1 ##Mentioning 1 will print edited/non-edited line for Input_file.
' Input_file | ##Mentioning Input_file name and passing awk program output to next command by using pipe(|).
column -t ##Using column -t will print the output in TAB separated format.

If you are considering Perl,
> cat kwa.in
Done City Start_time End_time
Yes Chicago 18 10
Yes Atlanta 208 11
No Minnetonka 57 Failed
Yes Hopkins 112 80
No Marietta 2018 Failed
> perl -lane ' print join(" ",#F,"Time_Diff") if $.==1; if($.>1 ) { $F[4]=$F[2]-$F[3] if not $F[3]=~/Failed/; print join(" ",#F) } ' kwa.in | column -t
Done City Start_time End_time Time_Diff
Yes Chicago 18 10 8
Yes Atlanta 208 11 197
No Minnetonka 57 Failed
Yes Hopkins 112 80 32
No Marietta 2018 Failed
>

function of catchup api on airflow?

i was searched about catchup on airflow documentatuon.
but i still don't understand what the purpose of this API.
catchup (bool) – Perform scheduler catchup (or only run latest)? Defaults to True
thanks

You'll find an expanded explanation in the documents about scheduling backfill and catchup.
Let me try to expand on it with an example.
Assume this calendar for January this year:
January 2018
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
Let's say you add a DAG on the 23rd with start_date=datetime(2018, 1, 1) and schedule_interval='0 0 * * MON'.
With catchup=True on first parsing the DAG the scheduler will immediately recognize that the periods 1-1 to 1-8, 1-8 to 1-15, 1-15 to 1-22 have closed and passed. It would schedule a DAG run for execution_date 2018-01-01 starting when you add the DAG on 23rd. If there are max_active_runs > 2 it would also schedule a DAG run for 2018-01-08 and 2018-01-15.
With catchup=False on first parsing the DAG the scheduler will still recognize that the same periods have closed and passed. But it would schedule a DAG run for execution_date 2018-01-15 only, starting when you add the DAG on 23rd. IE it would run the most recent closed period first, and not run any prior periods. The next run would then be 2018-01-22 starting at 2018-01-29T00:00:00±scheduler_lag. But if after the 2018-01-15 run completed, you paused the DAG, and then unpaused it on 2018-01-29T09:00, the scheduler would see that there are prior dag runs, and that the most recent period's start time is well past, it would not run a catchup run of this missed period.

Read serial input with awk, insert date

I'm trying to reformat serial input, which consists of two integers separated by a comma (sent from an Arduino):
1,2
3,4
0,0
0,1
etc. I would like to append the date after each line, separating everything with a tab character. Here's my code so far:
cat /dev/cu.usbmodem3d11 | awk 'BEGIN {RS=","};{printf "%i\t%i\t%s",$1,$2,system("date")}'
Here's the result I get (with date in my locale):
1 2 0Mer 26 fév 2014 22:09:20 EST
3 4 0Mer 26 fév 2014 22:09:20 EST
0 0 0Mer 26 fév 2014 22:09:20 EST
0 1 0Mer 26 fév 2014 22:09:20 EST
Why is there an extra '0' in front of my date field? Sorry for the newbie question :(
EDIT This code solved my problem. Thanks to all who helped.
awk 'BEGIN {FS=","};{system("date")|getline myDate;printf "%i\t%i\t%s",$1, $2, myDate}' /dev/cu.usbmodem3d11
I'm not clear why, but in order for the date to keep updating and recording at what time the data was received, I have to use system("date")instead of just "date"in the code above.

2 things
It will be easier to see your problem if you add a \n at the end of your printf string
Then the output is
>echo '1,2' | awk 'BEGIN {RS=","};{printf "%i\t%i\t%s\n",$1,$2,system("date")}'
Wed Feb 26 21:30:17 CST 2014
1 0 0
Wed Feb 26 21:30:17 CST 2014
2 0 0
I'm guessing that output from system("date") returns its output "outside" of scope of awk's $0 natural scope of each line of input processed. Others may be able to offer a better explanation.
To get the output you want, I'm using the getline function to capture the output of the date command to a variable (myDt). Now the output is
> echo '1,2' | awk 'BEGIN {RS=","};{"date" | getline myDt ; printf "%i\t%i\t%s\n",$1,$2,myDt}'
1 0 Wed Feb 26 21:31:15 CST 2014
2 0 Wed Feb 26 21:31:15 CST 2014
Finally, we remove the "debugging" \n char, and get the output you specify:
> echo '1,2' | awk 'BEGIN {RS=","};{"date" | getline myDt ; printf "%i\t%i\t%s",$1,$2,myDt}'
1 0 Wed Feb 26 21:34:56 CST 2014
2 0 Wed Feb 26 21:34:56 CST 2014
And, per Jaypal's post, I see now that FS="," is another issue, so when we make that change AND return the `\n' char, we have
echo '1,2' | awk 'BEGIN {FS=","};{"date" | getline myDt ; printf "%i\t%i\t%s\n",$1,$2,myDt}'
1 2 Wed Feb 26 21:44:42 CST 2014

Two issues:
First - RS is record separator. You need FS which is Field Separator to separate two columns, where $1 will be 1 and $2 will be 2 (as per your first row)
Second - The extra 0 you see in output is the return value of system() command. It means it ran successfully. You can simply run the shell command in quotes and pipe it to getline. Putting a variable after it will allow you to capture the value returned.
Try this:
awk 'BEGIN {FS=","};{"date"|getline var;printf "%i\t%i\t%s\n",$1,$2,var}'

This is a more simple solution:
awk -F, '{print $1,$2,dt}' dt="$(date)" OFS="\t" /dev/cu.usbmodem3d11
1 2 Thu Feb 27 06:23:41 CET 2014
3 4 Thu Feb 27 06:23:41 CET 2014
0 0 Thu Feb 27 06:23:41 CET 2014
0 1 Thu Feb 27 06:23:41 CET 2014
IF you like to show date in another format, just read manual for date
Eks dt="$(date +%D)" gives 02/27/14

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas