awk: Check if a number is in between two numbers - awk

I have a file of below format
02 Jul 2016 00:00:00 2736359
02 Jul 2016 00:02:00 2736594
02 Jul 2016 00:04:00 2736828
02 Jul 2016 00:06:00 2737068
02 Jul 2016 00:08:00 2737303
02 Jul 2016 00:10:00 2737542
02 Jul 2016 00:12:00 2737775
02 Jul 2016 00:14:00 2738011
02 Jul 2016 00:16:00 2738251
02 Jul 2016 00:18:00 2738483
Where the first column is the time stamp and second is a number. Given an input 2737778, I want the output to be "02 Jul 2016 00:12:00 and 02 Jul 2016 00:14:00" as 2737778 falls in between 2737775 and 2738011. Can I do this in awk? Is it possible to compare a number in current line with next line?

another similar awk
awk -v n=2737778 'n<=$NF{if(p) print p; print; exit} {p=$0}' file
02 Jul 2016 00:12:00 2737775
02 Jul 2016 00:14:00 2738011

Yes, it is possible to read ahead in awk, see Peek at next line, but don't consume it. Here is another way to do it.
awk '{ if (NR == 1) { save = $0; prtsw = 1 }
else if (prtsw == 1 && 2737778 < $5) {
print save
print $0
prtsw = 0
}
else { save = $0 }
}' abetween.txt

Related

Get the number of unique days with overlapping dates (in SAS)

I couldn't briefly explain the problem so I'll try to explain it this way. Let's say I have a table similar to the one below.
How do I get the total number of days in October per student that that student has at least 1 book checked out?
Please note that a single student can check out more than 1 book at a time which cause the overlapping dates.
Student
Book
Date_Borrowed
Date_Returned
David
A Thousand Splendid Suns
01 Oct 2021
05 Oct 2021
David
Jane Eyre
09 Oct 2021
13 Oct 2021
David
Please Look After Mom
21 Oct 2021
29 Oct 2021
Fiona
Sense and Sensibility
05 Oct 2021
14 Oct 2021
Fiona
The Girl Who Saved the King of Sweden
05 Oct 2021
14 Oct 2021
Fiona
A Fort of Nine Towers
02 Oct 2021
17 Oct 2021
Fiona
One Hundred Years of Solitude
20 Oct 2021
30 Oct 2021
Fiona
The Unbearable Lightness of Being
20 Oct 2021
30 Oct 2021
Greg
Fahrenheit 451
06 Oct 2021
11 Oct 2021
Greg
One Hundred Years of Solitude
10 Oct 2021
17 Oct 2021
Greg
Please Look After Mom
15 Oct 2021
21 Oct 2021
Greg
4 3 2 1
20 Oct 2021
27 Oct 2021
Greg
The Girl Who Saved the King of Sweden
27 Oct 2021
03 Nov 2021
Marcus
Fahrenheit 451
01 Oct 2021
04 Oct 2021
Marcus
Nectar in a Sieve
15 Oct 2021
15 Oct 2021
Marcus
Please Look After Mom
30 Oct 2021
31 Oct 2021
Priya
Like Water for Chocolate
02 Oct 2021
21 Oct 2021
Priya
Fahrenheit 451
21 Oct 2021
22 Oct 2021
Sasha
Baudolino
03 Oct 2021
29 Oct 2021
Sasha
A Thousand Splendid Suns
07 Oct 2021
16 Oct 2021
Sasha
A Fort of Nine Towers
26 Oct 2021
01 Nov 2021
Thanks in advance!
Using the data step, you can expand each date into a long format. From there, you can use SQL to do a simple count by student after removing overlapping dates.
data foo;
set have;
do date = date_borrowed to date_returned;
output;
end;
keep student date;
format date date9.;
run;
This gets us a long table of all the dates with at least one book checked out for each student.
student date
David 01OCT2021
David 02OCT2021
David 03OCT2021
David 04OCT2021
David 05OCT2021
David 09OCT2021
...
Now we need to remove the overlapping dates.
proc sort data=foo nodupkey;
by student date;
run;
From here, we can do a simple SQL count per student.
proc sql noprint;
create table want as
select student
, intnx('month', date, 0, 'B') as month format=monyy7.
, count(*) as days_checked_out
from foo
where calculated month = '01OCT2021'd
group by student, calculated month
;
quit;
Output:
student month days_checked_out
David OCT2021 19
Fiona OCT2021 27
Greg OCT2021 26
Marcus OCT2021 7
Priya OCT2021 21
Sasha OCT2021 29
An easy way is to make a temporary array with one variable for each day in the time period you want to count. Then just use a do loop to set the variables representing those days to 1. When you have reached the last record for a student then take the sum to find the number of days covered.
First let's convert your posted table into a dataset.
data have;
infile cards dsd dlm='|' truncover;
input Student :$20. Book :$100. (Date_Borrowed Date_Returned) (:date.);
format Date_Borrowed Date_Returned date11.;
cards;
David|A Thousand Splendid Suns|01 Oct 2021|05 Oct 2021
David|Jane Eyre|09 Oct 2021|13 Oct 2021
David|Please Look After Mom|21 Oct 2021|29 Oct 2021
Fiona|Sense and Sensibility|05 Oct 2021|14 Oct 2021
Fiona|The Girl Who Saved the King of Sweden|05 Oct 2021|14 Oct 2021
Fiona|A Fort of Nine Towers|02 Oct 2021|17 Oct 2021
Fiona|One Hundred Years of Solitude|20 Oct 2021|30 Oct 2021
Fiona|The Unbearable Lightness of Being|20 Oct 2021|30 Oct 2021
Greg|Fahrenheit 451|06 Oct 2021|11 Oct 2021
Greg|One Hundred Years of Solitude|10 Oct 2021|17 Oct 2021
Greg|Please Look After Mom|15 Oct 2021|21 Oct 2021
Greg|4 3 2 1|20 Oct 2021|27 Oct 2021
Greg|The Girl Who Saved the King of Sweden|27 Oct 2021|03 Nov 2021
Marcus|Fahrenheit 451|01 Oct 2021|04 Oct 2021
Marcus|Nectar in a Sieve|15 Oct 2021|15 Oct 2021
Marcus|Please Look After Mom|30 Oct 2021|31 Oct 2021
Priya|Like Water for Chocolate|02 Oct 2021|21 Oct 2021
Priya|Fahrenheit 451|21 Oct 2021|22 Oct 2021
Sasha|Baudolino|03 Oct 2021|29 Oct 2021
Sasha|A Thousand Splendid Suns|07 Oct 2021|16 Oct 2021
Sasha|A Fort of Nine Towers|26 Oct 2021|01 Nov 2021
;
Now we can use BY group processing in a data step to aggregate per student. We can set the upper and lower index for the array to be the values SAS uses to represent those days. Temporary arrays are automatically retained across observations, we just need to clear it out when we start a new student.
The SAS compiler does not expect to see a date literal as the index boundaries for an array so we can use %SYSEVALF() to convert the date literal to the integer it represents.
data want;
set have;
by student ;
array october [%sysevalf('01oct2021'd):%sysevalf('31oct2021'd)] _temporary_ ;
if first.student then call missing(of october[*]);
do date=max(date_borrowed,'01oct2021'd) to min(date_returned,'31oct2021'd);
october[date]=1;
end;
if last.student;
days = sum(0, of october[*]);
keep student days;
run;
Results:
Obs Student days
1 David 19
2 Fiona 27
3 Greg 26
4 Marcus 7
5 Priya 21
6 Sasha 29
You could also modify it slightly to not only count the number of "covered" (or unique) days, but also the total number of "book" days.
data want;
set have;
by student ;
array october [%sysevalf('01oct2021'd):%sysevalf('31oct2021'd)] _temporary_ ;
if first.student then call missing(of october[*]);
do date=max(date_borrowed,'01oct2021'd) to min(date_returned,'31oct2021'd);
october[date]=sum(october[date],1);
end;
if last.student;
unique_days = n(of october[*]);
book_days = sum(0,of october[*]);
keep student unique_days book_days;
run;
Results:
unique_ book_
Obs Student days days
1 David 19 19
2 Fiona 27 58
3 Greg 26 34
4 Marcus 7 7
5 Priya 21 22
6 Sasha 29 43

Is there a way to rejig a dataframe to show a better time series dataset?

Hi I have the following df:
Variable Total Month
Year
2011 110 01
2011 111 02
2011 112 03
2011 113 04
2011 114 05
2011 115 06
....
....
2021 302 04
2021 303 05
2021 304 06
Is it possible to rejig the dataset to this:
Jan Feb Mar Apr May .... Nov Dec
Year
2011 110 111 112 113 114
2012 ...
2013 ...
2014 ...
2015 ...
....
2020
2021
** I would also like to remove the "Variable" word at the corner of the table.
My eventual goal is to do some simple data visualization using matplotlib to create line plots of the respective years (2011...2021)
Thank you in advance!
Use pivot()+reindex():
from calendar import month_abbr
df['Month']=pd.to_datetime(df['Month'],format='%m').dt.strftime('%b')
df=df.pivot(columns='Month',values='Total').rename_axis(columns=None)
df=df.reindex(columns=month_abbr[1:])
OR
via pivot()+pd.Categorical():
df['Month']=pd.to_datetime(df['Month'],format='%m').dt.strftime('%b')
df=df.pivot(columns='Month',values='Total').rename_axis(columns=None)
df.columns=pd.Categorical(df.columns,month_abbr[1:],ordered=True)
df=df.sort_index(axis=1)
Now if you print df you will get your expected output

Awk looping within condition

I need to create a condition which separates the data by decade. The first column is the year value (going back to year 0). How do I change the condition within the awk query?
0 Jan 10 2:04:40 Tot D
0 Jul 05 11:33:06 Tot A
3 May 04 22:22:05 Tot A
3 Oct 29 1:32:40 Tot D
7 Feb 20 23:03:27 Tot A
7 Aug 17 5:58:18 Tot D
10 Dec 10 6:28:52 Tot A
11 Jun 04 15:36:12 Tot D
14 Apr 04 4:41:23 Tot D
14 Sep 27 7:18:39 Tot A
18 Jan 20 10:38:27 Tot D
18 Jul 16 18:04:17 Tot A
21 May 15 5:47:44 Tot A
21 Nov 08 9:27:47 Tot D
22 May 04 23:00:32 Tot A
25 Mar 03 6:19:48 Tot A
25 Aug 27 13:47:51 Tot D
28 Dec 20 15:07:37 Tot A
29 Jun 14 22:37:10 Tot D
32 Apr 14 11:56:36 Tot D
32 Oct 07 15:38:15 Tot A
36 Jan 31 19:07:10 Tot D
36 Jul 27 0:39:47 Tot A
39 May 26 13:13:25 Tot A
39 Nov 19 17:26:37 Tot D
40 May 15 6:26:43 Tot A
I need to present the data as follows:
awk '{if ($1 >= 0 && $1 < 10) print }' All_Lunar_Eclipse.txt
0 Jan 10 2:04:40 Tot D
0 Jul 05 11:33:06 Tot A
3 May 04 22:22:05 Tot A
3 Oct 29 1:32:40 Tot D
7 Feb 20 23:03:27 Tot A
7 Aug 17 5:58:18 Tot D
But I would have to do it manually for every 10 years.
awk '{if ($1 >= 10 && $1 < 20) print }' All_Lunar_Eclipse.txt
10 Dec 10 6:28:52 Tot A
11 Jun 04 15:36:12 Tot D
14 Apr 04 4:41:23 Tot D
14 Sep 27 7:18:39 Tot A
18 Jan 20 10:38:27 Tot D
18 Jul 16 18:04:17 Tot A
I have tried something similar to the following with no joy.
awk 'BEGIN { for (i = 0; i <= 2019; +=10) print i }'
$ awk '
int(p/10)!=int($1/10) {
print "New decade begins:"
}
{ p=$1 }
1' file
0 Jan 10 2:04:40 Tot D
0 Jul 05 11:33:06 Tot A
3 May 04 22:22:05 Tot A
3 Oct 29 1:32:40 Tot D
7 Feb 20 23:03:27 Tot A
7 Aug 17 5:58:18 Tot D
New decade begins:
10 Dec 10 6:28:52 Tot A
11 Jun 04 15:36:12 Tot D
...
... on your definition of a decade (if ($1 >= 10 && $1 < 20)). I would've assumed that years 1-10 are the first decade 11-20 the second etc. Did not check, though. It would've made it one summation harder, too.
Depend on what your want but use the first line as info by dividing by 10 and catchin the integer value
awk '
# separator process
{ Decade = int( $1 / 10 ) }
# apply sample (unsorted and just stored by decade)
{ Data[ Decade] = Data[Decade] "\n" $0 }
END { for ( Dec in Data ) printf "--- Decade: %d ----\n%s\n", Dec, Data[ Dec] }
' YourFile

Read serial input with awk, insert date

I'm trying to reformat serial input, which consists of two integers separated by a comma (sent from an Arduino):
1,2
3,4
0,0
0,1
etc. I would like to append the date after each line, separating everything with a tab character. Here's my code so far:
cat /dev/cu.usbmodem3d11 | awk 'BEGIN {RS=","};{printf "%i\t%i\t%s",$1,$2,system("date")}'
Here's the result I get (with date in my locale):
1 2 0Mer 26 fév 2014 22:09:20 EST
3 4 0Mer 26 fév 2014 22:09:20 EST
0 0 0Mer 26 fév 2014 22:09:20 EST
0 1 0Mer 26 fév 2014 22:09:20 EST
Why is there an extra '0' in front of my date field? Sorry for the newbie question :(
EDIT This code solved my problem. Thanks to all who helped.
awk 'BEGIN {FS=","};{system("date")|getline myDate;printf "%i\t%i\t%s",$1, $2, myDate}' /dev/cu.usbmodem3d11
I'm not clear why, but in order for the date to keep updating and recording at what time the data was received, I have to use system("date")instead of just "date"in the code above.
2 things
It will be easier to see your problem if you add a \n at the end of your printf string
Then the output is
>echo '1,2' | awk 'BEGIN {RS=","};{printf "%i\t%i\t%s\n",$1,$2,system("date")}'
Wed Feb 26 21:30:17 CST 2014
1 0 0
Wed Feb 26 21:30:17 CST 2014
2 0 0
I'm guessing that output from system("date") returns its output "outside" of scope of awk's $0 natural scope of each line of input processed. Others may be able to offer a better explanation.
To get the output you want, I'm using the getline function to capture the output of the date command to a variable (myDt). Now the output is
> echo '1,2' | awk 'BEGIN {RS=","};{"date" | getline myDt ; printf "%i\t%i\t%s\n",$1,$2,myDt}'
1 0 Wed Feb 26 21:31:15 CST 2014
2 0 Wed Feb 26 21:31:15 CST 2014
Finally, we remove the "debugging" \n char, and get the output you specify:
> echo '1,2' | awk 'BEGIN {RS=","};{"date" | getline myDt ; printf "%i\t%i\t%s",$1,$2,myDt}'
1 0 Wed Feb 26 21:34:56 CST 2014
2 0 Wed Feb 26 21:34:56 CST 2014
And, per Jaypal's post, I see now that FS="," is another issue, so when we make that change AND return the `\n' char, we have
echo '1,2' | awk 'BEGIN {FS=","};{"date" | getline myDt ; printf "%i\t%i\t%s\n",$1,$2,myDt}'
1 2 Wed Feb 26 21:44:42 CST 2014
Two issues:
First - RS is record separator. You need FS which is Field Separator to separate two columns, where $1 will be 1 and $2 will be 2 (as per your first row)
Second - The extra 0 you see in output is the return value of system() command. It means it ran successfully. You can simply run the shell command in quotes and pipe it to getline. Putting a variable after it will allow you to capture the value returned.
Try this:
awk 'BEGIN {FS=","};{"date"|getline var;printf "%i\t%i\t%s\n",$1,$2,var}'
This is a more simple solution:
awk -F, '{print $1,$2,dt}' dt="$(date)" OFS="\t" /dev/cu.usbmodem3d11
1 2 Thu Feb 27 06:23:41 CET 2014
3 4 Thu Feb 27 06:23:41 CET 2014
0 0 Thu Feb 27 06:23:41 CET 2014
0 1 Thu Feb 27 06:23:41 CET 2014
IF you like to show date in another format, just read manual for date
Eks dt="$(date +%D)" gives 02/27/14

Trying to pull the required rows from the single table with applying conditional statements on columns in sql server?

I have tried in n-number ways to solve this solution but unfortunately I got stuck in all the ways..
source table
id year jan feb mar apr may jun jul aug sep oct nov dec
1234 2014 05 06 12 15 16 17 18 19 20 21 22 23
1234 2013 05 06 12 15 16 17 18 19 20 21 22 23
Task: Assume that we are currently at March 2014, and we need 12 months back date ...(i.e., from Mar 2013 to Feb 2014, and the remaining values needs to be zero except year and id.)
Solution:
id year jan feb mar apr may jun jul aug sep oct nov dec
1234 2014 05 06 0 0 0 0 0 0 0 0 0 0
1234 2013 0 0 12 15 16 17 18 19 20 21 22 23
This needs a code solution for SQL Server 2008. I would be very happy if any body can solve this.
Note:
I got stuck to pull the column names dynamically.
You can try this:
select id, year, case when DATEDiff(month, getdate(), convert(datetime, year + '-01-01'))) < 12 then jan else 0,
DATEDiff(month, getdate(), convert(datetime, year + '-02-01'))) < 12 then fab else 0 ....