How to choose rows with time-stamps in a complicated range? - pandas

My data frame has a time stamp column (dtype: datetime64[ns]) like this
ID TIMESTAMP
1 2014-08-14 17:57:17
2 2014-08-14 17:50:11
3 2014-08-14 17:49:28
4 2014-08-14 17:58:10
5 2014-08-14 17:59:37
6 2014-08-14 17:25:46
7 2014-08-14 17:54:06
8 2014-08-14 17:55:48
9 2014-08-14 17:49:23
10 2014-08-14 17:40:21
...
301 2014-12-21 14:11:52
302 2014-12-21 14:22:22
303 2014-12-21 14:29:19
304 2014-12-21 14:27:37
305 2014-12-21 14:22:33
306 2014-12-21 14:26:25
307 2014-12-21 14:11:13
308 2014-12-21 11:41:54
309 2014-12-21 13:18:44
310 2014-12-21 14:26:31
Now suppose I want to find the rows from 2014-08-04 to 2014-08-24, and from 17:55 to 18:00 of each day in the period, how can I do this with pandas? I think I should use Timedelta, but I don't find the functionality of Timedelta for hours only. Thank you.

Ugly but should work for you:
In [113]:
df[(df['TIMESTAMP'] > '2014-08-04') & (df['TIMESTAMP'] < '2014-08-24') & (df['TIMESTAMP'].dt.time > dt.time(17,55)) & (df['TIMESTAMP'].dt.time < dt.time(18))]
Out[113]:
ID TIMESTAMP
0 1 2014-08-14 17:57:17
3 4 2014-08-14 17:58:10
4 5 2014-08-14 17:59:37
7 8 2014-08-14 17:55:48
So we can use date strings for comparing the date but for the time portion you'll have to construct a time object

Related

Convert decimal Day-of-year dataframe to datetime with HH:MM

Is it possible to convert an entire column of decimal Day-Of-Year into datetime format YYYY-mm-dd HH:MM ? I tried counting the amount of seconds and minutes in a day, but decimal DOY is different from decimal hours.
Example:
DOY = 181.82015046296297
Converted to:
Timestamp('2021-06-05 14:00:00')
Here the date would be a datetime object appearing only as 2021-06-05 14:00:00 in my dataframe. And the year I am interested in is 2021.
Use Timedelta to create an offset from the first day of year
Input data:
>>> df
DayOfYear
0 254
1 156
2 303
3 32
4 100
5 8
6 329
7 82
8 218
9 293
df['Date'] = pd.to_datetime('2021') \
+ df['DayOfYear'].sub(1).apply(pd.Timedelta, unit='D')
Output result:
>>> df
DayOfYear Date
0 254 2021-09-11
1 156 2021-06-05
2 303 2021-10-30
3 32 2021-02-01
4 100 2021-04-10
5 8 2021-01-08
6 329 2021-11-25
7 82 2021-03-23
8 218 2021-08-06
9 293 2021-10-20

get the records before and after the nearest merge by 30 minutes in python

I have two data frames in csv files. First data described traffic incidents (df1) and second data has the traffic record data for each 15 minutes(df2). I want to merge between them based on the closest time. I used python pandas_merge_asof and I got the nearest match. but I want the 30 minutes records before and after the match from the traffic record data. And I want to join the closest incidents to the traffic data time. if the incidents occured 14:02:00, it will be mereged with the traffic date that recorded at 14:00:00
For example:
1- Incidents data
Date detector_id Inident_type
09/30/2015 8:00:00 1 crash
09/30/2015 8:02:00 1 congestion
04/22/2014 15:30:00 9 congestion
04/22/2014 15:33:00 9 Emergency vehicle
2 - Traffic data
Date detector_id traffic_volume
09/30/2015 7:30:00 1 55
09/30/2015 7:45:00 1 45
09/30/2015 8:00:00 1 60
09/30/2015 8:15:00 1 200
09/30/2015 8:30:00 1 70
04/22/2014 15:00:00 9 15
04/22/2014 15:15:00 9 7
04/22/2014 15:30:00 9 50
04/22/2014 15:45:00 9 11
04/22/2014 16:00:00 9 7
2- the desired table
Date detector_id traffic_volume Incident_type
09/30/2015 7:30:00 1 55 NA
09/30/2015 7:45:00 1 45 NA
09/30/2015 8:00:00 1 60 Crash
09/30/2015 8:00:00 1 60 congestion
09/30/2015 8:15:00 1 200 NA
09/30/2015 8:30:00 1 70 NA
04/22/2014 15:00:00 9 15 NA
04/22/2014 15:15:00 9 7 NA
04/22/2014 15:30:00 9 50 Congestion
04/22/2014 15:30:00 9 50 Emergency vehicle
04/22/2014 15:45:00 9 11 NA
04/22/2014 16:00:00 9 7 NA
The code that I used as follow
Merge = pd.merge_asof(df2, df1, left_index = True, right_index = True, allow_exact_maches = False,
on='Date', by='detector_id', direction='nearest')
but it gave me this table.
Date detector_id traffic_volume Incident_type
09/30/2015 8:00:00 1 60 Crash
04/22/2014 15:30:00 9 50 Congestion
and I want to know the situation after and before the incidents occur.
Any Idea?
Thank you.
*If I made mistake by asking like this way, please let me know.
For anyone has the same problem and want to do merge by using pandas.merge_asof, you have to use the Tolerance function. This function helps you adjust the time different between the two datasets.
But you may face two problems related to Timedelta and sorting index. so the solution of Timedelta is converting the time to datetime as follow:
df1.Date = pd.to_datetime(df1.Date)
df2.Date = pd.to_datetime(df2.Date)
and the sorting index you need apply sort in your main code as follow:
x = pd.merge_asof(df1.sort_values('Date'), #sort_values fix the error"left Key must be sorted"
df2.sort_values('Date'),
on = 'Date',
by = 'Detector_id',
direction = 'backward',
tolerance =pd.Timedelta('45 min'))
The direction could be nearest which in my case will select all the records accord before and after the match records within 45 minutes.
The direction could be backward will merge all records within 45 minutes after the exact or nearest match
and Forward will select all the records within 45 minutes before the exact or nearest match.
Thank you and hopefully this will help anyone in future.

Future dates calculating incorrectly in FBProphet - make_future_dataframe method

I'm trying to do a weekly forecast in FBProphet for just 5 weeks ahead. The make_future_dataframe method doesn't seem to be working right....makes the correct one week intervals except for one week between jul 3 and Jul 5....every other interval is correct at 7 days or a week. Code and output below:
INPUT DATAFRAME
ds y
548 2010-01-01 3117
547 2010-01-08 2850
546 2010-01-15 2607
545 2010-01-22 2521
544 2010-01-29 2406
... ... ...
4 2020-06-05 2807
3 2020-06-12 2892
2 2020-06-19 3012
1 2020-06-26 3077
0 2020-07-03 3133
CODE
future = m.make_future_dataframe(periods=5, freq='W')
future.tail(9)
OUTPUT
ds
545 2020-06-12
546 2020-06-19
547 2020-06-26
548 2020-07-03
549 2020-07-05
550 2020-07-12
551 2020-07-19
552 2020-07-26
553 2020-08-02
All you need to do is create a dataframe with the dates you need for predict method. utilizing the make_future_dataframe method is not necessary.

Subtract day column from date column in pandas data frame

I have two columns in my data frame.One column is date(df["Start_date]) and other is number of days.I want to subtract no of days column(df["days"]) from Date column.
I was trying something like this
df["new_date"]=df["Start_date"]-datetime.timedelta(days=df["days"])
I think you need to_timedelta:
df["new_date"]=df["Start_date"]-pd.to_timedelta(df["days"], unit='D')
Sample:
np.random.seed(120)
start = pd.to_datetime('2015-02-24')
rng = pd.date_range(start, periods=10)
df = pd.DataFrame({'Start_date': rng, 'days': np.random.choice(np.arange(10), size=10)})
print (df)
Start_date days
0 2015-02-24 7
1 2015-02-25 0
2 2015-02-26 8
3 2015-02-27 4
4 2015-02-28 1
5 2015-03-01 7
6 2015-03-02 1
7 2015-03-03 3
8 2015-03-04 8
9 2015-03-05 9
df["new_date"]=df["Start_date"]-pd.to_timedelta(df["days"], unit='D')
print (df)
Start_date days new_date
0 2015-02-24 7 2015-02-17
1 2015-02-25 0 2015-02-25
2 2015-02-26 8 2015-02-18
3 2015-02-27 4 2015-02-23
4 2015-02-28 1 2015-02-27
5 2015-03-01 7 2015-02-22
6 2015-03-02 1 2015-03-01
7 2015-03-03 3 2015-02-28
8 2015-03-04 8 2015-02-24
9 2015-03-05 9 2015-02-24

How to create a multiple variable in SQL?

I am trying calculate a, sort of, moving average for my data in SQL Server 2008, but the only way I have found is by using a #variable. For example I have this set of data:
StudyDate Cpty Value
---------- ---- ----------------------
2015-11-24 1 3009
2015-11-24 2 2114
2015-11-24 3 558
2015-11-24 4 121
2015-11-24 5 2515
2015-11-24 6 81
2015-11-24 7 80
2015-11-24 8 1534
2015-11-24 9 136
2015-11-24 10 5674
2015-11-25 1 2731
2015-11-25 2 2197
2015-11-25 3 550
2015-11-25 4 124
2015-11-25 5 2532
2015-11-25 6 81
2015-11-25 7 80
2015-11-25 8 1700
2015-11-25 9 122
2015-11-25 10 5788
2015-11-26 1 2666
2015-11-26 2 2175
2015-11-26 3 408
2015-11-26 4 124
2015-11-26 5 2545
2015-11-26 6 81
2015-11-26 7 81
2015-11-26 8 1712
2015-11-26 9 122
2015-11-26 10 5967
And I want to find a moving average for every day. If I run this Query:
DECLARE #StudyDate DATE = '2015-11-26'
SELECT #StudyDate,
Cpty,
AVG(Value)
FROM #MovAvg
WHERE StudyDate > DATEADD(m,-1,#StudyDate) AND StudyDate <= #StudyDate
GROUP BY Cpty
ORDER BY Cpty
Then I get the average for only one day '2015-11-26', but can I get an average for every day for every Cpty?
Thank you in advance!
In SQL Server 2008, you would do this using outer apply. I'm not sure what you mean exactly by "moving average", but it appears to be the average for the previous month.
So:
select t.*, tavg.value
from t outer apply
(select avg(t2.value) as value
from t t2
where t2.cpty = t.cpty and
t2.studydate > DATEADD(month, -1, t.StudyDate) and
t2.StudyDate <= t.StudyDate
) tavg;