I am trying to make a graph on BI which needs me to make so the time variable in my database groups for each 10 minutes IE - 11:04:00, 11:08:30, 11:00:28 are grouped as 11-1 then 11-2, ... ,11-6, 12-1 so on. -
06:00:00 06-1
06:03:00
06:06:00
06:09:00
06:12:00 06-2
06:15:00
06:18:00
06:21:00 06-2
06:24:00
06:27:00
06:30:00 06-3
06:33:00
06:36:00
06:39:00
06:42:00 06-4
06:45:00
06:48:00
06:51:00 06-5
06:54:00
06:57:00
07:00:00 07-1
07:03:00
07:06:00
07:09:00
07:12:00 07-2
Is there anyway I can do this on the BI?
Thank you for helping.
This will give you 10-minute block groupings, starting with 0:
=FormatDate([Date];"HH")+"-"+Floor(ToNumber(FormatDate([Date];"mm"))/10)
Related
I got two pandas dataframes as following:
ts1
Out[50]:
soil_moisture_ids41
date_time
2007-01-07 05:00:00 0.1830
2007-01-07 06:00:00 0.1825
2007-01-07 07:00:00 0.1825
2007-01-07 08:00:00 0.1825
2007-01-07 09:00:00 0.1825
... ...
2017-10-10 20:00:00 0.0650
2017-10-10 21:00:00 0.0650
2017-10-10 22:00:00 0.0650
2017-10-10 23:00:00 0.0650
2017-10-11 00:00:00 0.0650
[94316 rows x 3 columns]
and the other one is
ts2
Out[51]:
soil_moisture_ids42
date_time
2016-07-20 00:00:00 0.147
2016-07-20 01:00:00 0.148
2016-07-20 02:00:00 0.149
2016-07-20 03:00:00 0.150
2016-07-20 04:00:00 0.152
... ...
2019-12-31 19:00:00 0.216
2019-12-31 20:00:00 0.216
2019-12-31 21:00:00 0.215
2019-12-31 22:00:00 0.215
2019-12-31 23:00:00 0.215
[30240 rows x 3 columns]
You could see that, from 2007-01-07 to 2016-07-19, only ts1 has the data points. And from 2016-07-20 to 2017-10-11 there are some overlapped time series. Now I want to combine these two data frames. During the overlapped period, I want to get the mean values over ts1 and ts2. During the non-overlapped period, (2007-01-07 to 2016-07-19 and 2017-10-12 to 2019-12-31), the values at each time stamp is set as the value from ts1 or ts2. So how can I do it?
Thanks!
Use concat with aggregate mean, if only one value get same ouput, if multiple get mean. Also finally DatatimeIndex is sorted:
s = pd.concat([ts1, ts2]).groupby(level=0).mean()
Just store the concatenated series first and then apply the mean. i.e. merged_ts = pd.concat([ts1, ts2]) and then mean_ts = merged_ts.group_by(level=0).mean()
I have a time series that is very irregular. The difference in time, between two records can be 1s or 10 days.
I want to resample the data every 1h, but only when the sequential records are less than 1h.
How to approach this, without making too many loops?
In the example above, I would like to resample only rows 5-6 (delta difference is 10s) and rows 6-7 (delta difference is 50min).
The others should remain as they are.
tmp=vals[['datumtijd','filter data']]
datumtijd filter data
0 1970-11-01 00:00:00 129.0
1 1970-12-01 00:00:00 143.0
2 1971-01-05 00:00:00 151.0
3 1971-02-01 00:00:00 151.0
4 1971-03-01 00:00:00 163.0
5 1971-03-01 00:00:10 163.0
6 1971-03-01 00:00:20 163.0
7 1971-03-01 00:01:10 163.0
8 1971-03-01 00:04:10 163.0
.. ... ...
244 1981-08-19 00:00:00 102.0
245 1981-09-02 00:00:00 98.0
246 1981-09-17 00:00:00 92.0
247 1981-10-01 00:00:00 89.0
248 1981-10-19 00:00:00 92.0
You can be a little explicit about this by using groupby on the hour-floor of the time stamps:
grouped = df.groupby(df['datumtijd'].dt.floor('1H')).mean()
This is explicitly looking for the hour of each existing data point and grouping the matching ones.
But you can also just do the resample and then filter out the empty data, as pandas can still do this pretty quickly:
resampled = df.resample('1H', on='datumtijd').mean().dropna()
In either case, you get the following (note that I changed the last time stamp just so that the console would show the hours):
filter data
datumtijd
1970-11-01 00:00:00 129.0
1970-12-01 00:00:00 143.0
1971-01-05 00:00:00 151.0
1971-02-01 00:00:00 151.0
1971-03-01 00:00:00 163.0
1981-08-19 00:00:00 102.0
1981-09-02 00:00:00 98.0
1981-09-17 00:00:00 92.0
1981-10-01 00:00:00 89.0
1981-10-19 03:00:00 92.0
One quick clarification also. In your example, rows 5-8 all occur within the same hour, so they all get grouped together (hour:minute:second)!.
Also, see this related post.
I have a df that looks similar to this (shortened version, with less rows):
Time (EDT) Open High Low Close
0 02.01.2006 19:00:00 0.85224 0.85498 0.85224 0.85498
1 02.01.2006 20:00:00 0.85498 0.85577 0.85423 0.85481
2 02.01.2006 21:00:00 0.85481 0.85646 0.85434 0.85646
3 02.01.2006 22:00:00 0.85646 0.85705 0.85623 0.85651
4 02.01.2006 23:00:00 0.85643 0.85691 0.85505 0.85653
5 03.01.2006 00:00:00 0.85653 0.8569 0.85601 0.85626
6 03.01.2006 01:00:00 0.85626 0.85653 0.85524 0.8557
7 03.01.2006 02:00:00 0.85558 0.85597 0.85486 0.85597
8 03.01.2006 03:00:00 0.85597 0.85616 0.85397 0.8548
9 03.01.2006 04:00:00 0.85469 0.85495 0.8529 0.85328
10 03.01.2006 05:00:00 0.85316 0.85429 0.85222 0.85401
11 03.01.2006 06:00:00 0.85401 0.8552 0.853 0.8552
12 03.01.2006 07:00:00 0.8552 0.8555 0.85319 0.85463
13 03.01.2006 08:00:00 0.85477 0.85834 0.8545 0.85788
14 03.01.2006 09:00:00 0.85788 0.85838 0.85341 0.85416
15 03.01.2006 10:00:00 0.8542 0.8542 0.85006 0.85111
16 03.01.2006 11:00:00 0.85115 0.85411 0.85 0.85345
17 03.01.2006 12:00:00 0.85337 0.85432 0.8526 0.85413
18 03.01.2006 13:00:00 0.85413 0.85521 0.85363 0.85363
19 03.01.2006 14:00:00 0.85325 0.8561 0.85305 0.85606
20 03.01.2006 15:00:00 0.8561 0.85675 0.85578 0.85599
I need to convert the date string to datetime, then set date column as index, and resample. When I use method 1, I can't resample properly, the data how it resamples is wrong and it creates extra future dates. Let say my last date is 2018-11, I will see 2018-12 something like that.
method 1:
df['Time (EDT)'] = pd.to_datetime(df['Time (EDT)']) <---- this takes long also, because theres 90000 rows
df.set_index('Time (EDT)', inplace=True)
ohlc_dict = {'Open':'first','High':'max', 'Low':'min','Close'}
df=df.resample'4H', base=17, closed='left', label='left').agg(ohlc_dict)
result:
Time (EDT) Open High Low Close
1/1/2006 21:00 0.86332 0.86332 0.86268 0.86321
1/2/2006 1:00 0.86321 0.86438 0.86111 0.86164
1/2/2006 5:00 0.86164 0.86222 0.8585 0.86134
1/2/2006 9:00 0.86149 0.86297 0.85695 0.85793
1/2/2006 13:00 0.85801 0.85947 0.85759 0.8591
1/2/2006 17:00 0.8591 0.86034 0.85757 0.85825
1/2/2006 21:00 0.85825 0.85969 0.84377 0.84412
1/3/2006 1:00 0.84445 0.8468 0.84286 0.84642
1/3/2006 5:00 0.84659 0.8488 0.84494 0.84872
1/3/2006 9:00 0.84829 0.84915 0.84271 0.84416
1/3/2006 13:00 0.84372 0.8453 0.84346 0.84423
1/3/2006 17:00 0.84426 0.84693 0.84426 0.84516
1/3/2006 21:00 0.84523 0.8458 0.84442 0.84579
When I use method 2. It resamples properly.
method 2:
def to_datetime_obj(date_string):
datetime_obj = datetime.strptime(date_string[:], '%d.%m.%Y %H:%M:%S')
return datetime_obj
datetime_objs = None
date_list = df['Time (EDT)'].tolist()
datetime_objs=list(map(to_datetime_obj, date_list)) <--- this is faster also
df.iloc[:,:1] = datetime_objs
df.set_index('Time (EDT)', inplace=True)
ohlc_dict = {'Open':'first','High':'max', 'Low':'min','Close'}
df=df.resample'4H', base=17, closed='left', label='left').agg(ohlc_dict)
result:
Time (EDT) Open High Low Close
1/2/2006 17:00 0.85224 0.85577 0.85224 0.85481
1/2/2006 21:00 0.85481 0.85705 0.85434 0.85626
1/3/2006 1:00 0.85626 0.85653 0.8529 0.85328
1/3/2006 5:00 0.85316 0.85834 0.85222 0.85788
1/3/2006 9:00 0.85788 0.85838 0.85 0.85413
1/3/2006 13:00 0.85413 0.85675 0.85305 0.85525
1/3/2006 17:00 0.85525 0.85842 0.85502 0.85783
1/3/2006 21:00 0.85783 0.85898 0.85736 0.85774
1/4/2006 1:00 0.85774 0.85825 0.8558 0.85595
1/4/2006 5:00 0.85595 0.85867 0.85577 0.85839
1/4/2006 9:00 0.85847 0.85981 0.85586 0.8578
1/4/2006 13:00 0.85773 0.85886 0.85597 0.85653
1/4/2006 17:00 0.85653 0.85892 0.85642 0.8584
1/4/2006 21:00 0.8584 0.85863 0.85658 0.85715
1/5/2006 1:00 0.85715 0.8588 0.85641 0.85791
1/5/2006 5:00 0.85803 0.86169 0.85673 0.86065
The df.index of method 1 and 2 are the same visually before resampling.
They are both pandas.core.indexes.datetimes.DatetimeIndex
But when I compare them, they are actually different method1_df.index != method2_df.index
Why is that? How to fix? Thanks.
It's surprising that a vectorized method (pd.to_datetime), written in Cython is slower than a pure Python method (datetime.strptime).
You can specify the format to pd.to_datetime whicch speeds it up a lot:
pd.to_datetime(df['Time (EDT)'], format='%d.%m.%Y %H:%M:%S')
For your second problem, I think it may have something to do with the order of day and month in your string data. Have you verified that the two methods actually give you the same datetimes?
s1 = pd.to_datetime(df['Time (EDT)'])
s2 = pd.Series(map(to_datetime_obj, date_list))
(s1 == s2).all()
For me datetime.strptime was 3 times faster than pd.to_datetime for 2 operations per row on a 880,000+ rows DataFrame.
I have two data frames in csv files. First data described traffic incidents (df1) and second data has the traffic record data for each 15 minutes(df2). I want to merge between them based on the closest time. I used python pandas_merge_asof and I got the nearest match. but I want the 30 minutes records before and after the match from the traffic record data. And I want to join the closest incidents to the traffic data time. if the incidents occured 14:02:00, it will be mereged with the traffic date that recorded at 14:00:00
For example:
1- Incidents data
Date detector_id Inident_type
09/30/2015 8:00:00 1 crash
09/30/2015 8:02:00 1 congestion
04/22/2014 15:30:00 9 congestion
04/22/2014 15:33:00 9 Emergency vehicle
2 - Traffic data
Date detector_id traffic_volume
09/30/2015 7:30:00 1 55
09/30/2015 7:45:00 1 45
09/30/2015 8:00:00 1 60
09/30/2015 8:15:00 1 200
09/30/2015 8:30:00 1 70
04/22/2014 15:00:00 9 15
04/22/2014 15:15:00 9 7
04/22/2014 15:30:00 9 50
04/22/2014 15:45:00 9 11
04/22/2014 16:00:00 9 7
2- the desired table
Date detector_id traffic_volume Incident_type
09/30/2015 7:30:00 1 55 NA
09/30/2015 7:45:00 1 45 NA
09/30/2015 8:00:00 1 60 Crash
09/30/2015 8:00:00 1 60 congestion
09/30/2015 8:15:00 1 200 NA
09/30/2015 8:30:00 1 70 NA
04/22/2014 15:00:00 9 15 NA
04/22/2014 15:15:00 9 7 NA
04/22/2014 15:30:00 9 50 Congestion
04/22/2014 15:30:00 9 50 Emergency vehicle
04/22/2014 15:45:00 9 11 NA
04/22/2014 16:00:00 9 7 NA
The code that I used as follow
Merge = pd.merge_asof(df2, df1, left_index = True, right_index = True, allow_exact_maches = False,
on='Date', by='detector_id', direction='nearest')
but it gave me this table.
Date detector_id traffic_volume Incident_type
09/30/2015 8:00:00 1 60 Crash
04/22/2014 15:30:00 9 50 Congestion
and I want to know the situation after and before the incidents occur.
Any Idea?
Thank you.
*If I made mistake by asking like this way, please let me know.
For anyone has the same problem and want to do merge by using pandas.merge_asof, you have to use the Tolerance function. This function helps you adjust the time different between the two datasets.
But you may face two problems related to Timedelta and sorting index. so the solution of Timedelta is converting the time to datetime as follow:
df1.Date = pd.to_datetime(df1.Date)
df2.Date = pd.to_datetime(df2.Date)
and the sorting index you need apply sort in your main code as follow:
x = pd.merge_asof(df1.sort_values('Date'), #sort_values fix the error"left Key must be sorted"
df2.sort_values('Date'),
on = 'Date',
by = 'Detector_id',
direction = 'backward',
tolerance =pd.Timedelta('45 min'))
The direction could be nearest which in my case will select all the records accord before and after the match records within 45 minutes.
The direction could be backward will merge all records within 45 minutes after the exact or nearest match
and Forward will select all the records within 45 minutes before the exact or nearest match.
Thank you and hopefully this will help anyone in future.
How can I get the Start and End time of this list? I can add date to this time and can get by min and max but you can see row 3 have next day shift but it will come under same date because it is night shift
I have added normal day shift employee also get the logic right
EmployeeId ShiftDate ShiftStartTime ShiftEndTime
-----------------------------------------------------
20040 2017-11-01 21:00:00 23:00:00
20040 2017-11-01 23:00:00 00:30:00
20040 2017-11-01 00:30:00 06:00:00
20124 2017-11-01 09:00:00 16:30:00
20124 2017-11-01 16:30:00 22:00:00
20124 2017-11-01 22:00:00 22:30:00
I need it like below:
EmployeeId ShiftDate ShiftStartTime ShiftEndTime
----------------------------------------------------
20040 2017-11-01 21:00:00 06:00:00
20124 2017-11-01 09:00:00 22:30:00
In a commercial environment we solved this by attaching a FLAG to each shift. The Flag would indicate the 'Reporting Date' of the Shift...The Flag would have have a value of 1 if the 'Reporting / Administrative date' was the 'next' day. 0 for the same day. -1 for the previous day (which we never used...depends on your scenario)
I modified your table to show a possible SHIFTS table, which should also have a NAME column I guess (like Morning, Afternoon, Day, Night shift etc)
ReportFlag ShiftStartTime ShiftEndTime
1 21:00:00 23:00:00
1 23:00:00 00:30:00
0 00:30:00 06:00:00
0 09:00:00 16:30:00
0 16:30:00 22:00:00
1 22:00:00 22:30:00
Notice how I added 1 - to say that 'this shift' is actually considered to be on the 'next' day.
Then you can use your flag value 0,1 to add to DATE functions in your queries too