Time difference between two columns in Pandas - pandas

How can I subtract the time between two columns and convert it to minutes
Date Time Ordered Time Delivered
0 1/11/19 9:25:00 am 10:58:00 am
1 1/11/19 10:16:00 am 11:13:00 am
2 1/11/19 10:25:00 am 10:45:00 am
3 1/11/19 10:45:00 am 11:12:00 am
4 1/11/19 11:11:00 am 11:47:00 am
I want to subtract the Time_delivered - Time_ordered to get the minutes the delivery took.
df.time_ordered = pd.to_datetime(df.time_ordered)
This doesn't output the correct time instead it adds today's date the time

Convert both time columns to datetimes, get difference, convert to seconds by Series.dt.total_seconds and then to minutes by division by 60:
df['diff'] = (pd.to_datetime(df.time_ordered, format='%I:%M:%S %p')
.sub(pd.to_datetime(df.time_delivered, format='%I:%M:%S %p'))
.dt.total_seconds()
.div(60))

Try to_datetime()
df = pd.DataFrame([['9:25:00 AM','10:58:00 AM']],
columns=['time1', 'time2'])
print(pd.to_datetime(df.time2)-pd.to_datetime(df.time1))
Output:
01:33:00

another way is using np.timedelta64
print(df)
Date Time Ordered Time Delivered
0 1/11/19 9:25:00 am 10:58:00 am
1 1/11/19 10:16:00 am 11:13:00 am
2 1/11/19 10:25:00 am 10:45:00 am
3 1/11/19 10:45:00 am 11:12:00 am
4 1/11/19 11:11:00 am 11:47:00 am
df['mins'] = (
pd.to_datetime(df["Date"] + " " + df["Time Delivered"])
- pd.to_datetime(df["Date"] + " " + df["Time Ordered"])
) / np.timedelta64(1, "m")
output:
print(df)
Date Time Ordered Time Delivered mins
0 1/11/19 9:25:00 am 10:58:00 am 93.0
1 1/11/19 10:16:00 am 11:13:00 am 57.0
2 1/11/19 10:25:00 am 10:45:00 am 20.0
3 1/11/19 10:45:00 am 11:12:00 am 27.0
4 1/11/19 11:11:00 am 11:47:00 am 36.0

Related

Calculate the rolling average every two weeks for the same day and hour in a DataFrame

I have a Dataframe like the following:
df = pd.DataFrame()
df['datetime'] = pd.date_range(start='2023-1-2', end='2023-1-29', freq='15min')
df['week'] = df['datetime'].apply(lambda x: int(x.isocalendar()[1]))
df['day_of_week'] = df['datetime'].dt.weekday
df['hour'] = df['datetime'].dt.hour
df['minutes'] = pd.DatetimeIndex(df['datetime']).minute
df['value'] = range(len(df))
df.set_index('datetime',inplace=True)
df = week day_of_week hour minutes value
datetime
2023-01-02 00:00:00 1 0 0 0 0
2023-01-02 00:15:00 1 0 0 15 1
2023-01-02 00:30:00 1 0 0 30 2
2023-01-02 00:45:00 1 0 0 45 3
2023-01-02 01:00:00 1 0 1 0 4
... ... ... ... ... ...
2023-01-08 23:00:00 1 6 23 0 668
2023-01-08 23:15:00 1 6 23 15 669
2023-01-08 23:30:00 1 6 23 30 670
2023-01-08 23:45:00 1 6 23 45 671
2023-01-09 00:00:00 2 0 0 0 672
And I want to calculate the average of the column "value" for the same hour/minute/day, every two consecutive weeks.
What I would like to get is the following:
df=
value
day_of_week hour minutes datetime
0 0 0 2023-01-02 00:00:00 NaN
2023-01-09 00:00:00 NaN
2023-01-16 00:00:00 336
2023-01-23 00:00:00 1008
15 2023-01-02 00:15:00 NaN
2023-01-09 00:15:00 NaN
2023-01-16 00:15:00 337
2023-01-23 00:15:00 1009
So the first two weeks should have NaN values and week-3 should be the average of week-1 and week-2 and then week-4 the average of week-2 and week-3 and so on.
I tried the following code but it does not seem to do what I expect:
df = pd.DataFrame(df.groupby(['day_of_week','hour','minutes'])['value'].rolling(window='14D', min_periods=1).mean())
As what I am getting is:
value
day_of_week hour minutes. datetime
0 0 0 2023-01-02 00:00:00 0
2023-01-09 00:00:00 336
2023-01-16 00:00:00 1008
2023-01-23 00:00:00 1680
15 2023-01-02 00:15:00 1
2023-01-09 00:15:00 337
2023-01-16 00:15:00 1009
2023-01-23 00:15:00 1681
I think you want to shift within each group. Then you need another groupby:
(df.groupby(['day_of_week','hour','minutes'])['value']
.rolling(window='14D', min_periods=2).mean() # `min_periods` is different
.groupby(['day_of_week','hour','minutes']).shift() # shift within each group
.to_frame()
)
Output:
value
day_of_week hour minutes datetime
0 0 0 2023-01-02 00:00:00 NaN
2023-01-09 00:00:00 NaN
2023-01-16 00:00:00 336.0
2023-01-23 00:00:00 1008.0
15 2023-01-02 00:15:00 NaN
... ...
6 23 30 2023-01-15 23:30:00 NaN
2023-01-22 23:30:00 1006.0
45 2023-01-08 23:45:00 NaN
2023-01-15 23:45:00 NaN
2023-01-22 23:45:00 1007.0

SQL: Split time interval into 1 hour with overlapping minutes split (Bigquery)

This is the data that I have:
date
event_type
interval_start
interval_end
duration_in_min
2022-06-06
s1
09:05:00
11:45:00
160
2022-06-01
s2
08:00:00
08:17:00
17
2022-05-31
c1
17:55:00
18:08:00
13
2022-04-05
s3
07:58:00
08:46:00
48
...
and this is what I would like to achieve:
interval represents a 1 hour interval (or maybe 59 min and 59 sec to be accurate, in case an event starts/ends at exactly 10:00:00 but it should not occur very often).
date
interval
event_type
interval_start
interval_end
duration_in_min
2022-06-06
09:00:00
s1
09:05:00
11:45:00
55
2022-06-06
10:00:00
s1
09:05:00
11:45:00
60
2022-06-06
11:00:00
s1
09:05:00
11:45:00
45
2022-06-01
08:00:00
s2
08:00:00
08:17:00
17
2022-05-31
17:00:00
c1
17:55:00
18:08:00
5
2022-05-31
18:00:00
c1
17:55:00
18:08:00
8
2022-04-05
07:00:00
s3
07:58:00
08:46:00
2
2022-04-05
08:00:00
s3
07:58:00
08:46:00
46
...
I struggle to sort the data per hour by getting a split for the overlapping minutes into a new interval(s).
Any help would be greatly appreciated :)
Consider below approach
select
date, time(hour, 0, 0) as `interval`,
event_type, interval_start, interval_end,
time_diff(least(time(hour + 1, 0, 0), interval_end), greatest(time(hour, 0, 0), interval_start), minute) as duration_in_min
from your_table,
unnest(generate_array(0, 23)) hour
where hour between extract(hour from time(interval_start)) and extract(hour from time(interval_end))
if applied to sample data in your question - output is

Overlap in seconds between datetime range and a time range

I have a dataframe like this:
df11 = pd.DataFrame(
{
"Start_date": ["2018-01-31 12:00:00", "2018-02-28 16:00:00", "2018-02-27 22:00:00"],
"End_date": ["2019-01-31 21:45:00", "2019-03-24 22:00:00", "2018-02-28 01:00:00"],
}
)
Start_date End_date
0 2018-01-31 12:00:00 2019-01-31 21:45:00
1 2018-02-28 16:00:00 2019-03-24 22:00:00
2 2018-02-27 22:00:00 2018-02-28 01:00:00
I need to check the overlap time duration in specific periods in seconds. My expected results are like this:
Start_date End_date 12h-16h 16h-22h 22h-00h 00h-02h30
0 2018-01-31 12:00:00 2019-01-31 21:45:00 14400 20700 0 0
1 2018-02-28 16:00:00 2019-03-24 22:00:00 0 21600 0 0
2 2018-02-27 22:00:00 2018-02-28 01:00:00 0 0 7200 3600
I know it`s completely wrong and I´ve tried other solutions. This is one of my attempts:
df11['12h-16h']=np.where(df11['Start_date']<timedelta(hours=16, minutes=0, seconds=0) & df11['End_date']>timedelta(hours=12, minutes=0, seconds=0),(np.minimum(df11['End_date'],timedelta(hours=16, minutes=0, seconds=0)))-(np.maximum(df11['Start_date'],timedelta(hours=12, minutes=0, seconds=0)))

replace 0 that is in between exactly two numbers in a column

I only want to replace 0 which lies between exactly two numbers with its average value.
My dataset looks like below:
time value
9:45:00 0
10:00:00 0
10:15:00 0
10:30:00 10
10:45:00 0
11:00:00 10
11:15:00 10
11:30:00 0
11:45:00 10
12:00:00 0
12:15:00 0
12:30:00 0
12:45:00 10
13:00:00 0
13:15:00 0
I want it to look like this:
time value
9:45:00 0
10:00:00 0
10:15:00 0
10:30:00 10
10:45:00 10
11:00:00 10
11:15:00 10
11:30:00 10
11:45:00 10
12:00:00 0
12:15:00 0
12:30:00 0
12:45:00 10
13:00:00 0
13:15:00 0
in this, since the 0 between 11:45 to 12:45 is not exactly between two numbers (ie multiple zeros), we are not filling in these values
How about this?
from io import StringIO as sio
data = sio("""
time value
9:45:00 0
10:00:00 0
10:15:00 0
10:30:00 10
10:45:00 0
11:00:00 10
11:15:00 10
11:30:00 0
11:45:00 10
12:00:00 0
12:15:00 0
12:30:00 0
12:45:00 10
13:00:00 0
13:15:00 0
""")
import pandas as pd
df = pd.read_csv(data, sep='\s+')
df['flag_to_fill'] = (df['value']==0) & (df['value'].shift(1)!=0) & (df['value'].shift(-1)!=0)
df.loc[df['flag_to_fill'], 'value'] = 0.5*(df['value'].shift(1) + df['value'].shift(-1))
df

Pandas Datetime conversion

I have the following dataframe;
Date = ['01-Jan','01-Jan','01-Jan','01-Jan']
Heure = ['00:00','01:00','02:00','03:00']
value =[1,2,3,4]
df = pd.DataFrame({'value':value,'Date':Date,'Hour':Heure})
print(df)
Date Hour value
0 01-Jan 00:00 1
1 01-Jan 01:00 2
2 01-Jan 02:00 3
3 01-Jan 03:00 4
I am trying to create a datetime index, knowing that the file I am working with is for 2015. I have tried a lot of things but can get it to work! I tried to only convert the date and the month, but even that does not work:
df.index = pd.to_datetime(df['Date'],format='%d-%m')
I expect the following result:
Date Hour value
2015-01-01 00:00:00 01-Jan 00:00 1
2015-01-01 01:00:00 01-Jan 01:00 2
2015-01-01 02:00:00 01-Jan 02:00 3
2015-01-01 03:00:00 01-Jan 03:00 4
Does anyone know how to do it?
Thanks,
You need to explicitely add 2015 somehow, and include the Hour column as well. I would do something like this:
df.index = pd.to_datetime(df.Date + '-2015 ' + df.Hour, format='%d-%b-%Y %H:%M')
>>> df
Date Hour value
2015-01-01 00:00:00 01-Jan 00:00 1
2015-01-01 01:00:00 01-Jan 01:00 2
2015-01-01 02:00:00 01-Jan 02:00 3
2015-01-01 03:00:00 01-Jan 03:00 4
You can replace the default 1900 by using replace
s=pd.to_datetime(df['Date']+df['Hour'],format='%d-%b%H:%M').apply(lambda x : x.replace(year=2015))
s
Out[131]:
0 2015-01-01 00:00:00
1 2015-01-01 01:00:00
2 2015-01-01 02:00:00
3 2015-01-01 03:00:00
dtype: datetime64[ns]
df.index=s