pandas to_datetime does not accept '24' as time - pandas

The time is in the YYYYMMDDHH format.The first time 2010010101, increases by 1 hour, reaches 2010010124, then 2010010201.
date
0 2010010101
1 2010010124
2 2010010201
df['date'] = pd.to_datetime(df['date'], format ='%Y%m%d%H')
I am getting error:
'int' object is unsliceable
If I run:
df2['date'] = pd.to_datetime(df2['date'], format ='%Y%m%d%H', errors = 'coerce')
All the '24' hour is labeled as NaT.
[

Time starts from 00 (midnight) till 23 so the time 24 in your date is 00 of the next day. One way is to define a custom to_datetime to handle the date format.
df = pd.DataFrame({'date':['2010010101', '2010010124', '2010010201']})
def custom_to_datetime(date):
# If the time is 24, set it to 0 and increment day by 1
if date[8:10] == '24':
return pd.to_datetime(date[:-2], format = '%Y%m%d') + pd.Timedelta(days=1)
else:
return pd.to_datetime(date, format = '%Y%m%d%H')
df['date'] = df['date'].apply(custom_to_datetime)
date
0 2010-01-01 01:00:00
1 2010-01-02 00:00:00
2 2010-01-02 01:00:00

Related

Getting start date of week from week number

I have a list like this lst = [25,26,27]
numbers: 25, 26, 27 are the week number.
For each number from list I would like to have a start date, e.x. for week 25 the start date is 2022-06-21 (week starts on Monday).
Please, can someone help?
I have tried sth like
import datetime
from dateutil.relativedelta import relativedelta
week = 25
year = 2021
date = datetime.date(year, 1, 1) + relativedelta(weeks=+week)
print(date)
but it doesn;t work.
IIUC, you can use to_datetime:
lst = [25,26,27]
year = 2021
out = pd.to_datetime(pd.Series(lst).astype(str)+str(year)+'Mon', format='%W%Y%a')
output:
0 2021-06-21
1 2021-06-28
2 2021-07-05
dtype: datetime64[ns]
intermediate:
pd.Series(lst).astype(str)+str(year)+'Mon'
0 252021Mon
1 262021Mon
2 272021Mon
dtype: object

Difference between datetime object returns only days

I'm trying to calculate the difference between two datetime objects but it only returns the difference between days and not between hours/minutes/seconds.
This is my code:
import pandas as pd
import datetime as dt
df = pd.read_csv(r'recorridos-realizados-2020.csv')
df.head(2)
Id_start start_date end_date Id_end ID_cyclist
75 2020-09-14 11:52:21 2020-09-14 11:58:10 186.0 155721
210 2020-09-14 11:51:41 2020-09-14 11:53:06 210.0 191320
df['start_date'] = pd.to_datetime(df['start_date'], format='%Y-%m-%d %H:%M:%S')
df['end_date'] = pd.to_datetime(df['end_date'], format='%Y-%m-%d %H:%M:%S')
df['timelapse'] = df['end_date'] - df['start_date']
df['timelapse'].head()
0 0 days
1 0 days
The result should be:
0 days, 00:05:49
0 days, 00:01:25
What I'm doing wrong?
Please look at pandas time deltas.
d1 = pd.to_datetime('2020-09-14 11:52:21')
d2 = pd.to_datetime('2020-09-14 11:58:10')
delta = (d2-d1)
print('seconds: ', delta.seconds)

breaking down time intervals and assigning them to corresponding hours

Given start_time and end_time of an event, I want to assign the corresponding duration in which hour(start_time) they belong:
For example if I have a dataframe of:
event start_time end_time
a 8:00 8:30
b 8:49 10:22
In this case hour(start_time) = 8, is assigned with 30 mins like in first row.
However if the hours of start_time and end_time are not equal like in second rows,
then I want to split the start_time and end_time as below:
event start_time end_time hour(start_time) duration
a 8:00 8:30 8 30
b 8:49 9:00 8 11
b 9:00 10:00 9 60
b 10:00 10:22 10 22
Is there a straightforward way to accomplish this in pandas?
For easy manipulation with data are times convert to datetimes, repeat rows by difference and add or subtract timedeltas for start, end and hour columns with GroupBy.cumcount and to_timedelta, last round by Series.dt.floor and
Series.dt.ceil new datetimes:
print (df)
event start_time end_time
0 a 8:00 8:30
1 b 8:49 10:22
df['s'] = pd.to_datetime(df['start_time'], format='%H:%M')
df['e'] = pd.to_datetime(df['end_time'], format='%H:%M')
df['hour'] = df['s'].dt.hour
df = df.loc[df.index.repeat(df['e'].dt.hour.sub(df['hour']).add(1))]
idx = df.index
m1 = idx.duplicated()
m2 = idx.duplicated(keep='last')
df = df.reset_index(drop=True)
s = df.groupby(idx).cumcount()
s1 = df.groupby(idx).cumcount(ascending=False)
df['hour'] = df['hour'].add(s)
df.loc[m1, 's'] += pd.to_timedelta(s, unit='H')
df.loc[m1, 's'] = df.loc[m1, 's'].dt.floor('H')
df.loc[m2, 'e'] -= pd.to_timedelta(s1, unit='H')
df.loc[m2, 'e'] = df.loc[m2, 'e'].dt.ceil('H')
df['duration'] = df['e'].sub(df['s']).dt.total_seconds().div(60).astype(int)
df['start_time'] = df.pop('s').dt.strftime('%H:%M')
df['end_time'] = df.pop('e').dt.strftime('%H:%M')
print (df)
event start_time end_time hour duration
0 a 08:00 08:30 8 30
1 b 08:49 09:00 8 11
2 b 09:00 10:00 9 60
3 b 10:00 10:22 10 22

Add random datetimes to timestamps

I have a column of timestamps that span over 24 hours. I want to convert these to differentiate between days. I've done this by converting to timedelta. The result is displayed below.
The question I have is, can these be converted or re-arranged again to provide random datetimes. e.g. dd:mm:yyyy hh:mm:ss.
import pandas as pd
df = pd.DataFrame({
'Time' : ['8:00','18:00','28:00'],
})
df['Time'] = [x + ':00' for x in df['Time']]
df['Time'] = pd.to_timedelta(df['Time'])
Out:
Time
0 0 days 08:00:00
1 0 days 18:00:00
2 1 days 04:00:00
Intended Output:
Time
0 1/01/1904 08:00:00 AM
1 1/01/1904 18:00:00 PM
2 2/01/1904 04:00:00 AM
The input timestamps will never go over more than 2 days. Is there a package that can achieve this or would a dummy start and end dates.
After you convert the Time just adding the date part
df.Time+pd.to_datetime('1904-01-01')
0 1904-01-01 08:00:00
1 1904-01-01 18:00:00
2 1904-01-02 04:00:00
Name: Time, dtype: datetime64[ns]

How to convert object to hour and add to date?

i have the following data frame :
correction = ['2.0','-2.5','4.5','-3.0']
date = ['2015-05-19 20:45:00','2017-04-29 17:15:00','2011-05-09 10:40:00','2016-12-18 16:10:00']
i want to convert correction as hours and add it to the date. i tried the following code, but it get the error.
df['correction'] = pd.to_timedelta(df['correction'],unit='h')
df['date'] =pd.DatetimeIndex(df['date'])
df['date'] = df['date'] + df['correction']
I get the error in converting correction to timedelta as:
ValueError: no units specified
For me works cast to float column correction:
df['correction'] = pd.to_timedelta(df['correction'].astype(float),unit='h')
df['date'] = pd.DatetimeIndex(df['date'])
df['date'] = df['date'] + df['correction']
print (df)
correction date
0 02:00:00 2015-05-19 22:45:00
1 -1 days +21:30:00 2017-04-29 14:45:00
2 04:30:00 2011-05-09 15:10:00
3 -1 days +21:00:00 2016-12-18 13:10:00