Difference between two timestamps in Pandas - pandas

I have the following two time column,"Time1" and "Time2".I have to calculate the "Difference" column,which is (Time2-Time1) in Pandas:
Time1 Time2 Difference
8:59:45 9:27:30 -1 days +23:27:45
9:52:29 10:08:54 -1 days +23:16:26
8:07:15 8:07:53 00:00:38
When Time1 and Time2 are in different hours,I am getting result as"-1 days +" .My desired output for First two values are given below:
Time1 Time2 Difference
8:59:45 9:27:30 00:27:45
9:52:29 10:08:54 00:16:26
How can I get this output in Pandas?
Both time values are in 'datetime64[ns]' dtype.

The issue is not that time1 and time2 are in different hours, it's that time2 is before time1 so time2-time1 is negative, and this is how negative timedeltas are stored. If you just want the difference in minutes as a negative number, you could extract the minutes before calculating the difference:
(df.Time1.dt.minute- df.Time2.dt.minute)

I was not able to reproduce the issue using pandas 17.1:
import pandas as pd
d = {
"start_time": [
"8:59:45",
"9:52:29",
"8:07:15"
],
"end_time": [
"9:27:30",
"10:08:54",
"8:07:53"
]
}
from datetime import datetime
df = pd.DataFrame(data=d)
df['start_time'] = pd.to_datetime(df['start_time'])
df['end_time'] = pd.to_datetime(df['end_time'])
df.end_time - df.start_time
0 00:27:45
1 00:16:25
2 00:00:38
dtype: timedelta64[ns]

Related

Dataframe - How to convert datetime to timestamp

A column in dataframe keeps date like:
2019-06-19 23:04:36
2018-06-29 20:06:56
2019-03-04 11:12:35
2019-07-12 21:16:44
I tried the below code but it gives not correct results:
df['timestamps'] = pd.to_datetime(df['datetimes']).astype('int64') / 10**9
The results are like these:
1.465506e+09
1.465516e+09
1.465503e+09
If I convert them again to date, I get incorrect date time:
df['new'] = df['timestamps'].apply(lambda x: pd.Timestamp(x))
1970-01-01 00:00:01.465506396
1970-01-01 00:00:01.465506397
1970-01-01 00:00:01.465506397
Something is not correct...
What is the way to convert date time that is as string like "2019-06-19 23:04:36" to timestamp?
Thank you.

Convert float days to timedelta64[ns] in pandas dataframe

I have found this where 1,5 is 1,5 days
pd.to_timedelta('1,5')
But it gives me 0 days 00:00:00.000000015
I needed this : 1 days 12:00:00.000000000
How to convert float days to timedelta64[ns] in a pandas column dataframe please ?
Solution was this :
pd.to_timedelta('1.5 days')

Adding variable hours to timestamp in Spark SQL

I have one column Start_Time with a timestamp, and one column Time_Zone_Offset, an integer. How can I add the Time_Zone_Offset to Start_Time as a number of hours?
Example MyTable:
id Start_Time Time_Zone_Offset
1 2020-01-12 00:00:00 1
2 2020-01-12 00:00:00 2
Desired output:
id Local_Start_Time
1 2020-01-12 01:00:00
2 2020-01-12 02:00:00
Attempt:
SELECT id, Start_time + INTERVAL time_zone_offset HOURS AS Local_Start_Time
FROM MyTable
This doesn't seem to work, and I can't use from_utc_timestamp as I don't have the actual timezone details, just the time-zone offset at the time being considered.
(Hope you are using pyspark)
In deed, coudn't make it work with SQL, I manage to get to the result by converting to timestamp, its probably not the best way but it works (i proceeded step by step to make sure the references were working, thought i would need a user defined function, but apparently not)
from pyspark.sql.functions import col,explode,lit
from pyspark.sql import functions as F
df2 = df.withColumn("Start_Time", F.unix_timestamp("Start_Time"))
df2.show()
df3 = df.withColumn("Start_Time", F.unix_timestamp("Start_Time") + df["Time_Zone_Offset"]*60*60)
df3.show()
df4 = df3.withColumn('Start_Time', F.from_unixtime("Start_Time", "YYYY-MM-DD HH:00:00")).show()
Adding an alternative to Benoit's answer using a python UDF:
from pyspark.sql import SQLContext
from datetime import datetime, timedelta
from pyspark.sql.types import TimestampType
# Defining pyspark function to add hours onto datetime
def addHours(my_datetime, hours):
# Accounting for NULL (None in python) values
if (hours is None) or (my_datetime is None):
adjusted_datetime = None
else:
adjusted_datetime = my_datetime + timedelta(hours = hours)
return adjusted_datetime
# Registering the function as a UDF to use in SQL, and defining the output type as 'TimestampType' (this is important, the default is StringType)
sqlContext.udf.register("add_hours", addHours, TimestampType());
followed by:
SELECT id, add_hours(Start_Time, Time_Zone_Offset) AS Local_Start_Time
FROM MyTable
Beginning from Spark 3.0, you may use the make_interval(years, months, weeks, days, hours, mins, secs) function if you want to add intervals using values from other columns.
SELECT
id
, Start_time + make_interval(0, 0, 0, 0, time_zone_offset, 0, 0) AS Local_Start_Time
FROM MyTable
For anyone else coming to this question and using Spark SQL via Databricks, the dateadd function works in the same way as most other SQL languages:
select dateadd(microsecond,30,'2022-11-04') as microsecond
,dateadd(millisecond,30,'2022-11-04') as millisecond
,dateadd(second ,30,'2022-11-04') as second
,dateadd(minute ,30,'2022-11-04') as minute
,dateadd(hour ,30,'2022-11-04') as hour
,dateadd(day ,30,'2022-11-04') as day
,dateadd(week ,30,'2022-11-04') as week
,dateadd(month ,30,'2022-11-04') as month
,dateadd(quarter ,30,'2022-11-04') as quarter
,dateadd(year ,30,'2022-11-04') as year
Output
microsecond
millisecond
second
minute
hour
day
week
month
quarter
year
2022-11-04T00:00:00.000+0000
2022-11-04T00:00:00.030+0000
2022-11-04T00:00:30.000+0000
2022-11-04T00:30:00.000+0000
2022-11-05T06:00:00.000+0000
2022-12-04T00:00:00.000+0000
2023-06-02T00:00:00.000+0000
2025-05-04T00:00:00.000+0000
2030-05-04T00:00:00.000+0000
2052-11-04T00:00:00.000+0000

How to convert a numeric column (without a calendar date) into datetime

I am working with a time series data in pandas df that doesn't have a real calendar date but an index value that indicates an equal time interval in between each value. I'm trying to convert it into a datetime type with daily or weekly frequency. Is there a way to keep the values same while changing the type (like without setting an actual calander date)?
Index,Col1,Col2
1,6.5,0.7
2,6.2,0.3
3,0.4,2.1
pd.to_datetime can create dates when given time units relative to some origin. The default is the POSIX origin 1970-01-01 00:00:00 and time in nanoseconds.
import pandas as pd
df['date1'] = pd.to_datetime(df.index, unit='D', origin='2010-01-01')
df['date2'] = pd.to_datetime(df.index, unit='W')
Output:
# Col1 Col2 date1 date2
#Index
#1 6.5 0.7 2010-01-02 1970-01-08
#2 6.2 0.3 2010-01-03 1970-01-15
#3 0.4 2.1 2010-01-04 1970-01-22
Alternatively, you can add timedeltas to the specified start:
pd.to_datetime('2010-01-01') + pd.to_timedelta(df.index, unit='D')
or just keep them as a timedelta:
pd.to_timedelta(df.index, unit='D')
#TimedeltaIndex(['1 days', '2 days', '3 days'], dtype='timedelta64[ns]', name='Index', freq=None)

Get the time spent since midnight in dataframe

I have a dataframe which has a column of type Timestamp. I want to find the time elapsed (in seconds) since midnight as a new column. How to do it in a simple way ?
Eg :
Input :
samples['time']
2018-10-01 00:00:01.000000000
2018-10-01 00:00:12.000000000
type(samples['time'].iloc[0])
<class 'pandas._libs.tslib.Timestamp'>
Output :
samples['time_elapsed']
1
12
Current answers either too complicated or specialized.
samples = pd.DataFrame(data=['2018-10-01 00:00:01', '2018-10-01 00:00:12'], columns=['time'], dtype='datetime64[ns]')
samples['time_elapsed'] = ((samples['time'] - samples['time'].dt.normalize()) / pd.Timedelta('1 second')).astype(int)
print(samples)
time time_elapsed
0 2018-10-01 00:00:01 1
1 2018-10-01 00:00:12 12
normalize() removes the time component from the datetime (moves clock back to midnight).
pd.Timedelta('1 s') sets the unit of measurement, i.e. number of seconds in the timedelta.
.astype(int) casts the decimal number of seconds to int. Use round functionality if that is preferred.
Note that the date part in each row may be other (not from one
and the same day), so you can not take any "base date" (midnight)
for the whole DataFrame, as it can be seen in one of other solutions.
My intention was also not to "contaminate" the source DataFrame
with any intermediate columns, e.g. the time (actually date and time)
as string converted to "true" DateTime.
Then my proposition is:
convert the DateTime string to DateTime,
take the time part from it,
compute the number of seconds from hour / minute / second
part.
All the above steps in a dedicated function.
So to do the task, define a function:
def secSinceMidnight(datTimStr):
tt = pd.to_datetime(datTimStr).time()
return tt.hour * 3600 + tt.minute * 60 + tt.second
Then call:
samples['Secs'] = samples.time.apply(secSinceMidnight)
For source data:
samples = pd.DataFrame(data=[
[ '2018-10-01 00:00:01' ], [ '2018-10-01 00:00:12' ],
[ '2018-11-02 01:01:10' ], [ '2018-11-04 03:02:15' ] ],
columns = ['time']);
when you print the result, you will see:
time Secs
0 2018-10-01 00:00:01 1
1 2018-10-01 00:00:12 12
2 2018-11-02 01:01:10 3670
3 2018-11-04 03:02:15 10935
Doing this in Pandas is very simple!
midnight = pd.Timestamp('2018-10-01 00:00:00')
print(pd.Timestamp('2018-10-01 00:00:01.000000000') - midnight).seconds
>
1
And by extension we can use an apply on a Pandas Series:
samples = pd.DataFrame(['2018-10-01 00:00:01.000000000', '2018-10-01 00:00:12.000000000'], columns=['time'])
samples.time = pd.to_datetime(samples.time)
midnight = pd.Timestamp('2018-10-01 00:00:00')
samples['time_elapsed'] = samples['time'].apply(lambda x: (x - midnight).seconds)
samples
>
time time_elapsed
0 2018-10-01 00:00:01 1
1 2018-10-01 00:00:12 12
Note that the answers here use an alternative method: comparing the timestamp to itself converted to a date. This zeros all time data and so is the equivalent of midnight of that day. This method might be slightly more performant.
I ran into the same problem in my one of my projects and here's how I solved it (assuming your time column has already been converted to Timestamp):
(samples['time'] - samples['time'].dt.normalize()) / pd.Timedelta(seconds=1)
The beauty of this approach is that you can change the last part to get seconds, minutes, hours or days elapsed:
... / pd.Timedelta(seconds=1) # seconds elapsed
... / pd.Timedelta(minutes=1) # minutes elapsed
... / pd.Timedelta(hours=1) # hours elapsed
... / pd.Timedelta(days=1) # days elapsed
datetime = samples['time']
(datetime - datetime.dt.normalize()).dt.total_seconds()
We can do :
samples['time'].dt.hour * 3600 +
samples['time'].dt.minute * 60 +
samples['time'].dt.second