Get the time spent since midnight in dataframe - pandas
I have a dataframe which has a column of type Timestamp. I want to find the time elapsed (in seconds) since midnight as a new column. How to do it in a simple way ?
Eg :
Input :
samples['time']
2018-10-01 00:00:01.000000000
2018-10-01 00:00:12.000000000
type(samples['time'].iloc[0])
<class 'pandas._libs.tslib.Timestamp'>
Output :
samples['time_elapsed']
1
12
Current answers either too complicated or specialized.
samples = pd.DataFrame(data=['2018-10-01 00:00:01', '2018-10-01 00:00:12'], columns=['time'], dtype='datetime64[ns]')
samples['time_elapsed'] = ((samples['time'] - samples['time'].dt.normalize()) / pd.Timedelta('1 second')).astype(int)
print(samples)
time time_elapsed
0 2018-10-01 00:00:01 1
1 2018-10-01 00:00:12 12
normalize() removes the time component from the datetime (moves clock back to midnight).
pd.Timedelta('1 s') sets the unit of measurement, i.e. number of seconds in the timedelta.
.astype(int) casts the decimal number of seconds to int. Use round functionality if that is preferred.
Note that the date part in each row may be other (not from one
and the same day), so you can not take any "base date" (midnight)
for the whole DataFrame, as it can be seen in one of other solutions.
My intention was also not to "contaminate" the source DataFrame
with any intermediate columns, e.g. the time (actually date and time)
as string converted to "true" DateTime.
Then my proposition is:
convert the DateTime string to DateTime,
take the time part from it,
compute the number of seconds from hour / minute / second
part.
All the above steps in a dedicated function.
So to do the task, define a function:
def secSinceMidnight(datTimStr):
tt = pd.to_datetime(datTimStr).time()
return tt.hour * 3600 + tt.minute * 60 + tt.second
Then call:
samples['Secs'] = samples.time.apply(secSinceMidnight)
For source data:
samples = pd.DataFrame(data=[
[ '2018-10-01 00:00:01' ], [ '2018-10-01 00:00:12' ],
[ '2018-11-02 01:01:10' ], [ '2018-11-04 03:02:15' ] ],
columns = ['time']);
when you print the result, you will see:
time Secs
0 2018-10-01 00:00:01 1
1 2018-10-01 00:00:12 12
2 2018-11-02 01:01:10 3670
3 2018-11-04 03:02:15 10935
Doing this in Pandas is very simple!
midnight = pd.Timestamp('2018-10-01 00:00:00')
print(pd.Timestamp('2018-10-01 00:00:01.000000000') - midnight).seconds
>
1
And by extension we can use an apply on a Pandas Series:
samples = pd.DataFrame(['2018-10-01 00:00:01.000000000', '2018-10-01 00:00:12.000000000'], columns=['time'])
samples.time = pd.to_datetime(samples.time)
midnight = pd.Timestamp('2018-10-01 00:00:00')
samples['time_elapsed'] = samples['time'].apply(lambda x: (x - midnight).seconds)
samples
>
time time_elapsed
0 2018-10-01 00:00:01 1
1 2018-10-01 00:00:12 12
Note that the answers here use an alternative method: comparing the timestamp to itself converted to a date. This zeros all time data and so is the equivalent of midnight of that day. This method might be slightly more performant.
I ran into the same problem in my one of my projects and here's how I solved it (assuming your time column has already been converted to Timestamp):
(samples['time'] - samples['time'].dt.normalize()) / pd.Timedelta(seconds=1)
The beauty of this approach is that you can change the last part to get seconds, minutes, hours or days elapsed:
... / pd.Timedelta(seconds=1) # seconds elapsed
... / pd.Timedelta(minutes=1) # minutes elapsed
... / pd.Timedelta(hours=1) # hours elapsed
... / pd.Timedelta(days=1) # days elapsed
datetime = samples['time']
(datetime - datetime.dt.normalize()).dt.total_seconds()
We can do :
samples['time'].dt.hour * 3600 +
samples['time'].dt.minute * 60 +
samples['time'].dt.second
Related
how to get Date difference in postgres with date part
How to get datetime difference in postgres I am using below syntax DATE_PART('hour', A_column::timestamp-B_column::timestamp ) I want output like this: If A_column=2020-05-20 00:00:00 and B_column=2020-05-15 00:00:00 I want to get 72(in hours). Is there any possibility to skip weekends(Saturday and Sunday) in first one, it means to get the result as 72 hours(exclude weekend hours) If A_column=2020-08-15 12:00:00 and B_column=2020-08-15 00:00:00 I want to get 12(in hours).
You could write this as: select extract(epoch from a_column::timestamp - b_column::timestamp) / 60 / 60 from mytable Rationale: substracting the two timestamps gives you an interval; you can then turn it to a number of seconds, and do arithmetics to convert that to hours.
Adding variable hours to timestamp in Spark SQL
I have one column Start_Time with a timestamp, and one column Time_Zone_Offset, an integer. How can I add the Time_Zone_Offset to Start_Time as a number of hours? Example MyTable: id Start_Time Time_Zone_Offset 1 2020-01-12 00:00:00 1 2 2020-01-12 00:00:00 2 Desired output: id Local_Start_Time 1 2020-01-12 01:00:00 2 2020-01-12 02:00:00 Attempt: SELECT id, Start_time + INTERVAL time_zone_offset HOURS AS Local_Start_Time FROM MyTable This doesn't seem to work, and I can't use from_utc_timestamp as I don't have the actual timezone details, just the time-zone offset at the time being considered.
(Hope you are using pyspark) In deed, coudn't make it work with SQL, I manage to get to the result by converting to timestamp, its probably not the best way but it works (i proceeded step by step to make sure the references were working, thought i would need a user defined function, but apparently not) from pyspark.sql.functions import col,explode,lit from pyspark.sql import functions as F df2 = df.withColumn("Start_Time", F.unix_timestamp("Start_Time")) df2.show() df3 = df.withColumn("Start_Time", F.unix_timestamp("Start_Time") + df["Time_Zone_Offset"]*60*60) df3.show() df4 = df3.withColumn('Start_Time', F.from_unixtime("Start_Time", "YYYY-MM-DD HH:00:00")).show()
Adding an alternative to Benoit's answer using a python UDF: from pyspark.sql import SQLContext from datetime import datetime, timedelta from pyspark.sql.types import TimestampType # Defining pyspark function to add hours onto datetime def addHours(my_datetime, hours): # Accounting for NULL (None in python) values if (hours is None) or (my_datetime is None): adjusted_datetime = None else: adjusted_datetime = my_datetime + timedelta(hours = hours) return adjusted_datetime # Registering the function as a UDF to use in SQL, and defining the output type as 'TimestampType' (this is important, the default is StringType) sqlContext.udf.register("add_hours", addHours, TimestampType()); followed by: SELECT id, add_hours(Start_Time, Time_Zone_Offset) AS Local_Start_Time FROM MyTable
Beginning from Spark 3.0, you may use the make_interval(years, months, weeks, days, hours, mins, secs) function if you want to add intervals using values from other columns. SELECT id , Start_time + make_interval(0, 0, 0, 0, time_zone_offset, 0, 0) AS Local_Start_Time FROM MyTable
For anyone else coming to this question and using Spark SQL via Databricks, the dateadd function works in the same way as most other SQL languages: select dateadd(microsecond,30,'2022-11-04') as microsecond ,dateadd(millisecond,30,'2022-11-04') as millisecond ,dateadd(second ,30,'2022-11-04') as second ,dateadd(minute ,30,'2022-11-04') as minute ,dateadd(hour ,30,'2022-11-04') as hour ,dateadd(day ,30,'2022-11-04') as day ,dateadd(week ,30,'2022-11-04') as week ,dateadd(month ,30,'2022-11-04') as month ,dateadd(quarter ,30,'2022-11-04') as quarter ,dateadd(year ,30,'2022-11-04') as year Output microsecond millisecond second minute hour day week month quarter year 2022-11-04T00:00:00.000+0000 2022-11-04T00:00:00.030+0000 2022-11-04T00:00:30.000+0000 2022-11-04T00:30:00.000+0000 2022-11-05T06:00:00.000+0000 2022-12-04T00:00:00.000+0000 2023-06-02T00:00:00.000+0000 2025-05-04T00:00:00.000+0000 2030-05-04T00:00:00.000+0000 2052-11-04T00:00:00.000+0000
Difference between two timestamps in Pandas
I have the following two time column,"Time1" and "Time2".I have to calculate the "Difference" column,which is (Time2-Time1) in Pandas: Time1 Time2 Difference 8:59:45 9:27:30 -1 days +23:27:45 9:52:29 10:08:54 -1 days +23:16:26 8:07:15 8:07:53 00:00:38 When Time1 and Time2 are in different hours,I am getting result as"-1 days +" .My desired output for First two values are given below: Time1 Time2 Difference 8:59:45 9:27:30 00:27:45 9:52:29 10:08:54 00:16:26 How can I get this output in Pandas? Both time values are in 'datetime64[ns]' dtype.
The issue is not that time1 and time2 are in different hours, it's that time2 is before time1 so time2-time1 is negative, and this is how negative timedeltas are stored. If you just want the difference in minutes as a negative number, you could extract the minutes before calculating the difference: (df.Time1.dt.minute- df.Time2.dt.minute)
I was not able to reproduce the issue using pandas 17.1: import pandas as pd d = { "start_time": [ "8:59:45", "9:52:29", "8:07:15" ], "end_time": [ "9:27:30", "10:08:54", "8:07:53" ] } from datetime import datetime df = pd.DataFrame(data=d) df['start_time'] = pd.to_datetime(df['start_time']) df['end_time'] = pd.to_datetime(df['end_time']) df.end_time - df.start_time 0 00:27:45 1 00:16:25 2 00:00:38 dtype: timedelta64[ns]
Time Difference in Redshift
how to get exact time Difference between two column eg: col1 date is 2014-09-21 02:00:00 col2 date is 2014-09-22 01:00:00 output like result: 23:00:00 I am getting result like Hours Minutes Seconds -------------------- 3 3 20 1 2 30 using the following query SELECT start_time, end_time, DATE_PART(H,end_time) - DATE_PART(H,start_time) AS Hours, DATE_PART(M,end_time) - DATE_PART(M,start_time) AS Minutes, DATE_PART(S,end_time) - DATE_PART(S,start_time) AS Seconds FROM user_session but i need like Difference ----------- 03:03:20 01:02:30
Use DATEDIFF to get the seconds between the two datetimes: DATEDIFF(second,'2014-09-23 00:00:00.000','2014-09-23 01:23:45.000') Then use DATEADD to add the seconds to '1900-01-01 00:00:00': DATEADD(seconds,5025,'1900-01-01 00:00:00') Then CAST the result to a TIME data type (note that this limits you to 24 hours max): CAST('1900-01-01 01:23:45' as TIME) Then LTRIM the date part of the value off the TIME data (as discovered by Benny). Redshift does not allow use of TIME on actual stored data: LTRIM('1900-01-01 01:23:45','1900-01-01') Now, do it in a single step: SELECT LTRIM(DATEADD(seconds,DATEDIFF(second,'2014-09-23 00:00:00','2014-09-23 01:23:45.000'),'1900-01-01 00:00:00'),'1900-01-01'); :)
SELECT LTRIM(DATEADD(seconds,DATEDIFF(second,'2014-09-23 00:00:00','2014-09-23 01:23:45.000'),'1900-01-01 00:00:00'),'1900-01-01');
SQL generating a set of dates
I am trying to find a way to have a SELECT statement return the set of dates between 2 input dates with a given interval. I would like to be able to easily change the time frame and interval that would be returned, so hard coding something with a series of SELECT ... UNIONs would not be ideal. For example: I want all the dates at 5 second intervals for the last 60 seconds. Expected: times --------------------- 2009-02-05 08:00:00 2009-02-05 08:00:05 2009-02-05 08:00:10 2009-02-05 08:00:15 2009-02-05 08:00:20 ... 2009-02-05 08:00:55 Edit: generate_series(...) can be used in place of a table in the SELECT and simulates a table with a series of numbers in it with a given start value, end value and optionally a step. From there it can be CAST to the type I need for time functions and manipulated during the SELECT. Thanks Quassnoi.
SELECT CAST (s || ' seconds' AS INTERVAL) + TIMESTAMP 'now' FROM generate_series(0, -60, -5) s