Get the time spent since midnight in dataframe

Get the time spent since midnight in dataframe - pandas

I have a dataframe which has a column of type Timestamp. I want to find the time elapsed (in seconds) since midnight as a new column. How to do it in a simple way ?
Eg :
Input :
samples['time']
2018-10-01 00:00:01.000000000
2018-10-01 00:00:12.000000000
type(samples['time'].iloc[0])
<class 'pandas._libs.tslib.Timestamp'>
Output :
samples['time_elapsed']
1
12

Current answers either too complicated or specialized.
samples = pd.DataFrame(data=['2018-10-01 00:00:01', '2018-10-01 00:00:12'], columns=['time'], dtype='datetime64[ns]')
samples['time_elapsed'] = ((samples['time'] - samples['time'].dt.normalize()) / pd.Timedelta('1 second')).astype(int)
print(samples)
time time_elapsed
0 2018-10-01 00:00:01 1
1 2018-10-01 00:00:12 12
normalize() removes the time component from the datetime (moves clock back to midnight).
pd.Timedelta('1 s') sets the unit of measurement, i.e. number of seconds in the timedelta.
.astype(int) casts the decimal number of seconds to int. Use round functionality if that is preferred.

Note that the date part in each row may be other (not from one
and the same day), so you can not take any "base date" (midnight)
for the whole DataFrame, as it can be seen in one of other solutions.
My intention was also not to "contaminate" the source DataFrame
with any intermediate columns, e.g. the time (actually date and time)
as string converted to "true" DateTime.
Then my proposition is:
convert the DateTime string to DateTime,
take the time part from it,
compute the number of seconds from hour / minute / second
part.
All the above steps in a dedicated function.
So to do the task, define a function:
def secSinceMidnight(datTimStr):
tt = pd.to_datetime(datTimStr).time()
return tt.hour * 3600 + tt.minute * 60 + tt.second
Then call:
samples['Secs'] = samples.time.apply(secSinceMidnight)
For source data:
samples = pd.DataFrame(data=[
[ '2018-10-01 00:00:01' ], [ '2018-10-01 00:00:12' ],
[ '2018-11-02 01:01:10' ], [ '2018-11-04 03:02:15' ] ],
columns = ['time']);
when you print the result, you will see:
time Secs
0 2018-10-01 00:00:01 1
1 2018-10-01 00:00:12 12
2 2018-11-02 01:01:10 3670
3 2018-11-04 03:02:15 10935

Doing this in Pandas is very simple!
midnight = pd.Timestamp('2018-10-01 00:00:00')
print(pd.Timestamp('2018-10-01 00:00:01.000000000') - midnight).seconds
>
1
And by extension we can use an apply on a Pandas Series:
samples = pd.DataFrame(['2018-10-01 00:00:01.000000000', '2018-10-01 00:00:12.000000000'], columns=['time'])
samples.time = pd.to_datetime(samples.time)
midnight = pd.Timestamp('2018-10-01 00:00:00')
samples['time_elapsed'] = samples['time'].apply(lambda x: (x - midnight).seconds)
samples
>
time time_elapsed
0 2018-10-01 00:00:01 1
1 2018-10-01 00:00:12 12
Note that the answers here use an alternative method: comparing the timestamp to itself converted to a date. This zeros all time data and so is the equivalent of midnight of that day. This method might be slightly more performant.

I ran into the same problem in my one of my projects and here's how I solved it (assuming your time column has already been converted to Timestamp):
(samples['time'] - samples['time'].dt.normalize()) / pd.Timedelta(seconds=1)
The beauty of this approach is that you can change the last part to get seconds, minutes, hours or days elapsed:
... / pd.Timedelta(seconds=1) # seconds elapsed
... / pd.Timedelta(minutes=1) # minutes elapsed
... / pd.Timedelta(hours=1) # hours elapsed
... / pd.Timedelta(days=1) # days elapsed

datetime = samples['time']
(datetime - datetime.dt.normalize()).dt.total_seconds()

We can do :
samples['time'].dt.hour * 3600 +
samples['time'].dt.minute * 60 +
samples['time'].dt.second

Related

how to get Date difference in postgres with date part

How to get datetime difference in postgres
I am using below syntax
DATE_PART('hour', A_column::timestamp-B_column::timestamp )
I want output like this:
If A_column=2020-05-20 00:00:00 and B_column=2020-05-15 00:00:00 I want to get 72(in hours).
Is there any possibility to skip weekends(Saturday and Sunday) in first one, it means to get the result as 72 hours(exclude weekend hours)
If A_column=2020-08-15 12:00:00 and B_column=2020-08-15 00:00:00 I want to get 12(in hours).

You could write this as:
select extract(epoch from a_column::timestamp - b_column::timestamp) / 60 / 60
from mytable
Rationale: substracting the two timestamps gives you an interval; you can then turn it to a number of seconds, and do arithmetics to convert that to hours.

Adding variable hours to timestamp in Spark SQL

I have one column Start_Time with a timestamp, and one column Time_Zone_Offset, an integer. How can I add the Time_Zone_Offset to Start_Time as a number of hours?
Example MyTable:
id Start_Time Time_Zone_Offset
1 2020-01-12 00:00:00 1
2 2020-01-12 00:00:00 2
Desired output:
id Local_Start_Time
1 2020-01-12 01:00:00
2 2020-01-12 02:00:00
Attempt:
SELECT id, Start_time + INTERVAL time_zone_offset HOURS AS Local_Start_Time
FROM MyTable
This doesn't seem to work, and I can't use from_utc_timestamp as I don't have the actual timezone details, just the time-zone offset at the time being considered.

(Hope you are using pyspark)
In deed, coudn't make it work with SQL, I manage to get to the result by converting to timestamp, its probably not the best way but it works (i proceeded step by step to make sure the references were working, thought i would need a user defined function, but apparently not)
from pyspark.sql.functions import col,explode,lit
from pyspark.sql import functions as F
df2 = df.withColumn("Start_Time", F.unix_timestamp("Start_Time"))
df2.show()
df3 = df.withColumn("Start_Time", F.unix_timestamp("Start_Time") + df["Time_Zone_Offset"]*60*60)
df3.show()
df4 = df3.withColumn('Start_Time', F.from_unixtime("Start_Time", "YYYY-MM-DD HH:00:00")).show()

Adding an alternative to Benoit's answer using a python UDF:
from pyspark.sql import SQLContext
from datetime import datetime, timedelta
from pyspark.sql.types import TimestampType
# Defining pyspark function to add hours onto datetime
def addHours(my_datetime, hours):
# Accounting for NULL (None in python) values
if (hours is None) or (my_datetime is None):
adjusted_datetime = None
else:
adjusted_datetime = my_datetime + timedelta(hours = hours)
return adjusted_datetime
# Registering the function as a UDF to use in SQL, and defining the output type as 'TimestampType' (this is important, the default is StringType)
sqlContext.udf.register("add_hours", addHours, TimestampType());
followed by:
SELECT id, add_hours(Start_Time, Time_Zone_Offset) AS Local_Start_Time
FROM MyTable

Beginning from Spark 3.0, you may use the make_interval(years, months, weeks, days, hours, mins, secs) function if you want to add intervals using values from other columns.
SELECT
id
, Start_time + make_interval(0, 0, 0, 0, time_zone_offset, 0, 0) AS Local_Start_Time
FROM MyTable

For anyone else coming to this question and using Spark SQL via Databricks, the dateadd function works in the same way as most other SQL languages:
select dateadd(microsecond,30,'2022-11-04') as microsecond
,dateadd(millisecond,30,'2022-11-04') as millisecond
,dateadd(second ,30,'2022-11-04') as second
,dateadd(minute ,30,'2022-11-04') as minute
,dateadd(hour ,30,'2022-11-04') as hour
,dateadd(day ,30,'2022-11-04') as day
,dateadd(week ,30,'2022-11-04') as week
,dateadd(month ,30,'2022-11-04') as month
,dateadd(quarter ,30,'2022-11-04') as quarter
,dateadd(year ,30,'2022-11-04') as year
Output
microsecond
millisecond
second
minute
hour
day
week
month
quarter
year
2022-11-04T00:00:00.000+0000
2022-11-04T00:00:00.030+0000
2022-11-04T00:00:30.000+0000
2022-11-04T00:30:00.000+0000
2022-11-05T06:00:00.000+0000
2022-12-04T00:00:00.000+0000
2023-06-02T00:00:00.000+0000
2025-05-04T00:00:00.000+0000
2030-05-04T00:00:00.000+0000
2052-11-04T00:00:00.000+0000

Difference between two timestamps in Pandas

I have the following two time column,"Time1" and "Time2".I have to calculate the "Difference" column,which is (Time2-Time1) in Pandas:
Time1 Time2 Difference
8:59:45 9:27:30 -1 days +23:27:45
9:52:29 10:08:54 -1 days +23:16:26
8:07:15 8:07:53 00:00:38
When Time1 and Time2 are in different hours,I am getting result as"-1 days +" .My desired output for First two values are given below:
Time1 Time2 Difference
8:59:45 9:27:30 00:27:45
9:52:29 10:08:54 00:16:26
How can I get this output in Pandas?
Both time values are in 'datetime64[ns]' dtype.

The issue is not that time1 and time2 are in different hours, it's that time2 is before time1 so time2-time1 is negative, and this is how negative timedeltas are stored. If you just want the difference in minutes as a negative number, you could extract the minutes before calculating the difference:
(df.Time1.dt.minute- df.Time2.dt.minute)

I was not able to reproduce the issue using pandas 17.1:
import pandas as pd
d = {
"start_time": [
"8:59:45",
"9:52:29",
"8:07:15"
],
"end_time": [
"9:27:30",
"10:08:54",
"8:07:53"
]
}
from datetime import datetime
df = pd.DataFrame(data=d)
df['start_time'] = pd.to_datetime(df['start_time'])
df['end_time'] = pd.to_datetime(df['end_time'])
df.end_time - df.start_time
0 00:27:45
1 00:16:25
2 00:00:38
dtype: timedelta64[ns]

Time Difference in Redshift

how to get exact time Difference between two column
eg:
col1 date is 2014-09-21 02:00:00
col2 date is 2014-09-22 01:00:00
output like
result: 23:00:00
I am getting result like
Hours Minutes Seconds
--------------------
3 3 20
1 2 30
using the following query
SELECT start_time,
end_time,
DATE_PART(H,end_time) - DATE_PART(H,start_time) AS Hours,
DATE_PART(M,end_time) - DATE_PART(M,start_time) AS Minutes,
DATE_PART(S,end_time) - DATE_PART(S,start_time) AS Seconds
FROM user_session
but i need like
Difference
-----------
03:03:20
01:02:30

Use DATEDIFF to get the seconds between the two datetimes:
DATEDIFF(second,'2014-09-23 00:00:00.000','2014-09-23 01:23:45.000')
Then use DATEADD to add the seconds to '1900-01-01 00:00:00':
DATEADD(seconds,5025,'1900-01-01 00:00:00')
Then CAST the result to a TIME data type (note that this limits you to 24 hours max):
CAST('1900-01-01 01:23:45' as TIME)
Then LTRIM the date part of the value off the TIME data (as discovered by Benny). Redshift does not allow use of TIME on actual stored data:
LTRIM('1900-01-01 01:23:45','1900-01-01')
Now, do it in a single step:
SELECT LTRIM(DATEADD(seconds,DATEDIFF(second,'2014-09-23 00:00:00','2014-09-23 01:23:45.000'),'1900-01-01 00:00:00'),'1900-01-01');
:)

SELECT LTRIM(DATEADD(seconds,DATEDIFF(second,'2014-09-23 00:00:00','2014-09-23 01:23:45.000'),'1900-01-01 00:00:00'),'1900-01-01');

SQL generating a set of dates

I am trying to find a way to have a SELECT statement return the set of dates between 2 input dates with a given interval. I would like to be able to easily change the time frame and interval that would be returned, so hard coding something with a series of SELECT ... UNIONs would not be ideal.
For example: I want all the dates at 5 second intervals for the last 60 seconds.
Expected:
times
---------------------
2009-02-05 08:00:00
2009-02-05 08:00:05
2009-02-05 08:00:10
2009-02-05 08:00:15
2009-02-05 08:00:20
...
2009-02-05 08:00:55
Edit:
generate_series(...) can be used in place of a table in the SELECT and simulates a table with a series of numbers in it with a given start value, end value and optionally a step. From there it can be CAST to the type I need for time functions and manipulated during the SELECT.
Thanks Quassnoi.

SELECT CAST (s || ' seconds' AS INTERVAL) + TIMESTAMP 'now'
FROM generate_series(0, -60, -5) s

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get the time spent since midnight in dataframe - pandas

datetime = samples['time'] (datetime - datetime.dt.normalize()).dt.total_seconds()

We can do : samples['time'].dt.hour * 3600 + samples['time'].dt.minute * 60 + samples['time'].dt.second

Related

how to get Date difference in postgres with date part

Adding variable hours to timestamp in Spark SQL

Difference between two timestamps in Pandas

Time Difference in Redshift

SQL generating a set of dates

Categories

Resources