Converting pandas._libs.tslibs.timestamps.Timestamp to seconds since midnight? - pandas

I have a pandas._libs.tslibs.timestamps.Timestamp object, e.g., 2016-01-01 07:00:04.85+00:00 and I want to create an int object that stores the number of seconds since the previous midnight.
In the above example, it would return 7 * 3600 + 0 * 60 + 4.85 = 25204.85
Is there a quick way to do this in pandas?

You can use normalize() to subtract the date part:
# ts = pd.to_datetime('2016-01-01 07:00:04.85+00:00')
>>> (ts - ts.normalize()).total_seconds()
25204.85
It also works with DataFrame through dt accessor:
# df = pd.DataFrame({'date': [ts]})
>>> (df['date'] - df['date'].dt.normalize()).dt.total_seconds()
0 25204.85
Name: date, dtype: float64

Not sure if this is what you are looking for but here is an implementation:
import pandas as pd
def seconds_from_midnight(date):
return date.hour * 3600 + date.minute * 60 + date.second + date.microsecond / 1000000
date = pd.Timestamp.now()
print(date)
print(seconds_from_midnight(date))

Related

Add a column of minutes to a datetime in pandas

I have a dataframe with a start time and the length of operation. I'm trying to figure out out to add the length (in minutes) to the start time in order to figure out the end time of the session. I've run a few different variations of the same general idea and keep getting the same error, "unsupported type for timedelta minutes component: Series". The code extract is below:
data= {'Name': ['John', 'Peter'],
'Start' : [2, 2],
'Length': [120, 90],
}
df = pd.DataFrame.from_records(data)
df['Start'] = pd.to_datetime(df['Start'])
df['Length'] = pd.to_datetime(df['Length'])
df["tdiffinmin"] = df['Start'].apply(lambda x: x + pd.DateOffset(minutes = df["Length"]))
Ive also tried the follow as other methods of doing this math and keep getting similar errors.
df["tdiffinmin"] = df['Start'].apply(lambda x: x -pd.DateOffset(minutes = df["Length"]))
df["tdiffinmin"] = (df['Start']. + timedelta(minutes = df["Length"])).dt.total_seconds() / 60
df['tdiffinmin'] = df['Start'] - pd.DateOffset(minutes = df["Length"])
The full code reads from a data set (excel sheet or CSV), populates a Dataframe, and this is some of the math I am doing. Originally it was done with Start and Stop times, so I know something similar is possible. In the dataset, Length is in minutes and Start is a date and time, so datetime is necessary.
You should convert Length into timedelta, not datetime:
df['Start'] = pd.to_datetime(df['Start'])
df['Length'] = pd.to_timedelta(df['Length'], unit='min')
df['tdiffinmin'] = df['Start'] + df['Length']
Output:
Length Name Start tdiffinmin
0 02:00:00 John 1970-01-01 00:00:00.000000002 1970-01-01 02:00:00.000000002
1 01:30:00 Peter 1970-01-01 00:00:00.000000002 1970-01-01 01:30:00.000000002

Getting Binance Historical Data For Specific TimeZone

I found this python script on the web, it gets OHLCV historical data from Binance api by wanted dates, assets and time intervals. The script currently returns the data for UTC time.
I want to modify it so it will return the data (daily/hourly) according to a specified timezone. I guess
it takes only to change one function or add an argument but I can't manage to do it correctly.
How can I change it so it will return data for UTC+2 (or any other time zone)?
import time
import dateparser
import pytz
import os
from datetime import datetime
import binance
print(binance.__file__)
from binance.client import Client
import time
import pandas as pd
def date_to_milliseconds(date_str):
"""Convert UTC date to milliseconds.
If using offset strings add "UTC" to date string e.g. "now UTC", "11 hours ago UTC"
See dateparse docs for formats http://dateparser.readthedocs.io/en/latest/
:param date_str: date in readable format, i.e. "January 01, 2018", "11 hours ago UTC", "now UTC"
:type date_str: str
"""
# get epoch value in UTC
epoch = datetime.utcfromtimestamp(0).replace(tzinfo=pytz.utc)
# parse our date string
d = dateparser.parse(date_str)
# if the date is not timezone aware apply UTC timezone
if d.tzinfo is None or d.tzinfo.utcoffset(d) is None:
d = d.replace(tzinfo=pytz.utc)
# return the difference in time
return int((d - epoch).total_seconds() * 1000.0)
def interval_to_milliseconds(interval):
"""Convert a Binance interval string to milliseconds
:param interval: Binance interval string 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 3d, 1w
:type interval: str
:return:
None if unit not one of m, h, d or w
None if string not in correct format
int value of interval in milliseconds
"""
ms = None
seconds_per_unit = {
"m": 60,
"h": 60 * 60,
"d": 24 * 60 * 60,
"w": 7 * 24 * 60 * 60
}
unit = interval[-1]
if unit in seconds_per_unit:
try:
ms = int(interval[:-1]) * seconds_per_unit[unit] * 1000
except ValueError:
pass
return ms
def GetUpdateData(kline):
Time = time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime(kline[0]/1000))
Open = kline[1]
High = kline[2]
Low = kline[3]
Close = kline[4]
Volume = kline[5]
Close_time = time.strftime("%a, %d %b %Y %H:%M:%S", time.localtime(kline[6]/1000))
Quote_asset_volume = kline[7]
Number_of_trades = kline[8]
Taker_buy_base_asset_volume = kline[9]
Taker_buy_quote_asset_volume = kline[10]
return Time,Open,High,Low,Close,Volume,Close_time,Quote_asset_volume,Number_of_trades,Taker_buy_base_asset_volume,Taker_buy_quote_asset_volume
def get_historical_klines(symbol, interval, start_str, end_str=None):
"""Get Historical Klines from Binance
See dateparse docs for valid start and end string formats http://dateparser.readthedocs.io/en/latest/
If using offset strings for dates add "UTC" to date string e.g. "now UTC", "11 hours ago UTC"
:param symbol: Name of symbol pair e.g BNBBTC
:type symbol: str
:param interval: Biannce Kline interval
:type interval: str
:param start_str: Start date string in UTC format
:type start_str: str
:param end_str: optional - end date string in UTC format
:type end_str: str
:return: list of OHLCV values
"""
# create the Binance client, no need for api key
client = Client("", "")
# init our list
output_data = []
# setup the max limit
limit = 500
# convert interval to useful value in seconds
timeframe = interval_to_milliseconds(interval)
# convert our date strings to milliseconds
start_ts = date_to_milliseconds(start_str)
# if an end time was passed convert it
end_ts = None
if end_str:
end_ts = date_to_milliseconds(end_str)
idx = 0
# it can be difficult to know when a symbol was listed on Binance so allow start time to be before list date
symbol_existed = False
while True:
# fetch the klines from start_ts up to max 500 entries or the end_ts if set
temp_data = client.get_klines(
symbol=symbol,
interval=interval,
limit=limit,
startTime=start_ts,
endTime=end_ts
)
# handle the case where our start date is before the symbol pair listed on Binance
if not symbol_existed and len(temp_data):
symbol_existed = True
if symbol_existed:
# append this loops data to our output data
output_data += temp_data
# update our start timestamp using the last value in the array and add the interval timeframe
start_ts = temp_data[len(temp_data) - 1][0] + timeframe
else:
# it wasn't listed yet, increment our start date
start_ts += timeframe
idx += 1
# check if we received less than the required limit and exit the loop
if len(temp_data) < limit:
# exit the while loop
break
# sleep after every 3rd call to be kind to the API
if idx % 3 == 0:
time.sleep(1)
return output_data
start = "01 January, 2017"
end = "01 February, 2017"
symbols = ['ETHBTC']
interval = '1d'#Client.KLINE_INTERVAL_15MIN
for symbol in symbols:
klines = get_historical_klines(symbol, interval, start, end)
times = []
Opens = []
Highs = []
Lows = []
Closes = []
Volumes = []
Close_times = []
Quote_asset_volumes = []
Number_of_tradess = []
Taker_buy_base_asset_volumes = []
Taker_buy_quote_asset_volumes = []
for k in klines:
Time,Open,High,Low,Close,Volume,Close_time,Quote_asset_volume,Number_of_trades,Taker_buy_base_asset_volume,Taker_buy_quote_asset_volume = GetUpdateData(k)
times.append(Time)
Opens.append(Open)
Highs.append(High)
Lows.append(Low)
Closes.append(Close)
Volumes.append(Volume)
Close_times.append(Close_time)
Quote_asset_volumes.append(Quote_asset_volume)
Number_of_tradess.append(Number_of_trades)
Taker_buy_base_asset_volumes.append(Taker_buy_base_asset_volume)
Taker_buy_quote_asset_volumes.append(Taker_buy_quote_asset_volume)
DataStruct = pd.DataFrame()
DataStruct['time'] = times
DataStruct['Open'] = Opens
DataStruct['High'] = Highs
DataStruct['Low'] = Lows
DataStruct['Close'] = Closes
DataStruct['Volume'] = Volumes
DataStruct['Close_time'] = Close_times
DataStruct['Quote_asset_volume'] = Quote_asset_volumes
DataStruct['Number_of_trades'] = Number_of_tradess
DataStruct['Taker_buy_base_asset_volume'] = Taker_buy_base_asset_volumes
DataStruct['Taker_buy_quote_asset_volume'] = Taker_buy_quote_asset_volumes
FileName = symbol+ '_' + start+ '_' + end + ' .csv'
FileName = FileName.replace(' ','_')
FileName = FileName.replace(',','')
Path2Save = os.path.normpath(r'')
SaveStrFile = os.path.normpath(Path2Save+ '\\' +FileName)
#save FeatureWeights to CSV file
D_S_header = ['time','Open','High','Low','Close','Volume','Close_time','Quote_asset_volume','Number_of_trades','Taker_buy_base_asset_volume','Taker_buy_quote_asset_volume']
DataStruct.to_csv(path_or_buf = SaveStrFile, header = D_S_header )
In these lines you see the timezone being defined:
# get epoch value in UTC
epoch = datetime.utcfromtimestamp(0).replace(tzinfo=pytz.utc)
Just redefine the timezone there. For a list of timezones supported by pytz you can get a list using pytz.all_timezones.

Pandas Timeseries: Total duration meeting a specific condition

I have a timeseries
ts = pd.Series(data=[0,1,2,3,4],index=[pd.Timestamp('1991-01-01'),pd.Timestamp('1995-01-01'),pd.Timestamp('1996-01-01'),pd.Timestamp('2010-01-01'),pd.Timestamp('2011-01-01')])
Whats the fastest, most readable, way to get the total duration in which the value is below 2, assuming the values are valid until the next time-step indicates otherwise (no linear interpolation). I imagine there probably is a pandas function for this
This seems to be working quite well, however I am still baffled that there does not seem to be a pandas function for this!
import pandas as pd
import numpy as np
ts = pd.Series(data=[0,1,2,3,4],index=[pd.Timestamp('1991-01-01'),pd.Timestamp('1995-01-01'),pd.Timestamp('1996-01-01'),pd.Timestamp('2010-01-01'),pd.Timestamp('2011-01-01')])
# making the timeseries binary. 1 = meets condition, 0 = does not
ts = ts.where(ts>=2,other=1)
ts = ts.where(ts<2,other=0)
delta_time = ts.index.to_pydatetime()[1:]-ts.index.to_pydatetime()[:-1]
time_below_2 = np.sum(delta_time[np.invert(ts.values[:-1])]).total_seconds()
time_above_2 = np.sum(delta_time[(ts.values[:-1])]).total_seconds()
The above function seems to break for certain timeframes. This option is slower, but did not break in any of my tests:
def get_total_duration_above_and_below_value(value,ts):
# making the timeseries binary. 1 = above value, 0 = below value
ts = ts.where(ts >= value, other=1)
ts = ts.where(ts < value, other=0)
time_above_value = 0
time_below_value = 0
for i in range(ts.size - 1):
if ts[i] == 1:
time_above_value += abs(pd.Timedelta(
ts.index[i] - ts.index[i + 1]).total_seconds()) / 3600
else:
time_below_value += abs(pd.Timedelta(
ts.index[i] - ts.index[i + 1]).total_seconds()) / 3600
return time_above_value, time_below_value

Time Difference between Time Period and Instant

I have some time periods (df_A) and some time instants (df_B):
import pandas as pd
import numpy as np
import datetime as dt
from datetime import timedelta
# Data
df_A = pd.DataFrame({'A1': [dt.datetime(2017,1,5,9,8), dt.datetime(2017,1,5,9,9), dt.datetime(2017,1,7,9,19), dt.datetime(2017,1,7,9,19), dt.datetime(2017,1,7,9,19), dt.datetime(2017,2,7,9,19), dt.datetime(2017,2,7,9,19)],
'A2': [dt.datetime(2017,1,5,9,9), dt.datetime(2017,1,5,9,12), dt.datetime(2017,1,7,9,26), dt.datetime(2017,1,7,9,20), dt.datetime(2017,1,7,9,21), dt.datetime(2017,2,7,9,23), dt.datetime(2017,2,7,9,25)]})
df_B = pd.DataFrame({ 'B': [dt.datetime(2017,1,6,14,45), dt.datetime(2017,1,4,3,31), dt.datetime(2017,1,7,3,31), dt.datetime(2017,1,7,14,57), dt.datetime(2017,1,9,14,57)]})
I can match these together:
# Define an Extra Margin
M = dt.timedelta(days = 10)
df_A["A1X"] = df_A["A1"] + M
df_A["A2X"] = df_A["A2"] - M
# Match
Bv = df_B .B .values
A1 = df_A .A1X.values
A2 = df_A .A2X.values
i, j = np.where((Bv[:, None] >= A1) & (Bv[:, None] <= A2))
df_C = pd.DataFrame(np.column_stack([df_B .values[i], df_A .values[j]]),
columns = df_B .columns .append (df_A.columns))
I would like to find the time difference between each time period and the time instant matched to it. I mean that
if B is between A1 and A2
then dT = 0
I've tried doing it like this:
# Calculate dt
def time(A1,A2,B):
if df_C["B"] < df_C["A1"]:
return df_C["A1"].subtract(df_C["B"])
elif df_C["B"] > df_C["A2"]:
return df_C["B"].subtract(df_C["A2"])
else:
return 0
df_C['dt'] = df_C.apply(time)
I'm getting "ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series"
So, I found two fixes:
You are adding M to the lower value and subtracting from the higher one. Change it to:
df_A['A1X'] = df_A['A1'] - M
df_A['A2X'] = df_A['A2'] + M
You are only passing one row of your dataframe at a time to your time function, so it should be something like:
def time(row):
if row['B'] < row['A1']:
return row['A1'] - row['B']
elif row['B'] > row['A2']:
return row['B'] - row['A2']
else:
return 0
And then you can call it like this:
df_C['dt'] = df_C.apply(time, axis=1) :)

Represent negative timedelta in most basic form

If I create a negative Timedelta for e.g. 0.5 hours, the internal representation looks as follow:
In [2]: pd.Timedelta('-0.5h')
Out[2]: Timedelta('-1 days +23:30:00')
How can I get back a (str) representation of this Timedelta in the form -00:30?
I want to display these deltas and requiring the user to calculate the expression -1 day + something is a bit award.
I can't add comment to you so adding it here. Don't know if this helps but I think you can use python humanize.
import humanize as hm
hm.naturaltime((pd.Timedelta('-0.5h')))
Out:
'30 minutes from now'
Ok, I will live with a hack going trough a date:
sign = ''
date = pd.to_datetime('today')
if delta.total_seconds() < 0:
sign = '-'
date = date - delta
else:
date = date + delta
print '{}{:%H:%M}'.format(sign, date.to_pydatetime())
You can use the components of a Pandas timedelta
import pandas as pd
t = pd.Timedelta('-0.5h')
print t.components
>> Components(days=-1L, hours=23L, minutes=30L, seconds=0L, milliseconds=0L, microseconds=0L, nanoseconds=0L)
You can access each component with
print t.components.days
>> -1
print t.components.hours
>> 23
print t.components.minutes
>> 30
The rest is then formatting.
source
This is a total hack that won't work for Series data, but....
import pandas as pd
import numpy as np
t = pd.Timedelta('-0.5h').components
mins = t.days*24*60 + t.hours*60 + t.minutes
print str(np.sign(mins))[0]+str(divmod(abs(mins), 60)[0]).zfill(2)+':'+str(divmod(abs(mins), 60)[1]).zfill(2)
>> -00:30
I was looking for something similar (see https://github.com/pandas-dev/pandas/issues/17232 )
I'm not sure if it will be implemented in Pandas, so here is a workaround
import pandas as pd
def timedelta2str(td, display_plus=False, format=None):
"""
Parameters
----------
format : None|all|even_day|sub_day|long
Returns
-------
converted : string of a Timedelta
>>> td = pd.Timedelta('00:00:00.000')
>>> timedelta2str(td)
'0 days'
>>> td = pd.Timedelta('00:01:29.123')
>>> timedelta2str(td, display_plus=True, format='sub_day')
'+ 00:01:29.123000'
>>> td = pd.Timedelta('-00:01:29.123')
>>> timedelta2str(td, display_plus=True, format='sub_day')
'- 00:01:29.123000'
"""
td_zero = pd.Timedelta(0)
sign_sep = ' '
if td >= td_zero:
s = td._repr_base(format=format)
if display_plus:
s = "+" + sign_sep + s
return s
else:
s = timedelta2str(-td, display_plus=False, format=format)
s = "-" + sign_sep + s
return s
if __name__ == "__main__":
import doctest
doctest.testmod()