Add a column of minutes to a datetime in pandas - pandas

I have a dataframe with a start time and the length of operation. I'm trying to figure out out to add the length (in minutes) to the start time in order to figure out the end time of the session. I've run a few different variations of the same general idea and keep getting the same error, "unsupported type for timedelta minutes component: Series". The code extract is below:
data= {'Name': ['John', 'Peter'],
'Start' : [2, 2],
'Length': [120, 90],
}
df = pd.DataFrame.from_records(data)
df['Start'] = pd.to_datetime(df['Start'])
df['Length'] = pd.to_datetime(df['Length'])
df["tdiffinmin"] = df['Start'].apply(lambda x: x + pd.DateOffset(minutes = df["Length"]))
Ive also tried the follow as other methods of doing this math and keep getting similar errors.
df["tdiffinmin"] = df['Start'].apply(lambda x: x -pd.DateOffset(minutes = df["Length"]))
df["tdiffinmin"] = (df['Start']. + timedelta(minutes = df["Length"])).dt.total_seconds() / 60
df['tdiffinmin'] = df['Start'] - pd.DateOffset(minutes = df["Length"])
The full code reads from a data set (excel sheet or CSV), populates a Dataframe, and this is some of the math I am doing. Originally it was done with Start and Stop times, so I know something similar is possible. In the dataset, Length is in minutes and Start is a date and time, so datetime is necessary.

You should convert Length into timedelta, not datetime:
df['Start'] = pd.to_datetime(df['Start'])
df['Length'] = pd.to_timedelta(df['Length'], unit='min')
df['tdiffinmin'] = df['Start'] + df['Length']
Output:
Length Name Start tdiffinmin
0 02:00:00 John 1970-01-01 00:00:00.000000002 1970-01-01 02:00:00.000000002
1 01:30:00 Peter 1970-01-01 00:00:00.000000002 1970-01-01 01:30:00.000000002

Related

Converting pandas._libs.tslibs.timestamps.Timestamp to seconds since midnight?

I have a pandas._libs.tslibs.timestamps.Timestamp object, e.g., 2016-01-01 07:00:04.85+00:00 and I want to create an int object that stores the number of seconds since the previous midnight.
In the above example, it would return 7 * 3600 + 0 * 60 + 4.85 = 25204.85
Is there a quick way to do this in pandas?
You can use normalize() to subtract the date part:
# ts = pd.to_datetime('2016-01-01 07:00:04.85+00:00')
>>> (ts - ts.normalize()).total_seconds()
25204.85
It also works with DataFrame through dt accessor:
# df = pd.DataFrame({'date': [ts]})
>>> (df['date'] - df['date'].dt.normalize()).dt.total_seconds()
0 25204.85
Name: date, dtype: float64
Not sure if this is what you are looking for but here is an implementation:
import pandas as pd
def seconds_from_midnight(date):
return date.hour * 3600 + date.minute * 60 + date.second + date.microsecond / 1000000
date = pd.Timestamp.now()
print(date)
print(seconds_from_midnight(date))

Increment a time and add it in data frame column

Hi I am new to python and I am looking for below result.
I have From_Curr(3), To_Curr(3) and making currency pairs and adding new column in my data frame as time.
3*3 = 9 currency pairs created So I want same time for currency pairs and then increment by 1 hr again for same pairs as shown below.
Problem statement is time gets incremented after every row.
Actual df:
Expected df:
Thanks for any help and appreciate your time.
`
import pandas as pd
import datetime
from datetime import timedelta
data = pd.DataFrame({'From':["EUR","GBP",'USD'],
'To':["INR","SGD",'HKD'],
'time':''})
init_date = datetime.datetime(1, 1, 1)
for index, row in data.iterrows():
row['time'] = str(init_date)[11:19]
init_date = init_date + timedelta(hours=1.0)
`
I'm not understanding why you are repeating the combinations, and incrementing in one hour in the last half.
But for this case, you can do something like this:
import pandas as pd
data = pd.DataFrame({'From':["EUR","GBP",'USD'],
'To':["INR","SGD",'HKD'],
'time':''})
outlist = [ (i, j) for i in data["From"] for j in data["To"] ]*2 # Create double combinations
data = pd.DataFrame(data=outlist,columns=["From","To"])
data["time"] = "00:00:00"
data["time"].iloc[int(len(data)/2):len(data)] = "01:00:00" # Assign 1 hour to last half
data["time"] = pd.to_datetime(data["time"]).dt.time
Update: After some clarifications
import pandas as pd
data = pd.DataFrame(
{"From": ["EUR", "GBP", "USD"], "To": ["INR", "SGD", "HKD"], "time": ""}
)
outlist = [
(i, j) for i in data["From"] for j in data["To"]
] * 2 # Create double combinations, i think that for your case it would be 24 instead of two
data = pd.DataFrame(data=outlist, columns=["From", "To"])
data["time"] = data.groupby(["From", "To"]).cumcount() # Get counts of pairs values
data["time"] = data["time"] * pd.to_datetime("01:00:00").value # Multiply occurrences by the value of 1 hour
data["time"] = pd.to_datetime(data["time"]).dt.time # Pass to time
I think this script covers all your needs, happy coding :)
Regards,

Extract PI OSIsoft Monthly Interval in Python

I am trying to extract the sum of PI data from OSIsoft 10m (10 minute) data in a one (1) month interval using Python pandas. However, I either get an error from OSIsoft or Python when I choose the internal notation as "M" for OSIsoft or "1mo" for python. Neither notation seems to work w/out an error. I have a function that calls the interval of data to plot and save and this works for intervals of "1d", "30d", "1w", "1y" for example but I cannot get the sum of data for each 1-month interval. Is it a conflict of how python requires a description of "month" with an "M" and OSISoft that requires "1mo"?? thank you, Here is my code:
def get_tag_history2(tagname, starttime, endtime, interval="10m"):
# pull historical data
tag = PIPoint.FindPIPoint(piServer, tagname)
# name = tag.Name.lower()
timerange = AFTimeRange(starttime, endtime)
span = AFTimeSpan.Parse(interval)
#summariesvalues
summaries = tag.Summaries(timerange, span, AFSummaryTypes.Average, AFCalculationBasis.TimeWeighted, AFTimestampCalculation.Auto)
recordedValuesDict = dict()
for summary in summaries:
for event in summary.Value:
dt = datetime.strptime(
event.Timestamp.LocalTime.ToString(),'%m/%d/%Y %I:%M:%S %p')
recordedValuesDict[dt] = event.Value
# turn dictionary into pd.DataFrame
df = pd.DataFrame(
recordedValuesDict.items(), columns=['TimeStamp', 'Value'])
#Send it to a dateTime Index then set the index
df['TimeStamp'] = pd.to_datetime(df['TimeStamp']) + pd.Timedelta(interval)
df.set_index(['TimeStamp'], inplace=True)
return df
if __name__ == '__main__':
"""
Set inputs
"""
pitags = ['JC1.WF.DOMINA.ProdEffective','HO1.WF.DOMINA.ProdEffective','BC1.WF.DOMINA.ProdEffective']
start_time = '2020-01-01 00:00'
end_time = '2022-01-01 00:00'
interval = "M"
"""
Run Script
"""
connect_to_Server('PDXPI01')
output = pd.DataFrame()
for tag in pitags:
values = get_tag_history2(
tag, start_time, end_time, interval=interval)
output[tag] = values['Value']
for i, col in enumerate(output.columns):
output[col].plot(fig=plt.figure(i))
plt.title(col)
plt.show()
The error when using interval = "1mo" is --- >
ValueError: invalid unit abbreviation: mo
The error when using interval = "M" is --- >
FormatException: The 'M' token in the string 'M' was not expected.
at OSIsoft.AF.Time.AFTimeSpan.FormatError(String input, Char token, Boolean throwErrors, AFTimeSpan& result)

Getting Binance Historical Data For Specific TimeZone

I found this python script on the web, it gets OHLCV historical data from Binance api by wanted dates, assets and time intervals. The script currently returns the data for UTC time.
I want to modify it so it will return the data (daily/hourly) according to a specified timezone. I guess
it takes only to change one function or add an argument but I can't manage to do it correctly.
How can I change it so it will return data for UTC+2 (or any other time zone)?
import time
import dateparser
import pytz
import os
from datetime import datetime
import binance
print(binance.__file__)
from binance.client import Client
import time
import pandas as pd
def date_to_milliseconds(date_str):
"""Convert UTC date to milliseconds.
If using offset strings add "UTC" to date string e.g. "now UTC", "11 hours ago UTC"
See dateparse docs for formats http://dateparser.readthedocs.io/en/latest/
:param date_str: date in readable format, i.e. "January 01, 2018", "11 hours ago UTC", "now UTC"
:type date_str: str
"""
# get epoch value in UTC
epoch = datetime.utcfromtimestamp(0).replace(tzinfo=pytz.utc)
# parse our date string
d = dateparser.parse(date_str)
# if the date is not timezone aware apply UTC timezone
if d.tzinfo is None or d.tzinfo.utcoffset(d) is None:
d = d.replace(tzinfo=pytz.utc)
# return the difference in time
return int((d - epoch).total_seconds() * 1000.0)
def interval_to_milliseconds(interval):
"""Convert a Binance interval string to milliseconds
:param interval: Binance interval string 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 3d, 1w
:type interval: str
:return:
None if unit not one of m, h, d or w
None if string not in correct format
int value of interval in milliseconds
"""
ms = None
seconds_per_unit = {
"m": 60,
"h": 60 * 60,
"d": 24 * 60 * 60,
"w": 7 * 24 * 60 * 60
}
unit = interval[-1]
if unit in seconds_per_unit:
try:
ms = int(interval[:-1]) * seconds_per_unit[unit] * 1000
except ValueError:
pass
return ms
def GetUpdateData(kline):
Time = time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime(kline[0]/1000))
Open = kline[1]
High = kline[2]
Low = kline[3]
Close = kline[4]
Volume = kline[5]
Close_time = time.strftime("%a, %d %b %Y %H:%M:%S", time.localtime(kline[6]/1000))
Quote_asset_volume = kline[7]
Number_of_trades = kline[8]
Taker_buy_base_asset_volume = kline[9]
Taker_buy_quote_asset_volume = kline[10]
return Time,Open,High,Low,Close,Volume,Close_time,Quote_asset_volume,Number_of_trades,Taker_buy_base_asset_volume,Taker_buy_quote_asset_volume
def get_historical_klines(symbol, interval, start_str, end_str=None):
"""Get Historical Klines from Binance
See dateparse docs for valid start and end string formats http://dateparser.readthedocs.io/en/latest/
If using offset strings for dates add "UTC" to date string e.g. "now UTC", "11 hours ago UTC"
:param symbol: Name of symbol pair e.g BNBBTC
:type symbol: str
:param interval: Biannce Kline interval
:type interval: str
:param start_str: Start date string in UTC format
:type start_str: str
:param end_str: optional - end date string in UTC format
:type end_str: str
:return: list of OHLCV values
"""
# create the Binance client, no need for api key
client = Client("", "")
# init our list
output_data = []
# setup the max limit
limit = 500
# convert interval to useful value in seconds
timeframe = interval_to_milliseconds(interval)
# convert our date strings to milliseconds
start_ts = date_to_milliseconds(start_str)
# if an end time was passed convert it
end_ts = None
if end_str:
end_ts = date_to_milliseconds(end_str)
idx = 0
# it can be difficult to know when a symbol was listed on Binance so allow start time to be before list date
symbol_existed = False
while True:
# fetch the klines from start_ts up to max 500 entries or the end_ts if set
temp_data = client.get_klines(
symbol=symbol,
interval=interval,
limit=limit,
startTime=start_ts,
endTime=end_ts
)
# handle the case where our start date is before the symbol pair listed on Binance
if not symbol_existed and len(temp_data):
symbol_existed = True
if symbol_existed:
# append this loops data to our output data
output_data += temp_data
# update our start timestamp using the last value in the array and add the interval timeframe
start_ts = temp_data[len(temp_data) - 1][0] + timeframe
else:
# it wasn't listed yet, increment our start date
start_ts += timeframe
idx += 1
# check if we received less than the required limit and exit the loop
if len(temp_data) < limit:
# exit the while loop
break
# sleep after every 3rd call to be kind to the API
if idx % 3 == 0:
time.sleep(1)
return output_data
start = "01 January, 2017"
end = "01 February, 2017"
symbols = ['ETHBTC']
interval = '1d'#Client.KLINE_INTERVAL_15MIN
for symbol in symbols:
klines = get_historical_klines(symbol, interval, start, end)
times = []
Opens = []
Highs = []
Lows = []
Closes = []
Volumes = []
Close_times = []
Quote_asset_volumes = []
Number_of_tradess = []
Taker_buy_base_asset_volumes = []
Taker_buy_quote_asset_volumes = []
for k in klines:
Time,Open,High,Low,Close,Volume,Close_time,Quote_asset_volume,Number_of_trades,Taker_buy_base_asset_volume,Taker_buy_quote_asset_volume = GetUpdateData(k)
times.append(Time)
Opens.append(Open)
Highs.append(High)
Lows.append(Low)
Closes.append(Close)
Volumes.append(Volume)
Close_times.append(Close_time)
Quote_asset_volumes.append(Quote_asset_volume)
Number_of_tradess.append(Number_of_trades)
Taker_buy_base_asset_volumes.append(Taker_buy_base_asset_volume)
Taker_buy_quote_asset_volumes.append(Taker_buy_quote_asset_volume)
DataStruct = pd.DataFrame()
DataStruct['time'] = times
DataStruct['Open'] = Opens
DataStruct['High'] = Highs
DataStruct['Low'] = Lows
DataStruct['Close'] = Closes
DataStruct['Volume'] = Volumes
DataStruct['Close_time'] = Close_times
DataStruct['Quote_asset_volume'] = Quote_asset_volumes
DataStruct['Number_of_trades'] = Number_of_tradess
DataStruct['Taker_buy_base_asset_volume'] = Taker_buy_base_asset_volumes
DataStruct['Taker_buy_quote_asset_volume'] = Taker_buy_quote_asset_volumes
FileName = symbol+ '_' + start+ '_' + end + ' .csv'
FileName = FileName.replace(' ','_')
FileName = FileName.replace(',','')
Path2Save = os.path.normpath(r'')
SaveStrFile = os.path.normpath(Path2Save+ '\\' +FileName)
#save FeatureWeights to CSV file
D_S_header = ['time','Open','High','Low','Close','Volume','Close_time','Quote_asset_volume','Number_of_trades','Taker_buy_base_asset_volume','Taker_buy_quote_asset_volume']
DataStruct.to_csv(path_or_buf = SaveStrFile, header = D_S_header )
In these lines you see the timezone being defined:
# get epoch value in UTC
epoch = datetime.utcfromtimestamp(0).replace(tzinfo=pytz.utc)
Just redefine the timezone there. For a list of timezones supported by pytz you can get a list using pytz.all_timezones.

Get day of year from a string date in pandas dataframe

I want to turn my date string into day of year... I try this code..
import pandas as pd
import datetime
data = pd.DataFrame()
data = pd.read_csv(xFilename, sep=",")
and get this DataFrame
Index Date Tmin Tmax
0 1950-01-02 -16.508 -2.096
1 1950-01-03 -6.769 0.875
2 1950-01-04 -1.795 8.859
3 1950-01-05 1.995 9.487
4 1950-01-06 -17.738 -9.766
I try this...
convert = lambda x: x.DatetimeIndex.dayofyear
data['Date'].map(convert)
with this error:
AttributeError: 'str' object has no attribute 'DatetimeIndex'
I expect to get new date to match 1950-01-02 = 2, 1950-01-03 = 3...
Thank for your help... and sorry Im new on python
I think need pass parameter parse_dates to read_csv and then call Series.dt.dayofyear:
data = pd.read_csv(xFilename, parse_dates=["Date"])
data['dayofyear'] = data['Date'].dt.dayofyear