Pandas resampling hourly timeseries into hourly proportion timeseries - pandas

I am working with a hourly time series (Date, Time (hr), P) and trying to calculate the proportion of daily total 'Amount' for each hour. I know I can us Pandas' resample('D', how='sum') to calculate the daily sum of P (DailyP) but in the same step, I would like to use the daily P to calculate proportion of daily P in each hour (so, P/DailyP) to end up with an hourly time series (i.e., same frequency as original). I am not sure if this can even be called 'resampling' in Pandas term.
This is probably apparent from my use of terminology, but I am an absolute newbie at Python or programming for that matter. If anyone can suggest a way to do this, I would really appreciate it.
Thanks!

A possible approach is to reindex the daily sums back to the original hourly index (reindex) and filling the values forward (so that every hour gets the value of the sum of that day, fillna):
df.resample('D', how='sum').reindex(df.index).fillna(method="ffill")
And this you can use to divide your original dataframe with.
An example:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame({'P' : np.random.rand(72)}, index=pd.date_range('2013-05-05', periods=72, freq='h'))
>>> df.resample('D', 'sum').reindex(df.index).fillna(method="pad")
P
2013-05-05 00:00:00 14.049649
2013-05-05 01:00:00 14.049649
...
2013-05-05 22:00:00 14.049649
2013-05-05 23:00:00 14.049649
2013-05-06 00:00:00 13.483974
2013-05-06 01:00:00 13.483974
...
2013-05-06 23:00:00 13.483974
2013-05-07 00:00:00 12.693711
2013-05-07 01:00:00 12.693711
..
2013-05-07 22:00:00 12.693711
2013-05-07 23:00:00 12.693711

Related

Facebook Prophet Future Dataframe

I have last 5 years monthly data. I am using that to create a forecasting model using fbprophet. Last 5 months of my data is as follows:
data1['ds'].tail()
Out[86]: 55 2019-01-08
56 2019-01-09
57 2019-01-10
58 2019-01-11
59 2019-01-12
I have created the model on this and made a future prediction dataframe.
model = Prophet(
interval_width=0.80,
growth='linear',
daily_seasonality=False,
weekly_seasonality=False,
yearly_seasonality=True,
seasonality_mode='additive'
)
# fit the model to data
model.fit(data1)
future_data = model.make_future_dataframe( periods=4, freq='m', include_history=True)
After 2019 December, I need the next year first four months. But it's adding next 4 months with same year 2019.
future_data.tail()
ds
59 2019-01-12
60 2019-01-31
61 2019-02-28
62 2019-03-31
63 2019-04-30
How to get the next year first 4 months in the future dataframe? Is there any specific parameter in that to adjust the year?
The issue is because of the date-format i.e. the 2019-01-12 (2019 December as per your question) is in format "%Y-%d-%m"
Hence, it creates data with month end frequency (stated by 'm') for the next 4 periods.
Just for reference this is how the future dataframe is created by Prophet:
dates = pd.date_range(
start=last_date,
periods=periods + 1, # An extra in case we include start
freq=freq)
dates = dates[dates > last_date] # Drop start if equals last_date
dates = dates[:periods] # Return correct number of periods
Hence, it infers the date format and extrapolates in the future dataframe.
Solution: Change the date format in training data to "%Y-%m-%d"
Stumbled here searching for the appropriate string for minutes
As per the docs the date time need to be YY-MM-DD format -
The input to Prophet is always a dataframe with two columns: ds and y. The ds (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. The y column must be numeric, and represents the measurement we wish to forecast.
2019-01-12 in YY-MM-DD is 2019-12-01 ; using this
>>> dates = pd.date_range(start='2019-12-01',periods=4 + 1,freq='M')
>>> dates
DatetimeIndex(['2019-12-31', '2020-01-31', '2020-02-29', '2020-03-31',
'2020-04-30'],
dtype='datetime64[ns]', freq='M')
Other formats here; it is not given explicitly for python in prophet docs
https://pandas.pydata.org/docs/reference/api/pandas.tseries.frequencies.to_offset.html
dates = pd.date_range(start='2022-03-17 11:40:00',periods=10 + 1,freq='min')
>>> dates
DatetimeIndex(['2022-03-17 11:40:00', '2022-03-17 11:41:00',
'2022-03-17 11:42:00', '2022-03-17 11:43:00',
..],
dtype='datetime64[ns]', freq='T')

Timeseries resample error - none of Dateindex in column pandas

Please excuse obvious errors - still in the learning process.
I am trying to do a simple timeseries plot on my data with a frequency of 15 minutes. The idea is to plot monthly means, starting with resampling data every hour - including only those hourly means that have atleast 1 observation in the interval. There are subsequent conditions for daily and monthly means.
This is relatively simpler only if this error does not crop up- "None of [DatetimeIndex(['2016-01-01 05:00:00', '2016-01-01 05:15:00',\n....2016-12-31 16:15:00'],\n dtype='datetime64[ns]', length=103458, freq=None)] are in the [columns]"
This is my code:
#Original dataframe
Date value
0 1/1/2016 0:00 405.22
1 1/1/2016 0:15 418.56
Date object
value object
dtype: object
#Conversion of 'value' column to numeric/float values.
df.Date = pd.to_datetime(df.Date,errors='coerce')
year=df.Date.dt.year
df['Year'] = df['Date'].map(lambda x: x.year )
df.value = pd.to_numeric(df.value,errors='coerce' )
Date datetime64[ns]
value float64
Year int64
dtype: object
Date value Year
0 2016-01-01 00:00:00 405.22 2016
1 2016-01-01 00:15:00 418.56 2016
df=df.set_index(Date)
diurnal1 = df[df['Date']].resample('h').mean().count()>=2
**(line of error)**
diurnal_mean_1 = diurnal1.mean()[diurnal1.count() >= 1]
(the code follows)
Any help in solving the error will be appreciated.
I think you want df=df.set_index('Date') (Date is a string). Also I would move the conversions over into the constructor if possible after you get it working.

Convert daily stock returns to weekly stock returns

I'm very new to pandas and I'm trying to convert daily stock return into weekly stock returns by finding the product of (1 + return) for each day Monday to Friday.
Here is an example of what i have so far (data is just an example, not real numbers):
In[1]: df
In [2]:
Date AAPL NFLX INTC
2019-09-09 0.01 0.0012 0.00873
2019-09-10 0.014 0.0074 0.0837738
2019-09-11 0.0123 0.007123 0.09383
2019-09-12 0.0028 0.07234 0.0484
2019-09-13 0.00172 0.8427 0.09484
My dataset is much larger than what I'm showing. But essentially I just want to find the product of (1+return) for every consecutive Monday to Friday.
The ideal output would be a dataframe with fridays as indices, and then weekly return values displayed under the stock tickers
The line of code below should do it:
(1+df).resample('W-FRI').prod()-1
What the line above is doing is resampling the (1 + daily return) (check pandas resample documentation for further information) to a weekly frequency with Friday set as the resampling day ('W-FRI'). Finally, the prod() is multiplying the (1 + daily return) when weekly resampling is perfomed to return the accumulated return for each week.

groupby date using other start time than midnight

I am aggregating some data by date.
for dt,group in df.groupby(df.timestamp.dt.date):
# do stuff
Now, I would like to do the same, but without using midnight as time offset.
Still, I would like to use groupby, but e.g. in 6AM-6AM bins.
Is there any better solution than a dummy column?
unfortunately, resample as discussed in
Resample daily pandas timeseries with start at time other than midnight
Resample hourly TimeSeries with certain starting hour
does not work, as I do need to apply any resampling/aggregation function
You can, for example, subtract the offset before grouping:
for dt, group in df.groupby(df.timestamp.sub(pd.to_timedelta('6H')).dt.date):
# do stuff
There's a base argument for resample or pd.Grouper that is meant to handle this situation. There are many ways to accomplish this, pick whichever you feel is more clear.
'1D' frequency with base=0.25
'24h' frequency with base=6
'1440min' frequency with base=360
Code
df = pd.DataFrame({'timestamp': pd.date_range('2010-01-01', freq='10min', periods=200)})
df.resample(on='timestamp', rule='1D', base=0.25).timestamp.agg(['min', 'max'])
#df.resample(on='timestamp', rule='24h', base=6).timestamp.agg(['min', 'max'])
#df.resample(on='timestamp', rule=f'{60*24}min', base=60*6).timestmap.agg(['min', 'max'])
min max
timestamp
2009-12-31 06:00:00 2010-01-01 00:00:00 2010-01-01 05:50:00 #[Dec31 6AM - Jan1 6AM)
2010-01-01 06:00:00 2010-01-01 06:00:00 2010-01-02 05:50:00 #[Jan1 6AM - Jan2 6AM)
2010-01-02 06:00:00 2010-01-02 06:00:00 2010-01-02 09:10:00 #[Jan2 6AM - Jan3 6AM)
For completeness, resample is a convenience method and is in all ways the same as groupby. If for some reason you absolutely cannot use resample you could do:
for dt, gp in df.groupby(pd.Grouper(key='timestamp', freq='24h', base=6)):
...
which is equivalent to
for dt, gp in df.resample(on='timestamp', rule='24h', base=6):
...

Use pandas asfreq/resample to get the end of the period

I am trying to find the mean for each day and put that value as an input for every hour in a year. The problem is that the last day only contains the first hour.
rng = pd.date_range('2011-01-01', '2011-12-31')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
days = ts.resample('D')
day_mean_in_hours = asfreq('H', method='ffill')
day_mean_in_hours.tail(5)
2011-12-30 20:00:00 -0.606819
2011-12-30 21:00:00 -0.606819
2011-12-30 22:00:00 -0.606819
2011-12-30 23:00:00 -0.606819
2011-12-31 00:00:00 -2.086733
Freq: H, dtype: float64
Is there a nice way to change the frequency to hour and still get the full last day?
You could reindex the frame using an hourly DatetimeIndex that covers the last full day.
hourly_rng = pd.date_range('2011-01-01', '2012-01-01', freq='1H', closed='left')
day_mean_in_hours = day_mean_in_hours.reindex(hourly_rng, method='ffill')
See Resample a time series with the index of another time series for another example.