I'm trying to generate pandas dataframe with timeseries index with the fixed interval. As an input parameters I need to provide set start and end date. The challenge is that the generated index starts either from month start freq='3MS' or month end with freq='3M'. That cannot be defined in number of days as the whole year needs to have exact 4 periods and the start date needs to be as the defined start date.
The expected output should be in this case:
2020-10-05
2021-01-05
2021-04-05
2021-10-05
Any ideas appreciated.
interpolated = pd.DataFrame( index=pd.date_range('2020-10-05', '2045-10-05', freq='3M'),columns['dummy'])
Related
I know that it is possible to group your data by time intervals of the same length by using the function resample. But how can I group by time intervals of custom length (i.e. irregular time intervals)?
Here is an example:
Say we have a dataframe with time values, like this:
rng = pd.date_range(start='2015-02-11', periods=7, freq='M')
df = pd.DataFrame({ 'Date': rng, 'Val': np.random.randn(len(rng)) })
And we have the following time intervals:
2015-02-12 -----
2015-05-10
2015-05-10 -----
2015-08-20
2015-08-20 -----
2016-01-01
It is clear that rows with index 0,1,2 belong to the first time interval, rows with index 3,4,5 belong to the second time interval and row 7 belongs to the last time interval.
My question is: how do I group these rows according to those specific time intervals, in order to perform aggregate functions (e.g. mean) on them?
I have last 5 years monthly data. I am using that to create a forecasting model using fbprophet. Last 5 months of my data is as follows:
data1['ds'].tail()
Out[86]: 55 2019-01-08
56 2019-01-09
57 2019-01-10
58 2019-01-11
59 2019-01-12
I have created the model on this and made a future prediction dataframe.
model = Prophet(
interval_width=0.80,
growth='linear',
daily_seasonality=False,
weekly_seasonality=False,
yearly_seasonality=True,
seasonality_mode='additive'
)
# fit the model to data
model.fit(data1)
future_data = model.make_future_dataframe( periods=4, freq='m', include_history=True)
After 2019 December, I need the next year first four months. But it's adding next 4 months with same year 2019.
future_data.tail()
ds
59 2019-01-12
60 2019-01-31
61 2019-02-28
62 2019-03-31
63 2019-04-30
How to get the next year first 4 months in the future dataframe? Is there any specific parameter in that to adjust the year?
The issue is because of the date-format i.e. the 2019-01-12 (2019 December as per your question) is in format "%Y-%d-%m"
Hence, it creates data with month end frequency (stated by 'm') for the next 4 periods.
Just for reference this is how the future dataframe is created by Prophet:
dates = pd.date_range(
start=last_date,
periods=periods + 1, # An extra in case we include start
freq=freq)
dates = dates[dates > last_date] # Drop start if equals last_date
dates = dates[:periods] # Return correct number of periods
Hence, it infers the date format and extrapolates in the future dataframe.
Solution: Change the date format in training data to "%Y-%m-%d"
Stumbled here searching for the appropriate string for minutes
As per the docs the date time need to be YY-MM-DD format -
The input to Prophet is always a dataframe with two columns: ds and y. The ds (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. The y column must be numeric, and represents the measurement we wish to forecast.
2019-01-12 in YY-MM-DD is 2019-12-01 ; using this
>>> dates = pd.date_range(start='2019-12-01',periods=4 + 1,freq='M')
>>> dates
DatetimeIndex(['2019-12-31', '2020-01-31', '2020-02-29', '2020-03-31',
'2020-04-30'],
dtype='datetime64[ns]', freq='M')
Other formats here; it is not given explicitly for python in prophet docs
https://pandas.pydata.org/docs/reference/api/pandas.tseries.frequencies.to_offset.html
dates = pd.date_range(start='2022-03-17 11:40:00',periods=10 + 1,freq='min')
>>> dates
DatetimeIndex(['2022-03-17 11:40:00', '2022-03-17 11:41:00',
'2022-03-17 11:42:00', '2022-03-17 11:43:00',
..],
dtype='datetime64[ns]', freq='T')
Is there any way based on a date to take the last one month data? I have searched a lot but I can't find a good and precise solution. If the current date is on index 420 the date is 2012-01-09. I want to have a data frame with data from 2011-12-09 until 2012-01-09.
import pandas as pd
import numpy as np
times = pd.DataFrame(pd.date_range('2012-01-01', '2012-04-01', freq='30min'), columns=['date'])
times['date'] = pd.to_datetime(times['date'])
times['value'] = np.random.randint(1, 6, times.shape[0])
months = times.iloc[0:420].sort_values(by='date', ascending=True).set_index('date').last('1M')
Using the .last command the results end on 2012-01-01 as this is the last month. I understand it but is there any way to find the last one month data without using timedelta or relative delta? In the case of both if a date is missing then an error appears which is also a problem.
Thank you.
I think what you're looking for is pd.Period. You can convert all of your datetimes to a month period and then search using that
# turn your datetimes to month periods
times["month"] = times[times["date"].dt.to_period("m")]
# turn your search date to a period
your_date = pd.Period(your_date, "m")
# search times
times[times["month"] == your_date]
So i would like to make a column name call date
The first entry i would make it today's Date, i.e. 23/07/2019
and the following row to be the date + 1 i.e. 24/07/2019 so on...
This is easily done in Excel but i tried this simple thing in pandas and i just cant figure out how!
I already have a dateframe called df
so to put down todays date is relatively simple.
df.Date = pd.datetime.now().date()
But im not sure which function would get me the date+1 in the following rows.
Thanks
pd.date_range can use 'today' to set the dates. Normalize then create the Series yourself, otherwise pandas thinks the DatetimeIndex should be the Index too.
import pandas as pd
pd.Series(pd.date_range('today', periods=30, freq='D').normalize(),
name='Date')
0 2019-07-23
1 2019-07-24
...
28 2019-08-20
29 2019-08-21
Name: Date, dtype: datetime64[ns]
If adding a new column to the DataFrame:
df['Date'] = pd.date_range('today', periods=len(df), freq='D').normalize()
pd.date_range is what you are looking for. To build a series of 31 days starting from today:
today = pd.Timestamp.now().normalize()
s = pd.date_range(today, today + pd.Timedelta(days=30), freq='D').to_series()
I need a way to return a LIST of months, not the number of months between the start and end of a period, and can't find a solution.
For example, if a record shows that a contract period starts on 2017-04-12 and ends on 2018-04-22, I need to return a full list of months that covers. So the output would look like this:
Months_active
2017-04
2017-05
2017-06
2017-07
....
2018-03
2018-04
Any help would be much appreciated.
There's a nice extension for creating time series in Teradata:
SELECT To_Char(Begin(pd),'yyyy-mm') -- this extracts the year/month
FROM myTbable
-- this create one row for each month in the date range
EXPAND ON PERIOD(Trunc(start_date,'mon'), Last_Day(end_date)+1) AS pd BY ANCHOR Month_Begin