Let's say I am training a model for predicting tomorrow's sales. I have data about previous days and future days and I know my previous sales. About tomorrow I know that it is a weekday there will be rain and it is a holiday. How can I use this data to make predictions?
Dataset looks like this.
Weekday
Holiday
Weather
Sales
1
0
Rainy
25
1
0
Rainy
27
1
1
Sunny
23
0
0
Sunny
24
0
0
Cloudy
31
I created the training set by using the previous 150 days with multivariant lstm. However to do prediction I use only previous days' data.
I have data about tomorrow and want to use it. How can I do that?
You can shift Weekday/Holiday/Weather data by -1 and use it as an input during training. Than at inference time you use tomorrows data as an input.
As an example please see "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition", by Aurélien Géron, p. 559:
"...df_mulvar["next_day_type"] = df["day_type"].shift(-1) # we know tomorrow's type"
This example is also available at (see section "Multivariate time series"):
https://github.com/ageron/handson-ml3/blob/main/15_processing_sequences_using_rnns_and_cnns.ipynb
Related
I have a price series in a dataframe, with a datetime index. Just appending the top 5 rows of the price series, but I basically have data all the way from 2020-04-01.
Date
CTH3 Comdty
2022-11-28
78.95
2022-11-25
80.18
2022-11-23
82.90
2022-11-22
82.42
2022-11-21
79.78
So for example, the weekly return for 2002-11-28 should be based on the price from 5 business days ago, i.e. 2022-11-21, and so (78.95 - 79.78)/79.78 = -1.04%
I would like to calculate week-on-week (WoW) return, month-on-month (MoM) return, year-on-year (YoY) return for each day. For MoM and YoY, it should be based on the price from exactly 1 month or 1 year ago respectively, but if that day is not a business day and there is no price, then to take the price from day before and so on. For this I know I can use .ffill or .bfill in some way.
My current thinking is to use .loc and use a for loop to input the 1 week ago, 1 month ago, and 1 year ago prices as 3 different columns and then proceed to do the % calculation. But this seems a tad bit tedious. How would I go about doing this in a more efficient way?
Instead of .iloc, you can also try .iat.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iat.html
I have a data for a period from December 2013 to November 2018. I converted it into a data frame as shown here.
Date 0.1 0.2 0.3 0.4 0.5 0.6
2013-12-01 301.04 297.4 296.63 295.76 295.25 295.25
2013-12-04 297.96 297.15 296.25 295.25 294.43 293.45
2013-12-05 298.4 297.61 296.65 295.81 294.75 293.89
2013-12-08 298.82 297.95 297.15 296.25 295.45 294.41
2013-12-09 298.65 297.65 296.95 296.02 295.13 294.05
2013-12-12 299.05 297.33 296.65 295.81 294.85 293.85
2013-12-16 301.05 300.28 299.38 298.45 297.65 296.51
....
2014-01-10 301.65 297.45 296.46 295.52 294.65 293.56
2014-01-11 301.99 298.95 298.39 297.15 296.05 295.11
2014-01-12 299.86 298.65 297.73 296.82 296.35 295.37
2014-01-13 299.25 298.15 297.3 296.43 295.26 294.31
I want to take monthly mean and seasonal mean of this data.
For monthly mean I have tried
df.resample('M').mean()
And it worked well.
For seasons, I would like decompose this data into 4 seasons (December-Feb; Mar-May; June-Aug; and Sep-Nov) of three months interval. While I tried the resample with 3 months interval. i.e.
df.resample('3M').mean()
However this is not worked well as it giving the average for the starting December month separately and then considering the above said interval for a calendar year (ie. from January to March and so on).
I would like to know if there are any possible ways to avoid this by specifying which month is our period of consideration begins.
Moreover, I would also like to know whether we can define these seasons beforehand and group the data accordingly to get averages with more ease.
You can define the origin in resample:
df.resample('M', origin=pd.Timestamp('2013-12-01')).mean()
To preface my question, I did see the following link, and if it is the same query I fear my skills may be to primitive to confirm whether it is the case: Flat Apportionment of values across time periods, any help would be appreciated.
Please bear with be as my programming skills are still relatively basic.
I was wondering if it was possible to equally apportion values over time based on a date field?
Data is based on enrolment units, with a number value for hours and start and end date values.
Eg: Row level data
Unit Name
Department
Hours
Start Date
End Date
ABC
Electrical
30
1/01/2021
31/03/2021
DEF
Hospitality
50
1/03/2021
31/04/2021
Then wanted to equally distribute the hours by month based on the end date
End result to display would be something like this:
Total Hours by Department
Department
Jan 2021
Feb 2021
Mar 2021
April 2021
Electrical
10
10
10
Hospitality
25
25
Total
10
10
35
25
Is something like this even possible?
Again my skills are still basic and this may be a very stupid question. Apologies in advance.
I'm using Prophet (Python) to predict and analysis time series in bulk. that means that my time series share the same properties, but they are not exactly the same. They all run from 2016-01-01 to 2020-Jul-01.
I would like to cross validate my results using the first 3 years of data, and my forecast goal is 15 days only.
What is the best configuration to test my fit using the first 3 years, aiming for a 15 days forecast?
My naive try is the one below:
df_cv = cross_validation(mts, initial="1095 days", period='31 days', horizon = '15 days')
I'm not sure what to add in the 'period' and in the 'horizon' parameters.
As mentioned in Prophet's documentation:
We specify the forecast horizon (horizon), and then optionally the size of the initial training period (initial) and the spacing between cutoff dates (period).
Thus, a forecast is made for every observed point between cutoff and cutoff + horizon.
So, you can specify any combination of the 'period' and in the 'horizon' parameters as long as their sum is equal to the period for which you want to forecast (15 days).
I've been working on this for a while and trying different styles... I need to develop a tracking spreadsheet (for 50 employees) that will calculate accruing vacation time and vacation time used based on the following parameters:
*Employees who have worked less than 3 years but past the 90 day anniversary with the company (at which they start accruing) (there is one more caveat on this... the accrual day is the 1st of the month of the month past the 90 days)[I figured this 90 day item out on my excel sheet] it will accrue at:
- 4 hours a month, at the end of the month, with a MAX cap of 72 hours that they can have stored (but if they use vacation time and fall below the 72 hours they can continue to accrue again up to 72 hours...)
*Employees who have worked more than 3 years but less than 6 years with the company carry over their prior vacation and begin to accrue from here onward at:
- 6.8 hours a month, at the end of the month, with a MAX cap of 122.4 hours that they can have stored (but if they use vacation time and fall below the 122.4 hours they can continue to accrue again up to 122.4 hours...)
*Employees 6 years and over with the company carry over their prior vacation and begin to accrue from here onward at:
- 10 hours a month, at the end of the month, with a MAX cap of 180 hours that they can have stored (but if they use vacation time and fall below the 180 hours they can continue to accrue again up to 180 hours....)
& yes, I need to be able to deduct vacation time used.
Does anyone have any suggestions for layout or for a formula that can do part of these functions? I appreciate any advice or suggestion on what else I can use!
I have created a test sheet, and was accruing based on these conditions for the first and started on the second set of rules (for years 3+).
However when I accrue to the max of 72 hours on the first policy, it no longer accrues correctly if they used vacation and fall under the 72 hours cap again.
I know this is a overcomplicated policy, but that is what the company wants and they will not budge.... Any help or advice is appreciated. I know my sheet isn't great.. but I'm trying options.
Below is the equation I used for the 90 day:
=IF(F2<TODAY()-90, F2+90, "90 Day Period")
Then to get the first day of the month after I used:
=IF(G2="90 Day Period","N/A",DATE(YEAR(G2),MONTH(G2)+1,1))
I tried using for the accrual of the first rule (but it has issues...):
=IF(N2="N/A","N/A",IF(N2<=36,MIN(72,((N2*4)-P2),72),(72-P2)))
For second rule:
=IF(AND(N2>36,N2<=72),MIN(122.4,((N2-36)*6.8)+Q2-S2),0)
Let
column A EmpName
column B EntryDate
Column C DaysSpent for the current month
Column D capped Accruel buffer at end of month
repeat Columns C and D month after month
example
01-Sep-2013 01-Oct-2013
EmpName EntryDate spent buffer spent buffer ... etc ...
.-------.-----------.-----------.------.-----------.------.
me 01-Jan-2010 0 12 0 18.8
you 01-Jun-2013 0 4 0 4
In order not to get spaghetti formulas I recommend to create some user defined functions in VBA, like
Function GetCap(EntryDate, ThisDate) As Single
Function GetMonthly(EntryDate, ThisDate) As Single
By experience it's easier to debug/maintain 2-3 nested If's or Select Case's in VBA than 92 character long formulas with no blanks, no comments etc. in the sheet. Should the business logic change, there's one code block to review - instead of dozens/hundreds of formula in a sheet that has grown for 3x12 months x 50 users.
The above functions may want the help of e.g.
Function EndOfMonth(MyDate) as Date
Function BeginOfNextMonth(MyDate) as Date
so that in the sheet you just
manually enter hours spent month after month
calculate new buffer as = MIN([oldbuffer] - [Spent] + GetMonthly(...), GetCap(...))
carefully use relative/absolute addressing to make the formula "copyable" across columns/rows, e.g.
row-absolute on ThisDate got from the header when copying downwards
column-absolute on EntryDate for each Emp when copying rightwards
You can of course use =GetCap(...) and =GetMonthly(...) directly in cells of your sheet to display intermediate results and for debugging purposes.
Be carefull when you compare dates
Tips:
3 years later is not always 365x3 days later
check what the VBA DateSerial() functions does for months > 12 and months < 0
the end of next month always is the first days of 2 months ahead minus 1 ... even in February of a leap year ggg
and post more questions if you get stuck on these functions.