Forecasting with Postgres - sql

I'm trying to do some forecasting on some data I have. The tables below are just examples.
Basically I have an integer value x for today's date in a table.
todays_date, x
07/15/2018, 3
I have another query that has generated the monthly avg change of x for the past 3 years.
month, change
jan, 1
feb, 2
mar, 1
apr,-1
may, 1
jun, -2
jul, 2 ...
All I want to do now is create entries in a new table that would be the "forecast" for the next 6 months by adding the current months historical avg change to the current value of x and keep adding the change value for the next month to that number for the next 6 months. Putting a row in the table for each month.
todays_date, forecast_date, value
07/15/2018, 08/01/2018, 6
07/15/2018, 09/01/2018, 8
07/15/2018, 10/01/2018, 9
07/15/2018, 11/01/2018, 11
07/15/2018, 12/01/2018, 13
07/15/2018, 13/01/2018, 13
I could do this in Go but I would much rather do it in Postgres and possibly create a trigger to populate this forecast table.

Related

Ensuring years and months are running as part of data cleaning

I have 2 datasets:
rainfall per month (mm) from 1982-01 to 2022-08
no. of rainy days per month per year from 1982-01 to 2022-08.
month no_of_rainy_days
0 1982-01 10
1 1982-02 5
2 1982-03 11
3 1982-04 14
4 1982-05 10
month total_rainfall
0 1982-01 107.1
1 1982-02 27.8
2 1982-03 160.8
3 1982-04 157.0
4 1982-05 102.2
Qn 1: As part of ensuring data integrity, how do I ensure that the dates are running consecutively? i.e 1982-01 and next is 1982-02 and not a skip to 1982-03?
I am unsure how to perform the checking and have done a search online. Is it common practice to assume that the years and months are running?
First, separate the year from the month.
df.rename(columns={"month": "ym"}, inplace=True)
df[["year", "month"]] = df["ym"].astype(str).str.split("-", expand=True)
Then you can group the dataframe by year and count the number of observations per year (counts number of rows per year).
observations_per_year = df["year"]\
.groupby(df["year"])\
.agg("count")\
.reset_index(name="observations")
observations_per_year[observations_per_year["observations"] < 12]
Assuming you have any years with less than 12 observations, they will be displayed like so:
year observations
0 1982 11
4 1986 11
5 1987 11
6 1988 10
11 1993 11
Given the lack of detail and no sample data provided, I made some assumptions about your data:
Each data set will not have more than one row for any month of the year (i.e., a maximum of 12 rows/observations per year).
Each dataframe contains a single observation per row, as shown in your examples (so you would do this for each dataframe prior to merging them). As such, counting rows per year-month is an accurate means of counting the number of observations for any given month.
The sorted order of the data is irrelevant (you can later sort by year-month if needed).

Create Date in Google Big Query using Newly Created Column

I have a view that converts fiscal year periods to calendar periods, creating a new column called "NewPeriod". I would then like to create a date using this "NewPeriod" column using the Date() function, Date(Year, NewPeriod, "1"). I am unable to use the NewPeriod in the Date function, is there a way I can accomplish this in the same view?
SELECT distinct
company_code,
Period,
Year,
CASE COMPANY_CODE
WHEN 1 THEN CASE Period
WHEN 4 THEN 1
WHEN 5 THEN 2
WHEN 6 THEN 3
WHEN 7 THEN 4
WHEN 8 THEN 5
WHEN 9 THEN 6
WHEN 10 THEN 7
WHEN 11 THEN 8
WHEN 12 THEN 9
WHEN 1 THEN 10
WHEN 2 THEN 11
WHEN 3 THEN 12
ELSE
Period
END
Else Period
END AS NewPeriod,
FROM
`table`

Tableau: How to get moving average with respect to day of week in last 4 weeks?

e.g: If I have the data as below:
Week 1 Week2 Week3
S M T W T F S S M T W T F S S M T W T F S
2 5 6 7 5 5 3 4 5 7 2 4 3 2 4 5 2 1 2 7 8
If today is Monday, my average will be (5+5+5)/3 which is 5. Tomorrow it will be (6+7+2)/3 which will be 5 again and day after it will be (7+2+1)/3 which will be 3.33
How to get this in Tableau?
First, you can use "Weekday" as a column or row (by rightclicking on the date).
Then you can simply add a Table Calculation "Moving Average" with a specific computing dimension "Week of [Date]"
=> Table Calculation Specifics <=
=> Result <=
Data source used-: Tableau Sample Superstore.
You can do the following-:
Columns-: Week(Order Date)
Rows-: Weekday(Order date)
Put Sales in text.
Right click sales>Quick Table Calculation>Moving Average
right click Sales>edit quick table calculation>
Set the following
Select Moving along-: "Table across"
Previous values-: 4

SQL Teradata - in query create new column that multiplies column by 2 if certain value is true

I have a sql query I'm running that exports 2 columns, cost and months. The months column either has a value of 6 or 2. I want to create a new column that checks the months column and sees what the value is. If the month value is 6 then multiply the cost column by 2 and if the month value is 12 then just copy that same number in the cost column. Sample data:
cost months
100 6
200 12
400 6
expected result:
cost months total
100 6 200
200 12 200
400 6 800
A simple case statement should work:
select
cost,
months,
case when months = 6 then cost * 2
else cost
end as total
from <your table>

how to get a moving average in sql

If I have data from week 1 to week 52 data and I want 4 week Moving Average with 1 week how can I make a SQL query for this? For example, for week 5 I want week1-week4 average, week6 I want week5-week8 average and so on.
I have the columns week and target_value in table A.
Sample data is like this:
Week target_value
1 20
2 10
3 10
4 20
5 60
6 20
So the output I want will start from week 5 as only week 1-week4 is available not before that.
Output data will look like:
Week Output
5 15 (20+10+10+20)/4=15 Moving Average week1-week4
6 25 (10+10+20+60)/4=25 Moving Average week2-week5
The data is in hive but I can move it to oracle if it is simpler to do this there.
SELECT
Week,
(SELECT ISNULL(AVG(B.target_value), A.target_value)
FROM tblA B
WHERE (B.Week < A.Week)
AND B.Week >= (A.Week - 4)
) AS Moving_Average
FROM tblA A
The ISNULL keeps you from getting a null for your first week since there is no week 0. If you want it to be null, then just leave the ISNULL function out.
If you want it to start at week 5 only, then add the following line to the end of the SQL that I wrote:
WHERE A.Week > 4
Results:
Week Moving_Average
1 20
2 20
3 15
4 13
5 15
6 25