I'm trying to get the last 13 week moving average of my inventory and sales columns (I have thousands of rows but I wanna get the last 13 week moving average from the recent going back since there's new data every week).
I have the weekno there (that is setup as YYYY-WEEKNO).
Anyone has any idea on how I can do this in Oracle SQL?
| weekno | inventory | sales |
| 202111| 5 | 78 |
| 202110| 6 | 50 |
| 202109| 3 | 80 |
| 202108| 2 | 75 |
| 202107| 5 | 33 |
| 202106| 8 | 77 |
| 202105| 3 | 80 |
| 202104| 2 | 75 |
| 202103| 5 | 33 |
| 202102| 8 | 77 |
| 202101| 8 | 77 |
| 202053| 2 | 75 |
| 202052| 5 | 33 |
| 202051| 8 | 77 |
| 202050| 8 | 77 |
..... and so on
You can use window functions. Assuming you have data for every week:
select t.*,
avg(inventory) over (order by weekno rows between 12 preceding and current row),
avg(sales) over (order by weekno rows between 12 preceding and current row)
from t;
Note: This assumes that previous 13 weeks includes the current week. If not, you would use:
avg(sales) over (order by weekno rows between 13 preceding and 1 preceding)
Related
I have a table, which when sorted according to the week number gives the units left of a product at a store. The units left should always be decreasing. However, there are some garbage values due to which the units left in a store increases for few weeks and then decreases again. I just have these four columns to work with. I want to replace the garbage values with the correct value. I am looking for the SQL for the following garbage value replacement logic - the units left for each week should be the minimum of the values in units left of all rows above it sorted by week number ascending.
e.g. here it goes to 12 for week 4 and 5 and then back to 9 - which is incorrect - they [the 12s] should each be replaced by 9
INPUT:---
+-------+------------+-------------+------------+
| Store | Product ID | Week Number | Units left |
+-------+------------+-------------+------------+
| XXX | A1 | 1 | 10.0 |
| XXX | A1 | 2 | 9 |
| XXX | A1 | 3 | 9 |
| XXX | A1 | 4 | 12 |
| XXX | A1 | 5 | 12 |
| XXX | A1 | 6 | 9 |
| XXX | A1 | 7 | 8 |
+-------+------------+-------------+------------+
OUTPUT:----
+-------+------------+-------------+------------+
| Store | Product ID | Week Number | Units left |
+-------+------------+-------------+------------+
| XXX | A1 | 1 | 10.0 |
| XXX | A1 | 2 | 9 |
| XXX | A1 | 3 | 9 |
| XXX | A1 | 4 | 9 |
| XXX | A1 | 5 | 9 |
| XXX | A1 | 6 | 9 |
| XXX | A1 | 7 | 8 |
+-------+------------+-------------+------------+
The DB is Teradata.
You could try cumulative minimum function in teradata.
Select Store, Product_ID, Week_Number, Units,
MIN(Units) over (PARTITION BY Store, Product_ID ORDER BY Week_Number ROWS UNBOUNDED PRECEDING) as Corrected_units from TABLE_NAME;
You can use a cumulative minimum:
select t.*,
min(units_left) over (partition by store, product_id
order by date
rows between unbounded preceding and current row
) as imputed_units_left
from t;
This is standard SQL syntax and should work in all the databases you originally tagged.
If you want to actually change the data -- well, the syntax varies by database.
I've two columns containing sales from last year and this year and a data column with the number of weeks we are in. I want to calculate a yearly rolling sum in a new column from the week we are in back in the past till that same week.
In my example if i'm in week 2 this year my rolling sum will be sum all the values of sales last year from week 2 til week 52 plus sales from this year until week 2 including!!
Here's an example in excel of what my table and results would look like:
Assuming your data look like this
Table
+------+----------+-------+
| week | sales_ly | sales |
+------+----------+-------+
| 1 | 65 | 100 |
+------+----------+-------+
| 2 | 93 | 130 |
+------+----------+-------+
| 3 | 83 | 134 |
+------+----------+-------+
| 4 | 3083 | 59 |
+------+----------+-------+
| 5 | 30984 | 39 |
+------+----------+-------+
| 6 | 38 | 580 |
+------+----------+-------+
| 7 | 28 | 94 |
+------+----------+-------+
| 8 | 48 | 93 |
+------+----------+-------+
| 9 | 24 | 984 |
+------+----------+-------+
| 10 | 49 | 95 |
+------+----------+-------+
You need to create two cumulatives and sum them in the same measure.
Rolling Sum =
VAR CurrentYearCumulative =
CALCULATE(
SUM('Table'[sales]),
FILTER(ALLSELECTED('Table'),'Table'[week] <= MAX('Table'[week] ) )
)
VAR LastYearCumulative =
CALCULATE(
SUM('Table'[sales_ly]),
FILTER(ALLSELECTED('Table'),'Table'[week] >= MAX('Table'[week]) )
)
RETURN
CurrentYearCumulative + LastYearCumulative
The output
I want to calculate a rolling average in a table and keep track of the starting time of each calculated window frame.
My problem is, that I expect result count reduced compared of the rows in the table. But my query retuns the exact same row number. I think I understand why it does not work, but I don't know the remedy.
Let's say I have a table with example data that looks like this:
+------+-------+
| Tick | Value |
+------+-------+
| 1 | 1 |
| 2 | 3 |_
| 3 | 5 |
| 4 | 7 |_
| 5 | 9 |
| 6 | 11 |_
| 7 | 13 |
| 8 | 15 |_
| 9 | 17 |
| 10 | 19 |_
+------+-------+
I want to calculate the average of every nth item, for example of two rows (see marks above) so that I get an result of:
+--------------+--------------+
| OccurredTick | ValueAverage |
+--------------+--------------+
| 1 | 2 |
| 3 | 6 |
| 5 | 10 |
| 7 | 14 |
| 9 | 18 |
+--------------+--------------+
I tried that with
SELECT
FIRST_VALUE(Tick) OVER (
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) OccurredTick,
AVG(Value) OVER (
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) ValueAverage
FROM TableName;
What I get in return is:
+--------------+--------------+
| OccurredTick | ValueAverage |
+--------------+--------------+
| 1 | 2 |
| 2 | 4 |
| 3 | 6 |
| 4 | 8 |
| 5 | 10 |
| 6 | 12 |
| 7 | 14 |
| 8 | 16 |
| 9 | 18 |
| 10 | 19 |
+--------------+--------------+
You could use aggregation. If tick is always increasing with no gaps:
select min(tick), avg(value) avg_value
from mytable
group by cast((tick - 1) / 2 as integer)
You can change 2 to whatever group size suits to best.
If tick are not sequentially increasing, we can generate a sequence with row_number()
select min(tick), avg(value) avg_value
from (
select t.*, row_number() over(order by tick) rn
from mytable t
) t
group by cast((rn - 1) / 2 as integer)
I have been working on this query for most of the night, and just cannot get it to work. This is an addendum to this question. The query should find the "Seqnum" of the last Maximum over the last 10 records. I am unable to limit the last Maximum to just the window.
Below is my best effort at getting there although I have tried many other queries to no avail:
SELECT [id], high, running_max, seqnum,
MAX(CASE WHEN ([high]) = running_max THEN seqnum END) OVER (ORDER BY [id]) AS [lastmax]
FROM (
SELECT [id], [high],
MAX([high]) OVER (ORDER BY [id] ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS running_max,
ROW_NUMBER() OVER (ORDER BY [id]) as seqnum
FROM PY t
) x
When the above query is run, the below results.
id | high | running_max | seqnum | lastmax |
+----+--------+-------------+--------+---------+
| 1 | 28.12 | 28.12 | 1 | 1 |
| 2 | 27.45 | 28.12 | 2 | 1 |
| 3 | 27.68 | 28.12 | 3 | 1 |
| 4 | 27.4 | 28.12 | 4 | 1 |
| 5 | 28.09 | 28.12 | 5 | 1 |
| 6 | 28.07 | 28.12 | 6 | 1 |
| 7 | 28.2 | 28.2 | 7 | 7 |
| 8 | 28.7 | 28.7 | 8 | 8 |
| 9 | 28.05 | 28.7 | 9 | 8 |
| 10 | 28.195 | 28.7 | 10 | 8 |
| 11 | 27.77 | 28.7 | 11 | 8 |
| 12 | 28.27 | 28.7 | 12 | 8 |
| 13 | 28.185 | 28.7 | 13 | 8 |
| 14 | 28.51 | 28.7 | 14 | 8 |
| 15 | 28.5 | 28.7 | 15 | 8 |
| 16 | 28.23 | 28.7 | 16 | 8 |
| 17 | 27.59 | 28.7 | 17 | 8 |
| 18 | 27.6 | 28.51 | 18 | 8 |
| 19 | 27.31 | 28.51 | 19 | 8 |
| 20 | 27.11 | 28.51 | 20 | 8 |
| 21 | 26.87 | 28.51 | 21 | 8 |
| 22 | 27.12 | 28.51 | 22 | 8 |
| 23 | 27.22 | 28.51 | 23 | 8 |
| 24 | 27.3 | 28.5 | 24 | 8 |
| 25 | 27.66 | 28.23 | 25 | 8 |
| 26 | 27.405 | 27.66 | 26 | 8 |
| 27 | 27.54 | 27.66 | 27 | 8 |
| 28 | 27.65 | 27.66 | 28 | 8 |
+----+--------+-------------+--------+---------+
Unfortunately the lastmax column is taking the last max of all the previous records and not the max of the last 10 records only. The way it should result is below:
It is important to note that their can be duplicates in the "High" column, so this will need to be taken into account.
Any help would be greatly appreciated.
This isn't a bug. The issue is that high and lastmax have to come from the same row. This is a confusing aspect when using window functions.
Your logic in the outer query is looking for a row where the lastmax on that row matches the high on that row. That last occurred on row 8. The subsequent maxima are "local", in the sense that there was a higher value on that particular row.
For instance, on row 25, the value is 26.660. That is the maximum value that you want from row 26 onward. But on row 25 itself, then maximum is 28.230. That is clearly not equal to high on that row. So, it doesn't match in the outer query.
I don't think you can easily do what you want using window functions. There may be some tricky way.
A version using cross apply works. I've used id for the lastmax. I'm not sure if you really need seqnum:
select py.[id], py.high, t.high as running_max, t.id as lastmax
from py cross apply
(select top (1) t.*
from (SELECT top (10) t.*
from PY t
where t.id <= py.id
order by t.id desc
) t
order by t.high desc
) t;
Here is a db<>fiddle.
I have a table with customer_number, week, and sales. I need to check if there were 12 consecutive weeks of no sales for each customer and create a flag of 0/1.
I can check the last 12 weeks or a certain time frame, but what's the best way to check for consecutive runs? Here is the code I have so far:
select * from weekly_sales
where customer_nbr in (123, 234)
and week < '2015-11-01'
and week > '2014-11-01'
order by customer_nbr, week
;
Sql Fiddle Demo
Here is a simplify version only need a week_id and sales
SELECT S1.weekid start_week, MAX(S2.weekid) end_week, SUM (S2.sales)
FROM Sales S1
JOIN Sales S2
ON S2.weekid BETWEEN S1.weekid and S1.weekid + 11
WHERE S1.weekid BETWEEN 1 and 25 -- your search range
GROUP BY S1.weekid
Let me know if that work for you
OUTPUT
| start_week | end_week | |
|------------|----------|----|
| 1 | 12 | 12 |
| 2 | 13 | 8 |
| 3 | 14 | 3 |
| 4 | 15 | 2 |
| 5 | 16 | 0 | <-
| 6 | 17 | 0 | <- no sales for 12 week
| 7 | 18 | 0 | <-
| 8 | 19 | 4 |
| 9 | 20 | 9 |
| 10 | 21 | 11 |
| 11 | 22 | 15 |
| 12 | 23 | 71 |
| 13 | 24 | 78 |
| 14 | 25 | 86 |
| 15 | 25 | 86 | < - less than 12 week range
| 16 | 25 | 86 | < - below this line
| 17 | 25 | 86 |
| 18 | 25 | 86 |
| 19 | 25 | 86 |
| 20 | 25 | 82 |
| 21 | 25 | 77 |
| 22 | 25 | 75 |
| 23 | 25 | 71 |
| 24 | 25 | 15 |
| 25 | 25 | 8 |
Your final query should have
HAVING SUM (S2.sales) = 0
AND COUNT(*) = 12
Ummmmm...You could use between 'week' and 'week', and you can use too the "count(column)" in order to improve performance.
So you only have to compare if result is bigger than 0