How to calculate smoothed moving average using Postgres SQL - sql

I have a table in my Postgresql DB that has the fields: product_id, date, sales_amount.
I am calculating a simple moving average for the last 1 week using the below SQL
SELECT date,
AVG(amount)
OVER(PARTITION BY product_id ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_amount
FROM sales
How can I calculate a smoothed moving average(smma) instead the simple moving average above? I have found that the formula is smma_today = smma_yesterday * (lookback_period - 1) + amount) / lookback_period
but how to translate to SQL?
A CTE or function or query approach suggestion will be appreciated

I am pretty sure you will need recursion since your formula depends on using a value calculated for a previous row in the current row.
with recursive mma as (
(select distinct on (product_id) *, ddate as basedate,
amount as sm_mov_avg
from smma
order by product_id, ddate)
union all
select smma.*, mma.basedate,
( mma.sm_mov_avg
* least(smma.ddate - mma.basedate, 6)
+ smma.amount) / least(smma.ddate - mma.basedate + 1, 7)
from mma
join smma on smma.product_id = mma.product_id
and smma.ddate = mma.ddate + 1
)
select ddate, product_id, amount, round(sm_mov_avg, 2) as sm_mov_avg,
round(
avg(amount) over (partition by product_id
order by ddate
rows between 6 preceding
and current row), 2) as mov_avg
from mma;
Please note how the smooth moving average and the moving average begin to diverge after you reach the lookback of seven days:
ddate | product_id | amount | sm_mov_avg | mov_avg
:--------- | ---------: | -----: | ---------: | ------:
2020-11-01 | 1 | 8 | 8.00 | 8.00
2020-11-02 | 1 | 4 | 6.00 | 6.00
2020-11-03 | 1 | 7 | 6.33 | 6.33
2020-11-04 | 1 | 9 | 7.00 | 7.00
2020-11-05 | 1 | 4 | 6.40 | 6.40
2020-11-06 | 1 | 6 | 6.33 | 6.33
2020-11-07 | 1 | 4 | 6.00 | 6.00
2020-11-08 | 1 | 1 | 5.29 | 5.00
2020-11-09 | 1 | 8 | 5.67 | 5.57
2020-11-10 | 1 | 10 | 6.29 | 6.00
2020-11-11 | 1 | 8 | 6.54 | 5.86
2020-11-12 | 1 | 4 | 6.17 | 5.86
2020-11-13 | 1 | 3 | 5.72 | 5.43
2020-11-14 | 1 | 2 | 5.19 | 5.14
2020-11-15 | 1 | 5 | 5.16 | 5.71
2020-11-16 | 1 | 8 | 5.57 | 5.71
2020-11-17 | 1 | 4 | 5.34 | 4.86
2020-11-18 | 1 | 10 | 6.01 | 5.14
2020-11-19 | 1 | 5 | 5.86 | 5.29
2020-11-20 | 1 | 3 | 5.46 | 5.29
2020-11-21 | 1 | 3 | 5.10 | 5.43
2020-11-22 | 1 | 9 | 5.66 | 6.00
2020-11-23 | 1 | 7 | 5.85 | 5.86
2020-11-24 | 1 | 1 | 5.16 | 5.43
2020-11-25 | 1 | 10 | 5.85 | 5.43
2020-11-26 | 1 | 7 | 6.01 | 5.71
2020-11-27 | 1 | 8 | 6.30 | 6.43
2020-11-28 | 1 | 8 | 6.54 | 7.14
2020-11-29 | 1 | 1 | 5.75 | 6.00
2020-11-30 | 1 | 9 | 6.21 | 6.29
Working Fiddle

Many thanks for Mike Organek's answer that showed the way forward for the calculation was a recursive approach on the query. I am starting off with the simple moving average at some distant point in the past and thereafter using the smoothed average daily which has given us exactly what we needed
with recursive my_table_with_rn as
(
SELECT
product_id,
amount,
sale_date,
row_number() over (partition by product_id order by sale_date) as rn
FROM sale
where 1=1
and sale_date > '01-Jan-2018'
order by sale_date
),
rec_query(rn, product_id, amount, sale_date, smma) as
(
SELECT
rn,
product_id,
amount,
sale_date,
AVG(amount) OVER(PARTITION BY product_id ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS smma --first entry is a simple moving average to start off
from my_table_with_rn
where rn = 1
union all
select
t.rn, t.product_id, t.amount, t.sale_date,
(p.smma * (7 - 1) + amount) / 7 -- 7 is the lookback_period; formula is smma_today = smma_previous * (lookback_period - 1) + amount) / lookback_period
from rec_query p
join my_table_with_rn t on
(t.rn = p.rn + 1 AND t.product_id = p.product_id)
)
SELECT * FROM rec_query

Related

How can i create duplicate records based the value in another table

I have two tables in my database, Work_Order table which is the source table where work order information's stored i also have Work_Schedule table which contains work schedules which tell peoples in the production floor what to build, when and how much to build.
Work_Order Table looks like
Work order ItemCode Size Qty Qty_per_HR
41051 600111 14L-16.1 55 10
I want duplicate the above work order line in work order table above based on the Qty per hour and automatically create a work scheduler as shown below.
where TARGET = Work_Order.Qty/Work_Order.Qty_per_HR
Work_Schedule Table
Id Start Date/Time End Date/Time Work Order Work Center TARGET ACTUAL
1001 2019-07-22 7:00AM 2019-07-22 8:00AM 41051 1 10
1001 2019-07-22 8:00AM 2019-07-22 9:00AM 41051 1 10
1001 2019-07-22 9:00AM 2019-07-22 10:00AM 41051 1 10
1001 22019-07-22 10:15AM 2019-07-22 11:00AM 41051 1 10
1001 22019-07-22 11:00AM 2019-07-22 12:00PM 41051 1 10
1001 2019-07-22 1:30PM 2019-07-22 2:30PM 41051 1 5
My plan is to use AfterInsert trigger as soon as the user a work order create the duplicates.
Schedule windows
This seems like a natural for a recursive CTE:
with cte as (
select convert(datetime, '2019-07-22 7:00AM') as dt, workorder, 1 as workcenter, qtyperh as target,
itemcode, size, (qty - qtyperh) as qty, qtyperh
from t
union all
select dateadd(hour, 1, dt), workorder, workcenter,
(case when qty > qtyperh then qtyperh else qty end) as target,
itemcode, size, (qty - qtyperh), qtyperh
from cte
where qty > 0
)
select cte.*,
dateadd(second, 60 * 60 * target / qtyperh, dt) as end_dt
from cte
order by workorder, dt;
Here is a db<>fiddle.
Is that what are you after?
CREATE TABLE T(
WorkOrder INT,
ItemCode INT,
Size VARCHAR(25),
Qty INT,
QtyPerH INT
);
INSERT INTO T VALUES
(41051, 600111, '14L-16.1', 55, 10),
(41052, 600112, '14L-16.2', 55, 5);
SELECT T.*
FROM T CROSS APPLY
(
SELECT 1 N
FROM master..spt_values
WHERE [Type] = 'P'
AND
[Number] < (T.Qty / T.QtyPerH)
) TT;
Returns:
+-----------+----------+----------+-----+---------+
| WorkOrder | ItemCode | Size | Qty | QtyPerH |
+-----------+----------+----------+-----+---------+
| 41051 | 600111 | 14L-16.1 | 55 | 10 |
| 41051 | 600111 | 14L-16.1 | 55 | 10 |
| 41051 | 600111 | 14L-16.1 | 55 | 10 |
| 41051 | 600111 | 14L-16.1 | 55 | 10 |
| 41051 | 600111 | 14L-16.1 | 55 | 10 |
| 41052 | 600112 | 14L-16.2 | 55 | 5 |
| 41052 | 600112 | 14L-16.2 | 55 | 5 |
| 41052 | 600112 | 14L-16.2 | 55 | 5 |
| 41052 | 600112 | 14L-16.2 | 55 | 5 |
| 41052 | 600112 | 14L-16.2 | 55 | 5 |
| 41052 | 600112 | 14L-16.2 | 55 | 5 |
| 41052 | 600112 | 14L-16.2 | 55 | 5 |
| 41052 | 600112 | 14L-16.2 | 55 | 5 |
| 41052 | 600112 | 14L-16.2 | 55 | 5 |
| 41052 | 600112 | 14L-16.2 | 55 | 5 |
| 41052 | 600112 | 14L-16.2 | 55 | 5 |
+-----------+----------+----------+-----+---------+
Demo

How to find latest effective price for each month DB2 SQL

I`m trying to build a query that will the latest established price for each month within a year for two price types. A price may or may not change each month for each item and type. Below is my source table
item | date | type | price
------------------------------
itm1 | 20180101 | 1 | 3
itm1 | 20180101 | 2 | 1
itm1 | 20180105 | 1 | 5
itm2 | 20180101 | 1 | 8
itm2 | 20180103 | 2 | 6
itm2 | 20180105 | 2 | 5
itm3 | 20171215 | 1 | 7
itm3 | 20180201 | 1 | 9
itm3 | 20180201 | 2 | 10
And this is what I`m trying to achieve
item | YYYMM |type1_prc | type1_last(max) |type2_prc | typ2_last(max)
| | | effective_date | | effective_date
---------------------------------------------------------------------------
itm1 | 201801 | 5 | 20180105 | 1 | 20180101
itm2 | 201801 | 8 | 20180101 | 5 | 20180105
itm3 | 201801 | 7 | 20171215 | - | -
itm1 | 201802 | 5 | 20180105 | 1 | 20180101
itm2 | 201802 | 8 | 20180101 | 5 | 20180105
itm3 | 201802 | 9 | 20180201 | 10 | 20180201
Thank you!
DB2 doesn't have (to the best of my knowledge) aggregation functions for getting the first element. One relatively simple way is to use the first_value() window function:
select distinct item, to_char(date, 'YYYY-MM') as yyyymm,
first_value(price) over (partition by item, to_char(date, 'YYYY-MM') order by type asc) as type1_price,
max(case when type = 1 then price end) as type1_date
first_value(price) over (partition by item, to_char(date, 'YYYY-MM') order by type desc) as type2_price,
max(case when type = 2 then price end) as type2_date
from t

Obtain MIN() and MAX() over not correlative values in PostgreSQL

I have a problem that I can't found a solution. This is my scenario:
parent_id | transaction_code | way_to_pay | type_of_receipt | unit_price | period | series | number_from | number_to | total_numbers
10 | 2444 | cash | local | 15.000 | 2018 | A | 19988 | 26010 | 10
This result's when a grouping parent_id, transaccion_code, way_to_pay, type_of_receipt, unit_price, periodo, series, MIN(number), MAX(number) and COUNT(number). But the grouping hides that the number is not correlative, because this is my childs situation:
parent_id | child_id | number
10 | 1 | 19988
10 | 2 | 19989
10 | 3 | 19990
10 | 4 | 19991
10 | 5 | 22001
10 | 6 | 22002
10 | 7 | 26007
10 | 8 | 26008
10 | 9 | 26009
10 | 10 | 26010
What is the magic SQL to achieve the following?
parent_id | transaction_code | way_to_pay | type_of_receipt | unit_price | period | series | number_from | number_to | total_numbers
10 | 2444 | cash | local | 15.000 | 2018 | A | 19988 | 19991 | 4
10 | 2444 | cash | local | 15.000 | 2018 | A | 22001 | 22002 | 2
10 | 2444 | cash | local | 15.000 | 2018 | A | 26007 | 26010 | 4
You can identify adjacent numbers by subtracting a sequence. It would help if you showed your query, but the idea is this:
select parent_id, transaccion_code, way_to_pay, type_of_receipt, unit_price, periodo, series,
min(number), max(number), count(*)
from (select t.*,
row_number() over
(partition by parent_id, transaccion_code, way_to_pay, type_of_receipt, unit_price, periodo, series
order by number
) as seqnum
from t
) t
group by parent_id, transaccion_code, way_to_pay, type_of_receipt, unit_price, periodo, series,
(number - seqnum);

max(sum(field query in Hive/SQL

I have a table with lots of transactions for users across a month.
I need to take the hour from each day where Sum(cost) is at its highest.
I've tried MAX(SUM(Cost)) but get an error.
How would I go about doing this please?
here is some sample data
+-------------+------+----------+------+
| user id | hour | date | Cost |
+-------------+------+----------+------+
| 343252 | 13 | 20170101 | 21.5 |
| 32532532 | 13 | 20170101 | 22.5 |
| 35325325 | 13 | 20170101 | 30.5 |
| 325325325 | 13 | 20170101 | 10 |
| 64643643 | 12 | 20170101 | 22 |
| 643643643 | 12 | 20170101 | 31 |
| 436325234 | 13 | 20170101 | 15 |
| 213213213 | 13 | 20170101 | 12 |
| 53265436436 | 17 | 20170101 | 19 |
+-------------+------+----------+------+
Expected Output:
I need just one row per day, where it shows the total cost from the 'most expensive' hour. In this case, 13:00 had a total cost of 111.5
select hr
,dt
,total_cost
from (select dt
,hr
,sum(cost) as total_cost
,row_number () over
(
partition by dt
order by sum(cost) desc
) as rn
from mytable
group by dt,hr
) t
where rn = 1
+----+------------+------------+
| hr | dt | total_cost |
+----+------------+------------+
| 13 | 2017-01-01 | 111.5 |
+----+------------+------------+
Try this:
select AVG(hour) as 'Hour',date as 'Date',sum(cost) as 'TotalCost' from dbo.Table_3 group by date

SQL sum of multiple groups per group

In had a rather large error in my previous question
select earliest date from multiple rows
The answer by horse_with_no_name returns a perfect result, and I am hugely appreciative, however I got my own initial question wrong so I really apologise; if you look at the table below;
circuit_uid |customer_name |rack_location |reading_date | reading_time | amps | volts | kw | kwh | kva | pf | key
--------------------------------------------------------------------------------------------------------------------------------------
cu1.cb1.r1 | Customer 1 | 12.01.a1 | 2012-01-02 | 00:01:01 | 4.51 | 229.32 | 1.03 | 87 | 1.03 | 0.85 | 15
cu1.cb1.r1 | Customer 1 | 12.01.a1 | 2012-01-02 | 01:01:01 | 4.18 | 230.3 | 0.96 | 90 | 0.96 | 0.84 | 16
cu1.cb1.r2 | Customer 1 | 12.01.a1 | 2012-01-02 | 00:01:01 | 4.51 | 229.32 | 1.03 | 21 | 1.03 | 0.85 | 15
cu1.cb1.r2 | Customer 1 | 12.01.a1 | 2012-01-02 | 01:01:01 | 4.18 | 230.3 | 0.96 | 23 | 0.96 | 0.84 | 16
cu1.cb1.s2 | Customer 2 | 10.01.a1 | 2012-01-02 | 00:01:01 | 7.34 | 228.14 | 1.67 | 179 | 1.67 | 0.88 | 24009
cu1.cb1.s2 | Customer 2 | 10.01.a1 | 2012-01-02 | 01:01:01 | 9.07 | 228.4 | 2.07 | 182 | 2.07 | 0.85 | 24010
cu1.cb1.s3 | Customer 2 | 10.01.a1 | 2012-01-02 | 00:01:01 | 7.34 | 228.14 | 1.67 | 121 | 1.67 | 0.88 | 24009
cu1.cb1.s3 | Customer 2 | 10.01.a1 | 2012-01-02 | 01:01:01 | 9.07 | 228.4 | 2.07 | 124 | 2.07 | 0.85 | 24010
cu1.cb1.r1 | Customer 3 | 01.01.a1 | 2012-01-02 | 00:01:01 | 7.32 | 229.01 | 1.68 | 223 | 1.68 | 0.89 | 48003
cu1.cb1.r1 | Customer 3 | 01.01.a1 | 2012-01-02 | 01:01:01 | 6.61 | 228.29 | 1.51 | 226 | 1.51 | 0.88 | 48004
cu1.cb1.r4 | Customer 3 | 01.01.a1 | 2012-01-02 | 00:01:01 | 7.32 | 229.01 | 1.68 | 215 | 1.68 | 0.89 | 48003
cu1.cb1.r4 | Customer 3 | 01.01.a1 | 2012-01-02 | 01:01:01 | 6.61 | 228.29 | 1.51 | 217 | 1.51 | 0.88 | 48004
As you can see each customer now has multiple circuits. So the result would now be the sum of each of the earliest kwh readings for each circuit per customer, so the result in this table would be;
customer_name | kwh(sum)
--------------+-----------
customer 1 | 108 (the result of 87 + 21)
customer 2 | 300 (the result of 179 + 121)
customer 3 | 438 (the result of 223 + 215)
There will be more than 2 circuits per customer and the readings can happen at varying times, hence the need for the 'earliest' reading.
Would anybody have any suggestions for the revised question?
PostgreSQL 8.4 on CentOs/Redhat.
SELECT customer_name, sum(kwh) AS kwh_total
FROM (
SELECT DISTINCT ON (customer_name, circuit_uid)
customer_name, circuit_uid, kwh
FROM readings
WHERE reading_date = '2012-01-02'::date
ORDER BY customer_name, circuit_uid, reading_time
) x
GROUP BY 1
Same as before, just pick the earliest per (customer_name, circuit_uid).
Then sum per customer_name.
Index
A multi-column index like the following will make this very fast:
CREATE INDEX readings_multi_idx
ON readings(reading_date, customer_name, circuit_uid, reading_time);
This is an extension to your original question:
select customer_name,
sum(kwh)
from (
select customer_name,
kwh,
reading_time,
reading_date,
row_number() over (partition by customer_name, circuit_uid order by reading_time) as rn
from readings
where reading_date = date '2012-01-02'
) t
where rn = 1
group by customer_name
Note the new sum() in the outer query and the changed partition by definition in the inner query (compared to your previous question) which calculates the first reading for each circuit_uid now (instead of the first for each customer).