SQL sum of multiple groups per group - sql

In had a rather large error in my previous question
select earliest date from multiple rows
The answer by horse_with_no_name returns a perfect result, and I am hugely appreciative, however I got my own initial question wrong so I really apologise; if you look at the table below;
circuit_uid |customer_name |rack_location |reading_date | reading_time | amps | volts | kw | kwh | kva | pf | key
--------------------------------------------------------------------------------------------------------------------------------------
cu1.cb1.r1 | Customer 1 | 12.01.a1 | 2012-01-02 | 00:01:01 | 4.51 | 229.32 | 1.03 | 87 | 1.03 | 0.85 | 15
cu1.cb1.r1 | Customer 1 | 12.01.a1 | 2012-01-02 | 01:01:01 | 4.18 | 230.3 | 0.96 | 90 | 0.96 | 0.84 | 16
cu1.cb1.r2 | Customer 1 | 12.01.a1 | 2012-01-02 | 00:01:01 | 4.51 | 229.32 | 1.03 | 21 | 1.03 | 0.85 | 15
cu1.cb1.r2 | Customer 1 | 12.01.a1 | 2012-01-02 | 01:01:01 | 4.18 | 230.3 | 0.96 | 23 | 0.96 | 0.84 | 16
cu1.cb1.s2 | Customer 2 | 10.01.a1 | 2012-01-02 | 00:01:01 | 7.34 | 228.14 | 1.67 | 179 | 1.67 | 0.88 | 24009
cu1.cb1.s2 | Customer 2 | 10.01.a1 | 2012-01-02 | 01:01:01 | 9.07 | 228.4 | 2.07 | 182 | 2.07 | 0.85 | 24010
cu1.cb1.s3 | Customer 2 | 10.01.a1 | 2012-01-02 | 00:01:01 | 7.34 | 228.14 | 1.67 | 121 | 1.67 | 0.88 | 24009
cu1.cb1.s3 | Customer 2 | 10.01.a1 | 2012-01-02 | 01:01:01 | 9.07 | 228.4 | 2.07 | 124 | 2.07 | 0.85 | 24010
cu1.cb1.r1 | Customer 3 | 01.01.a1 | 2012-01-02 | 00:01:01 | 7.32 | 229.01 | 1.68 | 223 | 1.68 | 0.89 | 48003
cu1.cb1.r1 | Customer 3 | 01.01.a1 | 2012-01-02 | 01:01:01 | 6.61 | 228.29 | 1.51 | 226 | 1.51 | 0.88 | 48004
cu1.cb1.r4 | Customer 3 | 01.01.a1 | 2012-01-02 | 00:01:01 | 7.32 | 229.01 | 1.68 | 215 | 1.68 | 0.89 | 48003
cu1.cb1.r4 | Customer 3 | 01.01.a1 | 2012-01-02 | 01:01:01 | 6.61 | 228.29 | 1.51 | 217 | 1.51 | 0.88 | 48004
As you can see each customer now has multiple circuits. So the result would now be the sum of each of the earliest kwh readings for each circuit per customer, so the result in this table would be;
customer_name | kwh(sum)
--------------+-----------
customer 1 | 108 (the result of 87 + 21)
customer 2 | 300 (the result of 179 + 121)
customer 3 | 438 (the result of 223 + 215)
There will be more than 2 circuits per customer and the readings can happen at varying times, hence the need for the 'earliest' reading.
Would anybody have any suggestions for the revised question?
PostgreSQL 8.4 on CentOs/Redhat.

SELECT customer_name, sum(kwh) AS kwh_total
FROM (
SELECT DISTINCT ON (customer_name, circuit_uid)
customer_name, circuit_uid, kwh
FROM readings
WHERE reading_date = '2012-01-02'::date
ORDER BY customer_name, circuit_uid, reading_time
) x
GROUP BY 1
Same as before, just pick the earliest per (customer_name, circuit_uid).
Then sum per customer_name.
Index
A multi-column index like the following will make this very fast:
CREATE INDEX readings_multi_idx
ON readings(reading_date, customer_name, circuit_uid, reading_time);

This is an extension to your original question:
select customer_name,
sum(kwh)
from (
select customer_name,
kwh,
reading_time,
reading_date,
row_number() over (partition by customer_name, circuit_uid order by reading_time) as rn
from readings
where reading_date = date '2012-01-02'
) t
where rn = 1
group by customer_name
Note the new sum() in the outer query and the changed partition by definition in the inner query (compared to your previous question) which calculates the first reading for each circuit_uid now (instead of the first for each customer).

Related

PostgresSql:Comparing two tables and obtaining its result and compare it with third table

TABLE 2 : trip_delivery_sales_lines
+-------+---------------------+------------+----------+------------+-------------+--------+--+
| Sl no | Order_date | Partner_id | Route_id | Product_id | Product qty | amount | |
+-------+---------------------+------------+----------+------------+-------------+--------+--+
| 1 | 2020-08-01 04:25:35 | 34567 | 152 | 432 | 2 | 100 | |
| 2 | 2021-09-11 02:25:35 | 34572 | 130 | 312 | 4 | 150 | |
| 3 | 2020-05-10 04:25:35 | 34567 | 152 | 432 | 3 | 123 | |
| 4 | 2021-02-16 01:10:35 | 34572 | 130 | 432 | 5 | 123 | |
| 5 | 2020-02-19 01:10:35 | 34567 | 152 | 432 | 2 | 600 | |
| 6 | 2021-03-20 01:10:35 | 34569 | 152 | 123 | 1 | 123 | |
| 7 | 2021-04-23 01:10:35 | 34570 | 152 | 432 | 4 | 200 | |
| 8 | 2021-07-08 01:10:35 | 34567 | 152 | 432 | 3 | 32 | |
| 9 | 2019-06-28 01:10:35 | 34570 | 152 | 432 | 2 | 100 | |
| 10 | 2018-11-14 01:10:35 | 34570 | 152 | 432 | 5 | 20 | |
| | | | | | | | |
+-------+---------------------+------------+----------+------------+-------------+--------+--+
From Table 2 : we had to find partners in route=152 and find the sum of product_qty of the last 2 sale [can be selected by desc order_date]
. We can find its result in table 3.
34567 – Serial number [ 1,8]
34570 – Serial number [ 7,9]
34569 – Serial number [6]
TABLE 3 : RESULT OBTAINED FROM TABLE 1,2
+------------+-------+
| Partner_id | count |
+------------+-------+
| 34567 | 5 |
| 34569 | 1 |
| 34570 | 6 |
| | |
+------------+-------+
From table 4 we want to find the above partner_ids leaf count
TABLE 4 :coupon_leaf
+------------+-------+
| Partner_id | Leaf |
+------------+-------+
| 34567 | XYZ1 |
| 34569 | XYZ2 |
| 34569 | DDHC |
| 34567 | DVDV |
| 34570 | DVFDV |
| 34576 | FVFV |
| 34567 | FVV |
| | |
+------------+-------+
From that we can find result as:
34567 – 3
34569-2
34570 -1
TABLE 5: result obtained from TABLE 4
+------------+-------+
| Partner_id | count |
+------------+-------+
| 34567 | 3 |
| 34569 | 2 |
| 34570 | 1 |
| | |
+------------+-------+
Now we want compare table 3 and 5
If partner_id count [table 3] > partner_id count [table 4]
Print partner_id
I want a single query to do all these operation
distinct partner_id can be found by: fROM TABLE 1
SELECT DISTINCT partner_id
FROM trip_delivery_sales ts
WHERE ts.route_id='152'
GROUP BY ts.partner_id
This answers the original version of the problem.
You seem to want to compare totals after aggregating tables 2 and 3. I don't know what table1 is for. It doesn't seem to do anything.
So:
select *
from (select partner_id, sum(quantity) as sum_quantity
from (select tdsl.*,
row_number() over (partition by t2.partner_id order by order_date) as seqnum
from trip_delivery_sales_lines tdsl
) tdsl
where seqnum <= 2
group by tdsl.partner_id
) tdsl left join
(select cl.partner_id, count(*) as leaf_cnt
from coupon_leaf cl
group by cl.partner_id
) cl
on cl.partner_id = tdsl.partner_id
where leaf_cnt is null or sum_quantity > leaf_cnt

How to calculate smoothed moving average using Postgres SQL

I have a table in my Postgresql DB that has the fields: product_id, date, sales_amount.
I am calculating a simple moving average for the last 1 week using the below SQL
SELECT date,
AVG(amount)
OVER(PARTITION BY product_id ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS avg_amount
FROM sales
How can I calculate a smoothed moving average(smma) instead the simple moving average above? I have found that the formula is smma_today = smma_yesterday * (lookback_period - 1) + amount) / lookback_period
but how to translate to SQL?
A CTE or function or query approach suggestion will be appreciated
I am pretty sure you will need recursion since your formula depends on using a value calculated for a previous row in the current row.
with recursive mma as (
(select distinct on (product_id) *, ddate as basedate,
amount as sm_mov_avg
from smma
order by product_id, ddate)
union all
select smma.*, mma.basedate,
( mma.sm_mov_avg
* least(smma.ddate - mma.basedate, 6)
+ smma.amount) / least(smma.ddate - mma.basedate + 1, 7)
from mma
join smma on smma.product_id = mma.product_id
and smma.ddate = mma.ddate + 1
)
select ddate, product_id, amount, round(sm_mov_avg, 2) as sm_mov_avg,
round(
avg(amount) over (partition by product_id
order by ddate
rows between 6 preceding
and current row), 2) as mov_avg
from mma;
Please note how the smooth moving average and the moving average begin to diverge after you reach the lookback of seven days:
ddate | product_id | amount | sm_mov_avg | mov_avg
:--------- | ---------: | -----: | ---------: | ------:
2020-11-01 | 1 | 8 | 8.00 | 8.00
2020-11-02 | 1 | 4 | 6.00 | 6.00
2020-11-03 | 1 | 7 | 6.33 | 6.33
2020-11-04 | 1 | 9 | 7.00 | 7.00
2020-11-05 | 1 | 4 | 6.40 | 6.40
2020-11-06 | 1 | 6 | 6.33 | 6.33
2020-11-07 | 1 | 4 | 6.00 | 6.00
2020-11-08 | 1 | 1 | 5.29 | 5.00
2020-11-09 | 1 | 8 | 5.67 | 5.57
2020-11-10 | 1 | 10 | 6.29 | 6.00
2020-11-11 | 1 | 8 | 6.54 | 5.86
2020-11-12 | 1 | 4 | 6.17 | 5.86
2020-11-13 | 1 | 3 | 5.72 | 5.43
2020-11-14 | 1 | 2 | 5.19 | 5.14
2020-11-15 | 1 | 5 | 5.16 | 5.71
2020-11-16 | 1 | 8 | 5.57 | 5.71
2020-11-17 | 1 | 4 | 5.34 | 4.86
2020-11-18 | 1 | 10 | 6.01 | 5.14
2020-11-19 | 1 | 5 | 5.86 | 5.29
2020-11-20 | 1 | 3 | 5.46 | 5.29
2020-11-21 | 1 | 3 | 5.10 | 5.43
2020-11-22 | 1 | 9 | 5.66 | 6.00
2020-11-23 | 1 | 7 | 5.85 | 5.86
2020-11-24 | 1 | 1 | 5.16 | 5.43
2020-11-25 | 1 | 10 | 5.85 | 5.43
2020-11-26 | 1 | 7 | 6.01 | 5.71
2020-11-27 | 1 | 8 | 6.30 | 6.43
2020-11-28 | 1 | 8 | 6.54 | 7.14
2020-11-29 | 1 | 1 | 5.75 | 6.00
2020-11-30 | 1 | 9 | 6.21 | 6.29
Working Fiddle
Many thanks for Mike Organek's answer that showed the way forward for the calculation was a recursive approach on the query. I am starting off with the simple moving average at some distant point in the past and thereafter using the smoothed average daily which has given us exactly what we needed
with recursive my_table_with_rn as
(
SELECT
product_id,
amount,
sale_date,
row_number() over (partition by product_id order by sale_date) as rn
FROM sale
where 1=1
and sale_date > '01-Jan-2018'
order by sale_date
),
rec_query(rn, product_id, amount, sale_date, smma) as
(
SELECT
rn,
product_id,
amount,
sale_date,
AVG(amount) OVER(PARTITION BY product_id ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS smma --first entry is a simple moving average to start off
from my_table_with_rn
where rn = 1
union all
select
t.rn, t.product_id, t.amount, t.sale_date,
(p.smma * (7 - 1) + amount) / 7 -- 7 is the lookback_period; formula is smma_today = smma_previous * (lookback_period - 1) + amount) / lookback_period
from rec_query p
join my_table_with_rn t on
(t.rn = p.rn + 1 AND t.product_id = p.product_id)
)
SELECT * FROM rec_query

SQL percent of total and weighted average

I have the following postgreSql table stock, there the structure is the following:
| column | pk |
+--------+-----+
| date | yes |
| id | yes |
| type | yes |
| qty | |
| fee | |
The table looks like this:
| date | id | type | qty | cost |
+------------+-----+------+------+------+
| 2015-01-01 | 001 | CB04 | 500 | 2 |
| 2015-01-01 | 002 | CB04 | 1500 | 3 |
| 2015-01-01 | 003 | CB04 | 500 | 1 |
| 2015-01-01 | 004 | CB04 | 100 | 5 |
| 2015-01-01 | 001 | CB02 | 800 | 6 |
| 2015-01-02 | 002 | CB03 | 3100 | 1 |
I want to create a view or query, so that the result looks like this.
The table will show the t_qty, % of total Qty, and weighted fee for each day and each type:
% of total Qty = qty / t_qty
weighted fee = fee * % of total Qty
| date | id | type | qty | cost | t_qty | % of total Qty | weighted fee |
+------------+-----+------+------+------+-------+----------------+--------------+
| 2015-01-01 | 001 | CB04 | 500 | 2 | 2600 | 0.19 | 0.38 |
| 2015-01-01 | 002 | CB04 | 1500 | 3 | 2600 | 0.58 | 1.73 |
| 2015-01-01 | 003 | CB04 | 500 | 1 | 2600 | 0.19 | 0.192 |
| 2015-01-01 | 004 | CB04 | 100 | 5 | 2600 | 0.04 | 0.192 |
| | | | | | | | |
I could do this in Excel, but the dataset is too big to process.
You can use SUM with windows function and some Calculation to make it.
SELECT *,
SUM(qty) OVER (PARTITION BY date ORDER BY date) t_qty,
qty::numeric/SUM(qty) OVER (PARTITION BY date ORDER BY date) ,
fee * (qty::numeric/SUM(qty) OVER (PARTITION BY date ORDER BY date))
FROM T
If you want to Rounding you can use ROUND function.
SELECT *,
SUM(qty) OVER (PARTITION BY date ORDER BY date) t_qty,
ROUND(qty::numeric/SUM(qty) OVER (PARTITION BY date ORDER BY date),3) "% of total Qty",
ROUND(fee * (qty::numeric/SUM(qty) OVER (PARTITION BY date ORDER BY date)),3) "weighted fee"
FROM T
sqlfiddle
[Results]:
| date | id | type | qty | fee | t_qty | % of total Qty | weighted fee |
|------------|-----|------|------|-----|-------|----------------|--------------|
| 2015-01-01 | 001 | CB04 | 500 | 2 | 2600 | 0.192 | 0.385 |
| 2015-01-01 | 002 | CB04 | 1500 | 3 | 2600 | 0.577 | 1.731 |
| 2015-01-01 | 003 | CB04 | 500 | 1 | 2600 | 0.192 | 0.192 |
| 2015-01-01 | 004 | CB04 | 100 | 5 | 2600 | 0.038 | 0.192 |
| 2015-01-02 | 002 | CB03 | 3100 | 1 | 3100 | 1 | 1 |

SQL - Window Functions with dense_rank()

I have a dataset structured such as the one below stored in Hive, call it df:
+-----+-----+----------+--------+
| id1 | id2 | date | amount |
+-----+-----+----------+--------+
| 1 | 2 | 11-07-17 | 0.93 |
| 2 | 2 | 11-11-17 | 1.94 |
| 2 | 2 | 11-09-17 | 1.90 |
| 1 | 1 | 11-10-17 | 0.33 |
| 2 | 2 | 11-10-17 | 1.93 |
| 1 | 1 | 11-07-17 | 0.25 |
| 1 | 1 | 11-09-17 | 0.33 |
| 1 | 1 | 11-12-17 | 0.33 |
| 2 | 2 | 11-08-17 | 1.90 |
| 1 | 1 | 11-08-17 | 0.30 |
| 2 | 2 | 11-12-17 | 2.01 |
| 1 | 2 | 11-12-17 | 1.00 |
| 1 | 2 | 11-09-17 | 0.94 |
| 2 | 2 | 11-07-17 | 1.94 |
| 1 | 2 | 11-11-17 | 1.92 |
| 1 | 1 | 11-11-17 | 0.33 |
| 1 | 2 | 11-10-17 | 1.92 |
| 1 | 2 | 11-08-17 | 0.94 |
+-----+-----+----------+--------+
I wish to partition by id1 and id2, and then order by date descending within each grouping of id1 and id2, and then rank "amount" within that, where the same "amount" on consecutive days would receive the same rank. The ordered and ranked output I'd hope to see is shown here:
+-----+-----+------------+--------+------+
| id1 | id2 | date | amount | rank |
+-----+-----+------------+--------+------+
| 1 | 1 | 2017-11-12 | 0.33 | 1 |
| 1 | 1 | 2017-11-11 | 0.33 | 1 |
| 1 | 1 | 2017-11-10 | 0.33 | 1 |
| 1 | 1 | 2017-11-09 | 0.33 | 1 |
| 1 | 1 | 2017-11-08 | 0.30 | 2 |
| 1 | 1 | 2017-11-07 | 0.25 | 3 |
| 1 | 2 | 2017-11-12 | 1.00 | 1 |
| 1 | 2 | 2017-11-11 | 1.92 | 2 |
| 1 | 2 | 2017-11-10 | 1.92 | 2 |
| 1 | 2 | 2017-11-09 | 0.94 | 3 |
| 1 | 2 | 2017-11-08 | 0.94 | 3 |
| 1 | 2 | 2017-11-07 | 0.93 | 4 |
| 2 | 2 | 2017-11-12 | 2.01 | 1 |
| 2 | 2 | 2017-11-11 | 1.94 | 2 |
| 2 | 2 | 2017-11-10 | 1.93 | 3 |
| 2 | 2 | 2017-11-09 | 1.90 | 4 |
| 2 | 2 | 2017-11-08 | 1.90 | 4 |
| 2 | 2 | 2017-11-07 | 1.94 | 5 |
+-----+-----+------------+--------+------+
I attempted this with the following SQL query:
SELECT
id1,
id2,
date,
amount,
dense_rank() OVER (PARTITION BY id1, id2 ORDER BY date DESC) AS rank
FROM
df
GROUP BY
id1,
id2,
date,
amount
But that query doesn't seem to be doing what I'd like it to as I'm not receiving the output I'm looking for.
It seems like a window function using dense_rank, partition by and order by is what I need but I can't quite seem to get it to give me that sample output that I desire. Any help would be much appreciated! Thanks!
This is quite tricky. I think you need to use lag() to see where the value changes and then do a cumulative sum:
select df.*,
sum(case when prev_amount = amount then 0 else 1 end) over
(partition by id1, id2 order by date desc) as rank
from (select df.*,
lag(amount) over (partition by id1, id2 order by date desc) as prev_amount
from df
) df;

How to calculate running total in SQL

I have my dataset in the given format
It's a month level data along with salary for each month.
I need to calculate cumulative salary for each month end. How can I do this
+----------+-------+--------+---------------+
| Account | Month | Salary | Running Total |
+----------+-------+--------+---------------+
| a | 1 | 586 | 586 |
| a | 2 | 928 | 1514 |
| a | 3 | 726 | 2240 |
| a | 4 | 538 | 538 |
| b | 1 | 956 | 1494 |
| b | 3 | 667 | 2161 |
| b | 4 | 841 | 3002 |
| c | 1 | 826 | 826 |
| c | 2 | 558 | 1384 |
| c | 3 | 558 | 1972 |
| c | 4 | 735 | 2707 |
| c | 5 | 691 | 3398 |
| d | 1 | 670 | 670 |
| d | 4 | 838 | 1508 |
| d | 5 | 1000 | 2508 |
+----------+-------+--------+---------------+
I need to calculate running total column which is cumulative column. How can I do efficiently in SQL?
You can use SUM with ORDER BY clause inside the OVER clause:
SELECT Account, Month, Salary,
SUM(Salary) OVER (PARTITION BY Account ORDER BY Month) AS RunningTotal
FROM mytable