how to simplify this multiple-CTE solution to this sql question? - sql

DDL:
create table transactions
(
product_id int,
store_id int,
quantity int,
price numeric
);
DML:
insert into transactions values
(1, 1, 10, 2),
(2, 1, 5, 2),
(1, 2, 5, 4),
(2, 2, 2, 4),
(2, 3, 1, 20),
(1, 3, 1, 8),
(2, 4, 2, 10),
(1, 5, 2, 5),
(2, 5, 1, 3),
(2, 6, 4, 8);
I'm trying to find the top 3 products of the top 3 stores, both are based on sale amount. The solution I have is to use cte as below:
with cte as
(
select store_id, rank_store
from
(select
*,
dense_rank() over(order by sale desc) as rank_store
from
(select
store_id, sum(quantity * price) as sale
from transactions
group by 1) t) t2
where
rank_store <= 3
),
cte2 as
(
select
a.store_id, a.product_id,
sum(a.quantity * a.price) as sale_store_product
from
transactions as a
join
cte as b on a.store_id = b.store_id
group by
1, 2
order by
1, 2
),
cte3 as
(
select
*,
dense_rank() over (partition by store_id order by sale_store_product desc) as rank_product
from
cte2
)
select *
from cte3
where rank_product <= 3;
Here is the expected result:
Basically, the first cte is to get the top 3 stores based on sale amount, I use dense_rank() window function to handle tie cases. then the 2nd cte is to get the top 3 stores' products and their total sale amount. The last cte is to use dense_rank() window function again to rank the products in each stores based on their sale amount. then my last query is to get the top 3 products in each store based on the sale amount.
I'm wondering if this can be improved a bit since I feel three CTEs is kind of too complicated. Appreciate for sharing any solutions and ideas. Thanks.

I'm trying to find the top 3 products of the top 3 stores
How can this be done without aggregating the data twice -- once for store/products and once for stores? This is possible using window functions along with aggregation:
select sp.*
from (select sp.*,
dense_rank() over (order by store_sales, store_id) as store_seqnum
from (select t.store_id, t.product_id,
sum(quantity * price) as sp_sales,
sum(sum(quantity * price)) over (partition by store_id) as store_sales,
row_number() over (partition by t.store_id order by sum(quantity * price)) as sp_seqnum
from transactions t
group by t.store_id, t.product_id
) sp
) sp
where store_seqnum <= 3 and sp_seqnum <= 3;
The inner subquery calculates the product/store information. The next level ranks the stores -- notes that ties are broken using store_id.
Here is a db<>fiddle.

Related

postgresql: group by columns/windows function/min-max and complex query

Imagine I've invoices from two branches. I need to select min and max invoice date and on that date show branch id. If at min/max date several branches have invoices, choose any.
CREATE TEMP TABLE invoice (
id int not null,
branch_id int not null,
c_date date not null,
PRIMARY KEY (id)
);
insert into invoice (id, branch_id, c_date) values
(1, 1, '2020-01-01')
,(2, 2, '2020-01-01')
,(3, 1, '2020-01-02')
,(4, 2, '2020-01-02')
,(5, 2, '2020-01-03');
The straightforward solution is (skip max part to do not overcomplicate the query).
select i2.branch_id, i2.c_date from (
select min(i1.id) minid
from (select min(i.c_date) mind, max(i.c_date) maxd
from invoice i
)a
join invoice i1
on a.mind=i1.c_date) b
join invoice i2 on b.minid=i2.id
Window function solution a bit simpler but awkward too. Please keep in mind that the actual query is more complex, and I provide only the core part.
select * from (
select a.branch_id, a.c_date from(
select *, rank() over (order by c_date) r from invoice i
) a where a.r=1
limit 1
) mn,
(select a.branch_id, a.c_date from(
select *, rank() over (order by c_date desc) r from invoice i
) a where a.r=1
limit 1
) mx
Any guesses on how to write the query more elegantly?
One method is a trick using arrays:
select min(date),
(array_agg(branch_id order by date))[1] as first_branch,
max(date),
(array_agg(branch_id order by date desc))[1] as last_branch
from invoice;
This does aggregate all values into an array, so you wouldn't want to use this if there are too many values in each result row.

Finding last and second last date and corresponding values

Consider the following schema (fiddle):
CREATE TABLE meters
(
id int,
description varchar(10)
);
CREATE TABLE readings
(
id int,
meterid int,
date date,
value int
);
INSERT INTO readings (id, meterid, date, value)
VALUES
(1, 4, '20081231', 500),
(2, 4, '20090203', 550),
(3, 1, '20090303', 300),
(4, 2, '20090303', 244),
(5, 4, '20090303', 600),
(6, 1, '20090403', 399),
(7, 2, '20090403', 288),
(8, 3, '20090403', 555);
INSERT INTO meters (id, description)
VALUES
(1, 'this'),
(2, 'is'),
(3, 'not'),
(4, 'really'),
(5, 'relevant');
For each meter.id I need to find the latest reading date, value, and the value difference vs previous reading.
For the sample data my output would look like this (plus some other columns from meters):
meterid
latest
value
delta value
1
20090403
399
99
2
20090403
288
44
3
20090403
555
null
4
20090303
600
50
5
null
null
null
I figured I could first create a query with the relevant info and then join with that, but I struggle with achieving that
I've tried to adapt this method but for each id I get 2 rows instead of one
SELECT
p.meterid,
[1] AS [LastDate],
[2] AS [BeforeLastDate]
FROM
(SELECT TOP (2) WITH ties
*,
RowN = ROW_NUMBER() OVER (PARTITION BY r.meterid ORDER BY date DESC)
FROM
readings AS r
ORDER BY
(ROW_NUMBER() OVER (PARTITION BY r.meterid ORDER BY date DESC) - 1) / 2 + 1) a
PIVOT
(MAX(date) FOR RowN IN ([1], [2])) p
ORDER BY
p.meterId
I'm looking for ideas how to solve the double row issue, or if that's a dead end how to get my desired output
If I understand correctly, you can just use window functions:
select m.id, r.date, r.value, r.value - prev_value
from meters m left join
(select r.*,
lag(value) over (partition by meterid order by date) as prev_value,
row_number() over (partition by meterid order by date desc) as seqnum
from readings r
) r
on r.meterid = m.id and seqnum = 1
order by m.id;
No aggregation is necessary. Here is a db<>fiddle.
Use LEAD to get the next value going backwards, and ROW_NUMBER to get the first row.
SELECT *
FROM
(SELECT *,
delta_value = value - LEAD(value) over (PARTITION BY r.meterid ORDER BY date DESC),
RowN = Row_Number() over(PARTITION BY r.meterid ORDER BY date DESC)
FROM readings AS r
) a
WHERE RowN = 1
ORDER BY p.meterId

How to query the V shaped data?

TxnID RunningAmount MemberID
==================================
1 80000 20
2 90000 20
3 70000 20 //<==== Falls but previously never below 100k, hence ignore
4 90000 20
5 110000 20
6 60000 20 //<==== Falls below 100k, hence we want ID 8
7 80000 20
8 120000 20
9 85000 28
...
....
How to construct the query such that it group by members, get the first transactionID that formed the "V" shape. Even a pseudocode is fine, I can't share my attempt because I am totally clueless about how to do it.
UPDATES:
Sorry for the lack of explanations on the conditions. The base amount we looking is 100k. ID is random, definitely we need to have rownumber
We ignore all transactions before ID = 5 because their runningAmount is never exceeded 100k.
Now when ID=5, exceeded 100k, we check if transactions after ID=5 if there is a down trend in runningAmount that falls below 100k.
Immediately we see ID=6 falls below 100k, so we want to find the first transaction that exceed 100k again(if there is).
From the data sample above, the expected result is only one record, which is ID=8.
For every member, there will only be either one or zero record found based on the conditions I've mentioned
Try this query:
declare #tbl table(TxnID int, RunningAmount int, MemberID int);
insert into #tbl values
(1, 80000, 20),
(2, 90000, 20),
(3, 70000, 20),
(4, 90000, 20),
(5, 110000, 20),
(6, 60000, 20),
(7, 120000, 20),
(8, 85000, 28);
select TxnID, RunningAmount, MemberID,
LAG(VShape) over (partition by MemberID order by TxnID) VShape
from (
select TxnID, RunningAmount, MemberID,
case when rn < lagrn and rn < leadrn then 1 else 0 end VShape
from (
select *,
LAG(rn) over (partition by MemberID order by TxnID) lagRn,
LEAD(rn) over (partition by MemberID order by TxnID) leadRn
from (
select TxnID,
RunningAmount,
MemberID,
ROW_NUMBER() over (partition by MemberID order by RunningAmount) rn
from #tbl
) a
) a
) a
Last column VShape indicates if value in RunningAmount completes V shape (although you could be more clearer on what it means instead of everybody figuring it out). Now you can filter values based on RunningAmount (wheter they fall below or above 100k).
Here is version for earlier versions of SQL Server that don't have LAG and LEAD functions:
;with cte as (
select *,
ROW_NUMBER() over (partition by MemberID order by RunningAmount) rn
from #tbl
), cte2 as (
select c1.TxnID, c1.RunningAmount, c1.MemberID, c1.rn, c2.rn [lagRn] , c3.rn [leadRn]
from cte c1
left join cte c2 on c1.TxnID = c2.TxnID + 1 and c1.MemberID = c2.MemberID
left join cte c3 on c1.TxnID = c3.TxnID - 1 and c1.MemberID = c3.MemberID
), cte3 as (
select TxnID, RunningAmount, MemberID,
case when rn < lagrn and rn < leadrn then 1 else 0 end VShape
from cte2
), FinalResult as (
select c1.TxnID, c1.RunningAmount, c1.MemberID, c2.VShape
from cte3 c1
left join cte3 c2 on c1.TxnID = c2.TxnID + 1 and c1.MemberID = c2.MemberID
)
select fr.*, fr2.RunningAmount RunningAmountLagBy2 from FinalResult fr
left join FinalResult fr2 on fr.TxnID = fr2.TxnID + 2
where fr.RunningAmount > 100000 and fr2.RunningAmount > 100000 and fr.VShape = 1
UPDATE
After question update, here's solution:
select TxnID from (
select *, ROW_NUMBER() over (partition by VShape order by TxnID) CompletesVShape from (
select TxnID,
RunningAmount,
MemberID,
sum(case when RunningAmount >= 100000 then 1 else 0 end) over (partition by MemberID order by TxnID rows between unbounded preceding and current row) VShape
from #tbl
) a
) a where VShape > 1 and CompletesVShape = 1
Based on your question update and assuming for V shape necessary condition is to get above and below running amounts > 100000 and middle be smaller than above and below running amounts, below is a query showing how to do it in 2008 sql server.
also see live demo
; with firstlargeamount as
(
select MemberId, minTrxid=min(TxnID)
from t
where RunningAmount>100000
group by MemberId
)
,tbl as
(
select *,
rn=row_number() over( partition by MemberId order by TxnId)
from
t
)
select t3.*,f.*
from tbl t1
join tbl t2
on
t1.memberId=t2.memberid and t1.rn=t2.rn +1
and t1.RunningAmount<t2.RunningAmount
join tbl t3
on
t1.memberId=t3.memberid and t1.rn=t3.rn -1
and t1.RunningAmount<t3.RunningAmount
join firstlargeamount f
on
f.Memberid=t2.memberid and f.minTrxid>=t1.TxnID
Explanation:
First step is to generate a row number sequence at member level as cte tbl and min limiting transaction in cte firstlargeamount
Second step is double self join to find above and below records per row which satisfy the V shape criteria as well join with firstlargeamount to find rows which satisfy the 100000 criteria
Note that the above and below records are simply found using +1/-1 from the current records's row number computed in the step 1

Customer order total by month, w/ all customers listed, even if customer had no orders in a month, in one SQL statement?

Get a list of all customers order totals by month, and if the customer has no order in a given month, include a line for that month with 0 as the order total. In one statement? Totals already computed, no need for aggregate function.
Use of the coalesce function is acceptable.
Given list of customer order totals by month:
create table orders (cust char(1), month num, exps num);
insert into orders
values('a', 1, 5)
values('b', 2, 4)
values('c', 1, 8);
And a list of customers:
create table custs(cust char(1));
insert into custs
values('a')
values('b')
values('c')
values('d');
Generate this table:
cust, month, exps
a, 1, 5
a, 2, 0
b, 1, 0
b, 2, 4
c, 1, 8
c, 2, 0
d, 1, 0
d, 2, 0
select or1.cust, a.[month], sum(coalesce(or2.[exps], 0)) as exps
from (
select 1 as[month] union all select 2
) a cross join (select distinct cust from custs) or1
left join orders or2 on or2.[month] = a.[month] and or2.cust = or1.cust
group by or1.cust, a.[month]
order by or1.cust,a.[month]
Sqlfiddle
And another version with picking up all existing months from the table. Results are same for our test data:
select or1.cust, a.[month], sum(coalesce(or2.[exps], 0)) as exps
from (
select distinct [month] from orders
) a cross join (select distinct cust from custs) or1
left join orders or2 on or2.[month] = a.[month] and or2.cust = or1.cust
group by or1.cust, a.[month]
order by or1.cust,a.[month]
Sqlfiddle
Making the cartesian product of customers and months was the first crack in the egg... and then a left join/coalesce w/ the result.
select all_possible_months.cust,
all_possible_months.month,
coalesce(orders.exps,0) as exps
from
(select order_months.month,
custs.cust
from
(select distinct month
from
orders
) as order_months,
custs
) all_possible_months
left join
orders on(
all_possible_months.cust = orders.cust and
all_possible_months.month = orders.month
);

Select column based on sum of another column

Let's say I have
SalesManagerId, SaleAmount, ProductId
I want to sum up the SaleAmount for each (SalesManagerId, ProductId) and grab the ProductId with the maximum sum(SaleAmount).
Is this possible in one query?
Example:
1, 100, 1
1, 200, 1
1, 600, 1
1, 400, 2
2, 100, 3
3, 100, 4
3, 100, 4
2, 500, 6
3, 100, 5
result:
1, 900, 1
2, 500, 6
3, 200, 4
If you have analytic functions available, you can use a RANK()
Something like:
SELECT SalesManagerId, ProductId, Total
FROM (
SELECT SalesManagerId,
ProductId,
SUM(SaleAmount) as Total,
RANK() OVER(PARTITION BY SalesManagerId
ORDER BY SUM(SaleAmount) DESC) as R
FROM <Table name>
GROUP BY SalesManagerId, ProductId) as InnerQuery
WHERE InnerQuery.R = 1
Assuming at least SQL 2005 so you can use a CTE:
;with cteTotalSales as (
select SalesManagerId, ProductId, SUM(SaleAmount) as TotalSales
from YourSalesTable
group by SalesManagerId, ProductId
),
cteMaxSales as (
select SalesManagerId, MAX(TotalSales) as MaxSale
from cteTotalSales
group by SalesManagerId
)
select ts.SalesManagerId, ms.MaxSale, ts.ProductId
from cteMaxSales ms
inner join cteTotalSales ts
on ms.SalesManagerId = ts.SalesManagerId
and ms.MaxSale = ts.TotalSales
order by ts.SalesManagerId
Use GROUP BY and ORDER:
SELECT SalesManagerId, SUM(SaleAmount) AS SaleSum, ProductId FROM [table-name] GROUP BY SalesManagerId, ProductId ORDER BY SaleSum DESC
Very good question!
Try this:
SELECT MAX(SUM(SaleAmount)), ProductId GROUP BY SalesManagerId, ProductId;
Or alternatively
SELECT SUM(SaleAmount) as Sum, ProductId GROUP BY SalesManagerId, ProductId ORDER BY Sum DESC;
You can't just drop the sum column and get ONLY the product id