SQL Server dynamically sum rows based on other rows - sql

Here's the situation: I need a way to total sales of a certain class of item every month. Easy enough, right?
Except sometimes, the item will be suppressed (with 0 price) and a special item will be put on the order with the price. I solved this by looking for suppressed lines and using LAG to pull the price from the special item on the line below it:
CASE
WHEN olu.supress_print = 'Y'
THEN LAG(shrv.sales_price_home, 1, 0) OVER (ORDER BY shrv.order_no, pvol.line_seq_no DESC)
ELSE shrv.sales_price_home
END AS total_sales
However, I recently discovered that sometimes they will split the suppressed item into multiple "special" lines. I'm trying to dynamically sum rows of certain trigger items until the row below the trigger item contains a non-special item. I'll illustrate with a table:
item_id
qty_ordered
tot_price
line_seq
suppress_print
A
10
150
1
N
B
10
0
2
Y
SPECIAL
4
140
3
N
SPECIAL
6
90
4
N
SPECIAL
8
70
8
N
SPECIAL
6
80
9
N
So in this example, I'd like the prices for lines 2, 3, and 4 summed and rolled into one line. I really only need the total price and ideally to be able to preserve item id "B".
I'm trying to think of a way to solve this using exclusively SQL. I know I could write a script to do it, but I'd like to limit this to just SQL if possible.
Edit - unfiltered table (imagine 2 is the item class I want the sum of sales for):
item_id
qty_ordered
tot_price
line_seq
suppress_print
class
A
10
150
1
N
2
B
10
0
2
Y
2
SPECIAL
4
140
3
N
NULL
SPECIAL
6
90
4
N
NULL
C
5
80
5
N
NULL
D
3
50
6
N
NULL
D
14
0
7
N
NULL
SPECIAL
8
70
8
N
NULL
SPECIAL
6
80
9
N
NULL
Edit 2 - expected results:
item_id
qty_ordered
tot_price
line_seq
suppress_print
class
A
10
150
1
N
2
B
10
230
2
Y
2
C
5
80
5
N
NULL
D
3
50
6
N
NULL
D
14
0
7
N
NULL
SPECIAL
8
70
8
N
NULL
SPECIAL
6
80
9
N
NULL

Here's something based on your unfiltered table.
I didn't attempt to limit the logic to a specific class.
But that could be added easily, at the end, or as needed.
I also didn't really need the suppress_print column in the logic.
We could also easily exclude the 'D' items from the SPECIAL logic. Based on the summed qty values and the 0 tot_price, I guessed we should treat them specially too. That's easily adjusted.
We handle this much like an edges case, creating groups in the first groups CTE term.
Then, in the sums CTE term, use these groups to combine / SUM the SPECIAL rows within their groups / partitions. The rows associated with non-SPECIAL cases are in their own group, so can be summed as well.
The final query expression just takes the edge rows, which causes the SPECIAL rows to be hidden and the leading item_id shown only, as requested.
Here's the SQL Server test case:
Working Test Case (Updated)
and the corresponding solution:
WITH groups AS (
SELECT t.*
, SUM(CASE WHEN item_id <> 'SPECIAL' THEN 1 END) OVER (ORDER BY line_seq) AS seq
, CASE WHEN item_id <> 'SPECIAL' THEN 1 END AS edge
FROM unfiltered AS t
)
, sums AS (
SELECT item_id, qty_ordered
, line_seq, suppress_print, class
, SUM(tot_price) OVER (PARTITION BY seq) AS tot_price
, edge
FROM groups
)
SELECT item_id, qty_ordered, tot_price
, line_seq, suppress_print, class
FROM sums
WHERE edge = 1
;
Result:
+---------+-------------+-----------+----------+----------------+-------+
| item_id | qty_ordered | tot_price | line_seq | suppress_print | class |
+---------+-------------+-----------+----------+----------------+-------+
| A | 10 | 150 | 1 | N | 2 |
| B | 10 | 230 | 2 | Y | 2 |
| C | 5 | 80 | 5 | N | NULL |
| D | 3 | 50 | 6 | N | NULL |
| D | 14 | 150 | 7 | N | NULL |
+---------+-------------+-----------+----------+----------------+-------+
Both 'B' and the second 'D' item are summed as described in the question description.
The data in the unfiltered table:
+---------+-------------+-----------+----------+----------------+-------+
| item_id | qty_ordered | tot_price | line_seq | suppress_print | class |
+---------+-------------+-----------+----------+----------------+-------+
| A | 10 | 150 | 1 | N | 2 |
| B | 10 | 0 | 2 | Y | 2 |
| SPECIAL | 4 | 140 | 3 | N | NULL |
| SPECIAL | 6 | 90 | 4 | N | NULL |
| C | 5 | 80 | 5 | N | NULL |
| D | 3 | 50 | 6 | N | NULL |
| D | 14 | 0 | 7 | N | NULL |
| SPECIAL | 8 | 70 | 8 | N | NULL |
| SPECIAL | 6 | 80 | 9 | N | NULL |
+---------+-------------+-----------+----------+----------------+-------+
and the following actually produces the explicit requested result.
I haven't tried to reduce this. The requirement to restrict the behavior to a specific class added work. There were a couple of places I could have re-stated expressions to avoid additional CTE terms. Feel free to collapse them.
I also regenerated the groups (seq) a second time, once the main class logic was handled.
WITH groups AS (
SELECT t.*
, SUM(CASE WHEN item_id <> 'SPECIAL' THEN 1 END) OVER (ORDER BY line_seq) AS seq
, CASE WHEN item_id <> 'SPECIAL' THEN 1 END AS edge
FROM unfiltered AS t
)
, classes AS (
SELECT item_id, qty_ordered, tot_price
, line_seq, suppress_print
, edge, seq
, MAX(class) OVER (PARTITION BY seq) AS class
FROM groups
)
, edges AS (
SELECT item_id, qty_ordered, tot_price
, line_seq, suppress_print
, class
, CASE WHEN edge = 1 OR class IS NULL THEN 1 END AS edge
, SUM(CASE WHEN edge = 1 OR class IS NULL THEN 1 END) OVER (ORDER BY line_seq) AS seq
FROM classes
)
, sums AS (
SELECT item_id, qty_ordered
, line_seq, suppress_print, class
, SUM(tot_price) OVER (PARTITION BY seq) AS tot_price
, edge
FROM edges
)
SELECT item_id, qty_ordered, tot_price
, line_seq, suppress_print, class
FROM sums
WHERE edge = 1
;
Result:
+---------+-------------+-----------+----------+----------------+-------+
| item_id | qty_ordered | tot_price | line_seq | suppress_print | class |
+---------+-------------+-----------+----------+----------------+-------+
| A | 10 | 150 | 1 | N | 2 |
| B | 10 | 230 | 2 | Y | 2 |
| C | 5 | 80 | 5 | N | NULL |
| D | 3 | 50 | 6 | N | NULL |
| D | 14 | 0 | 7 | N | NULL |
| SPECIAL | 8 | 70 | 8 | N | NULL |
| SPECIAL | 6 | 80 | 9 | N | NULL |
+---------+-------------+-----------+----------+----------------+-------+

Using APPLY to get parent info for 'SPECIAL's of item with suppress_print = 'Y'
WITH grp AS (
SELECT -- all but tot_price from parent
coalesce(parent.item_id, itm.item_id) item_id,
coalesce(parent.qty_ordered, itm.qty_ordered) qty_ordered,
itm.tot_price,
coalesce(parent.line_seq, itm.line_seq) line_seq,
coalesce(parent.suppress_print, itm.suppress_print) suppress_print,
coalesce(parent.class, itm.class) class
FROM myTbl itm
OUTER APPLY (
SELECT t3.*
FROM (
SELECT top(1) t2.*
FROM myTbl t2
WHERE itm.item_id = 'SPECIAL' AND t2.line_seq < itm.line_seq AND t2.item_id != 'SPECIAL'
ORDER BY line_seq DESC
) t3
WHERE t3.suppress_print = 'Y'
) parent
)
select item_id, qty_ordered, sum(tot_price) tot_price, line_seq, suppress_print, class
from grp
group by item_id, qty_ordered, line_seq, suppress_print, class
order by line_seq

Related

Get some values from the table by selecting

I have a table:
| id | Number |Address
| -----| ------------|-----------
| 1 | 0 | NULL
| 1 | 1 | NULL
| 1 | 2 | 50
| 1 | 3 | NULL
| 2 | 0 | 10
| 3 | 1 | 30
| 3 | 2 | 20
| 3 | 3 | 20
| 4 | 0 | 75
| 4 | 1 | 22
| 4 | 2 | 30
| 5 | 0 | NULL
I need to get: the NUMBER of the last ADDRESS change for each ID.
I wrote this select:
select dh.id, dh.number from table dh where dh =
(select max(min(t.history)) from table t where t.id = dh.id group by t.address)
But this select not correctly handling the case when the address first changed, and then changed to the previous value. For example id=1: group by return:
| Number |
| -------- |
| NULL |
| 50 |
I have been thinking about this select for several days, and I will be happy to receive any help.
You can do this using row_number() -- twice:
select t.id, min(number)
from (select t.*,
row_number() over (partition by id order by number desc) as seqnum1,
row_number() over (partition by id, address order by number desc) as seqnum2
from t
) t
where seqnum1 = seqnum2
group by id;
What this does is enumerate the rows by number in descending order:
Once per id.
Once per id and address.
These values are the same only when the value is 1, which is the most recent address in the data. Then aggregation pulls back the earliest row in this group.
I answered my question myself, if anyone needs it, my solution:
select * from table dh1 where dh1.number = (
select max(x.number)
from (
select
dh2.id, dh2.number, dh2.address, lag(dh2.address) over(order by dh2.number asc) as prev
from table dh2 where dh1.id=dh2.id
) x
where NVL(x.address, 0) <> NVL(x.prev, 0)
);

Postgresql hierarchical (tree) query

I found few topics about it but none fits my expected results.
I have levels of categories stored in the table, just want to display it as tree structure.
All answers are kind of following query:
DB FIDDLE
WITH RECURSIVE cte AS (
SELECT category_id, category_name, parent_category, 1 AS level
FROM category
WHERE level = 1
UNION ALL
SELECT c.category_id, c.category_name, c.parent_category, ct.level + 1
FROM cte ct
JOIN category c ON c.parent_category = ct.category_id
)
SELECT *
FROM cte;
But the results are like
level
1
1
2
2
2
3
3
3
3
3
What I want to achieve is
level
1
2
3
3
2
3
3
1
2
3
3
2
3
3
You would typically keek track of the path to each node and use that for ordering. In Postgres, arrays come handy for this:
with recursive cte as (
select category_id, category_name, parent_category, 1 as level, array[category_id] path
from category
where parent_category is null
union all
select c.category_id, c.category_name, c.parent_category, ct.level + 1, ct.path || c.category_id
from cte ct
join category c on c.parent_category = ct.category_id
)
select *
from cte
order by path
Note that there is no need to store the level in the table; you can compute the information on the fly as you iterate. To identify the root nodes, you can filter on rows whose parent is null.
In your db fiddle, the query returns:
category_id | category_name | parent_category | level | path
----------: | :------------ | --------------: | ----: | :-------
1 | cat1 | null | 1 | {1}
3 | cat3 | 1 | 2 | {1,3}
8 | cat8 | 3 | 3 | {1,3,8}
9 | cat9 | 3 | 3 | {1,3,9}
4 | cat4 | 1 | 2 | {1,4}
6 | cat6 | 4 | 3 | {1,4,6}
7 | cat7 | 4 | 3 | {1,4,7}
5 | cat5 | 1 | 2 | {1,5}
10 | cat10 | 5 | 3 | {1,5,10}
11 | cat11 | 5 | 3 | {1,5,11}
2 | cat2 | null | 1 | {2}
You can keep track of the hierarchy as an array and use that for ordering:
WITH RECURSIVE cte AS (
SELECT category_id, category_name, parent_category, 1 AS level, array[category_id] as categories
FROM category
WHERE level = 1
UNION ALL
SELECT c.category_id, c.category_name, c.parent_category, ct.level + 1, ct.categories || c.category_id
FROM cte ct JOIN
category c
ON c.parent_category = ct.category_id
)
SELECT *
FROM cte
ORDER BY categories;
Here is a db<>fiddle.

Efficient ROW_NUMBER increment when column matches value

I'm trying to find an efficient way to derive the column Expected below from only Id and State. What I want is for the number Expected to increase each time State is 0 (ordered by Id).
+----+-------+----------+
| Id | State | Expected |
+----+-------+----------+
| 1 | 0 | 1 |
| 2 | 1 | 1 |
| 3 | 0 | 2 |
| 4 | 1 | 2 |
| 5 | 4 | 2 |
| 6 | 2 | 2 |
| 7 | 3 | 2 |
| 8 | 0 | 3 |
| 9 | 5 | 3 |
| 10 | 3 | 3 |
| 11 | 1 | 3 |
+----+-------+----------+
I have managed to accomplish this with the following SQL, but the execution time is very poor when the data set is large:
WITH Groups AS
(
SELECT Id, ROW_NUMBER() OVER (ORDER BY Id) AS GroupId FROM tblState WHERE State=0
)
SELECT S.Id, S.[State], S.Expected, G.GroupId FROM tblState S
OUTER APPLY (SELECT TOP 1 GroupId FROM Groups WHERE Groups.Id <= S.Id ORDER BY Id DESC) G
Is there a simpler and more efficient way to produce this result? (In SQL Server 2012 or later)
Just use a cumulative sum:
select s.*,
sum(case when state = 0 then 1 else 0 end) over (order by id) as expected
from tblState s;
Other method uses subquery :
select *,
(select count(*)
from table t1
where t1.id < t.id and state = 0
) as expected
from table t;

Find duplicate combinations

I need a query to find duplicate combinations in these tables:
AttributeValue:
id | name
------------------
1 | green
2 | blue
3 | red
4 | 100x200
5 | 150x200
Product:
id | name
----------------
1 | Produkt A
ProductAttribute:
id | id_product | price
--------------------------
1 | 1 | 100
2 | 1 | 200
3 | 1 | 100
4 | 1 | 200
5 | 1 | 100
6 | 1 | 200
7 | 1 | 100 -- duplicate combination
8 | 1 | 100 -- duplicate combination
ProductAttributeCombinations:
id_product_attribute | id_attribute
-------------------------------------
1 | 1
1 | 4
2 | 1
2 | 5
3 | 2
3 | 4
4 | 2
4 | 5
5 | 3
5 | 4
6 | 3
6 | 5
7 | 1
7 | 4
8 | 1
8 | 5
I need SQL that creates result like:
id_product | duplicate_attributes
----------------------------------
1 | {7,8}
If I understand correct, 7 is a duplicate of 1 and 8 is a duplicate of 2. As phrased, your question is a bit confusing, because 7 and 8 are not related to each other and the only table of interest is ProductAttributeCombinations.
If this is the case, then one method is to use string aggregation
with combos as (
select id_product_attribute,
string_agg(id_attribute::text, ',' order by id_attribute) as combo
from ProductAttributeCombinations pac
group by id_product_attribute
)
select *
from combos c
where exists (select 1
from combos c2
where c2.id_product_attribute > c.id_product_attribute and
c2.combo = c.combo
);
Your question leaves some room for interpretation. Here is my educated guess:
For each product, return an array of all instances with the same set of attributes as any other instance of the same product with smaller ID.
WITH combo AS (
SELECT id_product, id, array_agg(id_attribute) AS attributes
FROM (
SELECT pa.id_product, pa.id, pac.id_attribute
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
ORDER BY pa.id_product, pa.id, pac.id_attribute
) sub
GROUP BY 1, 2
)
SELECT id_product, array_agg(id) AS duplicate_attributes
FROM combo c
WHERE EXISTS (
SELECT 1
FROM combo
WHERE id_product = c.id_product
AND attributes = c.attributes
AND id < c.id
)
GROUP BY 1;
Sorting can be inlined into the aggregate function so we don't need a subquery for the sort (like #Gordon already provided). This is shorter, but also typically slower:
WITH combo AS (
SELECT pa.id_product, pa.id
, array_agg(pac.id_attribute ORDER BY pac.id_attribute) AS attributes
FROM ProductAttribute pa
JOIN PoductAttributeCombinations pac ON pac.id_product_attribute = pa.id
GROUP BY 1, 2
)
SELECT ...
This only returns products with duplicate instances.
SQL Fiddle.
Your table names are rather misleading / contradict the rest of your question. Your sample data is not very clear either, only featuring a single product. I assume there are many in your table.
It's also unclear whether you are using double-quoted table names preserving CaMeL-case spelling. I assume: no.

Sub-sub-selects and grouping: Get name column from the row containing the max value of a group

I have two tables: States, and Items.
States:
+----+------+-------+----------+
| id | name | state | priority |
+----+------+-------+----------+
| 1 | AA | 10 | 1 |
| 2 | AB | 10 | 2 |
| 3 | AC | 10 | 3 |
| 4 | BA | 20 | 1 |
| 5 | BB | 20 | 5 |
| 6 | BC | 20 | 10 |
| 7 | BD | 20 | 50 |
+----+------+-------+----------+
Items:
+----+--------+-------+
| id | item | state |
+----+--------+-------+
| 1 | Blue | 10 |
| 2 | Red | 20 |
| 3 | Green | 20 |
| 4 | Yellow | 10 |
| 5 | Brown | 10 |
+----+--------+-------+
The priority column is not used in the Items table, but complicates getting the data I need, as shown below.
What I want is a list of the rows in the Items table, replacing the state.id value in each row with the name of the highest priority state.
Results would look like this:
+----+--------+-------+
| id | item | state |
+----+--------+-------+
| 1 | Blue | AC |
| 2 | Red | BD |
| 3 | Green | BD |
| 4 | Yellow | AC |
| 5 | Brown | AC |
+----+--------+-------+
Here's the tiny monster I've come up with. Is this the best way, or can I be more efficient / less verbose? (Sub-sub-selects make my palms itch. :-P )
SELECT *
FROM
Items AS itm
INNER JOIN (SELECT sta.name, sta.state
FROM (SELECT state, MAX(priority) [highest]
FROM States
GROUP BY state) AS pri
INNER JOIN States AS sta
ON sta.state = pri.state
AND sta.priority = pri.highest) AS nam
ON item.state = name.state
Update: I'm using MS-SQL 2005 and MS-SQL 2008R2
You did not post your version of SQL-Server. Assuming you are on 2005 or later you can use the ROW_NUMBER() function together with a cross apply like this:
CREATE TABLE dbo.States(id INT, name NVARCHAR(25), state INT, priority INT);
INSERT INTO dbo.States
VALUES
( 1 ,'AA', 10 , 1 ),
( 2 ,'AB', 10 , 2 ),
( 3 ,'AC', 10 , 3 ),
( 4 ,'BA', 20 , 1 ),
( 5 ,'BB', 20 , 5 ),
( 6 ,'BC', 20 , 10 ),
( 7 ,'BD', 20 , 50 );
CREATE TABLE dbo.Items( id INT ,item NVARCHAR(25), state INT );
INSERT INTO dbo.Items
VALUES
( 1 ,'Blue', 10 ),
( 2 ,'Red', 20 ),
( 3 ,'Green', 20 ),
( 4 ,'Yellow', 10 ),
( 5 ,'Brown', 10 );
SELECT i.id,
i.item,
s.name,
s.priority
FROM dbo.Items i
CROSS APPLY (
SELECT *,ROW_NUMBER()OVER(ORDER BY priority DESC) rn FROM dbo.States si WHERE si.state = i.state
)s
WHERE s.rn = 1;
The cross apply works like a join but allows to reference columns on the left side in the right side as you can see in the where clause. The ROW_NUMBER() function numbers all rows in the states table that match the current state value in reverse priority order so that the row with the highest priority always gets the number 1. The final where clause is filtering out just those rows.
EDIT:
I just started a blog series about joins: A Join A Day
The Cross Apply will be topic of day 8 (12/8/2012).