Get only best ranked rows from a subquery

Get only best ranked rows from a subquery - sql

I want to get the price of an article for a specific customer.
There are several levels of prices which i ranked in my query.
So Article A has a price on rank 1, 4, 6. The result should always be the lowest ranked price.
Article B rank 3 ,5
So article A price is ranked 1 and Article b is price ranked 3.
My query is below .
SELECT p2.* FROM(
SElect ART_ID, MIN(RANG) RANG FROM (
Select p.ART_ID, p.betrag ,
CASE p.PREIS_EBENE WHEN 'KA' THEN 1 WHEN 'KW' THEN 2 WHEN 'W' THEN 7 WHEN 'A' THEN 6 ELSE 99 END RANG
FROM MDART a
INNER JOIN MDPRSVK p ON (a.KLIENT_ID = p.KLIENT_ID AND a.ART_ID = p.ART_ID)
WHERE ICP_KZ.IS_SET(KENNUNG_USER, 'P') = 1
ORDER BY RANG)
GROUP BY ART_ID) T
INNER JOIN MDPRSVK p2 ON (p2.ART_ID = T.ART_ID AND p2.PREIS_EBENE = p.PREIS_EBENE)
i want to have every article appearing only once in the result

You have tagged your request PL/SQL, so I guess your DBMS may be Oracle.
If I understand correctly, the table MDPRSVK contains several prices per ART_ID. And you want to select each ART_ID's best price (best to worst: 'KA' -> 'KW' -> 'A' -> 'W' -> any other PREIS_EBENE).
You can use a window function (ROW_NUMBER, RANK or DENSE_RANK) for this:
select *
from mdprsvk
order by row_number()
over (partition by art_id
order by decode(preis_ebene, 'KA', 1, 'KW', 2, 'A', 3, 'W', 4, 5))
fetch first row with ties;
This is standard SQL. In Oracle, FETCH FIRST is available as of version 12c. In earlier versions you'd use a subquery instead:
select *
from
(
select
mdprsvk.*,
row_number() over (partition by art_id
order by decode(preis_ebene, 'KA', 1, 'KW', 2, 'A', 3, 'W', 4, 5))
as rn
from mdprsvk
)
where rn = 1;
Or use OraclesKEEP FIRST`:
select art_id, max(betrag)
keep (dense_rank first
order by decode(preis_ebene, 'KA', 1, 'KW', 2, 'A', 3, 'W', 4, 5))
from mdprsvk
group by art_id;
It is not clear, how MDART comes into play. It looks like you want to restrict your results to articles for certain clients and KENNUNG_USER is the column in MDART to check. If so, add a WHERE clause:
where exists
(
select *
from mdart
where mdart.klient_id = mdprsvk.klient_id
and mdart.art_id = mdprsvk.art_id
and icp_kz.is_set(mdart.kennung_user, 'p') = 1
)
Or with IN instead of EXISTS:
where (klient_id, art_id) in
(
select klient_id, art_id
from mdart
where icp_kz.is_set(kennung_user, 'p') = 1
)

Related

BigQuery recursively join based on links between 2 ID columns

Given a table representing a many-many join between IDs like the following:
WITH t AS (
SELECT 1 AS id_1, 'a' AS id_2,
UNION ALL SELECT 2, 'a'
UNION ALL SELECT 2, 'b'
UNION ALL SELECT 3, 'b'
UNION ALL SELECT 4, 'c'
UNION ALL SELECT 5, 'c'
UNION ALL SELECT 6, 'd'
UNION ALL SELECT 6, 'e'
UNION ALL SELECT 7, 'f'
)
SELECT * FROM t
id_1
id_2
1
a
2
a
2
b
3
b
4
c
5
c
6
d
6
e
7
f
I would like to be able recursively join then aggregate rows in order to find each disconnected sub-graph represented by these links - that is each collection of IDs that are linked together:
The desired output for the example above would look something like this:
id_1_coll
id_2_coll
1, 2, 3
a, b
4, 5
c
6
d, e
7
f
where each row contains all the other IDs one could reach following the links in the table.
Note that 1 links to b even although there is no explicit link row because we can follow the path 1 --> a --> 2 --> b using the links in the first 3 rows.

One potential approach is to remodel the relationships between id_1 and id_2 such that we get all the links from id_1 to itself then use a recursive common table expression to traverse all the possible paths between id_1 values then aggregate (somewhat arbitrarily) to the lowest such value that can be reached from each id_1.
Explanation
Our steps are
Remodel the relationship into a series of self-joins for id_1
Map each id_1 to the lowest id_1 that it is linked to via a recursive CTE
Aggregate the recursive CTE using the lowest id_1s as the GROUP BY column and grabbing all the linked id_1 and id_2 values via the ARRAY_AGG() function
We can use something like this to remodel the relationships into a self join (1.):
SELECT
a.id_1, a.id_2, b.id_1 AS linked_id
FROM t as a
INNER JOIN t as b
ON a.id_2 = b.id_2
WHERE a.id_1 != b.id_1
Next - to set up the recursive table expression (2.) we can tweak the query above to also give us the lowest (LEAST) of the values for id_1 at each link then use this as the base iteration:
WITH RECURSIVE base_iter AS (
SELECT
a.id_1, b.id_1 AS linked_id, LEAST(a.id_1, b.id_1) AS lowest_linked_id
FROM t as a
INNER JOIN t as b
ON a.id_2 = b.id_2
WHERE a.id_1 != b.id_1
)
We can also grab the lowest id_1 value at this time:
id_1
linked_id
lowest_linked_id
1
2
1
2
1
1
2
3
2
3
2
2
4
5
4
5
4
4
For our recursive loop, we want to maintain an ARRAY of linked ids and join each new iteration such that the id_1 value of the n+1th iteration is equal to the linked_id value of the nth iteration AND the nth linked_id value is not in the array of previously linked ids.
We can code this as follows:
recursive_loop AS (
SELECT id_1, linked_id, lowest_linked_id, [linked_id ] AS linked_ids
FROM base_iter
UNION ALL
SELECT
prev_iter.id_1, prev_iter.linked_id,
iter.lowest_linked_id,
ARRAY_CONCAT(iter.linked_ids, [prev_iter.linked_id])
FROM base_iter AS prev_iter
JOIN recursive_loop AS iter
ON iter.id_1 = prev_iter.linked_id
AND iter.lowest_linked_id < prev_iter.lowest_linked_id
AND prev_iter.linked_id NOT IN UNNEST(iter.linked_ids )
)
Giving us the following results:
|id_1|linked_id|lowest_linked_id|linked_ids|
|----|---------|------------|---|
|3|2|1|[1,2]|
|2|3|1|[1,2,3]|
|4|5|4|[5]|
|1|2|1|[2]|
|5|4|4|[4]|
|2|3|2|[3]|
|2|1|1|[1]|
|3|2|2|[2]|
which we can now link back to the original table for the id_2 values then aggregate (3.) as shown in the complete query below
Solution
WITH RECURSIVE t AS (
SELECT 1 AS id_1, 'a' AS id_2,
UNION ALL SELECT 2, 'a'
UNION ALL SELECT 2, 'b'
UNION ALL SELECT 3, 'b'
UNION ALL SELECT 4, 'c'
UNION ALL SELECT 5, 'c'
UNION ALL SELECT 6, 'd'
UNION ALL SELECT 6, 'e'
UNION ALL SELECT 7, 'f'
),
base_iter AS (
SELECT
a.id_1, b.id_1 AS linked_id, LEAST(a.id_1, b.id_1) AS lowest_linked_id
FROM t as a
INNER JOIN t as b
ON a.id_2 = b.id_2
WHERE a.id_1 != b.id_1
),
recursive_loop AS (
SELECT id_1, linked_id, lowest_linked_id, [linked_id ] AS linked_ids
FROM base_iter
UNION ALL
SELECT
prev_iter.id_1, prev_iter.linked_id,
iter.lowest_linked_id,
ARRAY_CONCAT(iter.linked_ids, [prev_iter.linked_id])
FROM base_iter AS prev_iter
JOIN recursive_loop AS iter
ON iter.id_1 = prev_iter.linked_id
AND iter.lowest_linked_id < prev_iter.lowest_linked_id
AND prev_iter.linked_id NOT IN UNNEST(iter.linked_ids )
),
link_back AS (
SELECT
t.id_1, IFNULL(lowest_linked_id, t.id_1) AS lowest_linked_id, t.id_2
FROM t
LEFT JOIN recursive_loop
ON t.id_1 = recursive_loop.id_1
),
by_id_1 AS (
SELECT
id_1,
MIN(lowest_linked_id) AS grp
FROM link_back
GROUP BY 1
),
by_id_2 AS (
SELECT
id_2,
MIN(lowest_linked_id) AS grp
FROM link_back
GROUP BY 1
),
result AS (
SELECT
by_id_1.grp,
ARRAY_AGG(DISTINCT id_1 ORDER BY id_1) AS id1_coll,
ARRAY_AGG(DISTINCT id_2 ORDER BY id_2) AS id2_coll,
FROM
by_id_1
INNER JOIN by_id_2
ON by_id_1.grp = by_id_2.grp
GROUP BY grp
)
SELECT grp, TO_JSON(id1_coll) AS id1_coll, TO_JSON(id2_coll) AS id2_coll
FROM result ORDER BY grp
Giving us the required output:
grp
id1_coll
id2_coll
1
[1,2,3]
[a,b]
4
[4,5]
[c]
6
[6]
[d,e]
7
[7]
[f]
Limitations/Issues
Unfortunately this approach is inneficient (we have to traverse every single pathway before aggregating it back together) and fails with the real-world case where we have several million join rows. When trying to execute on this data BigQuery runs up a huge "Slot time consumed" then eventually errors out with:
Resources exceeded during query execution: Your project or organization exceeded the maximum disk and memory limit available for shuffle operations. Consider provisioning more slots, reducing query concurrency, or using more efficient logic in this job.
I hope there might be a better way of doing the recursive join such that pathways can be merged/aggregated as we go (if we have an id_1 value AND a linked_id in already in the list of linked_ids we dont need to check it further).

Using ROW_NUMBER() the query is as the follow:
WITH RECURSIVE
t AS (
SELECT 1 AS id_1, 'a' AS id_2,
UNION ALL SELECT 2, 'a'
UNION ALL SELECT 2, 'b'
UNION ALL SELECT 3, 'b'
UNION ALL SELECT 4, 'c'
UNION ALL SELECT 5, 'c'
UNION ALL SELECT 6, 'd'
UNION ALL SELECT 6, 'e'
UNION ALL SELECT 7, 'f'
),
t1 AS (
SELECT ROW_NUMBER() OVER(ORDER BY t.id_1) n, t.id_1, t.id_2 FROM t
),
t2 AS (
SELECT n, [n] n_arr, [id_1] arr_1, [id_2] arr_2, id_1, id_2 FROM t1
WHERE n IN (SELECT MIN(n) FROM t1 GROUP BY id_1)
UNION ALL
SELECT t2.n, ARRAY_CONCAT(t2.n_arr, [t1.n]),
CASE WHEN t1.id_1 NOT IN UNNEST(t2.arr_1)
THEN ARRAY_CONCAT(t2.arr_1, [t1.id_1])
ELSE t2.arr_1 END,
CASE WHEN t1.id_2 NOT IN UNNEST(t2.arr_2)
THEN ARRAY_CONCAT(t2.arr_2, [t1.id_2])
ELSE t2.arr_2 END,
t1.id_1, t1.id_2
FROM t2 JOIN t1 ON
t2.n < t1.n AND
t1.n NOT IN UNNEST(t2.n_arr) AND
(t2.id_1 = t1.id_1 OR t2.id_2 = t1.id_2) AND
(t1.id_1 NOT IN UNNEST(t2.arr_1) OR t1.id_2 NOT IN UNNEST(t2.arr_2))
),
t3 AS (
SELECT
n,
ARRAY_AGG(DISTINCT id_1 ORDER BY id_1) arr_1,
ARRAY_AGG(DISTINCT id_2 ORDER BY id_2) arr_2
FROM t2
WHERE n IN (SELECT MIN(n) FROM t2 GROUP BY id_1)
GROUP BY n
)
SELECT n, TO_JSON(arr_1), TO_JSON(arr_2) FROM t3 ORDER BY n
t1 : Append with row numbers.
t2 : Extract rows matching either id_1 or id_2 by recursive query.
t3 : Make arrays from id_1 and id_2 with ARRAY_AGG().
However, it may not help your Limitations/Issues.

The way this question is phrased makes it appear you want "show me distinct groups from a presorted list, unchained to a previous group". For that, something like this should suffice (assuming auto-incrementing order/one or both id's move to the next value):
SELECT GrpNr,
STRING_AGG(DISTINCT CAST(id_1 as STRING), ',') as id_1_coll,
STRING_AGG(DISTINCT CAST(id_2 as STRING), ',') as id_2_coll
FROM
(
SELECT id_1, id_2,
SUM(CASE WHEN a.id_1 <> a.previous_id_1 and a.id_2 <> a.previous_id_2 THEN 1 ELSE 0 END)
OVER (ORDER BY RowNr) as GrpNr
FROM
(
SELECT *,
ROW_NUMBER() OVER () as RowNr,
LAG(t.id_1, 1) OVER (ORDER BY 1) AS previous_id_1,
LAG(t.id_2, 1) OVER (ORDER BY 1) AS previous_id_2
FROM t
) a
ORDER BY RowNr
) a
GROUP BY GrpNr
ORDER BY GrpNr
I don't think this is the question you mean to ask. This seems to be a graph-walking problem as referenced in the other answers, and in the response from #GordonLinoff to the question here, which I tested (and presume works for BigQuery).
This can also be done using sequential updates as done by #RomanPekar
here (which I also tested). The main consideration seems to be performance. I'd assume dbms have gotten better at recursion since this was posted.
Rolling it up in either case should be fairly easy using String_Agg() as given above or as you have.
I'd be curious to see a more accurate representation of the data. If there is some consistency to how the data is stored/limitations to levels of nesting/other group structures there may be a shortcut approach other than recursion or iterative updates.

How to display null values in IN operator for SQL with two conditions in where

I have this query
select *
from dbo.EventLogs
where EntityID = 60181615
and EventTypeID in (1, 2, 3, 4, 5)
and NewValue = 'Received'
If 2 and 4 does not exist with NewValue 'Received' it shows this
current results
What I want

Ideally you should maintain somewhere a table containing all possible EventTypeID values. Sans that, we can use a CTE in place along with a left join:
WITH EventTypes AS (
SELECT 1 AS ID UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
)
SELECT et.ID AS EventTypeId, el.*
FROM EventTypes et
LEFT JOIN dbo.EventLogs el
ON el.EntityID = 60181615 AND
el.NewValue = 'Received'
WHERE
et.ID IN (1,2,3,4,5);

Select Distinct values once from multiple columns in this table preserving original order?

I have a (subquery) table that lists meal preferences for my friends. Each meal can only be taken once, and each person can only eat one meal.
row_number person_id meal_id
1 1 3
2 2 1
3 2 2
4 2 3
5 3 1
6 3 2
7 3 3
The picking order is determined by the original order of the table, so I would like the result to be:
person_id meal_id
1 3
2 1
3 2
Because meal 1 is taken by user 2, user 3 gets meal 2. I think this could be solved by selecting distinct values in both columns based on their original order, but I cannot figure out how to write that query. Any help appreciated.
Update Added row_number to original table.

If I understand correctly, this is a rather complicated graph walking problem. I should first note that there is no guarantee of an optimal solution -- without lots and lots of work. But you can implement a greedy algorithm using recursive CTEs:
with recursive t as (
select v.*
from (values (1, 1, 3), (2, 2, 1), (3, 2, 2), (4, 2, 3), (5, 3, 1), (6, 3, 2), (7, 3, 3)
) v(row_number, person_id, meal_id)
),
cte (row_number, person_id, meal_id, rows, persons, meals, lev) as (
select row_number, person_id, meal_id, array[row_number], array[person_id], array[meal_id], 1 as lev
from t
where row_number = 1
union all
select t.row_number, t.person_id, t.meal_id,
(case when t.person_id = any(cte.persons) or t.meal_id = any(cte.meals)
then cte.rows
else array_append(cte.rows, t.row_number)
end),
(case when t.person_id = any(cte.persons) or t.meal_id = any(cte.meals)
then cte.persons
else array_append(cte.persons, t.person_id)
end),
(case when t.person_id = any(cte.persons) or t.meal_id = any(cte.meals)
then cte.meals
else array_append(cte.meals, t.meal_id)
end),
cte.lev + 1
from cte join
t
on t.row_number = cte.row_number + 1
)
select t.*
from t cross join
(select rows from cte order by lev desc fetch first 1 row only) as last1
where t.row_number = any (last1.rows);
Here is a db<>fiddle.

SQL Server- Return Items Only When All Sub-Items Are Available

I have an Item table (denormalized for this example) containing a list of items, parts and whether the part is available. I want to return all the items for which all the parts are available. Each item can have a varying number of parts. For example:
Item Part Available
A 1 Y
A 2 N
A 3 N
B 1 Y
B 4 Y
C 2 N
C 5 Y
D 4 Y
D 6 Y
D 7 Y
The query should return the following:
Item Part
B 1
B 4
D 4
D 6
D 7
Thanks in advance for any assistance.

Here is one trick using Max() Over() Window aggregate Function
SELECT Item,
Part
FROM (SELECT Max([Available])OVER(partition BY [Item]) m_av,*
FROM yourtable) a
WHERE m_av = 'Y'
or using Group By and Having clause
Using IN clause
SELECT Item,
Part
FROM yourtable
WHERE Item IN (SELECT Item
FROM yourtable
GROUP BY Item
HAVING Count(*) = Sum(Iif(Available = 'Y', 1, 0)))
using Exists
SELECT Item,
Part
FROM yourtable A
WHERE EXISTS (SELECT 1
FROM yourtable B
WHERE A.Item = B.Item
HAVING Count(*) = Sum(Iif(Available = 'Y', 1, 0)))
using NOT EXISTS
SELECT Item,
Part
FROM yourtable A
WHERE NOT EXISTS (SELECT *
FROM yourtable B
WHERE A.Item = B.Item
AND B.Available = 'N')

I'd start with rephrasing the requirement - you want to return the items that don't have any parts that are not available. Once you put it like that, it's easy to translate the requirement to SQL using the not exists operator:
SELECT item, part
FROM parts a
WHERE NOT EXISTS (SELECT *
FROM parts b
WHERE a.item = b.item AND b.available = 'N')

Using window function does a single table read.
MIN and MAX window function
select *
from (
select
t.*,
max(available) over (partition by item) a,
min(available) over (partition by item) b
from your_table t
) t where a = b and a = 'Y';
COUNT window function:
select *
from (
select
t.*,
count(*) over (partition by item) n1
count(case when available = 'Y' then 1 end) over (partition by item) n2
from your_table t
) t where n1 = n2;

U can use NOT IN OR NOT EXISTS to achieve this
NOT EXISTS
Select item, part
from table as T1
where not exists( select 1 from tbl where item = t1.item and available = 'N')
NOT IN
Select item, part
from table
where item not in( select item from tbl where available = 'N')

I want to point out that the question in the text is: "I want to return all the items for which all the parts are available". However, your example results include the parts.
If the question is indeed that you want the items only, then you can use simple aggregation:
select item
from parts
group by item
having min(available) = max(available) and min(available) = 'Y';
If you indeed want the detail on the parts as well, then the other answers provide that information.

I do like it problems lend themselves well to being solved by infrequently used language features:
with cte as (
select * from (values
('A', 1, 'Y'),
('A', 2, 'N'),
('A', 3, 'N'),
('B', 1, 'Y'),
('B', 4, 'Y'),
('C', 2, 'N'),
('C', 5, 'Y'),
('D', 4, 'Y'),
('D', 6, 'Y'),
('D', 7, 'Y')
) as x(Item, Part, Available)
)
select *
into #t
from cte as c;
select *
from #t as c
where 'Y' = all (
select Available
from #t as a
where c.Item = a.Item
)
Here, we use a correlated subquery and the all keyword to see if all of the parts are available. My understanding is that, like exists, this will stop if it finds a counter-example.

How to write a Sql statement without using union?

I have a sql statement like below. How can I add a single row(code = 0, desc = 1) to result of this sql statement without using union keyword? thanks.
select code, desc
from material
where material.ExpireDate ='2010/07/23'

You can always create a view for your table which itself uses UNION keyword
CREATE VIEW material_view AS SELECT code, desc, ExpireDate FROM material UNION SELECT '0', '1', NULL;
SELECT code, desc FROM material_view WHERE ExpireDate = '2010/07/23' OR code = '0';

WITH material AS
(
SELECT *
FROM
(VALUES (2, 'x', '2010/07/23'),
(3, 'y', '2009/01/01'),
(4, 'z', '2010/07/23')) vals (code, [desc], ExpireDate)
)
SELECT
COALESCE(m.code,x.code) AS code,
COALESCE(m.[desc],x.[desc]) AS [desc]
FROM material m
FULL OUTER JOIN (SELECT 0 AS code, '1' AS [desc] ) x ON 1=0
WHERE m.code IS NULL OR m.ExpireDate ='2010/07/23'
Gives
code desc
----------- ----
2 x
4 z
0 1

Since you don't want to use either a union or a view, I'd suggest adding a dummy row to the material table (with code = 0, desc = 1, and ExpireDate something that would never normally be selected - eg. 01 January 1900) - then use a query like the following:
select code, desc
from material
where material.ExpireDate ='2010/07/23' or
material.ExpireDate ='1900/01/01'
Normally, a Union would be my preferred option.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Get only best ranked rows from a subquery - sql

Related

BigQuery recursively join based on links between 2 ID columns

How to display null values in IN operator for SQL with two conditions in where

Select Distinct values once from multiple columns in this table preserving original order?

SQL Server- Return Items Only When All Sub-Items Are Available

How to write a Sql statement without using union?

Categories

Resources