I would like to solve a problem consisting of 2 recursions.
In one of the 2 recursions I find out the answer to one question which is "What is the leaf member of a specific input (template)?" This is already solved.
In a second recursion I would like to run this query for a number of other inputs (templates).
1st part of the problem:
I have a tree and would like to find the leaf of it. This part of the recursion can be solved using this query:
with recursive full_tree as (
select id, "previousVersionId", 1 as level
from template
where
template."id" = '5084520a-bb07-49e8-b111-3ea8182dc99f'
union all
select c.id, c."previousVersionId", p.level + 1
from template c
inner join full_tree p on c."previousVersionId" = p.id
)
select * from full_tree
order by level desc
limit 1
The query output is one record including the leaf id I'm interested in. This is fine.
2nd part of the query:
Here's the problem. I would like to run the first query n times.
Currently I can run the query only if it's just one id ('5084520a-bb07-49e8-b111-3ea8182dc99f' in the example). But what If I have a list of 100 such ids.
My ultimate goal is to get one id response (the leaf id) to each of the 100 template ids in the list.
In theory, a query that allows me to run above query for each of my e.g. 100 template ids would solve my problem.
Related
Find the first common parent, if any, from many different children.
Example:
1
/ \
2 3
/ / \
7 8 9
/ \
10 11
Input: [10, 9]
Output: 3 (first common parent for this elements)
Table example:
+------------------+-----------+------+
|EmployeePositionId|Subdivision|Parent|
+------------------+-----------+------+
|4718 |485 |42 |
|4719 |5064 |485 |
|4720 |5065 |5064 |
|4721 |5065 |5064 |
|4722 |3000 |null |
+------------------+-----------+------+
If I try to search for EmployeePositionId [4719, 4720, 4721],
I would like to get the Subdivision 5064, because it is the closest common subdivision for both employees (5065 nested in 5064).
If I were looking for 4719, 4720, 4721, 4722, then I would like to get null, because these elements do not have a common parent.
Or the answer will help me how get the data so that later solve this in Python
This class of problems is hard for SQL.
It's even harder with your particular table. It's not properly normalized. There is no level indicator. And input IDs can be from mixed hierarchy levels.
Setup
You clarified in a later comment that every path is terminated with a row that has "Parent" IS NULL (root), even if sample data in the question suggest otherwise. That helps a bit.
I assume valid "EmployeePositionId" as input. And no loops in your tree or the CTE enters an endless loop.
If you don't have a level of hierarchy in the table, add it. It's a simple task. If you can't add it, create a VIEW or, preferably, a MATERIALIZED VIEW instead:
CREATE MATERIALIZED VIEW mv_tbl AS
WITH RECURSIVE cte AS (
SELECT *, 0 AS level
FROM tbl
WHERE "Parent" IS NULL
UNION ALL
SELECT t.*, c.level + 1
FROM cte c
JOIN tbl t ON t."Parent" = c."Subdivision"
)
TABLE cte;
These would be the perfect indices for the task:
CREATE UNIQUE INDEX mv_tbl_id_uni ON mv_tbl ("EmployeePositionId") INCLUDE ("Subdivision", "Parent", level);
CREATE INDEX mv_tbl_subdivision_idx ON mv_tbl ("Subdivision") INCLUDE ("Parent", level);
See:
Covering index for top read performance
Query
Pure SQL solution with recursive CTE, based on a table with level indicator (or the MV from above):
WITH RECURSIVE init AS (
SELECT "Subdivision", "Parent", level
FROM mv_tbl
WHERE "EmployeePositionId" IN (4719, 4720, 4721) -- input
)
, cte AS (
TABLE init
UNION
SELECT c."Parent", t."Parent", c.level - 1
FROM cte c
JOIN mv_tbl t ON t."Subdivision" = c."Parent" -- recursion terminated at "Parent" IS NULL
)
, agg AS (
SELECT level, min("Subdivision") AS "Subdivision", count(*) AS ct
FROM cte
GROUP BY level
)
SELECT "Subdivision"
FROM agg a
WHERE ct = 1 -- no other live branch
AND level < (SELECT max(level) FROM cte WHERE "Parent" IS NULL) IS NOT TRUE -- no earlier dead end
AND level <= (SELECT min(level) FROM init) -- include highest (least) level
ORDER BY level DESC -- pick earliest (greatest) qualifying level
LIMIT 1;
db<>fiddle here
Covers all possible input, works for any modern version of Postgres.
I added basic explanation in the code.
Related:
How to aggregate a table with tree-structure to a single nested JSON object?
How to turn a set of flat trees into a single tree with multiple leaves?
Legal, lower-case, unquoted identifiers make your life with Postgres easier. See:
Are PostgreSQL column names case-sensitive?
I can't seem to find the reason behind the infinite loop in this query, nor how to correct it.
Here is the context :
I have a table called mergesWith with this description :
mergesWith: information about neighboring seas. Note that in this relation, for every pair of
neighboring seas (A,B), only one tuple is given – thus, the relation is not symmetric.
sea1: a sea
sea2: a sea.
I want to know every sea accessible from the Mediterranean Sea by navigating. I have opted for a recursive query using "with" :
With
acces(p,d) as (
select sea1 as p, sea2 as d
from MERGESWITH
UNION ALL
select a.p, case when mw.sea1=a.d
then mw.sea2
else mw.sea1
end as d
from acces a, MERGESWITH mw
where a.d=mw.sea1 or a.d=mw.sea2)
select d
from acces
where p= 'Mediterranean Sea';
I think the cause is either the case when or the a.d=mw.sea1 or a.d=mw.sea2 that is not restrictive enough, but I can't seem to pinpoint why.
I get this error message :
32044. 00000 - "cycle detected while executing recursive WITH query"
*Cause: A recursive WITH clause query produced a cycle and was stopped
in order to avoid an infinite loop.
*Action: Rewrite the recursive WITH query to stop the recursion or use
the CYCLE clause.
The cycles are caused by the structure of your query, not by cycles in the data. You ask for the reason for cycling. That should be obvious: at the first iteration, one row of output has d = 'Aegean Sea'. At the second iteration, you will find a row with d = 'Mediterranean Sea', right? Can you now see how this will result in cycles?
Recursive queries have a cycle clause used exactly for this kind of problem. For some reason, even many users who learned the recursive with clause well, and use it all the time, seem unaware of the cycle clause (as well as the unrelated, but equally useful, search clause - used for ordering the output).
In your code, you need to make two changes. Add the cycle clause, and also in the outer query filter for non-cycle rows only. In the cycle clause, you can decide what to call the "cycle" column, and what values to give it. To make this look as similar to connect by queries as possible, I like to call the new column IS_CYCLE and to give it the values 0 (for no cycle) and 1 (for cycle). In the outer query below, add is_cycle to the select list to see what it adds to the recursive query.
Notice the position of the cycle clause: it comes right after the recursive with clause (in particular, after the closing parenthesis at the end of the recursive factored subquery).
with
acces(p,d) as (
select sea1 as p, sea2 as d
from MERGESWITH
UNION ALL
select a.p, case when mw.sea1=a.d
then mw.sea2
else mw.sea1
end as d
from acces a, MERGESWITH mw
where a.d=mw.sea1 or a.d=mw.sea2)
cycle d set is_cycle to 1 default 0 -- add this line
select d
from acces
where p= 'Mediterranean Sea'
and is_cycle = 0 -- and this line
;
Clearly, this would be data-dependent due to cycles in the data. I typically include a lev value when developing recursive CTEs. This makes it simpler to debug them.
So, try something like this:
with acces(p, d, lev) as (
select sea1 as p, sea2 as d, 1 as lev
from MERGESWITH
union all
select a.p,
(case when mw.sea1 = a.d then mw.sea2 else mw.sea1 end) as d,
lev + 1
from acces a join
MERGESWITH mw
on a.d in (mw.sea1, mw.sea2)
where lev < 5)
select d
from acces
where p = 'Mediterranean Sea';
If you find the reason but can't fix the code, ask a new question with sample data and desired results. A DB fiddle of some sort is also helpful.
I'm using Access (I normally use SQL Server) for a little job, and I'm getting "enter parameter value" for Night.NightId in the statement below that has a subquery within a subquery. I expect it would work if I wasn't nesting it two levels deep, but I can't think of a way around it (query ideas welcome).
The scenario is pretty simple, there's a Night table with a one-to-many relationship to a Score table - each night normally has 10 scores. Each score has a bit field IsDouble which is normally true for two of the scores.
I want to list all of the nights, with a number next to each representing how many of the top 2 scores were marked IsDouble (would be 0, 1 or 2).
Here's the SQL, I've tried lots of combinations of adding aliases to the column and the tables, but I've taken them out for simplicity below:
select Night.*
,
( select sum(IIF(IsDouble,1,0)) from
(SELECT top 2 * from Score where NightId=Night.NightId order by Score desc, IsDouble asc, ID)
) as TopTwoMarkedAsDoubles
from Night
This is a bit of speculation. However, some databases have issues with correlation conditions in multiply nested subqueries. MS Access might have this problem.
If so, you can solve this by using aggregation with a where clause that chooses the top two values:
select s.nightid,
sum(IIF(IsDouble, 1, 0)) as TopTwoMarkedAsDoubles
from Score as s
where s.id in (select top 2 s2.id
from score as s2
where s2.nightid = s.nightid
order by s2.score desc, s2.IsDouble asc, s2.id
)
group by s.nightid;
If this works, it is a simply matter to join Night back in to get the additional columns.
Your subquery can only see one level above it. so Night.NightId is totally unknown to it hence why you are being prompted to enter a value. You can use a Group By to get the value you want for each NightId then correlate that back to the original Night table.
Select *
From Night
left join (
Select N.NightId
, sum(IIF(S.IsDouble,1,0)) as [Number of Doubles]
from Night N
inner join Score S
on S.NightId = S.NightId
group by N.NightId) NightsWithScores
on Night.NightId = NightsWithScores.NightId
Because of the IIF(S.IsDouble,1,0) I don't see the point is using top.
I've a table like this which contains links :
key_a key_b
--------------
a b
b c
g h
a g
c a
f g
not really tidy & infinite recursion ...
key_a = parent
key_b = child
Require a query which will recompose and attribute a number for each hierarchical group (parent + direct children + indirect children) :
key_a key_b nb_group
--------------------------
a b 1
a g 1
b c 1
**c a** 1
f g 2
g h 2
**link responsible of infinite loop**
Because we have
A-B-C-A
-> Only want to show simply the link as shown.
Any idea ?
Thanks in advance
The problem is that you aren't really dealing with strict hierarchies; you're dealing with directed graphs, where some graphs have cycles. Notice that your nbgroup #1 doesn't have any canonical root-- it could be a, b, or c due to the cyclic reference from c-a.
The basic way of dealing with this is to think in terms of graph techniques, not recursion. In fact, an iterative approach (not using a CTE) is the only solution I can think of in SQL. The basic approach is explained here.
Here is a SQL Fiddle with a solution that addresses both the cycles and the shared-leaf case. Notice it uses iteration (with a failsafe to prevent runaway processes) and table variables to operate; I don't think there's any getting around this. Note also the changed sample data (a-g changed to a-h; explained below).
If you dig into the SQL you'll notice that I changed some key things from the solution given in the link. That solution was dealing with undirected edges, whereas your edges are directed (if you used undirected edges the entire sample set is a single component because of the a-g connection).
This gets to the heart of why I changed a-g to a-h in my sample data. Your specification of the problem is straightforward if only leaf nodes are shared; that's the specification I coded to. In this case, a-h and g-h can both get bundled off to their proper components with no problem, because we're concerned about reachability from parents (even given cycles).
However, when you have shared branches, it's not clear what you want to show. Consider the a-g link: given this, g-h could exist in either component (a-g-h or f-g-h). You put it in the second, but it could have been in the first instead, right? This ambiguity is why I didn't try to address it in this solution.
Edit: To be clear, in my solution above, if shared braches ARE encountered, it treats the whole set as a single component. Not what you described above, but it will have to be changed after the problem is clarified. Hopefully this gets you close.
You should use a recursive query. In the first part we select all records which are top level nodes (have no parents) and using ROW_NUMBER() assign them group ID numbers. Then in the recursive part we add to them children one by one and use parent's groups Id numbers.
with CTE as
(
select t1.parent,t1.child,
ROW_NUMBER() over (order by t1.parent) rn
from t t1 where
not exists (select 1 from t where child=t1.parent)
union all
select t.parent,t.child, CTE.rn
from t
join CTE on t.parent=CTE.Child
)
select * from CTE
order by RN,parent
SQLFiddle demo
Painful problem of graph walking using recursive CTEs. This is the problem of finding connected subgraphs in a graph. The challenge with using recursive CTEs is to prevent unwarranted recursion -- that is, infinite loops In SQL Server, that typically means storing them in a string.
The idea is to get a list of all pairs of nodes that are connected (and a node is connected with itself). Then, take the minimum from the list of connected nodes and use this as an id for the connected subgraph.
The other idea is to walk the graph in both directions from a node. This ensures that all possible nodes are visited. The following is query that accomplishes this:
with fullt as (
select keyA, keyB
from t
union
select keyB, keyA
from t
),
CTE as (
select t.keyA, t.keyB, t.keyB as last, 1 as level,
','+cast(keyA as varchar(max))+','+cast(keyB as varchar(max))+',' as path
from fullt t
union all
select cte.keyA, cte.keyB,
(case when t.keyA = cte.last then t.keyB else t.keyA
end) as last,
1 + level,
cte.path+t.keyB+','
from fullt t join
CTE
on t.keyA = CTE.last or
t.keyB = cte.keyA
where cte.path not like '%,'+t.keyB+',%'
) -- select * from cte where 'g' in (keyA, keyB)
select t.keyA, t.keyB,
dense_rank() over (order by min(cte.Last)) as grp,
min(cte.Last)
from t join
CTE
on (t.keyA = CTE.keyA and t.keyB = cte.keyB) or
(t.keyA = CTE.keyB and t.keyB = cte.keyA)
where cte.path like '%,'+t.keyA+',%' or
cte.path like '%,'+t.keyB+',%'
group by t.id, t.keyA, t.keyB
order by t.id;
The SQLFiddle is here.
you might want to check with COMMON TABLE EXPRESSIONS
here's the link
I am trying to wrap my head around this one this morning.
I am trying to show inventory status for parts (for our products) and this query only becomes complex if I try to return all parts.
Let me lay it out:
single table inventoryReport
I have a distinct list of X parts I wish to display, the result of which must be X # of rows (1 row per part showing latest inventory entry).
table is made up of dated entries of inventory changes (so I only need the LATEST date entry per part).
all data contained in this single table, so no joins necessary.
Currently for 1 single part, it is fairly simple and I can accomplish this by doing the following sql (to give you some idea):
SELECT TOP (1) ldDate, ptProdLine, inPart, inSite, inAbc, ptUm, inQtyOh + inQtyNonet AS in_qty_oh, inQtyAvail, inQtyNonet, ldCustConsignQty, inSuppConsignQty
FROM inventoryReport
WHERE (ldPart = 'ABC123')
ORDER BY ldDate DESC
that gets me my TOP 1 row, so simple per part, however I need to show all X (lets say 30 parts). So I need 30 rows, with that result. Of course the simple solution would be to loop X# of sql calls in my code (but it would be costly) and that would suffice, but for this purpose I would love to work this SQL some more to reduce the x# calls back to the db (if not needed) down to just 1 query.
From what I can see here I need to keep track of the latest date per item somehow while looking for my result set.
I would ultimately do a
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
to limit the parts I need. Hopefully I made my question clear enough. Let me know if you have an idea. I cannot do a DISTINCT as the rows are not the same, the date needs to be the latest, and I need a maximum of X rows.
Thoughts? I'm stuck...
SELECT *
FROM (SELECT i.*,
ROW_NUMBER() OVER(PARTITION BY ldPart ORDER BY ldDate DESC) r
FROM inventoryReport i
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
)
WHERE r = 1
EDIT: Be sure to test the performance of each solution. As pointed out in this question, the CTE method may outperform using ROW_NUMBER.
;with cteMaxDate as (
select ldPart, max(ldDate) as MaxDate
from inventoryReport
group by ldPart
)
SELECT md.MaxDate, ir.ptProdLine, ir.inPart, ir.inSite, ir.inAbc, ir.ptUm, ir.inQtyOh + ir.inQtyNonet AS in_qty_oh, ir.inQtyAvail, ir.inQtyNonet, ir.ldCustConsignQty, ir.inSuppConsignQty
FROM cteMaxDate md
INNER JOIN inventoryReport ir
on md.ldPart = ir.ldPart
and md.MaxDate = ir.ldDate
You need to join into a Sub-query:
SELECT i.ldPart, x.LastDate, i.inAbc
FROM inventoryReport i
INNER JOIN (Select ldPart, Max(ldDate) As LastDate FROM inventoryReport GROUP BY ldPart) x
on i.ldPart = x.ldPart and i.ldDate = x.LastDate