Extract row families from linked rows - sql

I have a table of linked transactions similar to the following table
+----+----+----+
| # | A | B |
+----+----+----+
| 1 | 1 | 4 |
| 2 | 3 | 5 |
| 3 | 4 | 6 |
| 4 | 5 | 8 |
| 5 | 6 | 1 |
| 6 | 7 | 7 |
| 7 | 8 | 3 |
| 8 | 9 | 3 |
| 9 | 10 | 4 |
| 10 | 11 | 14 |
| 11 | 2 | 2 |
| 12 | 12 | 4 |
| 13 | 13 | 14 |
| 14 | 14 | 9 |
| 15 | 15 | 1 |
+----+----+----+
The numbers under columns A and B represent transaction Ids. So for instance, Transaction 1 is linked with transaction 4 by some criteria, tran 3 with tran 5, tran 4 with tran 6 and so on.
Transactions 2 and 7 aren't linked to any other transaction, hence they are self-linked.
What I want to extract are transaction families from this table- Since tran 1 and 4 are linked, tran 4 and 6 are linked, tran 10 and 4 are linked etc they come under one transacction family -(1,4,6,10,12,15).
I want to create families of transactions with the lowest transaction ID being the master transaction.
So ideally, the output will look like this
+----+------+--------------+
| # | Tran | Master_tran |
+----+------+--------------+
| 1 | 1 | 1 |
| 2 | 3 | 3 |
| 3 | 4 | 1 |
| 4 | 5 | 3 |
| 5 | 6 | 1 |
| 6 | 7 | 7 |
| 7 | 8 | 3 |
| 8 | 9 | 3 |
| 9 | 10 | 1 |
| 10 | 11 | 3 |
| 11 | 2 | 2 |
| 12 | 12 | 1 |
| 13 | 13 | 3 |
| 14 | 14 | 3 |
| 15 | 15 | 1 |
+----+------+----+
I have been toying with self-joins.
SELECT t1.a as x,
least (min(t1.b), min(t2.a)) as y
FROM test t1
LEFT JOIN test t2 on t2.b = t1.a
GROUP BY t1.a
ORDER BY t1.a asc
This code gives the following outupt
+------+----+---+
| Col1 | X | Y |
+------+----+---+
| 1 | 1 | 4 |
| 2 | 2 | 2 |
| 3 | 3 | 5 |
| 4 | 4 | 1 |
| 5 | 5 | 3 |
| 6 | 6 | 1 |
| 7 | 7 | 7 |
| 8 | 8 | 3 |
| 9 | 9 | 3 |
| 10 | 10 | |
| 11 | 11 | |
| 12 | 12 | |
| 13 | 13 | |
| 14 | 14 | 9 |
| 15 | 15 | |
+------+----+---+
I am not sure what is wrong in my code. Can someone point me in the right direction?
Thanks!

in principle you need a CONNECT BY Statement to solve such hierarchical problems.
While you have circular loops, you will also need a NOCYCLE clause, this will eliminate the last link in the loop, which is fine, as that link will never be part of the answer.
You also have links in both directions (f.e. (13, 14) and (14, 9)), so you must be careful to include that in your query (Twice!).
WITH t_order
AS (SELECT qt.qt_id, qt.qt_a, qt.qt_b, LEAST( qt.qt_a, qt.qt_b ) AS t_parent, GREATEST( qt.qt_a, qt.qt_b ) AS t_child
FROM query_test qt
UNION
SELECT qb.qt_id, qb.qt_a, qb.qt_b, GREATEST( qb.qt_a, qb.qt_b ) AS t_parent, LEAST( qb.qt_a, qb.qt_b ) AS t_child
FROM query_test qb)
, hier
AS (SELECT ps.qt_id
, ps.qt_a
, ps.qt_b
, t_parent
, t_child
, LEVEL
, CONNECT_BY_ROOT t_parent AS prev_tran
FROM t_order ps
CONNECT BY NOCYCLE PRIOR t_child = t_parent)
SELECT hr.qt_id, hr.qt_a, MIN( hr.prev_tran ) AS master_tran
FROM hier hr
GROUP BY hr.qt_id, hr.qt_a
ORDER BY hr.qt_id, hr.qt_a;
This will solve your problem, but might get very slow if those 100.000 records must be handled. The SQL statement also gets hard to understand if you need to combine this method with lots of other columns. For that you should factor out all qt.qt columns and join them in in the last select.
WITH t_order
AS (SELECT DISTINCT tran, root_tran
FROM (SELECT LEAST( qt.qt_a, qt.qt_b ) AS tran, GREATEST( qt.qt_a, qt.qt_b ) AS root_tran
FROM query_test qt
UNION
SELECT GREATEST( qb.qt_a, qb.qt_b ) AS tran, LEAST( qb.qt_a, qb.qt_b ) AS root_tran
FROM query_test qb))
, hier
AS (SELECT DISTINCT tran, root_tran
FROM (SELECT tran, CONNECT_BY_ROOT root_tran AS root_tran
FROM t_order
CONNECT BY NOCYCLE PRIOR tran = root_tran)
WHERE tran >= root_tran)
SELECT qt.qt_id
, qt.qt_a
, MIN( LEAST( h1.root_tran, h2.root_tran ) ) AS master_tran
FROM query_test qt
INNER JOIN hier h1 ON qt.qt_a = h1.tran
INNER JOIN hier h2 ON qt.qt_b = h2.tran
GROUP BY qt.qt_id, qt.qt_a
ORDER BY qt.qt_id, qt.qt_a;
I could not test this last statement.

I might have created that other solution.
Instead of using a CONNECT BY statement, you could also double your links, and redouble them any time that is needed.
The query to retrieve all links stays the same but it is followed by a simple query to replace the original links with all distinct combinations of two links.
Including the link that is formed by tran_a and tran_b, you have 2 + 1 + 2 links, so you can find paths up to 5 links long.
If that is to short, you insert an identical subquery under the previous subquery, and now it is 4 + 1 + 4 makes 9 links long.
As you see, your maximum pathlength doubles for each added subquery, with only moderately more performance costs.
First the query to check your demo data:
WITH double_0
AS (SELECT DISTINCT root_tran, tran
FROM ( SELECT LEAST( td_0.tran_a, td_0.tran_b ) AS root_tran
, GREATEST( td_0.tran_a, td_0.tran_b ) AS tran
FROM tran_demo td_0
UNION
SELECT GREATEST( qb.tran_a, qb.tran_b ) AS root_tran
, LEAST( qb.tran_a, qb.tran_b ) AS tran
FROM tran_demo qb ))
, double_1
AS (SELECT DISTINCT oa.root_tran, ob.tran
FROM double_0 oa INNER JOIN double_0 ob ON oa.tran = ob.root_tran)
SELECT td_1.td_id
, td_1.tran_a
, MIN( LEAST( d1.root_tran, d2.root_tran ) ) AS master_tran
FROM tran_demo td_1
INNER JOIN double_1 d1 ON td_1.tran_a = d1.tran
INNER JOIN double_1 d2 ON td_1.tran_b = d2.tran
GROUP BY td_1.td_id, td_1.tran_a
ORDER BY td_1.td_id, td_1.tran_a;
Then how you modify that:
Notice that you now query double_2 in the final query.
WITH double_0
AS (SELECT DISTINCT root_tran, tran
FROM ( SELECT LEAST( td_0.tran_a, td_0.tran_b ) AS root_tran
, GREATEST( td_0.tran_a, td_0.tran_b ) AS tran
FROM tran_demo td_0
UNION
SELECT GREATEST( qb.tran_a, qb.tran_b ) AS root_tran
, LEAST( qb.tran_a, qb.tran_b ) AS tran
FROM tran_demo qb ))
, double_1
AS (SELECT DISTINCT oa.root_tran, ob.tran
FROM double_0 oa INNER JOIN double_0 ob ON oa.tran = ob.root_tran)
, double_2
AS (SELECT DISTINCT oa.root_tran, ob.tran
FROM double_1 oa INNER JOIN double_0 ob ON oa.tran = ob.root_tran)
SELECT td_1.td_id
, td_1.tran_a
, MIN( LEAST( d1.root_tran, d2.root_tran ) ) AS master_tran
FROM tran_demo td_1
INNER JOIN double_2 d1 ON td_1.tran_a = d1.tran
INNER JOIN double_2 d2 ON td_1.tran_b = d2.tran
GROUP BY td_1.td_id, td_1.tran_a
ORDER BY td_1.td_id, td_1.tran_a;
Finally a query to check if the path length you're using ist still enough:
You already add the next level and subtract your current level.
As long as this query doesn't return any rows, the current query is correct.
WITH double_0
AS (SELECT DISTINCT root_tran, tran
FROM ( SELECT LEAST( td_0.tran_a, td_0.tran_b ) AS root_tran
, GREATEST( td_0.tran_a, td_0.tran_b ) AS tran
FROM tran_demo td_0
UNION
SELECT GREATEST( qb.tran_a, qb.tran_b ) AS root_tran
, LEAST( qb.tran_a, qb.tran_b ) AS tran
FROM tran_demo qb ))
, double_1
AS (SELECT DISTINCT oa.root_tran, ob.tran
FROM double_0 oa INNER JOIN double_0 ob ON oa.tran = ob.root_tran)
, double_2
AS (SELECT DISTINCT oa.root_tran, ob.tran
FROM double_1 oa INNER JOIN double_0 ob ON oa.tran = ob.root_tran)
SELECT td_1.tran_a
, MIN( LEAST( d1.root_tran, d2.root_tran ) ) AS master_tran
FROM tran_demo td_1
INNER JOIN double_2 d1 ON td_1.tran_a = d1.tran
INNER JOIN double_2 d2 ON td_1.tran_b = d2.tran
GROUP BY td_1.tran_a
MINUS
SELECT td_2.tran_a
, MIN( LEAST( d1.root_tran, d2.root_tran ) ) AS master_tran
FROM tran_demo td_2
INNER JOIN double_1 d1 ON td_2.tran_a = d1.tran
INNER JOIN double_1 d2 ON td_2.tran_b = d2.tran
GROUP BY td_2.tran_a
ORDER BY tran_a;
Performance testing you will have to do yourself.
I am optimistic while the subquery is cheap and each time the effective pathlength doubles.
Sooner or later this should become faster than the previous solution.
By the way, the remark about sorting the original links works here too!
Please mark my answer if it works.

Related

Postgres: Find missing items in a version table

I have a Table in Postgres 12 which tracks what Items i are use in which Versions v:
CREATE TABLE compare_test(v BIGINT, i BIGINT);
With example data:
INSERT INTO compare_test VALUES
(1,21),
(1,22),
(1,23),
(2,21),
(2,22),
(2,23),
(3,21),
(3,22);
I'm trying to create a View that returns:
source_v
target_v
source_i
target_i
1
3
23
null
2
3
23
null
Queries used to compare missing values in two Tables like:
SELECT l.v as source_v, l.i as source_i,
r.v as target_v, r.i as target_i
FROM compare_test l
LEFT JOIN
compare_test r ON r.i = l.i
WHERE r.i IS NULL;
and
SELECT l.v as source_v, l.i as source_i
FROM compare_test l
WHERE NOT EXISTS
(
SELECT i as target_i
FROM compare_test r
WHERE r.i = l.i
)
do not seem to work if the joined Table is the same Table or if more than 2 Versions are in the Table.
I don't have the option to change the Database Structure but I can use plugins.
The solution below gives those results.
It makes re-use of a CTE.
(but somehow I got a feeling that there should exist a more efficient way)
with cte1 as (
SELECT i
, count(*) cnt
, min(v) min_v
, max(v) max_v
FROM compare_test
GROUP BY i
)
, cte2 as
(
select *
from cte1 as c1
where not exists (
select 1
from cte1 c2
where c2.min_v = c1.min_v
and c2.max_v < c1.max_v
)
)
select distinct
t.v as source_v
, c1.max_v as target_v
, c2.i as source_i
, null as target_i
from cte2 c2
left join compare_test t
on t.i = c2.i
left join cte1 c1
on t.v between c1.min_v and c1.max_v
and c1.i != t.i
order by source_v
But if it's not really required to follow the relations, then it becomes really simple.
Then it's just a left join of the existing to all possible combinations.
select distinct
src.v as source_v
, missing.v as target_v
, src.i as source_i
, missing.i as target_i
from
(
select ver.v, itm.i
from (select distinct v from compare_test) as ver
cross join (select distinct i from compare_test) as itm
left join compare_test t
on t.v = ver.v and t.i = itm.i
where t.v is null
) as missing
left join compare_test as src
on src.i = missing.i and src.v != missing.v
order by target_i, target_v, source_v
source_v | target_v | source_i | target_i
-------: | -------: | -------: | -------:
1 | 5 | 21 | 21
2 | 5 | 21 | 21
3 | 5 | 21 | 21
1 | 5 | 22 | 22
2 | 5 | 22 | 22
3 | 5 | 22 | 22
1 | 3 | 23 | 23
2 | 3 | 23 | 23
1 | 5 | 23 | 23
2 | 5 | 23 | 23
5 | 1 | 44 | 44
5 | 2 | 44 | 44
5 | 3 | 44 | 44
db<>fiddle here

Values Disappear when Filtering Correlated Subquery

This question is related to the recent answer I provided here.
Setup
Using MS Access 2007.
Assume I have a table called mytable consisting of three fields:
id Long Integer AutoNumber (PK)
type Text
num Long Integer
With the following sample data:
+----+------+-----+
| id | type | num |
+----+------+-----+
| 1 | A | 10 |
| 2 | A | 20 |
| 3 | A | 30 |
| 4 | B | 40 |
| 5 | B | 50 |
| 6 | B | 60 |
| 7 | C | 70 |
| 8 | C | 80 |
| 9 | C | 90 |
| 10 | D | 100 |
+----+------+-----+
Similar to the linked answer, say I wish to output the three fields, with a running total for each type value, with the value of the running total limited to a maximum of 100, I might use a correlated subquery such as the following:
select q.* from
(
select t.id, t.type, t.num,
(
select sum(u.num)
from mytable u where u.type = t.type and u.id <= t.id
) as rt
from mytable t
) q
where q.rt < 100
This produces the expected result:
+----+------+-----+----+
| id | type | num | rt |
+----+------+-----+----+
| 1 | A | 10 | 10 |
| 2 | A | 20 | 30 |
| 3 | A | 30 | 60 |
| 4 | B | 40 | 40 |
| 5 | B | 50 | 90 |
| 7 | C | 70 | 70 |
+----+------+-----+----+
Observation
Now assume that I wish to filter the result to show only those values for type like "[AB]".
If I use either of the following queries:
select q.* from
(
select t.id, t.type, t.num,
(
select sum(u.num)
from mytable u where u.type = t.type and u.id <= t.id
) as rt
from mytable t
where t.type like "[AB]"
) q
where q.rt < 100
select q.* from
(
select t.id, t.type, t.num,
(
select sum(u.num)
from mytable u where u.type = t.type and u.id <= t.id
) as rt
from mytable t
) q
where q.rt < 100 and q.type like "[AB]"
The results are filtered as expected, but the values in the rt (running total) column disappear:
+----+------+-----+----+
| id | type | num | rt |
+----+------+-----+----+
| 1 | A | 10 | |
| 2 | A | 20 | |
| 3 | A | 30 | |
| 4 | B | 40 | |
| 5 | B | 50 | |
+----+------+-----+----+
Question
Why would the filter cause the values returned by the correlated subquery to disappear?
Thank you for your time reading my question and in advance for any advice you can offer.
Moving type criteria to the aggregate subquery works.
One less tier works but the aggregate subquery has to repeat in WHERE clause:
SELECT mytable.*, (select sum(u.num)
from mytable u where u.type = MyTable.type and u.id <= MyTable.id
) AS rt
FROM mytable
WHERE ((((select sum(u.num)
from mytable u where u.type = MyTable.type and u.id <= MyTable.id
))<100) AND ((mytable.[type]) Like "[AB]"));
An INNER JOIN version:
select MyTable.*, q.* from MyTable INNER JOIN
(
select t.id, t.type, t.num,
(
select sum(u.num)
from mytable u where u.type = t.type and u.id <= t.id
) as rt
from mytable t
) q
ON q.id=MyTable.ID
where q.rt < 100 AND MyTable.Type LIKE "[AB]";

Group by following numbers with different values

ID | Week | BeginDate | EndDate | Value
1 | 38 | 14.9.2015 | 20.9.2015 | 100
2 | 39 | 21.9.2015 | 27.9.2015 | 100
3 | 40 | 28.9.2015 | 2.10.2015 | 100
4 | 42 | 12.10.2015 | 18.10.2015 | 100
5 | 43 | 19.10.2015 | 25.10.2015 | 100
6 | 44 | 26.10.2015 | 31.10.2015 | 80
How can I group this record for following weeks with same value.
The begindate and end date is also important.
In this case I expect 3 records:
StartDate | EndDate | Value
14.9.2015 | 02.10.2015 | 100
12.10.2015 | 25.10.2015 | 100
26.10.2015 | 31.10.2015 | 80
This could probably be made a lot more efficient than it currently is, but I believe it produces the desired output.
; WITH CTE1 AS (
SELECT T1.[Week]
, T1.[BeginDate]
, T1.[EndDate]
, T1.[Value]
, T2.[Week] [T2Week]
, ROW_NUMBER() OVER (ORDER BY T1.[Week]) [RN]
FROM tblName T1
LEFT JOIN (SELECT [Week], [Value] FROM tblName) T2 ON T1.[Week] = T2.[Week] + 1 AND T1.[Value] = T2.[Value])
, CTE2 AS (SELECT T3.[Week]
, T3.[BeginDate]
, T3.[EndDate]
, T3.[Value]
, T4.[EndDate] [T2EndDate]
, ROW_NUMBER() OVER (ORDER BY T3.[Week]) [RN]
FROM CTE1 T3
LEFT JOIN (SELECT [EndDate], [RN] FROM CTE1) T4 ON T3.[RN] = T4.[RN] + 1
WHERE T3.[T2Week] IS NULL)
SELECT T5.[BeginDate]
, ISNULL(T6.[T2EndDate], T5.[EndDate]) [T2EndDate]
, T5.[Value]
FROM CTE2 T5
LEFT JOIN (SELECT [T2EndDate], [RN] FROM CTE2) T6 ON T5.[RN] = T6.[RN] - 1
-- ORDER BY T5.[RN]

Repeat row based on the number in a column

SqlFiddle Demo
I need to repeat each barcode of the article based on the quantity of this article in the table Stock.
This is source data:
| BarCode | quantity |
|---------|----------|
| 5142589 | 7 |
| 123454 | 5 |
| 1111145 | 3 |
I want result that looks like this:
Barcode
-------
5142589
5142589
5142589
5142589
5142589
5142589
5142589
123454
123454
123454
123454
123454
1111145
1111145
1111145
How can I do this?
Thanks
You can use table of numbers. Either permanent, or generated on the fly.
Query below uses CTE to generate up to 1000 numbers. Here is SQL Fiddle.
WITH
e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
) -- 10
,e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b) -- 10*10
,e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2) -- 10*100
,CTE_Numbers
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY n) AS Number
FROM e3
)
SELECT b.BarCode, s.quantity
FROM
TABLE_BARCODE b
INNER JOIN TABLE_STOCK s ON b.IdArticle = s.IdArticle
CROSS APPLY
(
SELECT TOP(s.quantity) CTE_Numbers.Number
FROM CTE_Numbers
ORDER BY CTE_Numbers.Number
) AS CA
Results:
| BarCode | quantity |
|---------|----------|
| 5142589 | 7 |
| 5142589 | 7 |
| 5142589 | 7 |
| 5142589 | 7 |
| 5142589 | 7 |
| 5142589 | 7 |
| 5142589 | 7 |
| 123454 | 5 |
| 123454 | 5 |
| 123454 | 5 |
| 123454 | 5 |
| 123454 | 5 |
| 1111145 | 3 |
| 1111145 | 3 |
| 1111145 | 3 |
You can get this by a simple recursive CTE.
WITH cte
AS
(
SELECT IdArticle,1 AS rn FROM TABLE_STOCK
UNION ALL
SELECT t.IdArticle,rn+1 AS rn
FROM cte c
INNER JOIN TABLE_STOCK t ON t.IdArticle = c.IdArticle and rn<t.QUANTITY
)
SELECT t.BarCode,TS.QUANTITY
FROM cte c
INNER JOIN TABLE_BARCODE t ON t.IdArticle = c.IdArticle
INNER JOIN TABLE_STOCK TS ON TS.IdArticle = C.IdArticle
ORDER BY t.IdArticle
Here is SQL Fiddle
Simplified and improved version of Vladmir's answer:
DECLARE #t table(BarCode int, quantity int)
INSERT #t values(5142589, 7),(123454, 5),(1111145,3)
;WITH
e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
) -- 10
,e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b) -- 10*10
,e3(n) AS (SELECT 1 FROM e2 CROSS JOIN e2 ex) -- 100*100
SELECT BarCode
FROM #t t
CROSS APPLY
(
SELECT top(t.quantity) null dummy
FROM e3
) x

Select a row X times

I have a very specific sql problem.
I have a table given with order positions (each position belongs to one order, but this isn't a problem):
| Article ID | Amount |
|--------------|----------|
| 5 | 3 |
| 12 | 4 |
For the customer, I need an export with every physical item that is ordered, e.g.
| Article ID | Position |
|--------------|------------|
| 5 | 1 |
| 5 | 2 |
| 5 | 3 |
| 12 | 1 |
| 12 | 2 |
| 12 | 3 |
| 12 | 4 |
How can I build my select statement to give me this results? I think there are two key tasks:
1) Select a row X times based on the amount
2) Set the position for each physical article
You can do it like this
SELECT ArticleID, n.n Position
FROM table1 t JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
) n
ON n.n <= t.amount
ORDER BY ArticleID, Position
Note: subquery n generates a sequence of numbers on the fly from 1 to 100. If you do a lot of such queries you may consider to create persisted tally(numbers) table and use it instead.
Here is SQLFiddle demo
or using a recursive CTE
WITH tally AS (
SELECT 1 n
UNION ALL
SELECT n + 1 FROM tally WHERE n < 100
)
SELECT ArticleID, n.n Position
FROM table1 t JOIN tally n
ON n.n <= t.amount
ORDER BY ArticleID, Position
Here is SQLFiddle demo
Output in both cases:
| ARTICLEID | POSITION |
|-----------|----------|
| 5 | 1 |
| 5 | 2 |
| 5 | 3 |
| 12 | 1 |
| 12 | 2 |
| 12 | 3 |
| 12 | 4 |
Query:
SQLFIDDLEExample
SELECT t1.[Article ID],
t2.number
FROM Table1 t1,
master..spt_values t2
WHERE t1.Amount >= t2.number
AND t2.type = 'P'
AND t2.number <= 255
AND t2.number <> 0
Result:
| ARTICLE ID | NUMBER |
|------------|--------|
| 5 | 1 |
| 5 | 2 |
| 5 | 3 |
| 12 | 1 |
| 12 | 2 |
| 12 | 3 |
| 12 | 4 |