Get Row's Sequence (Linked-List) in PostgreSQL - sql

I have a submissions table which is essentially a single linked list. Given the id of a given row I want to return the entire list that particular row is a part of (and it be in the proper order). For example in the table below if had id 2 I would want to get back rows 1,2,3,4 in that order.
(4,3) -> (3,2) -> (2,1) -> (1,null)
I expect 1,2,3,4 here because 4 is essentially the head of the list that 2 belongs to and I want to traverse all the through the list.
http://sqlfiddle.com/#!15/c352e/1
Is there a way to do this using postgresql's RECURSIVE CTE? So far I have the following but this will only give me the parents and not the descendants
WITH RECURSIVE "sequence" AS (
SELECT * FROM submissions WHERE "submissions"."id" = 2
UNION ALL SELECT "recursive".* FROM "submissions" "recursive"
INNER JOIN "sequence" ON "recursive"."id" = "sequence"."link_id"
)
SELECT "sequence"."id" FROM "sequence"

This approach uses what you have already come up with.
It adds another block to calculate the rest of the list and then combines both doing a custom reverse ordering.
WITH RECURSIVE pathtobottom AS (
-- Get the path from element to bottom list following next element id that matches current link_id
SELECT 1 i, -- add fake order column to reverse retrieved records
* FROM submissions WHERE submissions.id = 2
UNION ALL
SELECT pathtobottom.i + 1 i, -- add fake order column to reverse retrieved records
recursive.* FROM submissions recursive
INNER JOIN pathtobottom ON recursive.id = pathtobottom.link_id
)
, pathtotop AS (
-- Get the path from element to top list following previous element link_id that matches current id
SELECT 1 i, -- add fake order column to reverse retrieved records
* FROM submissions WHERE submissions.id = 2
UNION ALL
SELECT pathtotop.i + 1 i, -- add fake order column to reverse retrieved records
recursive2.* FROM submissions recursive2
INNER JOIN pathtotop ON recursive2.link_id = pathtotop.id
), pathtotoprev as (
-- Reverse path to top using fake 'i' column
SELECT pathtotop.id FROM pathtotop order by i desc
), pathtobottomrev as (
-- Reverse path to bottom using fake 'i' column
SELECT pathtobottom.id FROM pathtobottom order by i desc
)
-- Elements ordered from bottom to top
SELECT pathtobottomrev.id FROM pathtobottomrev where id != 2 -- remove element to avoid duplicate
UNION ALL
SELECT pathtotop.id FROM pathtotop;
/*
-- Elements ordered from top to bottom
SELECT pathtotoprev.id FROM pathtotoprev
UNION ALL
SELECT pathtobottom.id FROM pathtobottom where id != 2; -- remove element to avoid duplicate
*/

In was yet another quest for my brain. Thanks.
with recursive r as (
select *, array[id] as lst from submissions s where id = 6
union all
select s.*, r.lst || s.id
from
submissions s inner join
r on (s.link_id=r.id or s.id=r.link_id)
where (not array[s.id] <# r.lst)
)
select * from r;

Related

Efficiently outer join two array columns row-wise in BigQuery table

I'll first state the question as simply as possible, then elaborate with more detail and an example.
Concise question without context
I have a table with rows containing columns of arrays. I need to outer join the elements of some pairs of these, compute some variables, and then aggregate the results back into a new array. I'm currently using a pattern where I:
unnest each column in the pair to be joined (cross join to PK of row)
full outer join the two on the PK and compute desired fields
group by PK to get back to single row with array column that summarizes the results
Is there a way to do this without the multiple unnesting and grouping back down?
More context and an example
I have a table which represents edits to an entity that is made up of multiple sub-records. Each row represents a single entity. There is a column before that contains the records before the edit, and another after that contains the records afterwards.
My goal is to label each sub-record with exactly one of the four valid edit types:
DELETE - record exists in before but not after
ADD - record exists in after but not before
EDIT - record exists in both before and after but any field was changed
NONE - record exists in both before and after and no fields were changed
Each of the sub-record values is represented by its ID and a hash of all of its fields. I've created some fake data and provided my initial implementation below. This works, but it seems very roundabout.
WITH source_data AS (
SELECT
1 AS pkField,
[
STRUCT(1 AS id, 1 AS fieldHash),
STRUCT(2 AS id, 2 AS fieldHash),
STRUCT(3 AS id, 3 AS fieldHash)
] AS before,
[
STRUCT(1 AS id, 1 AS fieldHash),
STRUCT(2 AS id, 0 AS fieldHash), -- record 2 edited
-- record 3 deleted
STRUCT(4 AS id, 4 AS fieldHash), -- record 4 added
STRUCT(5 AS id, 5 AS fieldHash) -- record 5 added
] AS after
)
SELECT
pkField,
ARRAY_AGG(STRUCT(
id,
CASE
WHEN beforeHash IS NULL THEN "ADD"
WHEN afterHash IS NULL THEN "DELETE"
WHEN beforeHash <> afterHash THEN "EDIT"
ELSE "NONE"
END AS editType
)) AS edits
FROM (
SELECT pkField, id, fieldHash AS beforeHash
FROM source_data
CROSS JOIN UNNEST(source_data.before)
)
FULL OUTER JOIN (
SELECT pkField, id, fieldHash AS afterHash
FROM source_data
CROSS JOIN UNNEST(source_data.after)
)
USING (pkField, id)
GROUP BY pkField
Is there a simpler and/or more efficient way to do this? Perhaps something that avoids the multiple unnesting and grouping back down?
I think, what you have is already simple and efficient way!
Meantime, you can consider below optimized version
select pkField,
array(select struct(
id, case
when b.fieldHash is null then 'ADD'
when a.fieldHash is null then 'DELETE'
when b.fieldHash != a.fieldHash then 'EDIT'
else 'NONE'
end as editType
) edits
from (select id, fieldHash from t.before) b
full outer join (select id, fieldHash from t.after) a
using(id)
) edits
from source_data t
if applied to sample data in your question - output is

How to unnest BigQuery nested records into multiple columns

I am trying to unnest the below table .
Using the below unnest query to flatten the table
SELECT
id,
name ,keyword
FROM `project_id.dataset_id.table_id`
,unnest (`groups` ) as `groups`
where id = 204358
Problem is , this duplicates the rows (except name) as is the case with flattening the table.
How can I modify the query to put the names in two different columns rather than rows.
Expected output below -
That's because the comma is a cross join - in combination with an unnested array it is a lateral cross join. You repeat the parent row for every row in the array.
One problem with pivoting arrays is that arrays can have a variable amount of rows, but a table must have a fixed amount of columns.
So you need a way to decide for a certain row that becomes a certain column.
E.g. with
SELECT
id,
name,
groups[ordinal(1)] as firstArrayEntry,
groups[ordinal(2)] as secondArrayEntry,
keyword
FROM `project_id.dataset_id.table_id`
unnest(groups)
where id = 204358
If your array had a key-value pair you could decide using the key. E.g.
SELECT
id,
name,
(select value from unnest(groups) where key='key1') as key1,
keyword
FROM `project_id.dataset_id.table_id`
unnest(groups)
where id = 204358
But that doesn't seem to be the case with your table ...
A third option could be PIVOT in combination with your cross-join solution but this one has restrictions too: and I'm not sure how computation-heavy this is.
Consider below simple solution
select * from (
select id, name, keyword, offset
from `project_id.dataset_id.table_id`,
unnest(`groups`) with offset
) pivot (max(name) name for offset + 1 in (1, 2))
if applied to sample data in your question - output is
Note , when you apply to your real case - you just need to know how many such name_NNN columns to expect and extend respectively list - for example for offset + 1 in (1, 2, 3, 4, 5)) if you expect 5 such columns
In case if for whatever reason you want improve this - use below where everything is built dynamically for you so you don't need to know in advance how many columns it will be in the output
execute immediate (select '''
select * from (
select id, name, keyword, offset
from `project_id.dataset_id.table_id`,
unnest(`groups`) with offset
) pivot (max(name) name for offset + 1 in (''' || string_agg('' || pos, ', ') || '''))
'''
from (select pos from (
select max(array_length(`groups`)) cnt
from `project_id.dataset_id.table_id`
), unnest(generate_array(1, cnt)) pos
))
Your question is a little unclear, because it does not specify what to do with other keywords or other columns. If you specifically want the first two values in the array for keyword "OVG", you can unnest the array and pull out the appropriate names:
SELECT id,
(SELECT g.name
FROM UNNEST(t.groups) g WITH OFFSET n
WHERE key = 'OVG'
ORDER BY n
LIMIT 1
) as name_1,
(SELECT g.name
FROM UNNEST(t.groups) g WITH OFFSET n
WHERE key = 'OVG'
ORDER BY n
LIMIT 1 OFFSET 1
) as name_2,
'OVG' as keyword
FROM `project_id.dataset_id.table_id` t
WHERE id = 204358;

Teradata SQL Reverse Parent Child Hierarchy

I know how to build a hierarchy starting with the root node (i.e. where parent_id is null or something like that), but I can't find anything on how to build a hierarchy upward from the final child/edge node. I'd like to start with a child and build all the way back up to the top. Assume I don't know how many levels, or who the parent is, and we'll have to use SQL to figure it out.
Here is my base table:
old_entity_key,new_entity_key
1,2
2,3
3,4
4,5
5,6
Desired output:
new_entity_key,path
2,1/2
3,1/2/3
4,1/2/3/4
5,1/2/3/4/5
6,1/2/3/4/5/6
This is also acceptable:
new_entity_key,path
2,2/1
3,3/2/1
4,4/3/2/1
5,5/4/3/2/1
6,6/5/4/3/2/1
Here is the CTE I've started with:
with recursive history as (
select
old_entity_key,
new_entity_key,
cast(old_entity_key||'/'||new_entity_key as varchar(1000)) as path
from table
where new_entity_key not in (select old_entity_key from table)
and cast(start_time as date) between current_date - interval '3' day and current_date
union all
select
c.old_entity_key,
c.new_entity_key,
p.new_entity_key||'/'||c.path
from history c
join table p on p.new_entity_key = c.old_entity_key
)
select new_entity_key, old_entity_key, substr(path, 1, instr(path, '/') - 1) as original_entity_key, path
from history s;
The problem with the above query is that it runs forever. I think I've created an infinite loop. I've also tried using the below where filter in the bottom query of the union to try to find the root node, but Teradata gives me an error:
where p.new_entity_key in (select old_entity_key from table)
Any help would be greatly appreciated.
You'll need some sort of counter, and I think your join logic in your CTE doesn't make sense. I threw together a very simple volatile table example:
create volatile table tb
(old_entity_key char(1),
new_entity_key char(1),
rn integer)
on commit preserve rows;
insert into tb values ('1','2',1);
insert into tb values ('2','3',2);
insert into tb values ('3','4',3);
Now we can put together a recursive CTE:
with recursive history as (
select
old_entity_key,
new_entity_key,
cast(old_entity_key||'/'||new_entity_key as varchar(1000)) as path,
rn
from tb t
where
rn = 1
union all
select
t.old_entity_key,
t.new_entity_key,
h.path || '/' || t.new_entity_key,
t.rn
from
tb t
join history h
on t.rn = h.rn + 1
)
select * from history order by rn
The important things here are:
Limit your first pass (accomplished here by rn=1).
The second pass needs to pick up the "next" row, based on the previous row (t.rn = h.rn + 1)

How to Update a group of rows

My sqlfiddle: http://sqlfiddle.com/#!15/4f9da/1
I'm really bad explaining this and noob to do complex query(just the basics), because its complicated.
Situation: The column revision is a group of the same object related, for example: ids 1 2 3 are the same object and always refering the last old object on using id to ground_id.
Problem: I need to make ord column to make same id for the same group of object. example: the ids 1 2 3 need their value setted to 1, because the revison 0 is the id 1. Same for id 4, which must have ord 4 and id 5 too.
Basically must be like this:
You need a recursive query to do this. First you select the rows where ground_id IS NULL, set ord to the value of id. In the following iterations you add more rows based on the value of ground_id, setting the ord value to that of the row it is being matched to. You can then use that set of rows (id, ord) as a row source for the UPDATE:
WITH RECURSIVE set_ord (id, ord) AS (
SELECT id, id
FROM ground
WHERE ground_id IS NULL
UNION
SELECT g.id, o.ord
FROM ground g
JOIN set_ord o ON o.id = g.ground_id
)
UPDATE ground g
SET ord = s.ord
FROM set_ord s
WHERE g.id = s.id;
(SQLFiddle is currently not-responsive so I can't post my code there)

Ordering a SQL query based on the value in a column determining the value of another column in the next row

My table looks like this:
Value Previous Next
37 NULL 42
42 37 3
3 42 79
79 3 NULL
Except, that the table is all out of order. (There are no duplicates, so that is not an issue.) I was wondering if there was any way to make a query that would order the output, basically saying "Next row 'value' = this row 'next'" as it's shown above ?
I have no control over the database and how this data is stored. I am just trying to retrieve it and organize it. SQL Server I believe 2008.
I realize that this wouldn't be difficult to reorganize afterwards, but I was just curious if I could write a query that just did that out of the box so I wouldn't have to worry about it.
This should do what you need:
WITH CTE AS (
SELECT YourTable.*, 0 Depth
FROM YourTable
WHERE Previous IS NULL
UNION ALL
SELECT YourTable.*, Depth + 1
FROM YourTable JOIN CTE
ON YourTable.Value = CTE.Next
)
SELECT * FROM CTE
ORDER BY Depth;
[SQL Fiddle] (Referential integrity and indexes omitted for brevity.)
We use a recursive common table expression (CTE) to travel from the head of the list (WHERE Previous IS NULL) to the trailing nodes (ON YourTable.Value = CTE.Next) and at the same time memorize the depth of the recursion that was needed to reach the current node (in Depth).
In the end, we simply sort by the depth of recursion that was needed to reach each of the nodes (ORDER BY Depth).
Use a recursive query, with the one i list here you can have multiple paths along your linked list:
with cte (Value, Previous, Next, Level)
as
(
select Value, Previous, Next, 0 as Level
from data
where Previous is null
union all
select d.Value, d.Previous, d.Next, Level + 1
from data d
inner join cte c on d.Previous = c.Value
)
select * from cte
fiddle here
If you are using Oracle, try Starts with- connect by
select ... start with initial-condition connect by
nocycle recursive-condition;
EDIT: For SQL-Server, use WITH syntax as below:
WITH rec(value, previous, next) AS
(SELECT value, previous, next
FROM table1
WHERE previous is null
UNION ALL
SELECT nextRec.value, nextRec.previous, nextRec.next
FROM table1 as nextRec, rec
WHERE rec.next = nextRec.value)
SELECT value, previous, next FROM rec;
One way to do this is with a join:
select t.*
from t left outer join
t tnext
on t.next = tnext.val
order by tnext.value
However, won't this do?
select t.*
from t
order by t.next
Something like this should work:
With Parent As (
Select
Value,
Previous,
Next
From
table
Where
Previous Is Null
Union All
Select
t.Value,
t.Previous,
t.Next
From
table t
Inner Join
Parent
On Parent.Next = t.Value
)
Select
*
From
Parent
Example