Teradata how to use recursion to build JSON hierarchy - sql

What is the equivalent Teradata syntax to answer the same question asked about reverse aggregation inside of common table expression found at Reverse aggregation inside of common table expression?
I am trying to hack a Teradata syntax version to iterate over a parent child relationship table and build JSON which places parent of child who is parent to child who is parent to child etc in the one JSON field.
This is the answer given in the question from the hyperlink listed above which I think is written for PostgreSQL. I would really appreciate assistance in translating this to TD as I think this answer should allow me to accomplish my intended task. If not please set me straight.
I am not sure what row_to_json(c) is calling should this be JSON_AGG(c.children)? and I think that the double colon (NULL::JSON) is casting a null to a JSON data type? In any case I have tried a few variations to no avail. Please help.
Here is the PostgreSQL syntax answer given:
WITH RECURSIVE cte AS (
SELECT id, parent_id, name, NULL::JSON AS children
FROM people p
WHERE NOT EXISTS ( -- only leaf nodes; see link below
SELECT 1 FROM people
WHERE parent_id = p.id
)
UNION ALL
SELECT p.id, p.parent_id, p.name, row_to_json(c) AS children
FROM cte c
JOIN people p ON p.id = c.parent_id
)
SELECT id, name, json_agg(children) AS children
FROM cte
GROUP BY 1, 2;

When translating the PostgreSQL to Teradata I encountered a restriction, JSON columns are not supported by set operations like UNION.
Casting JSON/VarChar back and forth is a workaround:
CREATE VOLATILE TABLE people (id INT, name VARCHAR(20), parent_id INT) ON COMMIT PRESERVE ROWS;
INSERT INTO people VALUES(1, 'Adam', NULL);
INSERT INTO people VALUES(2, 'Abel', 1);
INSERT INTO people VALUES(3, 'Cain', 1);
INSERT INTO people VALUES(4, 'Enoch', 3);
WITH RECURSIVE cte AS (
SELECT id, parent_id, name,
CAST(NULL AS VARCHAR(2000)) AS children
FROM people p
WHERE NOT EXISTS (
SELECT * FROM people
WHERE parent_id = p.id
)
UNION ALL
SELECT p.id, p.parent_id, p.name,
-- VarChar -> JSON -> VarChar
CAST(JSON_COMPOSE(c.id,
c.name,
NEW JSON(c.children) AS children) AS VARCHAR(10000)) AS children
FROM cte c
JOIN people p ON p.id = c.parent_id
)
SELECT id, name,
JSON_AGG(NEW JSON(children) AS children) AS children
FROM cte
GROUP BY 1, 2;
The result is similar, but not exactly the same, Teradata adds "children":, e.g:
{"children":{"id":4,"name":"Enoch","children":null}} -- Teradata
[{"id":4,"name":"Enoch","children":null}] -- PostgreSQL
Finally adding JSONExtract to get the array only:
SELECT id, name,
JSON_AGG(NEW JSON(children) AS X).JSONExtract('$..X') AS children
FROM cte
GROUP BY 1, 2;
[{"id":4,"name":"Enoch","children":null}]

Related

SQL aliasing with FROM AS

SELECT A.barName AS BarName1, B.barName AS BarName2
FROM (
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
) AS A, B
WHERE A.count = B.count
I'm trying to do a self join on this table that I created, but I'm not sure how to alias the table twice in this format (i.e. FROM AS). Unfortunately, this is a school assignment where I can't create any new tables. Anyone have experience with this syntax?
edit: For clarification I'm using PostgreSQL 8.4. The schema for the tables I'm dealing with are as follows:
Drinkers(name, addr, hobby, frequent)
Bars(name, addr, owner)
Beers(name, brewer, alcohol)
Drinks(drinkerName, drinkerAddr, beerName, rating)
Sells(barName, beerName, price, discount)
Favorites(drinkerName, drinkerAddr, barName, beerName, season)
Again, this is for a school assignment, so I'm given read-only access to the above tables.
What I'm trying to find is pairs of bars (Name1, Name2) that sell the same set of drinks. My thinking in doing the above was to try and find pairs of bars that sell the same number of drinks, then list the names and drinks side by side (BarName1, Drink1, BarName2, Drink2) to try and compare if they are indeed the same set.
You have not mentioned what RDBMS you use.
If Oracle or MS SQL, you can do something like this (I use my sample data table, but you can try it with your tables):
create table some_data (
parent_id int,
id int,
name varchar(10)
);
insert into some_data values(1, 2, 'val1');
insert into some_data values(2, 3, 'val2');
insert into some_data values(3, 4, 'val3');
with data as (
select * from some_data
)
select *
from data d1
left join data d2 on d1.parent_id = d2.id
In your case this query
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
should be placed in WITH section and referenced from main query 2 times as A and B.
It is slightly unclear what you are trying to achive. Are you looking for a list bar names, with how many times they appear in the table? If so, there are a couple ways you could do this. Firstly:
SELECT SellsA.barName AS BarName1, SellsB.count AS Count
FROM
(
SELECT DISTINCT barName
FROM Sells
) SellsA
LEFT JOIN
(
SELECT Sells.barName, COUNT(barName) AS count
FROM Sells
GROUP BY barName
) AS SellsB
ON SellsA.barName = SellsB.barName
Secondly, if you are using MSSQL:
SELECT barNamr, MAX(rn) AS Count
FROM
(
SELECT barName,
ROW_NUMBER() OVER(ORDR BY barName PARTITION BY barName) as rn
FROM Sells
) CountSells
GROUP BY barName
Thirdly, you could avoid a self-join in MSSQL, by using OVER():
SELECT
barName
COUNT(*) OVER(ORDER BY barName PARTITION BY barName) AS Count
FROM Sells

Matching a set of child records between two similar table hierarchies

I have two similar table hierarchies:
Owner -> OwnerGroup -> Parent
and
Owner2 -> OwnerGroup2
I would like to determine if there is an exact match of Owners that exists in Owner2 based on a set of values. There are approximately a million rows in each Owner table. Some OwnerGroups contain up to 100 Owners.
So basically if there is an OwnerGroup than contains Owners "Smith", "John" and "Smith, "Jane", I want to know the id of the OwnerGroup2s that are exact matches.
The first attempt at this was to generate a join per Owner (which required dynamic sql being generated in the application:
select og.id
from owner_group2 og
-- dynamic bit starts here
join owner2 o1 on
(og.id = o1.og_id) AND
(o1.given_names = 'JOHN' and o1.surname='SMITH')
-- dynamic bit ends here
join owner2 o2 on
(og.id = o2.og_id) AND
(o2.given_names = 'JANE' and o2.surname='SMITH');
This works fine until for small numbers of owners, but when we have to deal with the 100 Owners in a group scenario as this query plan means there 100 nested loops and it takes almost a minute to run.
Another option I had was to use something around the intersect operator. E.g.
select * from (
select o.surname, o.given_names
from owner1 o1
join owner_group1 og1 on o1.og_id = og1.id
where
og1.parent_id = 1936233
)
intersect
select o.surname, o.given_names
from owner2 o2
join owner_group2 og2 on og2.id = o2.og_id;
I'm not sure how to suck out the owner2.id in this scenario either - and it was still running in the 4-5 second range.
I feel like I am missing something obvious - so please feel free to provide some better solutions!
You're on the right track with intersect, you just need to go a bit further. You need to join the results of it back to the owner_groups2 table to find the ids.
You can use the listagg function to convert the groups into comma-separated lists of the names (note - requires 11g). You can then take the intersection of these name lists to find the matches and join this back to the list in owner_groups2.
I've created a simplified example below, in it "Dave, Jill" is the group that is present in both tables.
create table grps (id integer, name varchar2(100));
create table grps2 (id integer, name varchar2(100));
insert into grps values (1, 'Dave');
insert into grps values(1, 'Jill');
insert into grps values (2, 'Barry');
insert into grps values(2, 'Jane');
insert into grps2 values(3, 'Dave');
insert into grps2 values(3, 'Jill');
insert into grps2 values(4, 'Barry');
with grp1 as (
SELECT id, listagg(name, ',') within group (order by name) n
FROM grps
group by id
), grp2 as (
SELECT id, listagg(name, ',') within group (order by name) n
FROM grps2
group by id
)
SELECT * FROM grp2
where n in (
-- find the duplicates
select n from grp1
intersect
select n from grp2
);
Note this will still require a full scan of owner_groups2; I can't think of a way you can avoid this. So your query is likely to remain slow.

Is there a better way to get the last level of groups from a table?

I have a Group table and within that table, there is a ParentId column that denotes a groups parent within the Group table. The purpose is to build a dynamic menu from these groups. I know I can loop and grab the last child and construct a result set, but I'm curious if there's a more SQL-y way of accomplishing this.
The table has Id, ParentId and Title fields of int, int and varchar.
Basically, a hierarchy may be constructed this way (People is the base group):
People -> Male -> Boy
-> Man
-> Female
I want to grab the last child(ren) of each branch. So, {Boy, Man, Female} in this case.
As I mentioned, getting that info isn't a problem. I'm just looking for a better way of getting it without having to write a bunch of unions and loops where I can basically change the base group and traverse the entire hierarchy outward, dynamically. I'm not really a Db guy, so I don't know if there's a slick way of doing this or not.
To get the leaf levels for one of many hierarchies, you can use a Recursive Common Table Expressions (CTEs) to enumerate the hierarchy, and then check which members are not the parent of another group to filter to the leaves:
Declare #RootID int = 1
;with cte as (
select
Id,
ParentId,
Title
From
Groups
Where
Id = #RootID
Union All
Select
g.Id,
g.ParentId,
g.Title
From
cte c
Inner Join
Groups g
On c.Id = g.ParentID
)
Select
*
From
cte g
Where
Not Exists (
Select
'x'
From
Groups g2
Where
g2.ParentID = g.Id
);
You can also do this with a left join rather than a not exists
http://sqlfiddle.com/#!6/8f1aa/9
Since you are using SQL Server 2012 you could take advantage of hierarchyid; here is an example following Laurence's schema:
CREATE TABLE Groups
(
Id INT NOT NULL
PRIMARY KEY
, Title VARCHAR(20)
, HID HIERARCHYID
)
INSERT INTO Groups
VALUES ( 1, 'People', '/' ),
( 2, 'Male', '/1/' ),
( 3, 'Female', '/2/' ),
( 4, 'Boy', '/1/1/' ),
( 5, 'Man', '/1/2/' );
SELECT Id
, Title
FROM Groups
WHERE HID NOT IN ( SELECT HID.GetAncestor(1)
FROM Groups
WHERE HID.GetAncestor(1) IS NOT NULL )
http://sqlfiddle.com/#!6/00330/1/0
Results:
ID TITLE
3 Female
4 Boy
5 Man

SQL Server 2005 recursive query with loops in data - is it possible?

I've got a standard boss/subordinate employee table. I need to select a boss (specified by ID) and all his subordinates (and their subrodinates, etc). Unfortunately the real world data has some loops in it (for example, both company owners have each other set as their boss). The simple recursive query with a CTE chokes on this (maximum recursion level of 100 exceeded). Can the employees still be selected? I care not of the order in which they are selected, just that each of them is selected once.
Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:
with
UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl
Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.
I know it has been a while but thought I should share my experience as I tried every single solution and here is a summary of my findings (an maybe this post?):
Adding a column with the current path did work but had a performance hit so not an option for me.
I could not find a way to do it using CTE.
I wrote a recursive SQL function which adds employeeIds to a table. To get around the circular referencing, there is a check to make sure no duplicate IDs are added to the table. The performance was average but was not desirable.
Having done all of that, I came up with the idea of dumping the whole subset of [eligible] employees to code (C#) and filter them there using a recursive method. Then I wrote the filtered list of employees to a datatable and export it to my stored procedure as a temp table. To my disbelief, this proved to be the fastest and most flexible method for both small and relatively large tables (I tried tables of up to 35,000 rows).
this will work for the initial recursive link, but might not work for longer links
DECLARE #Table TABLE(
ID INT,
PARENTID INT
)
INSERT INTO #Table (ID,PARENTID) SELECT 1, 2
INSERT INTO #Table (ID,PARENTID) SELECT 2, 1
INSERT INTO #Table (ID,PARENTID) SELECT 3, 1
INSERT INTO #Table (ID,PARENTID) SELECT 4, 3
INSERT INTO #Table (ID,PARENTID) SELECT 5, 2
SELECT * FROM #Table
DECLARE #ID INT
SELECT #ID = 1
;WITH boss (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM #Table
WHERE PARENTID = #ID
),
bossChild (ID,PARENTID) AS (
SELECT ID,
PARENTID
FROM boss
UNION ALL
SELECT t.ID,
t.PARENTID
FROM #Table t INNER JOIN
bossChild b ON t.PARENTID = b.ID
WHERE t.ID NOT IN (SELECT PARENTID FROM boss)
)
SELECT *
FROM bossChild
OPTION (MAXRECURSION 0)
what i would recomend is to use a while loop, and only insert links into temp table if the id does not already exist, thus removing endless loops.
Not a generic solution, but might work for your case: in your select query modify this:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
to become:
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
and a.[User_ID] <> #UserID
You don't have to do it recursively. It can be done in a WHILE loop. I guarantee it will be quicker: well it has been for me every time I've done timings on the two techniques. This sounds inefficient but it isn't since the number of loops is the recursion level. At each iteration you can check for looping and correct where it happens. You can also put a constraint on the temporary table to fire an error if looping occurs, though you seem to prefer something that deals with looping more elegantly. You can also trigger an error when the while loop iterates over a certain number of levels (to catch an undetected loop? - oh boy, it sometimes happens.
The trick is to insert repeatedly into a temporary table (which is primed with the root entries), including a column with the current iteration number, and doing an inner join between the most recent results in the temporary table and the child entries in the original table. Just break out of the loop when ##rowcount=0!
Simple eh?
I know you asked this question a while ago, but here is a solution that may work for detecting infinite recursive loops. I generate a path and I checked in the CTE condition if the USER ID is in the path, and if it is it wont process it again. Hope this helps.
Jose
DECLARE #Table TABLE(
USER_ID INT,
MANAGER_ID INT )
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 1, 2
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 2, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 3, 1
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 4, 3
INSERT INTO #Table (USER_ID,MANAGER_ID) SELECT 5, 2
DECLARE #UserID INT
SELECT #UserID = 1
;with
UserTbl as -- Selects an employee and his subordinates.
(
select
'/'+cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
where [User_ID] = #UserID
union all
select
b.[path] +'/'+ cast( a.USER_ID as varchar(max)) as [path],
a.[User_ID],
a.[Manager_ID]
from #Table a
inner join UserTbl b
on (a.[Manager_ID]=b.[User_ID])
where charindex('/'+cast( a.USER_ID as varchar(max))+'/',[path]) = 0
)
select * from UserTbl
basicaly if you have loops like this in data you'll have to do the retreival logic by yourself.
you could use one cte to get only subordinates and other to get bosses.
another idea is to have a dummy row as a boss to both company owners so they wouldn't be each others bosses which is ridiculous. this is my prefferd option.
I can think of two approaches.
1) Produce more rows than you want, but include a check to make sure it does not recurse too deep. Then remove duplicate User records.
2) Use a string to hold the Users already visited. Like the not in subquery idea that didn't work.
Approach 1:
; with TooMuchHierarchy as (
select "User_ID"
, Manager_ID
, 0 as Depth
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.Depth + 1 as Depth
from TooMuchHierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where Depth < 100) -- Warning MAGIC NUMBER!!
, AddMaxDepth as (
select "User_ID"
, Manager_id
, Depth
, max(depth) over (partition by "User_ID") as MaxDepth
from TooMuchHierarchy)
select "user_id", Manager_Id
from AddMaxDepth
where Depth = MaxDepth
The line where Depth < 100 is what keeps you from getting the max recursion error. Make this number smaller, and less records will be produced that need to be thrown away. Make it too small and employees won't be returned, so make sure it is at least as large as the depth of the org chart being stored. Bit of a maintence nightmare as the company grows. If it needs to be bigger, then add option (maxrecursion ... number ...) to whole thing to allow more recursion.
Approach 2:
; with Hierarchy as (
select "User_ID"
, Manager_ID
, '#' + cast("user_id" as varchar(max)) + '#' as user_id_list
from "User"
WHERE "User_ID" = #UserID
union all
select U."User_ID"
, U.Manager_ID
, M.user_id_list + '#' + cast(U."user_id" as varchar(max)) + '#' as user_id_list
from Hierarchy M
inner join "User" U
on U.Manager_ID = M."user_id"
where user_id_list not like '%#' + cast(U."User_id" as varchar(max)) + '#%')
select "user_id", Manager_Id
from Hierarchy
The preferrable solution is to clean up the data and to make sure you do not have any loops in the future - that can be accomplished with a trigger or a UDF wrapped in a check constraint.
However, you can use a multi statement UDF as I demonstrated here: Avoiding infinite loops. Part One
You can add a NOT IN() clause in the join to filter out the cycles.
This is the code I used on a project to chase up and down hierarchical relationship trees.
User defined function to capture subordinates:
CREATE FUNCTION fn_UserSubordinates(#User_ID INT)
RETURNS #SubordinateUsers TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
INSERT INTO #SubordinateUsers (User_ID, Distance) VALUES ( #User_ID, 0)
DECLARE #Distance INT, #Finished BIT
SELECT #Distance = 1, #Finished = 0
WHILE #Finished = 0
BEGIN
INSERT INTO #SubordinateUsers
SELECT S.User_ID, #Distance
FROM Users AS S
JOIN #SubordinateUsers AS C
ON C.User_ID = S.Manager_ID
LEFT JOIN #SubordinateUsers AS C2
ON C2.User_ID = S.User_ID
WHERE C2.User_ID IS NULL
IF ##RowCount = 0
SET #Finished = 1
SET #Distance = #Distance + 1
END
RETURN
END
User defined function to capture managers:
CREATE FUNCTION fn_UserManagers(#User_ID INT)
RETURNS #User TABLE (User_ID INT, Distance INT) AS BEGIN
IF #User_ID IS NULL
RETURN
DECLARE #Manager_ID INT
SELECT #Manager_ID = Manager_ID
FROM UserClasses WITH (NOLOCK)
WHERE User_ID = #User_ID
INSERT INTO #UserClasses (User_ID, Distance)
SELECT User_ID, Distance + 1
FROM dbo.fn_UserManagers(#Manager_ID)
INSERT INTO #User (User_ID, Distance) VALUES (#User_ID, 0)
RETURN
END
You need a some method to prevent your recursive query from adding User ID's already in the set. However, as sub-queries and double mentions of the recursive table are not allowed (thank you van) you need another solution to remove the users already in the list.
The solution is to use EXCEPT to remove these rows. This should work according to the manual. Multiple recursive statements linked with union-type operators are allowed. Removing the users already in the list means that after a certain number of iterations the recursive result set returns empty and the recursion stops.
with UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = #UserID
union all
(
select a.[User_ID], a.[Manager_ID]
from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
where a.[User_ID] not in (select [User_ID] from UserTbl)
EXCEPT
select a.[User_ID], a.[Manager_ID] from UserTbl a
)
)
select * from UserTbl;
The other option is to hardcode a level variable that will stop the query after a fixed number of iterations or use the MAXRECURSION query option hint, but I guess that is not what you want.

Determine the hierarchy of records in a SQL database

I've got a problem I was wondering if there's an elegant solution to. It is a real business problem and not a class assignment!
I have a table with thousands of records, some of which are groups related to each other.
The database is SQL 2005.
ID is the primary key. If the record replaced an earlier record, the ID of that record is in the REP_ID column.
ID REP_ID
E D
D B
C B
B A
A NULL
So in this example, A was the original row, B replaced A, C replaced B unsuccessfully, D replaced B successfully and finally E replaced D.
I'd like to be able to display all the records in this table in a grid.
Then, I'd like for the user to be able to right click any record in any
group, and for the system to locate all the related records and display them
in a some sort of tree.
Now I can obviously brute force a solution to this but I'd like to ask the
community if they can see a more elegant answer.
It's a recursive CTE you need, something like (untested)
;WITH myCTE AS
(
SELECT
ID
FROM
myTable
WHERE
REP_ID IS NULL
UNION ALL
SELECT
ID
FROM
myTable T
JOIN
myCTE C ON T.REP_ID = C.ID
)
SELECT
*
FROM
myCTE
However, the links C->B and D->B
So you want the C->B or both?
Do you want a ranking?
etc?
Use a CTE to build your hierarchy. Something like
CREATE TABLE #test(ID CHAR(1), REP_ID CHAR(1) NULL)
INSERT INTO #test VALUES('E','D')
INSERT INTO #test VALUES('D','B')
INSERT INTO #test VALUES('C','B')
INSERT INTO #test VALUES('B','A')
INSERT INTO #test VALUES('A',NULL)
WITH tree( ID,
REP_ID,
Depth
)
AS
(
SELECT
ID,
REP_ID,
1 AS [Depth]
FROM
#test
WHERE
REP_ID IS NULL
UNION ALL
SELECT
[test].ID,
[test].REP_ID,
tree.[Depth] + 1 AS [Depth]
FROM
#test [test]
INNER JOIN
tree
ON
[test].REP_ID = tree.ID
)
SELECT * FROM tree
You probably already considered it but have you looked into simply adding a row to store the "original_id"? That'd make your queries lightning fast compared to building a tree of who inherited from whom.
Barring that, just google for "SQL tree DFS".
Just make sure you have an optimization for your DFS as follows: if you know most records only have <=3 revisions, you can start with a 3-way joint to find A, B and C right away.