Creating a Materialized Path for a tree with SQL (Teradata)

Creating a Materialized Path for a tree with SQL (Teradata) - sql

I have an org tree given in a table, which is sorted "top-down" (from parent to child). The level of each instance is also given as an attribute. data structure example:
Index
Employee_name
Employee_level
1
Michael
1
2
Pam
2
3
Jim
2
4
Dwight
3
5
Angela
1
In the above tree, Michael is the parent of Pam and Jim, while Jim is the parent of Dwight. Angela is parallel to Michael with no children.
I wish to create a column which would allow to query all the employees in a selected branch. After some reseach, I think that a Materialized path could work. therefore, I would probably need to create a column with the parent of each employee, and then have a recursion create another column with the desired key. Any ideas how to create this with Teradata SQL? Thanks

This gets you the parent:
select t.*,
(
select max(index)
from tab as t2
where t2.index < t.index
and t2.Employee_level < t.Employee_level
) as parent_idx
from tab as t
When you materialize it in a table you can do simple recursion:
WITH RECURSIVE cte AS
( -- traverse the hierarchy and built the path
SELECT idx, parent_idx, Employee_level, Employee_name
,Cast(Employee_name AS VARCHAR(500)) AS Path -- must be large enough for concatenating all levels
FROM mytab
WHERE Employee_level = 1
UNION ALL
SELECT t.idx, t.parent_idx, t.Employee_level, t.Employee_name
,cte.Path || ',' || Trim(t.Employee_name)
FROM cte JOIN mytab AS t
ON cte.idx = t.parent_idx
)
select * from cte
order by idx
;

Related

Creating a category tree table from an array of categories in PostgreSQL

How to generate ids and parent_ids from the arrays of categories. The number or depth of subcategories can be anything between 1-10 levels.
Example PostgreSQL column. Datatype character varying array.
data_column
character varying[] |
----------------------------------
[root_1, child_1, childchild_1] |
[root_1, child_1, childchild_2] |
[root_2, child_2] |
I would like to convert the column of arrays into the table as shown below that I assume is called the Adjacency List Model. I know there is also the Nested Tree Sets Model and Materialised Path model.
Final output table
id | title | parent_id
------------------------------
1 | root_1 | null
2 | root_2 | null
3 | child_1 | 1
4 | child_2 | 2
5 | childchild_1 | 3
6 | childchild_2 | 3
Final output tree hierarchy
root_1
--child_1
----childchild_1
----childchild_2
root_2
--child_2

step-by-step demo: db<>fiddle
You can do this with a recursive CTE
WITH RECURSIVE cte AS
( SELECT data[1] as title, 2 as idx, null as parent, data FROM t -- 1
UNION
SELECT data[idx], idx + 1, title, data -- 2
FROM cte
WHERE idx <= cardinality(data)
)
SELECT DISTINCT -- 3
title,
parent
FROM cte
The starting query of the recursion: Get all root elements and data you'll need within the recursion
The recursive part: Get element of new index and increase the index
After recursion: Query the columns you finally need. The DISTINCT removes tied elements (e.g. two times the same root_1).
Now you have created the hierarchy. Now you need the ids.
You can generate them in many different ways, for example using the row_number() window function:
WITH RECURSIVE cte AS (...)
SELECT
*,
row_number() OVER ()
FROM (
SELECT DISTINCT
title,
parent
FROM cte
) s
Now, every row has its own id. The order criterion may be tweaked a little. Here we have only little chance to change this without any further information. But the algorithm stays the same.
With the ids of each column, we can create a self join to join the parent id by using the parent title column. Because a self join is a repetition of the select query, it makes sense to encapsulate it into a second CTE to avoid code replication. The final result is:
WITH RECURSIVE cte AS
( SELECT data[1] as title, 2 as idx, null as parent, data FROM t
UNION
SELECT data[idx], idx + 1, title, data
FROM cte
WHERE idx <= cardinality(data)
), numbered AS (
SELECT
*,
row_number() OVER ()
FROM (
SELECT DISTINCT
title,
parent
FROM cte
) s
)
SELECT
n1.row_number as id,
n1.title,
n2.row_number as parent_id
FROM numbered n1
LEFT JOIN numbered n2 ON n1.parent = n2.title

Select all related records

I have a table (in SQL Server) that stores records as shown below. The purpose for Old_Id is for change tracking.
Meaning that when I want to update a record, the original record has to be unchanged, but a new record has to be inserted with a new Id and with updated values, and with the modified record's Id in Old_Id column
Id Name Old_Id
---------------------
1 Paul null
2 Paul 1
3 Jim null
4 Paul 2
5 Tim null
My question is:
When I search for id = 1 or 2 or 4, I want to select all related records.
In this case I want see records the following ids: 1, 2, 4
How can it be written in a stored procedure?
Even if it's bad practice to go with this, I can't change this logic because its legacy database and it's quite a large database.
Can anyone help with this?

you can do that with Recursive Common Table Expressions (CTE)
WITH cte_history AS (
SELECT
h.id,
h.name,
h.old_id
FROM
history h
WHERE old_id IS NULL
and id in (1,2,4)
UNION ALL
SELECT
e.id,
e.name,
e.old_id
FROM
history e
INNER JOIN cte_history o
ON o.id = e.old_id
)
SELECT * FROM cte_history;

How to remove Repeated field in BigQuery schema?

I have a schema that has a repeated field nested into another repeated field like so: person.children.toys. I want to make this inner field not repeated (so child can have only single nullable toy). I know that for such change I need to make a new table with new schema and run SQL query that inserts modified results into it, but I don't know how to make the query. I need it to select first toy (or null) for each child and insert resulting objects into new table. There is a guarantee that in source table all children have no more than 1 toy.

Below is for BigQuery Standard SQL
I know - it might look over-complicated - but it totally preserves original schema while eliminating all but first (or null) toys. This can be handy if your real schema has more than just few fields so you don't need to worry about them
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, STRUCT([STRUCT('mike' AS name, ['woody'] AS toys)] AS children) AS person UNION ALL
SELECT 2 id, STRUCT([STRUCT('nik', ['buzz', 'bobeep']), ('john', ['car', 'buzz', 'bobeep'])] AS children) AS person UNION ALL
SELECT 3 id, STRUCT([STRUCT('vincent', IF(TRUE,[],['']))] AS children) AS person
)
SELECT *
REPLACE(
(SELECT AS STRUCT *
REPLACE (
(SELECT ARRAY_AGG(t) FROM
(SELECT * REPLACE((SELECT toy FROM UNNEST(toys) toy WITH OFFSET ORDER BY OFFSET LIMIT 1) AS toys) FROM UNNEST(children)) t)
AS children)
FROM UNNEST([person]))
AS person)
FROM `project.dataset.table`
If to apply to below data
Row id person.children.name person.children.toys
1 1 mike toy1
2 2 nik toy2
toy3
john toy4
toy5
toy6
3 3 vincent
result will be
Row id person.children.name person.children.toys
1 1 mike toy1
2 2 nik toy2
john toy4
3 3 vincent null
Note: toys field originally REPEATED STRING becomes just STRING

I could give you a better answer if you had a better described schema, but with the data provided:
CREATE OR REPLACE TABLE `temp.flat` AS
WITH data AS (
SELECT 1 id, STRUCT([STRUCT(['woody']AS toy)] AS children) AS person
UNION ALL
SELECT 2 id, STRUCT([STRUCT(['buzz', 'bobeep'])] AS children) AS person
UNION ALL
SELECT 3 id, STRUCT([STRUCT(IF(true,[],['']))] AS children) AS person
)
SELECT id, person.children[SAFE_OFFSET(0)].toy[SAFE_OFFSET(0)] first_toy
FROM `data`
Goes from:
To:

sql complex order by multiple fields

I have a table category that has an int field that can reference the primary key in the same table.
like this:
ID category isSubCategoryOf orderingNumber
3 "red t-shirts" 2 2
1 "clothes" NULL 1
4 "cars" NULL 1
6 "Baby toys" 5 1
5 "Toys" NULL 1
2 "t-shirt" 1 1
I want the table to be order such that under each category all sub-categories are listed and under that category all of that sub-category.
ID category isSubCategoryOf orderingNumber
1 "clothes" NULL 1
2 "t-shirt" 1 1
3 "red t-shirts" 2 2
4 "cars" NULL 1
5 "Toys" NULL 1
6 "Baby toys" 5 1
Is such a thing possible to do with SQL or do I have to order this later by hand?

I I understand your needs correctly, you'll need a recursive query to deal with your hierarchical data:
WITH recCTE AS
(
--recursive seed
SELECT
category,
ID,
isSubCategoryOf as ParentID,
orderingNumber,
CAST(ID as VARCHAR(100)) as Path,
1 as depth
FROM table
WHERE isSubCategoryOf IS NULL
UNION ALL
--recursive term
SELECT
table.category,
table.id,
table.isSubCategoryOf,
table.orderingNumber,
recCTE.path + '>' + table.id,
recCTE.depth + 1
FROM recCTE
INNER JOIN table ON
recCTE.ID = table.isSubCategoryOf
)
SELECT * FROM recCTE ORDER BY path
Recursive queries are made up of three parts.
The recursive seed, which is the starting point of the recursive look ups. In your case it's any record with a NULL isSubCategoryOf. You can think of these as the Root of your hierarchy.
The recursive term, which is the part of the recursive CTE that refers back to itself. It iterates until it comes up with no records for each leg of the hierarchy
The final Select statement that selects from the recursive CTE.
Here I made a path field that stitches together each ID that is part of the hieararchy. This gives you the field you can sort on to get your hierarchical sort as asked.
It seems like, with your data, your orderingNumber is akin to the depth field that I added to the recursive CTE above. If that's the case, then you can remove that field from the CTE and save a bit of processing.

if you just want to sotrt and print you can make calculated column
select ID, category, isSubCategoryOf, orderingNumber, ID + '_' + isSubCategoryOf as SortOrder
from table
order by ID + '_' + isSubCategoryOf

How to split the string value in one column and return the result table

Assume we have the following table:
id name member
1 jacky a;b;c
2 jason e
3 kate i;j;k
4 alex null
Now I want to use the sql or t-sql to return the following table:
1 jacky a
1 jacky b
1 jacky c
2 jason e
3 kate i
......
How to do that?
I'm using the MSSQL, MYSQL and Oracle database.

This is the shortest and readable string-to-rows splitter one could devise, and could be faster too.
Use case of choosing pure CTE instead of function, e.g. when you're not allowed to create a function on database :-)
Creating rows generator via function(which could be implemented by using loop or via CTE too) shall still need to use lateral joins(DB2 and Sybase have this functionality, using LATERAL keyword; In SQL Server, this is similar to CROSS APPLY and OUTER APPLY) to ultimately join the splitted rows generated by a function to the main table.
Pure CTE approach could be faster than function approach. The speed metrics lies in profiling though, just check the execution plan of this compared to other solutions if this is indeed faster:
with Pieces(theId, pn, start, stop) AS
(
SELECT id, 1, 1, charindex(';', member)
from tbl
UNION ALL
SELECT id, pn + 1, stop + 1, charindex(';', member, stop + 1)
from tbl
join pieces on pieces.theId = tbl.id
WHERE stop > 0
)
select
t.id, t.name,
word =
substring(t.member, p.start,
case WHEN stop > 0 THEN p.stop - p.start
ELSE 512
END)
from tbl t
join pieces p on p.theId = t.id
order by t.id, p.pn
Output:
ID NAME WORD
1 jacky a
1 jacky b
1 jacky c
2 jason e
3 kate i
3 kate j
3 kate k
4 alex (null)
Base logic sourced here: T-SQL: Opposite to string concatenation - how to split string into multiple records
Live test: http://www.sqlfiddle.com/#!3/2355d/1

Well... let me first introduce you to Adam Machanic who taught me about a Numbers table. He's also written a very fast split function using this Numbers table.
http://dataeducation.com/counting-occurrences-of-a-substring-within-a-string/
After you implement a Split function that returns a table, you can then join against it and get the results you want.

IF OBJECT_ID('dbo.Users') IS NOT NULL
DROP TABLE dbo.Users;
CREATE TABLE dbo.Users
(
id INT IDENTITY NOT NULL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
member VARCHAR(1000)
)
GO
INSERT INTO dbo.Users(name, member) VALUES
('jacky', 'a;b;c'),
('jason', 'e'),
('kate', 'i;j;k'),
('alex', NULL);
GO
DECLARE #spliter CHAR(1) = ';';
WITH Base AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM Base
WHERE n < CEILING(SQRT(1000)) --generate numbers from 1 to 1000, you may change it to a larger value depending on the member column's length.
)
, Nums AS --Numbers Common Table Expression, if your database version doesn't support it, just create a physical table.
(
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n
FROM Base AS B1 CROSS JOIN Base AS B2
)
SELECT id,
SUBSTRING(member, n, CHARINDEX(#spliter, member + #spliter, n) - n) AS element
FROM dbo.Users
JOIN Nums
ON n <= DATALENGTH(member) + 1
AND SUBSTRING(#spliter + member, n, 1) = #spliter
ORDER BY id
OPTION (MAXRECURSION 0); --Nums CTE is generated recursively, we don't want to limit recursion count.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Creating a Materialized Path for a tree with SQL (Teradata) - sql

Related

Creating a category tree table from an array of categories in PostgreSQL

Select all related records

How to remove Repeated field in BigQuery schema?

sql complex order by multiple fields

How to split the string value in one column and return the result table

Categories

Resources