Need denormalized query from heirarchial table - sql

I have a table(lets name it HEIRARCHY_TABLE ) in this format representing hierarchical data where a tree is stored in this format.
HEIRARCHY_TABLE (parent_name ,parent_id ,child_name ,child_id)
Sample Data in HEIRARCHY_TABLE:-
-------------------------------------------
parent_name |parent_id |child_name |child_id
--------------------------------------------
parent_1 | parent_1_id | leaf_node | leaf_node_id1
parent_2 | parent_2_id | parent_1 | parent_1_id
the above scenario is showing data for a case where parent_2->parent_1->leaf_node
where -> representing "is parent of " relation-ship.
I need to query this table and get a result like this for all leaf nodes.
Result:-
leaf_node | parent_1 | parent_2 | parent_3 |parent_4 | parent_5 |parent_6 | parent_7
and if for suppose a leaf node has only two parents then i require the rest of the parent values to be null.
i.e..,if it has only 5 parents.then parent_6 and parent_7 should be null.
Note :- the above table contains multiple trees.hence it contain multiple roots.I need data for all the trees available in this table in this format.the maximum level in all the trees is 7 only.

Google around for Common Table Expressions. What you are looking for, I think, is a recursive query. You are going to have to play some games to force the query to return nulls for parents that don't actually exist. [Check this.] (http://www.morganslibrary.org/reference/with.html)

The fastest query would be to use heirarchy IDs. then you can query against all or part of a tree using CONTAINS on the heirarchy id column.
Of course this assumes you are allowed to/can build this column.
For oracle, you can do Heirarchy Query. Not sure about its efficiency/demormalization factor. http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries003.htm

Related

Querying array of text in postgres

I have an array type I want to store in Postgres. One of the major use cases I have is to see if any of the records has an array which has a string in it.
eg.
| A | ["NY", "Paris", "Milan"] |
| B | ["Paris", "NY"] |
| C | [] |
| D | ["Milan"] |
Does there exist a row with Paris in the array? Which rows have Milan in the array? and so on.
I have 2 options on how to store the column. I can either make it a type text[] or convert it into a json as {"cities": ["NY", "Paris", "Milan"]} and then store as a JSONB field
However, I am not sure what would allow the fastest querying for the use case I have. Is there any one obviously better way of doing this? Am I tying myself down in any way by choosing one over the other? If I choose one over the other then how can I query the DB?
As you seem to be storing simple lists of values, I would recommend to use datataype Array over JSON, which better fits more complex cases (nested datastructures, associative arrays, ...).
To check for the value of an element at any position in the array, you can use array function ANY().
Here is a query that will return all records where the array stored in column cities contains 'Paris' :
SELECT t.* FROM mytable t WHERE 'Paris' = ANY(t.cities);
Yields :
id cities
---------------------------
A ["NY","Paris","Milan"]
B ["Paris","NY"]
Demo on DB Fiddle
For more information :
Postgres Arrays Documentation
Postgres Arrays Tutorial
I've noticed it is better to query JSONB, if it is a simple key-value store.
As in for instance you want to store arbitrary info on a row that your not sure what the columns(keys) would be.
info = {"a":"apple", "b":"ball"}
For use cases like yours, it would be better if you could design the db with simple tables so you could use JOINS and Indexes to your advantage.
You could restructure the tables like :
Location
id | name
----------
1 | Paris
2 | NY
3 | Milan
Other Table (with foreign key on location table)
user | location_id
--------------------
A | 1
A | 3
B | 2
Using these set of tables it would be easy to query all users with location paris using JOINS.

Creating a list tree with SQLite

I'm trying to make a hierarchical list with PHP and an SQLite table setup like this:
| itemid | parentid | name |
-----------------------------------------
| 1 | null | Item1 |
| 2 | null | Item2 |
| 3 | 1 | Item3 |
| 4 | 1 | Item4 |
| 5 | 2 | Item5 |
| 6 | 5 | Item6 |
The lists would be built with unordered lists and allow for this type of tree structure:
Item1
|_Item3
|_Item4
Item2
|_Item5
|_Item6
I've seen this done with directories and flat arrays, but I can't seem to make it work right with this structure and without a depth limit.
You're using a textbook design for storing hierarchical data in an SQL database. This design is called Adjacency List, i.e. each node in the hierarchy has a parentid foreign key to its immediate parent.
With this design, you can't generate a tree like you describe and support arbitrary depth for the tree. You've already figured this out.
Most other SQL databases (PostgreSQL, Microsoft, Oracle, IBM DB2) support recursive queries, which solve this problem.
Update:
MySQL supports recursive CTE queries from version 8.0.1 (2017-04-10). See https://dev.mysql.com/doc/refman/8.0/en/with.html
SQLite supports recursive CTE queries from version 3.34.0 (2020-12-01). See https://www.sqlite.org/lang_with.html
If you use an older version, you need another solution to store the hierarchy. There are several solutions for this. See my presentation Models for Hierarchical Data with PHP and MySQL for descriptions and examples.
I usually prefer a design I call Closure Table, but each design has strength and weaknesses. Which one is best for your project depends on what kinds of queries you need to do efficiently with your data. So you should go study the solutions and choose one for yourself.
I know this was asked log time ago, but with current SQLite version it is trivial to do and no need level depth as #Bill-Karwin says. So the correct answer should be reconsidered :)
My table has columns MCTMPLID and REF_TMPLID and my structure starting node is called ROOT
CREATE TABLE MyStruct (
`TMPLID` text,
`REF_TMPLID` text
);
INSERT INTO MyStruct
(`TMPLID`, `REF_TMPLID`)
VALUES
('Root', NULL),
('Item1', 'Root'),
('Item2', 'Root'),
('Item3', 'Item1'),
('Item4', 'Item1'),
('Item5', 'Item2'),
('Item6', 'Item5');
And here is the main query, that builds tree structure
WITH RECURSIVE
under_root(name,level) AS (
VALUES('Root',0)
UNION ALL
SELECT tmpl.TMPLID, under_root.level+1
FROM MyStruct as tmpl JOIN under_root ON tmpl.REF_TMPLID=under_root.name
ORDER BY 2 DESC
)
SELECT substr('....................',1,level*3) || name as TreeStructure FROM under_root
And here is result
Root
...Item1
......Item3
......Item4
...Item2
......Item5
.........Item6
I'm sure this can be modified to work tik OP's table structure, so let this sample be starting point
Documentation and some samples https://www.sqlite.org/lang_with.html#rcex1

Recursion & MYSQL?

I got a really simple table structure like this:
Table Page Hits
id | title | parent | hits
---------------------------
1 | Root | | 23
2 | Child | 1 | 20
3 | ChildX | 1 | 30
4 | GChild | 2 | 40
As I don't want to have the recursion in my code I would like to do a recurisive SQL.
Is there any SELECT statement to get the sum of Root (23+20+30+40) or Child ( 20 + 40 ) ?
You are organizing your hierarchical data using the adjacency list model. The fact that such recursive operations are difficult is in fact one major drawback of this model.
Some DBMSes, such as SQL Server 2005, Postgres 8.4 and Oracle 11g, support recursive queries using common table expressions with the WITH keyword.
As for MySQL, you may be interested in checking out the following article which describes an alternative model (the nested set model), which makes recursive operations easier (possible):
Mike Hillyer: Managing Hierarchical Data in MySQL
Not in 1 select statment, no.
If you knew the maximum depth of the relationshop (ie parent->child->child or parent->child->child->child) you could write a select statement which would give you a bunch of numbers that you would then have to sum up seperately (1 number per level of depth).
You could, however, do it with a mysql stored procedure which is recursive.

Creating new table from data of other tables

I'm very new to SQL and I hope someone can help me with some SQL syntax. I have a database with these tables and fields,
DATA: data_id, person_id, attribute_id, date, value
PERSONS: person_id, parent_id, name
ATTRIBUTES: attribute_id, attribute_type
attribute_type can be "Height" or "Weight"
Question 1
Give a person's "Name", I would like to return a table of "Weight" measurements for each children. Ie: if John has 3 children names Alice, Bob and Carol, then I want a table like this
| date | Alice | Bob | Carol |
I know how to get a long list of children's weights like this:
select d.date,
d.value
from data d,
persons child,
persons parent,
attributes a
where parent.name='John'
and child.parent_id = parent.person_id
and d.attribute_id = a.attribute_id
and a.attribute_type = "Weight';
but I don't know how to create a new table that looks like:
| date | Child 1 name | Child 2 name | ... | Child N name |
Question 2
Also, I would like to select the attributes to be between a certain range.
Question 3
What happens if the dates are not consistent across the children? For example, suppose Alice is 3 years older than Bob, then there's no data for Bob during the first 3 years of Alice's life. How does the database handle this if we request all the data?
1) It might not be so easy. MS SQL Server can PIVOT a table on an axis, but dumping the resultset to an array and sorting there (assuming this is tied to some sort of program) might be the simpler way right now if you're new to SQL.
If you can manage to do it in SQL it still won't be enough info to create a new table, just return the data you'd use to fill it in, so some sort of external manipulation will probably be required. But you can probably just use INSERT INTO [new table] SELECT [...] to fill that new table from your select query, at least.
2) You can join on attributes for each unique attribute:
SELECT [...] FROM data AS d
JOIN persons AS p ON d.person_id = p.person_id
JOIN attributes AS weight ON p.attribute_id = weight.attribute_id
HAVING weight.attribute_type = 'Weight'
JOIN attributes AS height ON p.attribute_id = height.attribute_id
HAVING height.attribute_type = 'Height'
[...]
(The way you're joining in the original query is just shorthand for [INNER] JOIN .. ON, same thing except you'll need the HAVING clause in there)
3) It depends on the type of JOIN you use to match parent/child relationships, and any dates you're filtering on in the WHERE, if I'm reading that right (entirely possible I'm not). I'm not sure quite what you're looking for, or what kind of database you're using, so no good answer. If you're new enough to SQL that you don't know the different kinds of JOINs and what they can do, it's very worthwhile to learn them - they put the R in RDBMS.
when you do a select, you need to specify the exact columns you want. In other words you can't return the Nth child's name. Ie this isn't possible:
1/2/2010 | Child_1_name | Child_2_name | Child_3_name
1/3/2010 | Child_1_name
1/4/2010 | Child_1_name | Child_2_name
Each record needs to have the same amount of columns. So you might be able to make a select that does this:
1/2/2010 | Child_1_name
1/2/2010 | Child_2_name
1/2/2010 | Child_3_name
1/3/2010 | Child_1_name
1/4/2010 | Child_1_name
1/4/2010 | Child_2_name
And then in a report remap it to how you want it displayed

cloning hierarchical data

let's assume i have a self referencing hierarchical table build the classical way like this one:
CREATE TABLE test
(name text,id serial primary key,parent_id integer
references test);
insert into test (name,id,parent_id) values
('root1',1,NULL),('root2',2,NULL),('root1sub1',3,1),('root1sub2',4,1),('root
2sub1',5,2),('root2sub2',6,2);
testdb=# select * from test;
name | id | parent_id
-----------+----+-----------
root1 | 1 |
root2 | 2 |
root1sub1 | 3 | 1
root1sub2 | 4 | 1
root2sub1 | 5 | 2
root2sub2 | 6 | 2
What i need now is a function (preferrably in plain sql) that would take the id of a test record and
clone all attached records (including the given one). The cloned records need to have new ids of course. The desired result
would like this for example:
Select * from cloningfunction(2);
name | id | parent_id
-----------+----+-----------
root2 | 7 |
root2sub1 | 8 | 7
root2sub2 | 9 | 7
Any pointers? Im using PostgreSQL 8.3.
Pulling this result in recursively is tricky (although possible). However, it's typically not very efficient and there is a much better way to solve this problem.
Basically, you augment the table with an extra column which traces the tree to the top - I'll call it the "Upchain". It's just a long string that looks something like this:
name | id | parent_id | upchain
root1 | 1 | NULL | 1:
root2 | 2 | NULL | 2:
root1sub1 | 3 | 1 | 1:3:
root1sub2 | 4 | 1 | 1:4:
root2sub1 | 5 | 2 | 2:5:
root2sub2 | 6 | 2 | 2:6:
root1sub1sub1 | 7 | 3 | 1:3:7:
It's very easy to keep this field updated by using a trigger on the table. (Apologies for terminology but I have always done this with SQL Server). Every time you add or delete a record, or update the parent_id field, you just need to update the upchain field on that part of the tree. That's a trivial job because you just take the upchain of the parent record and append the id of the current record. All child records are easily identified using LIKE to check for records with the starting string in their upchain.
What you're doing effectively is trading a bit of extra write activity for a big saving when you come to read the data.
When you want to select a complete branch in the tree it's trivial. Suppose you want the branch under node 1. Node 1 has an upchain '1:' so you know that any node in the branch of the tree under that node must have an upchain starting '1:...'. So you just do this:
SELECT *
FROM table
WHERE upchain LIKE '1:%'
This is extremely fast (index the upchain field of course). As a bonus it also makes a lot of activities extremely simple, such as finding partial trees, level within the tree, etc.
I've used this in applications that track large employee reporting hierarchies but you can use it for pretty much any tree structure (parts breakdown, etc.)
Notes (for anyone who's interested):
I haven't given a step-by-step of the SQL code but once you get the principle, it's pretty simple to implement. I'm not a great programmer so I'm speaking from experience.
If you already have data in the table you need to do a one time update to get the upchains synchronised initially. Again, this isn't difficult as the code is very similar to the UPDATE code in the triggers.
This technique is also a good way to identify circular references which can otherwise be tricky to spot.
The Joe Celko's method which is similar to the njreed's answer but is more generic can be found here:
Nested-Set Model of Trees (at the middle of the article)
Nested-Set Model of Trees, part 2
Trees in SQL -- Part III
#Maximilian: You are right, we forgot your actual requirement. How about a recursive stored procedure? I am not sure if this is possible in PostgreSQL, but here is a working SQL Server version:
CREATE PROCEDURE CloneNode
#to_clone_id int, #parent_id int
AS
SET NOCOUNT ON
DECLARE #new_node_id int, #child_id int
INSERT INTO test (name, parent_id)
SELECT name, #parent_id FROM test WHERE id = #to_clone_id
SET #new_node_id = ##IDENTITY
DECLARE #children_cursor CURSOR
SET #children_cursor = CURSOR FOR
SELECT id FROM test WHERE parent_id = #to_clone_id
OPEN #children_cursor
FETCH NEXT FROM #children_cursor INTO #child_id
WHILE ##FETCH_STATUS = 0
BEGIN
EXECUTE CloneNode #child_id, #new_node_id
FETCH NEXT FROM #children_cursor INTO #child_id
END
CLOSE #children_cursor
DEALLOCATE #children_cursor
Your example is accomplished by EXECUTE CloneNode 2, null (the second parameter is the new parent node).
This sounds like an exercise from "SQL For Smarties" by Joe Celko...
I don't have my copy handy, but I think it's a book that'll help you quite a bit if this is the kind of problems you need to solve.