Locating all reachable nodes using SQL - sql

Suppose a table with two columns: From and To. Example:
From To
1 2
2 3
2 4
4 5
I would like to know the most effective way to locate all nodes that are reachable from a node using a SQL Query. Example: given 1 it would return 2,3,4 and 5. It is possible to use several queries united by UNION clauses but it would limit the number of levels that can be reached. Perhaps a different data structure would make the problem more tractable but this is what is available.
I am using Firebird but I would like have a solution that only uses standard SQL.

You can use a recursive common table expression if you use most brands of database -- except for MySQL and SQLite and a few other obscure ones (sorry, I do consider Firebird obscure). This syntax is ANSI SQL standard, but Firebird doesn't support it yet.
Correction: Firebird 2.1 does support recursive CTE's, as #Hugues Van Landeghem comments.
Otherwise see my presentation Models for Hierarchical Data with SQL for several different approaches.
For example, you could store additional rows for every path in your tree, not just the immediate parent/child paths. I call this design Closure Table.
From To Length
1 1 0
1 2 1
1 3 2
1 4 2
1 5 3
2 2 0
2 3 1
2 4 1
3 3 0
4 4 0
4 5 1
5 5 0
Now you can query SELECT * FROM MyTable WHERE From = 1 and get all the descendants of that node.
PS: I'd avoid naming a column From, because that's an SQL reserved word.

Unfortunately there isn't a good generic solution to this that will work for all situations on all databases.
I recommend that you look at these resources for a MySQL solution:
Managing Hierarchical Data in MySQL
Models for hierarchical data - presentation by Bill Karwin which discusses this subject, demonstrates different solutions, and compares the adjacency list model you are using with other alternative models.
For PostgreSQL and SQL Server you should take a look at recursive CTEs.
If you are using Oracle you should look at CONNECT BY which is a proprietary extension to SQL that makes dealing with tree structures much easier.

With standard SQL the only way to store a tree with acceptable read performance is by using a hack such as path enumeration. Note that this is very heavy on writes.
ID PATH
1 1
2 1;2
3 1;2;3
4 1;2;4
SELECT * FROM tree WHERE path LIKE '%2;%'

Related

SQL Server 2014 equivalent to mysql's find_in_set()

I'm working with a database that has a locations table such as:
locationID | locationHierarchy
1 | 0
2 | 1
3 | 1,2
4 | 1
5 | 1,4
6 | 1,4,5
which makes a tree like this
1
--2
----3
--4
----5
------6
where locationHierarchy is a csv string of the locationIDs of all its ancesters (think of a hierarchy tree). This makes it easy to determine the hierarchy when working toward the top of the tree given a starting locationID.
Now I need to write code to start with an ancestor and recursively find all descendants. MySQL has a function called 'find_in_set' which easily parses a csv string to look for a value. It's nice because I can just say "find in set the value 4" which would give all locations that are descendants of locationID of 4 (including 4 itself).
Unfortunately this is being developed on SQL Server 2014 and it has no such function. The CSV string is a variable length (virtually unlimited levels allowed) and I need a way to find all ancestors of a location.
A lot of what I've found on the internet to mimic the find_in_set function into SQL Server assumes a fixed depth of hierarchy such as 4 levels maximum) which wouldn't work for me.
Does anyone have a stored procedure or anything that I could integrate into a query? I'd really rather not have to pull all records from this table to use code to individually parse the CSV string.
I would imagine searching the locationHierarchy string for locationID% or %,{locationid},% would work but be pretty slow.
I think you want like -- in either database. Something like this:
select l.*
from locations l
where l.locationHierarchy like #LocationHierarchy + ',%';
If you want the original location included, then one method is:
select l.*
from locations l
where l.locationHierarchy + ',' like #LocationHierarchy + ',%';
I should also note that SQL Server has proper support for recursive queries, so it has other options for hierarchies apart from hierarchy trees (which are still a very reasonable solution).
Finally It worked for me..
SELECT * FROM locations WHERE locationHierarchy like CONCAT(#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param)

Summarizing leaf node values in a tree using SQL?

Given a column containing a set of strings that represent leaf nodes in a tree, along with some statistics:
leafnodes count
--------- -----
/a/b 1
/a/c 3
/d/e/f 2
/d/e/c 5
How can I generate the set of non-leaf nodes with summarized statistics? It would be nice to summarize both the immediate children and also recursively summarize all descendents.
non-leafnodes immediate-counts recursive-counts
--- ---------------- ----------------
/a 4 4
/d 0 7
/d/e 7 7
Generic SQL preferred, but Oracle-specific solutions are fine.
There is no generic SQL solution, except adding precalculated fields into table, for oracle you do use Hierarchical queries but then better to change structure anyway, as you will have to struggle with substrings

SQL to retrieve tree structure nicely

Given the simple data structure:
ID | Category_Name | Parent_ID
Example:
1 Cars 0
2 Boxes 0
3 Lamborghinis 1
4 VW Camper Vans 1
5 Big Boxes 2
6 Small Boxes 2
7 Cereal Boxes 2
8 Broken Lambos 3
9 Yellow Ones 3
10 Rusty 8
11 Milkshake Stained 8
12 Chocolate Flavour 11
13 Strawberry 11
14 Indiscernible Solution 11
Representing a simple tree navigation structure, what would programatically be the best way to retrieve the tree in a presentable format? Can we create an SQL statement to retrieve them 'in order'?
Thanks for any help! If my approach is wrong, feel free to comment also.
I'm using SQL-Server 2000.
If you're using SQL Server 2008 you might want to try out the new hierarchyid data type.
If you're not then another way is to look into the nested sets model which works on all databases.
If you're using SQL Server 2005 and up you can use recursive CTEs to retreive the tree structure.
I usually build the tree structure in my application code. Partially because I'm more confident with c# than SQL, but it also because I usually need to process the data into suitable c# structures anyway.
SQL is quite bad at recursive structures like lists and trees. If I had to put the tree building in my database I'd go for a stored procedure. But there might be a smart way I don't know about.
If you use Oracle you might be able to hack something up with Connect By.
Not for SQL2000, but if you manage to upgrade to 2k5, you can do
WITH t AS(SELECT id, parent_id, category_name FROM mytable WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id, c.category_name FROM t p JOIN mytable c ON c.parent_id = p.id)
SELECT * FROM t

MySQL: Getting connected (similar) data with lef/right fields

In MySQL Im having two tables:
PRODUCTS (id, Name)
SEEALSO (id, prodLeft, prodRight)
SEEALSO defines which PRODUCTS are related together and are represented as binded fileds "prodLeft"-"prodRight".
For Example:
PRODUCTS:
1 Desk
2 Table
3 Chair
4 Doors
5 Tree
6 Flower
SEEALSO
1 1 2
2 2 3
3 3 4
4 5 6
From that we can see binding of Desk-Table-Chair-Doors and Tree-Flower.
I would now want to write SQL statement where I could specifie PRODUCT name (e.g. Chair) and i would get result of binded fields that are connected with it (e.g. Chair: Desk-Table-Chair-Doors).
From this point on i would like to know if this is even possible for my data presentation concept in SEEALSO and if it is if you could help me solve my problem.
As you're wondering whether it's even possible, you could look into this information on Nested Sets, which is the MySQL way of doing this (I gather).
I could not give you a worked sample, as I'm no MySQL expert: perhaps this will help you enough given the general nature of your question.

Storing parameterized definitions of sets of elements and single pass queries to fetch them in SQL

Suppose a database table containing properties of some elements:
Table Element (let's say 1 000 000 rows):
ElementId Property_1 Property_2 Property_3
------- ---------- ---------- ----------
1 abc 1 1
2 bcd 1 2
3 def 2 4
...
The table is being frequently updated. I'd like to store definitions of sets of these elements so that using a single SQL statement I would get eg.
SetId Element
--- -------
A 2
B 1
B 3
C 2
C 3
...
I'd also like to change the definitions when needed. So far I have stored the definitions of the sets as unions of intersections like this:
Table Subset (~1 000 rows):
SubsetId Property Value Operator
-------- -------- ----- --------
1 1 bcd =
1 3 1 >
2 2 3 <=
...
and
Table Set (~300 rows):
SetId SubsetId
--- ------
...
E 3
E 4
F 7
F 9
...
In SQL I suppose I could generate lots of case expressions from the tables, but so far I've just loaded the tables and used an external tool to do essentially the same thing.
When I came up with this I was pleased (and also implemented it). Lately I've been wondering whether it is as wonderful as I thought. Is there a better way to store the definitions of the sets?
I would think using duck-typing may be intuitive here, as an alternative.
For example all modern-languages (C#, Java, Python) have the concept of sets. If you are going to "intersect" or "union" (set operators) via SQL, then you have to store them in a relational way. Else, why not store them in a language native way ?. (as opposed to relational). By native way, I would mean that if it was done in Python and we used a Python set, then that is what I would persist. Same with Java or C#.
So if a set-id 10 had the members 1,4,5,6 it would be persisted in the DB as follows:
SetId Set
______________________________________
10 1,4,5,6
11 2,3
12 null
Sure, this has the disadvantage that it could be proprietary, or maybe even non-performant - which you can perhaps tell as you have the complete problem definition. If you need SQL to analyze it, maybe my suggestion has further downsides.
In a sense, the set representation feature of each of these languages are like a DSL (Domain specific Language) - if you will need to 'talk' a lot of set-stuff between your application classes / objects, then why not use the natural fit?