Summarizing leaf node values in a tree using SQL?

Summarizing leaf node values in a tree using SQL? - sql

Given a column containing a set of strings that represent leaf nodes in a tree, along with some statistics:
leafnodes count
--------- -----
/a/b 1
/a/c 3
/d/e/f 2
/d/e/c 5
How can I generate the set of non-leaf nodes with summarized statistics? It would be nice to summarize both the immediate children and also recursively summarize all descendents.
non-leafnodes immediate-counts recursive-counts
--- ---------------- ----------------
/a 4 4
/d 0 7
/d/e 7 7
Generic SQL preferred, but Oracle-specific solutions are fine.

There is no generic SQL solution, except adding precalculated fields into table, for oracle you do use Hierarchical queries but then better to change structure anyway, as you will have to struggle with substrings

Related

efficient breadth first search using sql joins

I'm dealing with a binary tree.
So I have a database table in my database where each node is a parent to up to 2 other nodes. I have a plan to efficiently find the top most node (under a given node) that is a parent to less than 2 other nodes. I'm looking for the top most open position to place a new node in other words. So I have this implemented as a breadth-first search. But the way I'm calling the database for each and every node is inefficient. I'm basically going down the tree, producing a running list of nodes on each level and checking each one if it is a parent to two other nodes.
Here's a diagram:
And here's the code if you'd like to see it:
# breadth-first search
def build_and_return_parent_id(breadth_list) do
[ {node_id} | tail ] = breadth_list
child_list = fetch_children_id(node_id)
bc_list = tail ++ child_list
case length(child_list) do
x when x > 2 ->
# recursion
build_and_return_parent_id(bc_list)
2 ->
# recursion
build_and_return_parent_id(bc_list)
_ -> node_id
end
end
def fetch_children_id(id) do
Repo.all( from n in Node,
where: n.parent_id == ^id,
order_by: [asc: n.inserted_at],
select: {n.id})
end
end
So instead of doing that so inefficiently - one db call per node - I was thinking, how about I produce a list of all the nodes that have less than two parents, then travel down the tree, for each level use one db call to get a list of all the nodes on that level, then simply compare the two lists. if there are matching IDs in both the lists I've found a node that has an available spot under it.
Here's a diagram:
The problem is I know almost nothing about sql queries. my guess is that this can be done with some kind of self join on the table.
node_id | parent_id
----------------------
1 | nil
2 | 1
3 | 1
4 | 2
5 | 2
6 | 3
7 | 4
8 | 5
9 | 6
10 | 3
So anyway I'm sure if this method works someone has done it before but I can't seem to find any information on the kinds of sql queries that would be used to generate the open list or the level list.
Now I suppose the 2nd query is pretty simple. since we have an open list we can just use a where-in-[list] clause. Byt the first one I think is the one I'm struggling with.
If you have anything you can point me to or help you can offer I'd really appreciate it.

You can add columns depth and child_count and create an index:
create index nodes_depth_1child_idx on nodes(depth) where child_count=1;
Then searching should be basically instant with:
select node_id from nodes where child_count=1 order by depth limit 1;
You should also create triggers that would maintain these values. This would slow down insert operations slightly, as the insert would have to read the parent node depth and update the parent node child_count.

SQL Server 2014 equivalent to mysql's find_in_set()

I'm working with a database that has a locations table such as:
locationID | locationHierarchy
1 | 0
2 | 1
3 | 1,2
4 | 1
5 | 1,4
6 | 1,4,5
which makes a tree like this
1
--2
----3
--4
----5
------6
where locationHierarchy is a csv string of the locationIDs of all its ancesters (think of a hierarchy tree). This makes it easy to determine the hierarchy when working toward the top of the tree given a starting locationID.
Now I need to write code to start with an ancestor and recursively find all descendants. MySQL has a function called 'find_in_set' which easily parses a csv string to look for a value. It's nice because I can just say "find in set the value 4" which would give all locations that are descendants of locationID of 4 (including 4 itself).
Unfortunately this is being developed on SQL Server 2014 and it has no such function. The CSV string is a variable length (virtually unlimited levels allowed) and I need a way to find all ancestors of a location.
A lot of what I've found on the internet to mimic the find_in_set function into SQL Server assumes a fixed depth of hierarchy such as 4 levels maximum) which wouldn't work for me.
Does anyone have a stored procedure or anything that I could integrate into a query? I'd really rather not have to pull all records from this table to use code to individually parse the CSV string.
I would imagine searching the locationHierarchy string for locationID% or %,{locationid},% would work but be pretty slow.

I think you want like -- in either database. Something like this:
select l.*
from locations l
where l.locationHierarchy like #LocationHierarchy + ',%';
If you want the original location included, then one method is:
select l.*
from locations l
where l.locationHierarchy + ',' like #LocationHierarchy + ',%';
I should also note that SQL Server has proper support for recursive queries, so it has other options for hierarchies apart from hierarchy trees (which are still a very reasonable solution).

Finally It worked for me..
SELECT * FROM locations WHERE locationHierarchy like CONCAT(#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param)

Locating all reachable nodes using SQL

Suppose a table with two columns: From and To. Example:
From To
1 2
2 3
2 4
4 5
I would like to know the most effective way to locate all nodes that are reachable from a node using a SQL Query. Example: given 1 it would return 2,3,4 and 5. It is possible to use several queries united by UNION clauses but it would limit the number of levels that can be reached. Perhaps a different data structure would make the problem more tractable but this is what is available.
I am using Firebird but I would like have a solution that only uses standard SQL.

You can use a recursive common table expression if you use most brands of database -- except for MySQL and SQLite and a few other obscure ones (sorry, I do consider Firebird obscure). This syntax is ANSI SQL standard, but Firebird doesn't support it yet.
Correction: Firebird 2.1 does support recursive CTE's, as #Hugues Van Landeghem comments.
Otherwise see my presentation Models for Hierarchical Data with SQL for several different approaches.
For example, you could store additional rows for every path in your tree, not just the immediate parent/child paths. I call this design Closure Table.
From To Length
1 1 0
1 2 1
1 3 2
1 4 2
1 5 3
2 2 0
2 3 1
2 4 1
3 3 0
4 4 0
4 5 1
5 5 0
Now you can query SELECT * FROM MyTable WHERE From = 1 and get all the descendants of that node.
PS: I'd avoid naming a column From, because that's an SQL reserved word.

Unfortunately there isn't a good generic solution to this that will work for all situations on all databases.
I recommend that you look at these resources for a MySQL solution:
Managing Hierarchical Data in MySQL
Models for hierarchical data - presentation by Bill Karwin which discusses this subject, demonstrates different solutions, and compares the adjacency list model you are using with other alternative models.
For PostgreSQL and SQL Server you should take a look at recursive CTEs.
If you are using Oracle you should look at CONNECT BY which is a proprietary extension to SQL that makes dealing with tree structures much easier.

With standard SQL the only way to store a tree with acceptable read performance is by using a hack such as path enumeration. Note that this is very heavy on writes.
ID PATH
1 1
2 1;2
3 1;2;3
4 1;2;4
SELECT * FROM tree WHERE path LIKE '%2;%'

Storing parameterized definitions of sets of elements and single pass queries to fetch them in SQL

Suppose a database table containing properties of some elements:
Table Element (let's say 1 000 000 rows):
ElementId Property_1 Property_2 Property_3
------- ---------- ---------- ----------
1 abc 1 1
2 bcd 1 2
3 def 2 4
...
The table is being frequently updated. I'd like to store definitions of sets of these elements so that using a single SQL statement I would get eg.
SetId Element
--- -------
A 2
B 1
B 3
C 2
C 3
...
I'd also like to change the definitions when needed. So far I have stored the definitions of the sets as unions of intersections like this:
Table Subset (~1 000 rows):
SubsetId Property Value Operator
-------- -------- ----- --------
1 1 bcd =
1 3 1 >
2 2 3 <=
...
and
Table Set (~300 rows):
SetId SubsetId
--- ------
...
E 3
E 4
F 7
F 9
...
In SQL I suppose I could generate lots of case expressions from the tables, but so far I've just loaded the tables and used an external tool to do essentially the same thing.
When I came up with this I was pleased (and also implemented it). Lately I've been wondering whether it is as wonderful as I thought. Is there a better way to store the definitions of the sets?

I would think using duck-typing may be intuitive here, as an alternative.
For example all modern-languages (C#, Java, Python) have the concept of sets. If you are going to "intersect" or "union" (set operators) via SQL, then you have to store them in a relational way. Else, why not store them in a language native way ?. (as opposed to relational). By native way, I would mean that if it was done in Python and we used a Python set, then that is what I would persist. Same with Java or C#.
So if a set-id 10 had the members 1,4,5,6 it would be persisted in the DB as follows:
SetId Set
______________________________________
10 1,4,5,6
11 2,3
12 null
Sure, this has the disadvantage that it could be proprietary, or maybe even non-performant - which you can perhaps tell as you have the complete problem definition. If you need SQL to analyze it, maybe my suggestion has further downsides.
In a sense, the set representation feature of each of these languages are like a DSL (Domain specific Language) - if you will need to 'talk' a lot of set-stuff between your application classes / objects, then why not use the natural fit?

custom sorting or ordering a table without resorting the whole shebang

For ten years we've been using the same custom sorting on our tables, I'm wondering if there is another solution which involves fewer updates, especially since today we'd like to have a replication/publication date and wouldn't like to have our replication replicate unnecessary entries.I had a look into nested sets, but it doesn't seem to do the job for us.
Base table:
id | a_sort
---+-------
1 10
2 20
3 30
After inserting:
insert into table (a_sort) values(15)
An entry at the second position.
id | a_sort
---+-------
1 10
2 20
3 30
4 15
Ordering the table with:
select * from table order by a_sort
and resorting all the a_sort entries, updating at least id=(2,3,4)
will of course produce the desired output:
id | a_sort
---+-------
1 10
4 20
2 30
3 40
The column names, the column count, datatypes, a possible join, possible triggers or the way the resorting is done is/are irrelevant to the problem.Also we've found some pretty neat ways to do this task fast.
only; how the heck can we reduce the updates in the db to 1 or 2 max.
Seems like an awfully common problem.
The captain obvious in me thougth once "use an a_sort float(53), insert using a fixed value of ordervaluefirstentry+abs(ordervaluefirstentry-ordervaluenextentry)/2".
But this would only allow around 1040 "in between" entries - so never resorting seems a bit problematic ;)

You really didn't describe what you're doing with this data, so forgive me if this is a crazy idea for your situation:
You could make a sort of 'linked list' where instead of a column of values, you have a column for the 'next highest valued' id. This would decrease the number of updates to a maximum of 2.
You can make it doubly linked and also have a column for next lowest, which would bring the maximum number of updates to 3.
See:
http://en.wikipedia.org/wiki/Linked_list

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Summarizing leaf node values in a tree using SQL? - sql

There is no generic SQL solution, except adding precalculated fields into table, for oracle you do use Hierarchical queries but then better to change structure anyway, as you will have to struggle with substrings

Related

efficient breadth first search using sql joins

SQL Server 2014 equivalent to mysql's find_in_set()

Locating all reachable nodes using SQL

Storing parameterized definitions of sets of elements and single pass queries to fetch them in SQL

custom sorting or ordering a table without resorting the whole shebang

Categories

Resources