Select root node from subtree (PostgreSQL ltree) query, which returns several descendants - sql

Is there a simple way to select the root node of a subtree (PostgreSQL ltree) from a query which returns (potentially) several descendant nodes of that same subtree? I've implemented a rather verbose algorithm for achieving the task (~40 lines, indented and formatted), but it would be awesome if I could leverage the fact that ltree data are in fact trees and have an easily accessible root node. It is important to note that several, distinct subtree roots may be returned from a single query, so I cannot merely sort the data and grab the top result.
June 07, 2012: I have updated the query to my most recent version, which cuts the time complexity in half. It uses a self-anti-join (if you will) to remove all nodes from the subtree which have ancestors in the subtree.
Essentially, my algorithm works as follows:
WITH roots AS
(
/* Place any query here, which returns a field "ancestry" of type ltree */
)
SELECT roots.*
FROM roots
WHERE NOT EXISTS
(
SELECT 1
FROM roots AS ancestors
WHERE ancestors.ancestry #> roots.ancestry
AND ancestors.id <> roots.id
);
(for more details, please see my gist, here: https://gist.github.com/1507368)

Can't you just use the subpath() function?
SELECT
SUBPATH(ancestry, 0, 1)
FROM
some_table;

Related

OrientDB Time Search Query

In OrientDB I have setup a time series using this use case. However, instead of appending my Vertex as an embedded list to the respective hour I have opted to just create an edge from the hour to the time dependent Vertex.
For arguments sake lets say that each hour has up to 60 time Vertex each identified by a timestamp. This means I can perform the following query to obtain a specific desired Vertex:
SELECT FROM ( SELECT expand( month[5].day[12].hour[0].out() ) FROM Year WHERE year = 2015) WHERE timestamp = 1434146922
The first part of this question is whether there are any advantages/disadvantages to storing the timestamp as a property of the Vertex (as above query) versus storing it as the edge (or an edge property). In which case I think the query would be:
SELECT expand( month[5].day[12].hour[0].out('1434146922') ) FROM Year WHERE year = 2015
Although this looks more elegant I have 2 concerns; the overhead of creating new edge types. The flexibility if you don't actually know the exact timestamp.
The second part of the question relates to the fact that once I have isolated the individual Vertex based on time, this is only the head of a hierarchal tree of Vertex.
I could of course get the #rid from the above query and then construct a new query but I was wondering how I could adapt the above to do it all in one. For example assume there is a boolean property called active in all the hierarchal Vertexes. How can I get all the Vertex that are true?
Answer for the first part of your question :
Like stated in the use case documentation :
If you need more granularity than the Hour you can go ahead until the Time unit you need:
Hour -> minute (map) -> Minute -> second (map) -> Second
You get more flexibility by adding more precision to the tree instead of storing a timestamp of the time in the hour.
Adding more precision to the tree has the only advantage of being able to group by smaller time unit in a really efficient way. If you don't need to group by a smaller unit then an hour, then you don't have to add more precision.
The timestamp should be stored in the vertex property because the filtering will be easy and efficient. Check out this blog bost to know the best way to filter on a vertex property when traversing :
Improved SQL filtering
Answer for the second part of your question :
Get the specific vertex and then do your query on the hiearachical tree of vertex :
<your-hierachical-tree-query> from (select out('edge')[property = "value"] from
(select expand(month[1].day[1].hour[1].min[1]) from Year where year = 2015))

Base condition in recursive query using CTE?

WITH CTE
AS(
SELECT ID,Name,ManagerID, 1 RecursiveCallNumber FROM Employee WHERE ID=2
UNION ALL
SELECT E.ID,E.Name,E.ManagerID,RecursiveCallNumber+1 RecursiveCallNumber FROM Employee E
INNER JOIN CTE ON E.ManagerID=CTE.ID
)
SELECT * FROM CTE
How does the above code work logically? Here is my explanation:
Execute the the first select statement. [Now the temporary table is
known as CTE]
Execute the next select statement and join with the above result. We
join with a condition that reduces the steps/loops in recursion
which in this case is Manager. [Now the entire thing is known as
CTE]
What is the base condition here? If there no results in the join, then its a base condition? Wouldn't that break if we had a 0th IDN record forming a circular reference?
https://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx is a good resource.
Recursive definitions in SQL Server with CTE are different from recursive definitions in many other programming languages (such as functional, imperative, and logical languages) in that the "base condition" is what starts, not ends, the recursion.
In the recursion familiar to most programmers you start by asking what you want to know (say, "what's the factorial of five?"), then your recursive program gradually reduces the request to something simple, gets to the base case ("what's the factorial of one?"), and builds up your solution as it "unwinds" the recursive chain of invocations ("factorial of three is three times the factorial of two, factorial of four is four times the factorial of three, and so on").
Here, you start with a "seed data", and proceed by expanding the seed set for as long as you can discover more things to add to it. Once there's nothing else to add, you stop, and return the results.
In a sense, this is very similar to breadth-first search implementation that uses a queue: you add the initial element to the queue, and then your loop takes items from the queue, and enqueues its related items. The loop stops once there is nothing more to add.

JCR: Get all ancestors of current node

Is it possible to select node ancestors by SQL2 query?
For example
I have: /content/categories/sport/football node
Want to select: /content, /content/categories, /content/categories/sport nodes
You can, but assuming you have other siblings at those levels it's not very easy or dynamic. Honestly, it'll probably be far easier and far more performant to just use the Node methods to walk up the ancestors. Remember that you can get the Node object(s) for each row in a JCR-SQL2 query result.
Alternatively, if you just want the paths to the ancestors, then you can implicitly get these from the path of a result node (e.g., /content/categories/sport nodes).

Remove a node in binary search tree

I am reading a book about removing a node from a binary search tree right now and the procedure described in the book seems unnecessarily complicated to me.
My question is specifically about removing a node that has both left and right subtree. In my opinion, node-to-remove should be replaced by the rightmost node in its left subtree or by its left node if its left subtree only has one node.
In case No.1, if we remove 40, it will be replaced by 30; in case No.2, if we remove 40, it will be replaced 35.
But in the book, it says the replacement should be found from node-to-remove's right subtree which could involve some complex manipulations.
Am I missing something here? Please point it out.
What you have pointed out is correct, the deleted node should be replace by either its in order successor which is the left most node in the right sub-tree or its in-order predecessor which is the right most node in the left sub-tree. This allows the tree to be traversed correctly. Most binary search tree data structures allow the deletion to be performed either way but in some cases special cases you might want to implement deletion such that the tree remains balanced.
More details and sample code is available on Wikipedia.
In case no.1 if you remove node 40 it will be replace by 50.
In case no.2 if you remove node 40 it will be replace by 50.
So basically when we delete any node that has 2 child then the removal should be as below.
We go the right child of the node, and then extreme left of that child.
Below figures shown some example, how to delete a node from binary search tree. This is also taken from one book, but it is clearly explained.

Efficient management of hierarchyid values in MS SQL Server

With the hierarchyid datatype in SQL Server 2008 and onward, would there be any benefit to trying to optimize the issuing of the next child of /1/1/8/ [ /1/1/8/x/ ] such that x is the closest non-negative whole number to 1 possible?
An easy solution seems to be to find the maximum assigned child value and getting the sibling to the right but it seems like you'd eventually exhaust this (in theory if not in practice) since you're never reclaiming any of the values and to my understanding, negatives and non-wholes consume more space.
EXAMPLE: If I've got a parent /1/1/8/ who has these children (and order of the children doesn't matter and reassignment of the values is ok):
/1/1/8/-400/
/1/1/8/1/
/1/1/8/4/
/1/1/8/40/
/1/1/8/18/
/1/1/8/9999999999/
wouldn't I want the next child to have /1/1/8/2/ ?
Here's the thing.
What you are saying will be "optimal" is not necessarily optimal.
When I am inserting values into a hierarchy, I generally do not care what the order is for the child nodes of a particular node.
If I do, that is why there are two parameters in GetDescendant.
If I want to prepend the node into the order(i.e make it first), I use a first parameter of NULL and a second parameter that is the lowest value of the other children.
If I want to append the node into the order (i.e. make it last), I use a first parameter of the maximum value of the other children and a second parameter of NULL.
If I want to insert between two other child nodes, I need both the one that will be before and the one that will be after the node I am inserting.
In any case, generally the values in the hierarchy field don't really matter, because you will order by a different field like Name or something.
Ergo, the most "efficient" method of adding things into a hierarchy is to either prepend or append, since finding the MIN or MAX hierarchy value is easy, and doing what you are describing requires several queries to find the first "hole" in the tree.
In other words, don't put a lot of meaning onto the string representation of a hierarchy unless you are using them for an application in which you are using the hierarchy value to sort by.
Even in that case, you probably don't want to fill in hierarchy values as you describe, and probably want to append to the end anyway.
Hope this helped.