Changing a parent in a nested-set with Sql - sql

I have a database structure that contains a parent/child hierarchy and am using a nested-set structure to represent it.
Each record has a parentkey and a lvalue and rvalue.
Inserting new children is easy. We can adjust all subsequent lvalues and rvalues easily.
But how do I re-adjust those values when I'm modifying the parent of a given node?
ie, I'm changing the parent a node belongs to.
Currently, I'm just recomputing the whole tree using a breadth-first traversal starting at the root nodes.
Doing this in sql is time consuming (about 5min to process 50k records).
Is there any easier technique for updating those lvalue/rvalues?
I'm using Sql Server if that makes any difference.

Related

Neo4j - Find node by ID - How to get the ID for querying?

I want to be able to to find a specific node by it's ID for performance reasons (IDs are more efficient than indexes)
In order to execute the following example:
MATCH (s)
WHERE ID(s) = 65110
RETURN s
I will need the ID of the node (65110 in this case)
But how to I get it? Since the ID is auto-generated, It's impossible to find the ID without querying the graph, which kind of defeats the purpose since I will already have the node.
Am I missing something?
TL;DR: use an indexed property for lookups unless you absolutely need to optimise and can measure the difference.
Typically you use an index lookup as an entry point to the graph, that is, to obtain the node that provides the start of an edge traversal. While the pointer-like nature of Neo4j node IDs means they are theoretically faster, index lookups are also very efficient so you should not discount them on performance grounds unless you are sure it will make a measurable difference.
You should also consider that Neo4j node IDs are not stable. If you delete a node it is possible for the same ID to be re-used in future. For this reason they should really be considered an internal implementation detail and not one that should be relied on as part of your application's external interface.
That said, I have an application that stores Neo4j IDs in a Solr index for looking up nodes in bulk, but this index is considered volatile and the nodes also contain an indexed, application-generated UUID property (with a unique constraint) that serves as their main "primary key".
Further reading and discussion: https://github.com/neo4j/neo4j/issues/258

Suggestions modelling nested data sets that change over time

I am looking for suggestions on creating a temporal nested data set model. I am trying to improve performance for reading sections. I have a node tree of ~1million nodes, with frequent depths of 20+ nodes. The tree stores categories that can change over time, with the ability to enter future changes.
The current data structure is a temporal adjacent node model, modelling changes to the node tree over time is trivial with a simple data structure:
Nodes
nodeID
[data]
Edges
parentNodeId
childNodeId
validFromDate
validToDate
A nested data set makes for very fast read operations, but my current understanding of nested sets does not support changes over time to the tree
Nodes
nodeId
left
right
[data]
One thought I had was to create a series of "nesting maps" which reflect the left/right values at given points in time, but this would mean recreating the entire node tree whenever a single change-over-time was modelled, which would make the size of the "Nests" dataset too large as changes are frequent.
Nests
nodeId
left
right
validFromDate
validToDate
Has anyone created a temporal nested dataset model, or know any good resources on the subject?
The following papers have investigated archiving multiple versioned nested data (a simple form of XML, but use of XML is not essential.)
http://xarch.sourceforge.net/
and see also some papers:
homepages.inf.ed.ac.uk/opb/papers/TODS2004.pdf

Hierarchical Database Structure SQL Server

I have Different hierarchical structure
Please find Below structure.
1. Parent 1
1.1 Child 1
1.2 Child 2
1.3 Child 3
1.3.1 Child 4
**1.3.2 Parent 2**
Now Look at above tree, here child can also have sub child as PARENT.
So how can I achieve this, keep in mind that I want whole tree without for each loop.
Thanks in advance.
Generally, two approaches may fit your needs.
Version #1: The most obvious (but slow) attempt is to simply create a table holding each node and a reference (foreign key) to its parent. A parent of NULL indicates a/the root node.
The disadvantage of this attempt is that you either need a loop (what you want to avoid) or a RDBMS with the possibility to define and execute recursive queries (usually with a CTE).
Version #2: The second attempt would be the choice in the real world. Whereas the first solution is able to store unlimited depth, these scenarios usually don't occur in hirarchical trees.
Again you create a table with one row per node, but instead having a reference to the parent, you store the absolute path to that node within the tree in e.g. a VarChar column, just like the absolute path of a file in a filesystem. Here, the 'directory name' corresponds to e.g. the ID of the node.
Version #1 has the advantage of being very compact, but it takes quite an effort to prune the tree or retrieve a list of all nodes with their absolute path (RDBMS are not very good in recursive structures). On the other side, a lot of UI components expect exactly this structure to display the tree on screen. Questions like 'Which nodes are indirect childs of node X' are both slow and quite difficult to answer.
Version #2 has the advantage of making it very easy to implement tree manipulation (deletion, pruning, moving nodes and subtrees). Also, the list you require is a simple SELECT. The question 'show all direct or indirect childs of node X' are answered with a simple SELECT as well.
The caveat is the increased size due to redundant saving of paths and the limited depth of the possible tree to save.

Are Neo4J node ids optimized for access?

I am building a large graph database using neo4j.
I have my own external indexes which give me identifiers for relevant nodes that I use for further neo4j graph traversal. In other words I already have my start node ids when I get to query the database.
My question is: can node lookups be faster if I use neo4j/lucene indexes to access relevant nodes?
Or are queries such as:
START n=node({ids})
already optimized for node access and nothing can be gained by using:
START n=node:nodeIndexName(key={value})
?
Thanks,
Yes. Neo4j is optimized for Node ID as at the persistence level, all nodes are a block, so accessing node 100 is like accessing block 100.
I will warn you though that Neo4j makes no guarantee about the node id if you delete it. Neo4j reclaims ID's. So if in the course of your DB's life you delete and add multiple nodes, your external entries may be "valid" but not what you'd expect.
//EDIT: Also, why not just use Lucene to perform your lookups? Of course accessing the Node ID is faster, but that's what Lucene does under the cover when you do a lookup, so key:name, value:frank will return node id 5123 and neo4j will return the node that corresponds to that ID.

Improving scalability of the modified preorder tree traversal algorithm

I've been thinking about the modified preorder tree traversal algorithm for storing trees within a flat table (such as SQL).
One property I dislike about the standard approach is that to insert a node you
have to touch (on average) N/2 of the nodes (everything with left or right higher than the insert point).
The implementations I've seen rely on sequentially numbered values. This leaves no room for updates.
This seems bad for concurrency and scaling. Imagine you have a tree rooted at the world containing user groups for every account in a large system, it's extremely large, to the point you must store subsets of the tree on different servers. Touching half of all the nodes to add a node to the bottom of the tree is bad.
Here is the idea I was considering. Basically leave room for inserts by partitioning the keyspace and dividing at each level.
Here's an example with Nmax = 64 (this would normally be the MAX_INT of your DB)
0:64
________|________
/ \
1:31 32:63
/ \ / \
2:14 15-30 33:47 48:62
Here, a node is added to the left half of the tree.
0:64
________|________
/ \
1:31 32:63
/ | \ / \
2:11 11:20 21:30 33:47 48:62
The alogorithm must be extended for the insert and removal process to recursively renumber to the left/right indexes for the subtree. Since querying for immediate children of a node is complicated, I think it makes sense to also store the parent id in the table. The algorithm can then select the sub tree (using left > p.left && right < p.right), then use node.id and node.parent to work through the list, subdividing the indexes.
This is more complex than just incrementing all the indexes to make room for the insert (or decrementing for removal), but it has the potential to affect far fewer nodes (only decendenants of the parent of the inserted/removed node).
My question(s) are basically:
Has this idea been formalized or implemented?
Is this the same as nested intervals?
I have heard of people doing this before, for the same reasons, yes.
Note that you do lose at a couple of small advantages of the algorithm by doing this
normally, you can tell the number of descendants of a node by ((right - left + 1) div 2). This can occasionally be useful, if e.g. you'd displaying a count in a treeview which should include the number of children to be found further down in the tree
Flowing from the above, it's easy to select out all leaf nodes -- WHERE (right = left + 1).
These are fairly minor advantages and may not be useful to you anyway, though for some usage patterns they're obviously handy.
That said, it does sound like materialized paths may be more useful to you, as suggested above.
I think you're better off looking at a different way of storing trees. If your tree is broad but not terribly deep (which seems likely for the case you suggested), you can store the complete list of ancestors up to the root against each node. That way, modifying a node doesn't require touching any nodes other than the node being modified.
You can split your table into two: the first is (node ID, node value), the second (node ID, child ID), which stores all the edges of the tree. Insertion and deletion then become O(tree depth) (you have to navigate to the element and fix what is below it).
The solution you propose looks like a B-tree. If you can estimate the total number of nodes in your tree, then you can choose the depth of the tree beforehand.