what is the difference between WAVL (weak AVL) and Red Black Tree? - binary-search-tree

what is the difference between WAVL (weak AVL) and Red Black Tree?
is there a specific reason to use WAVL over RB?

A WAVL tree is an attempt to combine the best characteristics of a AVL trees and red-black trees. Just inserting into a WAVL tree will build the same tree as an AVL tree - one that is more strictly balanced than a red-black tree so WAVL trees can be expected to perform better in situations where red-black trees become more unbalanced. Delete in WAVL is slightly simpler than delete for AVL trees in that WAVL deletes perform only 1 or 2 rotations and stop instead of potentially all the way to the root.

Related

Lmdb multiple indexes

I am reading some high-level documentation about lmdb, and it seems that multiple indexes should be elegantly feasible because it is, at least internally, possible to have a "data item ... [as] the root node of another tree" (LDAP at Lightning Speed, Howard Chu 2014 p. 96). This tree of trees is not exposed in the API, and this nesting of trees goes only one level deep, as far as I can tell.
For the sake of clarity, let's suppose I want to do queries of the type: Give me the names of all members aged 35-40 who joined in the decade 2000-2009.
The duplicate key feature does not really help me, because I cannot efficiently search ranges in the 2nd key (the member joining date, say).
So I can only achieve multiple indexes by cobbling together multiple databases, is that correct? This leads to a possibly related question: What are these sub-databases? They are not mentioned in the API docs. Is this, again, a purely internal matter?

Best vs average runtime on binary seach trees

It is known that O((log n)) is the average timecomplexity for search, insert and deletion for a binary search tree, my question is if this is also the best case? If not what are the best cases?
The best case, as is the case with other data structures, is O(1).
Two examples:
1.)The node that you're searching for is the root and that's the only element in the BST.
2.) In a left/right skewed tree, the node that you want to delete is at the root.

Risks and benefits of a modified closure table for hierarchical data

I am attempting to store hierarchical data in SQL and have resolved to use
an object table, where all of the main data will be
and a closure table, defining the relationships between the objects (read more on closure tables here [slides 40 to 68]).
After quite a bit of research, a closure table seemed to suit my needs well. One thing that I kept reading, however, is that if you want to query the direct ancestor / descendant of a particular node - then you can use a depth column in your closure table (see slide 68 from the above link). I have a need for this depth column to facilitate this exact type of query. This is all well and good, but one of the main attractions to the closure table in the first place was the ease by which one could both query and modify data contained there in. And adding a depth column seems to complete destroy the ease by which one can modify data (imagine adding a new node and offsetting an entire branch of the tree).
So - I'm considering modifying my closure table to define relations only between a node and its immediate ancestor / descendant. This allows me to still easily traverse the tree. Querying data seems relatively easy. Modifying data is not as easy as the original closure table without the depth field, but significantly easier than the one with the depth field. It seems like a fair compromise (almost between a closure table and an adjacency list).
Am I overlooking something though? Am I loosing one of the key advantages of the closure table by doing it this way? Does anyone see any inherent risks in doing it this way that may come to haunt me later?
I believe the key advantage you are losing is that if you want to know all of the descendants or ancestors of a node, you now have to do a lot more traversals.
For example, if you start with the following simple tree:
A->B->C->D
To get all descendants of A you have to go A->B then B->C then C->D. So, three queries, as opposed to a single query if following the normal pattern.

What's the case for duplications in BST?

How to solve the problem with duplication in Binary Search Tree?
I am not really sure what you are asking. But that won't stop me from posting an answer.
Usually, duplicate keys are disallowed in a BST. That tends to make things a lot easier, and it is a condition that is easy to avoid.
If you do want to allow duplicates, then insertions are not a problem. You can just stick it either in the left subtree or the right subtree.
The problem is that you can't count on the duplicates being on a particular side if it is a self-balancing tree like an AVL-tree or a red-black-tree. It seems like this might be a problem for deletions, but I once implemented an AVL-tree that made no special provisions for duplicates, and it had no problems at all.
Deleting a node from an AVL tree involves (1) finding the node, (2) replacing that node with either the greatest key in the left subtree or the smallest key in the right subtree, and then recursively deleting that node. If there is no subtree, then nothing more needs to be done.
In practice, deleting a node with duplicates means that the node with the sought key nearest the root will be replaced with something, either a node with another key, or a node with the same key. Either way, the ordering constraints are not violated, and everything proceeds with no trouble.
I don't know about red-black trees or other sorts of BSTs.
It's up to your comparison check: if equal and smaller are equivalent, duplicates will be placed in the "smaller" node, otherwise they're in the "larger" node. Besides this, there shouldn't be an issue with duplicates, unless you want to avoid them of course, in which case you need an extra equality check.

A query to summarize data in sub-tree?

My data fits a tree form naturally. Therefore, I have a simple SQL table to store the data: {id, parentid, data1, ..., dataN}
I want to be able to "zoom in" on the data and produce a report which summarizes the data found below the current branch.
That is, when standing in the root, I want to have the totals of all the data. When I have traveled down a certain branch of the tree, I want to only have the summation of the data found only for that node and its child nodes.
How do I write such a query in SQL?
Thanks in advance!
/John
Since sqlite does not support CONNECT BY, you will not be able to perform this calculation in a single query unless you use nested sets or materialized paths for your data.
Alternatively, do it "the hard way" and traverse your tree recursively, one query for each child node starting at the parent-of-interest.
Also see:
Managing Hierarchical Data in MySQL
Recursive Hierarchies: The Relational Taboo!
Vlad's reference on nested sets looks pretty good. If you want something that covers trees and hierarchies in more detail then you can also check out Joe Celko's book.
The "ID, ParentID" adjacency list model is really an "old time" way of looking at hierarchies in a relational database model.