Best vs average runtime on binary seach trees - binary-search-tree

It is known that O((log n)) is the average timecomplexity for search, insert and deletion for a binary search tree, my question is if this is also the best case? If not what are the best cases?

The best case, as is the case with other data structures, is O(1).
Two examples:
1.)The node that you're searching for is the root and that's the only element in the BST.
2.) In a left/right skewed tree, the node that you want to delete is at the root.

Related

Lmdb multiple indexes

I am reading some high-level documentation about lmdb, and it seems that multiple indexes should be elegantly feasible because it is, at least internally, possible to have a "data item ... [as] the root node of another tree" (LDAP at Lightning Speed, Howard Chu 2014 p. 96). This tree of trees is not exposed in the API, and this nesting of trees goes only one level deep, as far as I can tell.
For the sake of clarity, let's suppose I want to do queries of the type: Give me the names of all members aged 35-40 who joined in the decade 2000-2009.
The duplicate key feature does not really help me, because I cannot efficiently search ranges in the 2nd key (the member joining date, say).
So I can only achieve multiple indexes by cobbling together multiple databases, is that correct? This leads to a possibly related question: What are these sub-databases? They are not mentioned in the API docs. Is this, again, a purely internal matter?

combine lucene indexing and traversal in neo4j to give a single resultset

Is there any way to combine lucene indexing and traversal in neo4j to search the users indexed by their name but the search results should return minimum depth first (or breadth first traversal)..
i.e. say search all users with name "John*" but closeness to a particular user node should be given more priority than others.
i.e. say the particular node is X then the output should be in the following order:
X--JohnG
X------JohnM
X------------------JohnY
and so on...
I am not sure if i should use an evaluator to filter out on names since there may be thousands of nodes and so it does not sound very efficient without indexing.
Thanks for any help!
I do not believe this is possible. I do not see anywhere in the REST traversal framework where you can define the Node by Index, only by Node ID. What you'd have to do is use the REST framework to perform the index lookup to get the Node ID, then perform the traversal on that.

Risks and benefits of a modified closure table for hierarchical data

I am attempting to store hierarchical data in SQL and have resolved to use
an object table, where all of the main data will be
and a closure table, defining the relationships between the objects (read more on closure tables here [slides 40 to 68]).
After quite a bit of research, a closure table seemed to suit my needs well. One thing that I kept reading, however, is that if you want to query the direct ancestor / descendant of a particular node - then you can use a depth column in your closure table (see slide 68 from the above link). I have a need for this depth column to facilitate this exact type of query. This is all well and good, but one of the main attractions to the closure table in the first place was the ease by which one could both query and modify data contained there in. And adding a depth column seems to complete destroy the ease by which one can modify data (imagine adding a new node and offsetting an entire branch of the tree).
So - I'm considering modifying my closure table to define relations only between a node and its immediate ancestor / descendant. This allows me to still easily traverse the tree. Querying data seems relatively easy. Modifying data is not as easy as the original closure table without the depth field, but significantly easier than the one with the depth field. It seems like a fair compromise (almost between a closure table and an adjacency list).
Am I overlooking something though? Am I loosing one of the key advantages of the closure table by doing it this way? Does anyone see any inherent risks in doing it this way that may come to haunt me later?
I believe the key advantage you are losing is that if you want to know all of the descendants or ancestors of a node, you now have to do a lot more traversals.
For example, if you start with the following simple tree:
A->B->C->D
To get all descendants of A you have to go A->B then B->C then C->D. So, three queries, as opposed to a single query if following the normal pattern.

Lucene: Query at least

I'm trying to find if there's a way to search in lucene to say find all documents where there is at least one word that does not match a particualar word.
E.g. I want to find all documents where there is at least one word besides "test". i.e. "test" may or may not be present but there should be at least one word other than "test". Is there a way to do this in Lucene?
thanks,
Purushotham
Lucene could do this, but this wouldn't be a good idea.
The performance of query execution is bound to two factors:
the time to intersect the query with the term dictionary,
the time to retrieve the docs for every matching term.
Performant queries are the ones which can be quickly intersected with the term dictionary, and match only a few terms so that the second step doesn't take too long. For example, in order to prohibit too complex boolean queries, Lucene limits the number of clauses to 1024 by default.
With a TermQuery, intersecting the term dictionary requires (by default) O(log(n)) operations (where n is the size of the term dictionary) in memory and then one random access on disk plus the streaming of at most 16 terms. Another example is this blog entry from Lucene committer Mike McCandless which describes how FuzzyQuery performance improved when a brute-force implementation of the first step was replaced by something more clever.
However, the query you are describing would require to examine every single term of the term dictionary and dismiss documents which are in the "test" document set only!
You should give more details about your use-case so that people can think about a more efficient solution to your problem.
If you need a query with a single negative condition, then use a BooleanQuery with the MatchAllDocsQuery and a TermQuery with occurs=MUST_NOT. There is no way to additionaly enforce the existential constraint ("must contain at least one term that is not excluded"). You'll have to check that separately, once you retrieve Lucene's results. Depending on the ratio of favorable results to all the results returned from Lucene, this kind of solution can range from perfectly fine to a performance disaster.

What's the case for duplications in BST?

How to solve the problem with duplication in Binary Search Tree?
I am not really sure what you are asking. But that won't stop me from posting an answer.
Usually, duplicate keys are disallowed in a BST. That tends to make things a lot easier, and it is a condition that is easy to avoid.
If you do want to allow duplicates, then insertions are not a problem. You can just stick it either in the left subtree or the right subtree.
The problem is that you can't count on the duplicates being on a particular side if it is a self-balancing tree like an AVL-tree or a red-black-tree. It seems like this might be a problem for deletions, but I once implemented an AVL-tree that made no special provisions for duplicates, and it had no problems at all.
Deleting a node from an AVL tree involves (1) finding the node, (2) replacing that node with either the greatest key in the left subtree or the smallest key in the right subtree, and then recursively deleting that node. If there is no subtree, then nothing more needs to be done.
In practice, deleting a node with duplicates means that the node with the sought key nearest the root will be replaced with something, either a node with another key, or a node with the same key. Either way, the ordering constraints are not violated, and everything proceeds with no trouble.
I don't know about red-black trees or other sorts of BSTs.
It's up to your comparison check: if equal and smaller are equivalent, duplicates will be placed in the "smaller" node, otherwise they're in the "larger" node. Besides this, there shouldn't be an issue with duplicates, unless you want to avoid them of course, in which case you need an extra equality check.