Data structure that inserts in O(log2n), yet searches in O(1) - structure

Currently working on an assignment that requries a data structure to insert in O(log2n), yet be able to search for an element in O(1). I was thinking a BST because of the log2n insert, but will be unable to search in O(1). A hash table can insert at worst O(n), with a search of O(1), but unfortunately this doesn't fit the O(log2n) insert requirements.
Anybody have any suggestions? thanks!

You can insert element into a binary tree, and pointer to that element - into hashtable.
Or, you can just insert element into hashtable with O(1), and search with O(1), and for comply requirement "Insert O(lb N)" - just run LogBin(N) empty loops.
Regarding "worst case/average case": both hashtable and binary tree (non balanced like rb, just primitive) has O(N) in worst case. I think, your assignment is about "average case", this is usual. And hashtable can provide O(1). To preserve attack with selected dataset, use the "universal hashing".

Related

What benefit does a balanced search tree provide over a sorted key-value pair array?

public class Entry{
int key;
String value;
}
If you have an array of Entry.
Entry[]
You can do a binary search on this array to find, Insert or remove an Entry all in O(Log(n)). I can also do a range search in O(log(n)).
And this is very simple.
What does a comparatively complicated data structure like a red-black balanced search tree, give me over a simple sorted key value array?
If data is immutable, the tree has no benefit.
The only benefit of the array is locality of reference, e.g. data is close together and CPU may cache it.
Because the array is sorted, search is O(log n)
If you add / remove items things changed.
For small number of elements, the array is better (faster) this is because of the locality of reference.
For larger number of items Red Black Tree (or another self balanced tree) will perform better, because the array will need to shift the elements.
e.g. insert and delete will take O(log n) + huge n/2 for the shift.

Why is Hash Table insertion time complexity worst case is not N log N

Looking at the fundamental structure of hash table. We know that it resizes WRT load factor or some other deterministic parameter. I get that if the resizing limit is reached within an insertion we need to create a bigger hash table and insert everything there. Here is the thing which I don't get.
Let's consider a hash table where each bucket contains an AVL - balanced BST. If my hash function returns the same index for every key then I would store everything in the same AVL tree. I know that this hash function would be a really bad function and would not be used but I'm doing a worst case scenario here. So after some time let's say that resizing factor has been reached. So in order to resize I created a new hash table and tried to insert every old elements in my previous table. Since the hash function mapped everything back into one AVL tree, I would need to insert all the N elements into the same AVL. N insertion on an AVL tree is N logN. So why is the worst case of insertion for hash tables considered O(N)?
Here is the proof of adding N elements into Avl three is N logN:
Running time of adding N elements into an empty AVL tree
In short: it depends on how the bucket is implemented. With a linked list, it can be done in O(n) under certain conditions. For an implementation with AVL trees as buckets, this can indeed, wost case, result in O(n log n). In order to calculate the time complexity, the implementation of the buckets should be known.
Frequently a bucket is not implemented with an AVL tree, or a tree in general, but with a linked list. If there is a reference to the last entry of the list, appending can be done in O(1). Otherwise we can still reach O(1) by prepending the linked list (in that case the buckets store data in reversed insertion order).
The idea of using a linked list, is that a dictionary that uses a reasonable hashing function should result in few collisions. Frequently a bucket has zero, or one elements, and sometimes two or three, but not much more. In that case, a simple datastructure can be faster, since a simpler data structure usually requires less cycles per iteration.
Some hash tables use open addressing where buckets are not separated data structures, but in case the bucket is already taken, the next free bucket is used. In that case, a search will thus iterate over the used buckets until it has found a matching entry, or it has reached an empty bucket.
The Wikipedia article on Hash tables discusses how the buckets can be implemented.

Given an array of N integers how to find the largest element which appears an even number of times in the array with minimum time complexity

You are given an array of N integers. You are asked to find the largest element which appears an even number of times in the array. What is the time complexity of your algorithm? Can you do this without sorting the entire array?
You could do it in O(n log n) with a table lookup method. For each element in the list, look it up in the table. If it is missing, insert a key-value pair with the key being the element and the value as the number of appearances (starting at one); if it is present, increment the appearances. At the end just loop through the table in O(n) and look for the largest key with an even value.
In theory for an ideal hash-table, a lookup operation is O(1). So you can find and/or insert all n elements in O(n) time, making the total complexity O(n). However, in practice you will have trouble with space allocation (need much more space than data set size) and collisions (why you need it). This makes the O(1) lookup very difficult to achieve; in the worst case scenario it can be as much as O(n) (though also unlikely) - making the total complexity O(n^2).
Instead you can be more secure with a tree-based table - that is, the keys are stored in a binary tree. Lookup and insertion operations are all O(log n) in this case, provided that the tree is balanced; there are a wide range of tree structures to help ensure this e.g. Red-Black trees, AVL, splay, B-trees etc (Google is your friend). This will make the total complexity a guaranteed O(n log n).

Using REDIS sorted sets, what is the time complexity certain special operations?

On REDIS documentation, it states that insert and update operations on sorted sets are O(log(n)).
On this question they specify more details about the underlying data structure, the skip list.
However there are a few special cases that depend on the REDIS implementation with which I'm not familiar.
adding at the head or tail of the sorted set will probably not be a O(log(n)) operation, but O(1), right? this question seems to agree with reservations.
updating the score of a member even if the ordering doesn't change is still O(log(n)) either because you take the element out and insert it again with the slightly different score, or because you have to check that the ordering doesn't change and so the difference is only in constant operations between insert or update score. right? I really hope I'm wrong in this case.
Any insights will be most welcome.
Note: skip lists will be used once the list grows above a certain size (max_ziplist_entries), below that size a zip list is used.
Re. 1st question - I believe that it would still be O(log(n)) since a skip list is a type of a binary tree so there's no assurance where the head/tail nodes are
Re. 2nd question - according to the source, changing the score is implemented with a removing and readding the member: https://github.com/antirez/redis/blob/209f266cc534471daa03501b2802f08e4fca4fe6/src/t_zset.c#L1233 & https://github.com/antirez/redis/blob/209f266cc534471daa03501b2802f08e4fca4fe6/src/t_zset.c#L1272
In Skip List, when you insert a new element in head or tail, you still need to update O(log n) levels. The previous head or tail can have O(log n) levels and each may have pointers which need to be updated.
Already answered by #itamar-haber

Design a highly optimized datastructure to perform three operations insert, delete and getRandom

I just had a software interview. One of the questions was to design any datastructure with three methods insert, delete and getRandom in a highly optimized way. The interviewer asked me to think of a combination of datastructures to design a new one. Insert can be designed anyway but for random and delete i need to get the position of specific element. He gave me a hint to think about the datastructure which takes minimum time for sorting.
Any answer or discussion is welcomed....
Let t be the type of the elements you want to store in the datastructure.
Have an extensible array elements containing all the elements in no particular order. Have a hashtable indices that maps elements of type t to their position in elements.
Inserting e means
add e at the end of elements (i.e. push_back), get its position i
insert the mapping (e,i) into `indices
deleting e means
find the position i of e in elements thanks to indices
overwrite e with the last element f of elements
update indices: remove the mapping (f,indices.size()) and insert (f,i)
drawing one element at random (leaving it in the datastructure, i.e. it's peek, not pop) is simply drawing an integer i in [0,elements.size()[ and returning elements[i].
Assuming the hashtable is well suited for your elements of type t, all three operations are O(1).
Be careful about the cases where there are 0 or 1 element in the datastructure.
A tree might work well here. Order log(n) insert and delete, and choose random could also be log(n): start at the root node and at each junction choose a child at random (weighted by the total number of leaf nodes per child) until you reach a leaf.
The data structure which takes the least time for sorting is sorted array.
get_random() is binary search, so O(log n).
insert() and delete() involve adding/removing the element in question and then resorting, which is O(n log n), e.g. horrendous.
I think his hint was poor. You may have been in a bad interview.
What I feel is that you can use some balaced version of tree like Red-Black trees. This will give O(log n) insertion and deletion time.
For getting random element, may be you can have a additional hash table to keep track of elements which are in the tree structure.
It might be Heap (data structure)