Given an array of N integers how to find the largest element which appears an even number of times in the array with minimum time complexity - time-complexity

You are given an array of N integers. You are asked to find the largest element which appears an even number of times in the array. What is the time complexity of your algorithm? Can you do this without sorting the entire array?

You could do it in O(n log n) with a table lookup method. For each element in the list, look it up in the table. If it is missing, insert a key-value pair with the key being the element and the value as the number of appearances (starting at one); if it is present, increment the appearances. At the end just loop through the table in O(n) and look for the largest key with an even value.
In theory for an ideal hash-table, a lookup operation is O(1). So you can find and/or insert all n elements in O(n) time, making the total complexity O(n). However, in practice you will have trouble with space allocation (need much more space than data set size) and collisions (why you need it). This makes the O(1) lookup very difficult to achieve; in the worst case scenario it can be as much as O(n) (though also unlikely) - making the total complexity O(n^2).
Instead you can be more secure with a tree-based table - that is, the keys are stored in a binary tree. Lookup and insertion operations are all O(log n) in this case, provided that the tree is balanced; there are a wide range of tree structures to help ensure this e.g. Red-Black trees, AVL, splay, B-trees etc (Google is your friend). This will make the total complexity a guaranteed O(n log n).

Related

What benefit does a balanced search tree provide over a sorted key-value pair array?

public class Entry{
int key;
String value;
}
If you have an array of Entry.
Entry[]
You can do a binary search on this array to find, Insert or remove an Entry all in O(Log(n)). I can also do a range search in O(log(n)).
And this is very simple.
What does a comparatively complicated data structure like a red-black balanced search tree, give me over a simple sorted key value array?
If data is immutable, the tree has no benefit.
The only benefit of the array is locality of reference, e.g. data is close together and CPU may cache it.
Because the array is sorted, search is O(log n)
If you add / remove items things changed.
For small number of elements, the array is better (faster) this is because of the locality of reference.
For larger number of items Red Black Tree (or another self balanced tree) will perform better, because the array will need to shift the elements.
e.g. insert and delete will take O(log n) + huge n/2 for the shift.

Why is Hash Table insertion time complexity worst case is not N log N

Looking at the fundamental structure of hash table. We know that it resizes WRT load factor or some other deterministic parameter. I get that if the resizing limit is reached within an insertion we need to create a bigger hash table and insert everything there. Here is the thing which I don't get.
Let's consider a hash table where each bucket contains an AVL - balanced BST. If my hash function returns the same index for every key then I would store everything in the same AVL tree. I know that this hash function would be a really bad function and would not be used but I'm doing a worst case scenario here. So after some time let's say that resizing factor has been reached. So in order to resize I created a new hash table and tried to insert every old elements in my previous table. Since the hash function mapped everything back into one AVL tree, I would need to insert all the N elements into the same AVL. N insertion on an AVL tree is N logN. So why is the worst case of insertion for hash tables considered O(N)?
Here is the proof of adding N elements into Avl three is N logN:
Running time of adding N elements into an empty AVL tree
In short: it depends on how the bucket is implemented. With a linked list, it can be done in O(n) under certain conditions. For an implementation with AVL trees as buckets, this can indeed, wost case, result in O(n log n). In order to calculate the time complexity, the implementation of the buckets should be known.
Frequently a bucket is not implemented with an AVL tree, or a tree in general, but with a linked list. If there is a reference to the last entry of the list, appending can be done in O(1). Otherwise we can still reach O(1) by prepending the linked list (in that case the buckets store data in reversed insertion order).
The idea of using a linked list, is that a dictionary that uses a reasonable hashing function should result in few collisions. Frequently a bucket has zero, or one elements, and sometimes two or three, but not much more. In that case, a simple datastructure can be faster, since a simpler data structure usually requires less cycles per iteration.
Some hash tables use open addressing where buckets are not separated data structures, but in case the bucket is already taken, the next free bucket is used. In that case, a search will thus iterate over the used buckets until it has found a matching entry, or it has reached an empty bucket.
The Wikipedia article on Hash tables discusses how the buckets can be implemented.

Data structure that inserts in O(log2n), yet searches in O(1)

Currently working on an assignment that requries a data structure to insert in O(log2n), yet be able to search for an element in O(1). I was thinking a BST because of the log2n insert, but will be unable to search in O(1). A hash table can insert at worst O(n), with a search of O(1), but unfortunately this doesn't fit the O(log2n) insert requirements.
Anybody have any suggestions? thanks!
You can insert element into a binary tree, and pointer to that element - into hashtable.
Or, you can just insert element into hashtable with O(1), and search with O(1), and for comply requirement "Insert O(lb N)" - just run LogBin(N) empty loops.
Regarding "worst case/average case": both hashtable and binary tree (non balanced like rb, just primitive) has O(N) in worst case. I think, your assignment is about "average case", this is usual. And hashtable can provide O(1). To preserve attack with selected dataset, use the "universal hashing".

Searching an item in a balanced binary tree

If I have a balanced binary tree and I want to search for an item in it, will the big-oh time complexity be O(n)? Will searching for an item in a binary tree regardless of whether its balanced or not change the big - oh time complexity from O(n)? I understand that if we have a balanced BST then searching for an item is equivalent to the height of the BST so O(log n) but what about normal binary trees?
The O(log n) search time in a balanced BST is facilitated by two properties:
Elements in the tree are arranged by comparison
The tree is (approximately) balanced.
If you lose either of those properties, then you will no longer get O(log n) search time.
If you are searching a balanced binary tree that is not sorted (aka not a BST) for a specific value, then you will have to check every node in the tree to be guaranteed to find the value you are looking for, so it requires O(n) time.
For an unbalanced tree, it might help if you visualize the worst case of being out of balance in which every node has exactly one child except for the leaf—essentially a linked list. If you have a completely (or mostly) unbalanced BST, searching will take O(n) time, just like a linked list.
If the unsorted binary tree is unbalanced, it still has n nodes and they are still unsorted, so it still takes O(n) time.

Sorting array where many of the keys are identical - time complexity analysis

The question goes like this:
Given an array of n elements where elements are same. Worst case time complexity of sorting the array (with RAM model assumptions) will be:
So, I thought to use selection algorithm in order to find the element whose size is the , call it P. This should take O(n). Next, I take any element which doesn't equal this element and put it in another array. In total I will have k=n-n^(2001/2002) elements. Sorting this array will cost O(klog(k)) which equals O(nlogn). Finally, I will find the max element which is smaller than P and the min element which is bigger than P and I can sort the array.
All of it takes O(nlogn).
Note: if , then we can reduce the time to O(n).
I have two question: is my analysis correct? Is there any way to reduce time complexity? Also, what is the RAM model assumptions?
Thanks!
Your analysis is wrong - there is no guarantee that the n^(2001/2002)th-smallest element is actually one of the duplicates.
n^(2001/2002) duplicates simply don't constitute enough of the input to make things easier, at least in theory. Sorting the input is still at least as hard as sorting the n - n^(2001/2002) = O(n) other elements, and under standard comparison sort assumptions in the RAM model, that takes at least O(n*log(n)) worst-case time.
(For practical input sizes, n^(2001/2002) duplicates would be at least 98% of the input, so isolating the duplicates and sorting the rest would be both easy and highly efficient. This is one of those cases where the asymptotic analysis doesn't capture the behavior we care about in practice.)