Implement an iterator on a binary heap - iterator

I am looking for a way to implement an iterator on binary heaps (maximum or minimum).
That is, by using it’s nextNode() function for the i-th time, can get the i-th (greater or smaller) element in the heap.
Note that this operation happens without actually extracting the heap’s root!
My initial thoughts were:
Actually extract i elements, push them into a stack, and then insert them back into the heap after getting the i-th value. This takes O(i*log(n)) for each function call.
Keep an auxiliary sorted data structure, which can allow to lookup the next value in O(1), however updates would take O(n).
I understand these approaches eliminate the benefits of using heaps, so I’m looking for a better approach.

It's not clear what the use-case for this is, so it's hard to say what would make a solution viable, or better than any other solution.
That said, I suggest a small alteration to the general "extract and sort" ideas already thrown around: If we're fine making changes to the data structure, we can do our sorting in place.
The basic implementation suggested on Wikipedia is a partially sorted list under-the-hood. We can pay a (hopefully) one-time O(n log(n)) cost to sort our heap when the first time next() is called, after which next is O(1). Critically, a fully-sorted list is still a valid heap.
Furthermore, if you consider the heapsort algorithm, you can start at stage two, because you're starting with a valid heap.

Related

To what extent shall we optimize time complexity?

Theory vs practice here.
Regarding time complexity, and I have a conceptual question that we didn't get to go deeper into in class.
Here it is:
There's a barbaric BROOT force algorithm, O(n^3)... and we got it down o O(n) and it was considered good enough. If we dive in deeper, it is actually O(n)+O(n), two separate iterations of the input. I came up with another way which was actually O(n/2). But those two algorithms are considered to be the same since both are O(n) and as n reaches infinity, it makes no difference, so not necessary at all once we reach O(n).
My question is:
In reality, in practice, we always have a finite number of inputs (admittedly occasionally in the trillions). So following the time complexity logic, O(n/2) is four times as fast as O(2n). So if we can make it faster, why not?
Time complexity is not everything. As you already noticed, the Big-Oh can hide a lot and also assumes that all operations cost the same.
In Practice you should always try to find a fast/the fastest solution for your problem. Sometimes this means that you use a algorithm with a bad complexity but good constants if you know that your problem is always small. Depending on your use case, you also want to implement optimizations that utilize hardware properties like cache optimizations.

Iterative deepening in minimax - sorting all legal moves, or just finding the PV-move then using MVV-LVA?

After reading the chessprogramming wiki and other sources, I've been confused about what the exact purpose of iterative deepening. My original understanding was the following:
It consisted of minimax search performed at depth=1, depth=2, etc. until reaching the desired depth. After a minimax search of each depth, sort the root-node moves according to the results from that search, to make for optimal move ordering in the next search with depth+1, so in the next deeper search,the PV-move is searched, then the next best move, then the next best move after that, and so on.
Is this correct? Doubts emerged when I read about MVV-LVA ordering, specifically about ordering captures, and additionally, using hash tables and such. For example, this page recommends a move ordering of:
PV-move of the principal variation from the previous iteration of an iterative deepening framework for the leftmost path, often implicitly done by 2.
Hash move from hash tables
Winning captures/promotions
Equal captures/promotions
Killer moves (non capture), often with mate killers first
Non-captures sorted by history heuristic and that like
Losing captures
If so, then what's the point of sorting the minimax from each depth, if only the PV-move is needed? On the other hand, if the whole point of ID is the PV-move, won't it be a waste to search from every single minimax depth up till desired depth just to calculate the PV-move of each depth?
What is the concrete purpose of ID, and how much computation does it save?
Correct me if I am wrong, but I think you are mixing 2 different concepts here.
Iterative deepening is mainly used to set a maximum search time for each move. The AI will go deeper and deeper, and then when the decided time is up it returns the move from the latest depth it finished searching. Since each increase in depth leads to exponentially longer search times, searching each depth from e.g. 1 to 12 take almost the same time as only searching with depth 12.
Sorting the moves is done to maximize the effect of alpha-beta pruning. If you want an optimal alpha-beta pruning you look at the best move first. Which is of course impossible to know beforehand, but the points you stated above is a good guess. Just make sure that the sorting algorithm doens't slow down your recursive function, and by that removing the effect from the alhpa-beta.
Hope this helps and that I understood your question correctly.

Does add method of LinkedList has better performance speed than the one of ArrayList

I am writting a program in java for my application and i am concerned about speed performance . I have done some benchmarking test and it seems to me the speed is not good enough. I think it has to do with add ang get method of the arraylist since when i use jvm and press snapshot it tells me that it takes more seconds add and get method of arraylist.
I have read some years ago when i tool OCPJP test that if you want to have a lot of add and delete use LinkedList but if you want fast iteration use ArrayList. In other words use ArrayList when you will use get method and LinkedList when you will use add method and i have done that .
I am not sure anymore if this is right or not?!
I would like anybody to give me an advise if i have to stick with that or is there any other way how can i improve my performance.
I think it has to do with add ang get method of the arraylist since when i use jvm and press snapshot it tells me that it takes more seconds add and get method of arraylist
It sounds like you have used a profiler to check what the actual issues are -- that's the first place to start! Are you able to post the results of the analysis that might, perhaps, hint at the calling context? The speed of some operations differ between the two implementations as summarized in other questions. If the calls you see are really called from another method in the List implementation, you might be chasing the wrong thing (i.e. calling insert frequently near one end of an ArrayList that can cause terrible performance).
In general performance will depend on the implementation, but when running benchmarks myself with real-world conditions I have found that ArrayList-s generally fit my use case better if able to size them appropriately on creation.
LinkedList may or may not keep a pool of pre-allocated memory for new nodes, but once the pool is empty (if present at all) it will have to go allocate more -- an expensive operation relative to CPU speed! That said, it only has to allocate at least enough space for one node and then tack it onto the tail; no copies of any of the data are made.
An ArrayList exposes the part of its implementation that pre-allocates more space than actually required for the underlying array, growing it as elements are added. If you initialize an ArrayList, it defaults to an internal array size of 10 elements. The catch is that when the list outgrows that initially-allocated size, it must go allocate a contiguous block of memory large enough for the old and the new elements and then copy the elements from the old array into the new one.
In short, if you:
use ArrayList
do not specify an initial capacity that guarantees all items fit
proceed to grow the list far beyond its original capacity
you will incur a lot of overhead when copying items. If that is the problem, over the long run that cost should be amortized across the lack of future re-sizing ... unless, of course, you repeat the whole process with a new list rather than re-using the original that has now grown in size.
As for iteration, an array is composed of a contiguous chunk of memory. Since many items may be adjacent, fetches of data from main memory can end up being much faster than the nodes in a LinkedList that could be scattered all over depending on how things get laid out in memory. I'd strongly suggest trusting the numbers of the profiler using the different implementations and tracking down what might be going on.

How is a hash map stored?

I have an upcoming interview and was looking through some technical interview questions and I came across this one. It is asking for the time complexity for the insertion and deletion functions of a hash map. The consensus seems to be that the time complexity is O(1) if the has map is distributed evenly but O(n) if they are all in the same pool.
I guess my question is how exactly are hash maps stored in memory? How would these 2 cases happen?
One answer on your linked page is:
insertion always would be O(1) if even not properly distributed (if we
make linked list on collision) but Deletion would be O(n) in worst
case.
This is not a good answer. A generalized answer to time complexity for a hashmap would come to a similar statement as the Wikipedia article on hash tables:
Time complexity
in big O notation
Average Worst case
Space O(n) O(n)
Search O(1) O(n)
Insert O(1) O(n)
Delete O(1) O(n)
To adress your question how hash maps are stored in memory: There are a number of "buckets" that store values in the average case, but must be expanded to some kind of list when a hash collision occurs. Good explanations of hash tables are the Wikipedia article, this SO question and this C++ example.
The time complexity table above is like this because in the average case, a hash map just looks up and stores single values, but collisions make everything O(n) in worst case, where all your elements share a bucket and the behaviour is similar to the list implementation you chose for that case.
Note that there are specialized implementations that adress the worst cases here, also described in the Wikipedia article, but each of them has other disadvantages, so you'll have to choose the best for your use case.

Is NSMutableArray really a good backing store for stacks or queues?

I've read somewhere that NSMutableArray will have O(1) performance instead of O(n) when elements are added/removed from the ends of the array (e.g. removeAtObject:0 or removeLastObject) which makes it suitable for use as a stack or queue – negating the need to create a LinkedList implementation for those container types.
Is it really the case? If so, how Apple managed to do this? If not, is there any evidence showing that the time taken to add/remove elements at either end of NSMutableArray instances increases as the number of elements in the array increase?
PS: Since NSMutableArray is essentially CFArray (it's "pure-C" counterpart), and the source code to CFArray is open, it should be possible to inspect its inner workings.
_NSArrayM (which is used instead of CFArray for most NSArrays) is currently an array-deque, which does provide amortized O(1) push/pop at both ends
(This is not guaranteed to be this way on any past or future OS version. NSArrayM itself is quite new for example)
CFArray/CFMutableArray (and by extension, NSArray/NSMutableArray) have very loose performance guarantees---they certainly don't guarantee O(1) insert/delete performance.
From CFArray.h (emphasis added):
Computational Complexity
The access time for a value in the array is
guaranteed to be at worst O(lg N) for any implementation, current and
future, but will often be O(1) (constant time). Linear search
operations similarly have a worst case complexity of O(N*lg N),
though typically the bounds will be tighter, and so on. Insertion or
deletion operations will typically be linear in the number of values
in the array, but may be O(N*lg N) clearly in the worst case in some
implementations. There are no favored positions within the array for
performance; that is, it is not necessarily faster to access values
with low indices, or to insert or delete values with high indices, or
whatever.
Core Foundation/Foundation doesn't currently provide any data structures that model the performance of a linked list.
Might be worth using Obj-C++ and use any of the STL/boost containers if the datastore is used on its own (i.e. not used as backing store for tree/array controllers).