Is NSMutableArray really a good backing store for stacks or queues? - objective-c

I've read somewhere that NSMutableArray will have O(1) performance instead of O(n) when elements are added/removed from the ends of the array (e.g. removeAtObject:0 or removeLastObject) which makes it suitable for use as a stack or queue – negating the need to create a LinkedList implementation for those container types.
Is it really the case? If so, how Apple managed to do this? If not, is there any evidence showing that the time taken to add/remove elements at either end of NSMutableArray instances increases as the number of elements in the array increase?
PS: Since NSMutableArray is essentially CFArray (it's "pure-C" counterpart), and the source code to CFArray is open, it should be possible to inspect its inner workings.

_NSArrayM (which is used instead of CFArray for most NSArrays) is currently an array-deque, which does provide amortized O(1) push/pop at both ends
(This is not guaranteed to be this way on any past or future OS version. NSArrayM itself is quite new for example)

CFArray/CFMutableArray (and by extension, NSArray/NSMutableArray) have very loose performance guarantees---they certainly don't guarantee O(1) insert/delete performance.
From CFArray.h (emphasis added):
Computational Complexity
The access time for a value in the array is
guaranteed to be at worst O(lg N) for any implementation, current and
future, but will often be O(1) (constant time). Linear search
operations similarly have a worst case complexity of O(N*lg N),
though typically the bounds will be tighter, and so on. Insertion or
deletion operations will typically be linear in the number of values
in the array, but may be O(N*lg N) clearly in the worst case in some
implementations. There are no favored positions within the array for
performance; that is, it is not necessarily faster to access values
with low indices, or to insert or delete values with high indices, or
whatever.
Core Foundation/Foundation doesn't currently provide any data structures that model the performance of a linked list.

Might be worth using Obj-C++ and use any of the STL/boost containers if the datastore is used on its own (i.e. not used as backing store for tree/array controllers).

Related

In CUDA programming, is atomic function faster than reducing after calculating the intermediate results?

Atomic functions (such as atomic_add) are widely used for counting or performing summation/aggregation in CUDA programming. However, I can not find information about the speed of atomic functions compared with ordinary global memory read/write.
Consider the following task, where we want to calculate a floating-point array with 256K elements. Each element is the sum over 1000 intermediate variables which is calculated first. One approach is to use atomic_add; While another approach is to use a 256K*1000 temporary array for the intermediate results and then to reduce this array (by taking summation).
Is the first approach using atomic function faster than the second?
In your specific case, even without you providing a concrete program, one does not need to know anything about the difference in latency or in bandwidth between atomic and non-atomic operations to rule out both your approaches: They are both quite inefficient.
You should have single blocks handling single output variables (or a small number of output variables), so that the sum of each 1,000 intermediate variables is not performed via global memory. You may want to read the "classic" presentation by Mark Harris:
Optimizing Parallel Reduction in CUDA
to get the basics. There have been improvements over this in recent years, due to newer hardware capabilities. For a more recent actual implementation, see the CUB library's block reduction primitive.
Also relevant: CUDA: how to sum all elements of an array into one number within the GPU?
If you implement it this way, each output element will only be written to once. And even if the computation of the 1,000 intermediates somehow needs to be distributed among multiple blocks, for whatever reason you have not shared in the question - you should still distribute it over a smaller number, rather than 1,000, so that the global-memory writes of the result take up a small enough fraction of the total computation time, that it is not worth bothering with something other than an atomic addition.

Implement an iterator on a binary heap

I am looking for a way to implement an iterator on binary heaps (maximum or minimum).
That is, by using it’s nextNode() function for the i-th time, can get the i-th (greater or smaller) element in the heap.
Note that this operation happens without actually extracting the heap’s root!
My initial thoughts were:
Actually extract i elements, push them into a stack, and then insert them back into the heap after getting the i-th value. This takes O(i*log(n)) for each function call.
Keep an auxiliary sorted data structure, which can allow to lookup the next value in O(1), however updates would take O(n).
I understand these approaches eliminate the benefits of using heaps, so I’m looking for a better approach.
It's not clear what the use-case for this is, so it's hard to say what would make a solution viable, or better than any other solution.
That said, I suggest a small alteration to the general "extract and sort" ideas already thrown around: If we're fine making changes to the data structure, we can do our sorting in place.
The basic implementation suggested on Wikipedia is a partially sorted list under-the-hood. We can pay a (hopefully) one-time O(n log(n)) cost to sort our heap when the first time next() is called, after which next is O(1). Critically, a fully-sorted list is still a valid heap.
Furthermore, if you consider the heapsort algorithm, you can start at stage two, because you're starting with a valid heap.

Does add method of LinkedList has better performance speed than the one of ArrayList

I am writting a program in java for my application and i am concerned about speed performance . I have done some benchmarking test and it seems to me the speed is not good enough. I think it has to do with add ang get method of the arraylist since when i use jvm and press snapshot it tells me that it takes more seconds add and get method of arraylist.
I have read some years ago when i tool OCPJP test that if you want to have a lot of add and delete use LinkedList but if you want fast iteration use ArrayList. In other words use ArrayList when you will use get method and LinkedList when you will use add method and i have done that .
I am not sure anymore if this is right or not?!
I would like anybody to give me an advise if i have to stick with that or is there any other way how can i improve my performance.
I think it has to do with add ang get method of the arraylist since when i use jvm and press snapshot it tells me that it takes more seconds add and get method of arraylist
It sounds like you have used a profiler to check what the actual issues are -- that's the first place to start! Are you able to post the results of the analysis that might, perhaps, hint at the calling context? The speed of some operations differ between the two implementations as summarized in other questions. If the calls you see are really called from another method in the List implementation, you might be chasing the wrong thing (i.e. calling insert frequently near one end of an ArrayList that can cause terrible performance).
In general performance will depend on the implementation, but when running benchmarks myself with real-world conditions I have found that ArrayList-s generally fit my use case better if able to size them appropriately on creation.
LinkedList may or may not keep a pool of pre-allocated memory for new nodes, but once the pool is empty (if present at all) it will have to go allocate more -- an expensive operation relative to CPU speed! That said, it only has to allocate at least enough space for one node and then tack it onto the tail; no copies of any of the data are made.
An ArrayList exposes the part of its implementation that pre-allocates more space than actually required for the underlying array, growing it as elements are added. If you initialize an ArrayList, it defaults to an internal array size of 10 elements. The catch is that when the list outgrows that initially-allocated size, it must go allocate a contiguous block of memory large enough for the old and the new elements and then copy the elements from the old array into the new one.
In short, if you:
use ArrayList
do not specify an initial capacity that guarantees all items fit
proceed to grow the list far beyond its original capacity
you will incur a lot of overhead when copying items. If that is the problem, over the long run that cost should be amortized across the lack of future re-sizing ... unless, of course, you repeat the whole process with a new list rather than re-using the original that has now grown in size.
As for iteration, an array is composed of a contiguous chunk of memory. Since many items may be adjacent, fetches of data from main memory can end up being much faster than the nodes in a LinkedList that could be scattered all over depending on how things get laid out in memory. I'd strongly suggest trusting the numbers of the profiler using the different implementations and tracking down what might be going on.

Basic Queue Optimizations

How would one optimize a queue for the typical:
access / store
memory usage
i'm not sure of anyway to reduce memory besides trying to run a compression algorithm on it, but that would take quite a deal of store time as a tradeoff - one would have to recompress everything I think.
As such I'm thinking the typical linked list with pointers.... a circle queue?
Any ideas?
Thanks
Edit: regardless of what is above; how does one make the fastest/least memory intensive basic queue structure essentially?
Linked lists are actually not very typical (except in functional languages or when newbies mistakenly think that a linked list is faster than a dynamic array). A dynamic circular buffer is more typical. The growing (and, optionally, shrinking) works slightly differently than in a dynamic array: if the "data holding part" crosses the end of the array, the data should be copied to the new space in such a way that it remains contiguous (simply extending the array would create a gap in the middle of the data).
As usual, it has some advantages and some drawbacks.
Drawbacks:
slightly more complicated implementation
not suitable for lock-free synchronization
Advantages:
more compact: in the worst case (when it just grew or is just about to shrink but hasn't yet) it has a space overhead of about 100%, a singly linked list almost always has an overhead of 100% or more (unless the data elements are larger than a pointer) and a doubly linked list is even worse.
cache efficient: reading happens close to previous reading, writing happens close to previous writing. So cache misses are rare, and when they do occur, they read data that is mostly relevant (or in the case of writing: they get a cache line that will probably be written to again soon). In a linked list, locality is poor and about half of every cache miss is wasted on the overhead (pointers to other nodes).
Usually these advantages outweigh the drawbacks.

What structure should I use to store these objects?

I am trying to implement something similar to the flight control game. There will be a set of objects representing planes that get spawned and removed 'randomly'. Individual planes can then get touched and will respond. The model should take a plane index as a parameter when something get touched.
My storage requirements are:
Need fast iteration over all elements
Need fast insertion / deletion
Need to look up and item quickly by index
What should I use? NSMutableArray, NSMutableSet ?
Should I store each object in two places? (e.g. Set for fast iteration, Array for fast lookup)?
NSMutableArray is good enough if you want to look up only by index. The problem may be the deletion which takes O(n). When you do not need the index persistence you may delete in O(1) by placing the last item to the item deleted and shorten the array by 1.
Storing at two places would be slow in this case, because it would not bring any advantage in searching speed, but would require to maintain two containers.
Storing in 2 places seems silly. An Array should be fine, with o(n) iteration, o(1) look up by index. I am not familiar with objective-c to know the deletion or insertion speed, but both should be plenty fast if some system level array copy facilities are used.