Does add method of LinkedList has better performance speed than the one of ArrayList - arraylist

I am writting a program in java for my application and i am concerned about speed performance . I have done some benchmarking test and it seems to me the speed is not good enough. I think it has to do with add ang get method of the arraylist since when i use jvm and press snapshot it tells me that it takes more seconds add and get method of arraylist.
I have read some years ago when i tool OCPJP test that if you want to have a lot of add and delete use LinkedList but if you want fast iteration use ArrayList. In other words use ArrayList when you will use get method and LinkedList when you will use add method and i have done that .
I am not sure anymore if this is right or not?!
I would like anybody to give me an advise if i have to stick with that or is there any other way how can i improve my performance.

I think it has to do with add ang get method of the arraylist since when i use jvm and press snapshot it tells me that it takes more seconds add and get method of arraylist
It sounds like you have used a profiler to check what the actual issues are -- that's the first place to start! Are you able to post the results of the analysis that might, perhaps, hint at the calling context? The speed of some operations differ between the two implementations as summarized in other questions. If the calls you see are really called from another method in the List implementation, you might be chasing the wrong thing (i.e. calling insert frequently near one end of an ArrayList that can cause terrible performance).
In general performance will depend on the implementation, but when running benchmarks myself with real-world conditions I have found that ArrayList-s generally fit my use case better if able to size them appropriately on creation.
LinkedList may or may not keep a pool of pre-allocated memory for new nodes, but once the pool is empty (if present at all) it will have to go allocate more -- an expensive operation relative to CPU speed! That said, it only has to allocate at least enough space for one node and then tack it onto the tail; no copies of any of the data are made.
An ArrayList exposes the part of its implementation that pre-allocates more space than actually required for the underlying array, growing it as elements are added. If you initialize an ArrayList, it defaults to an internal array size of 10 elements. The catch is that when the list outgrows that initially-allocated size, it must go allocate a contiguous block of memory large enough for the old and the new elements and then copy the elements from the old array into the new one.
In short, if you:
use ArrayList
do not specify an initial capacity that guarantees all items fit
proceed to grow the list far beyond its original capacity
you will incur a lot of overhead when copying items. If that is the problem, over the long run that cost should be amortized across the lack of future re-sizing ... unless, of course, you repeat the whole process with a new list rather than re-using the original that has now grown in size.
As for iteration, an array is composed of a contiguous chunk of memory. Since many items may be adjacent, fetches of data from main memory can end up being much faster than the nodes in a LinkedList that could be scattered all over depending on how things get laid out in memory. I'd strongly suggest trusting the numbers of the profiler using the different implementations and tracking down what might be going on.

Related

Is NSMutableArray really a good backing store for stacks or queues?

I've read somewhere that NSMutableArray will have O(1) performance instead of O(n) when elements are added/removed from the ends of the array (e.g. removeAtObject:0 or removeLastObject) which makes it suitable for use as a stack or queue – negating the need to create a LinkedList implementation for those container types.
Is it really the case? If so, how Apple managed to do this? If not, is there any evidence showing that the time taken to add/remove elements at either end of NSMutableArray instances increases as the number of elements in the array increase?
PS: Since NSMutableArray is essentially CFArray (it's "pure-C" counterpart), and the source code to CFArray is open, it should be possible to inspect its inner workings.
_NSArrayM (which is used instead of CFArray for most NSArrays) is currently an array-deque, which does provide amortized O(1) push/pop at both ends
(This is not guaranteed to be this way on any past or future OS version. NSArrayM itself is quite new for example)
CFArray/CFMutableArray (and by extension, NSArray/NSMutableArray) have very loose performance guarantees---they certainly don't guarantee O(1) insert/delete performance.
From CFArray.h (emphasis added):
Computational Complexity
The access time for a value in the array is
guaranteed to be at worst O(lg N) for any implementation, current and
future, but will often be O(1) (constant time). Linear search
operations similarly have a worst case complexity of O(N*lg N),
though typically the bounds will be tighter, and so on. Insertion or
deletion operations will typically be linear in the number of values
in the array, but may be O(N*lg N) clearly in the worst case in some
implementations. There are no favored positions within the array for
performance; that is, it is not necessarily faster to access values
with low indices, or to insert or delete values with high indices, or
whatever.
Core Foundation/Foundation doesn't currently provide any data structures that model the performance of a linked list.
Might be worth using Obj-C++ and use any of the STL/boost containers if the datastore is used on its own (i.e. not used as backing store for tree/array controllers).

Basic Queue Optimizations

How would one optimize a queue for the typical:
access / store
memory usage
i'm not sure of anyway to reduce memory besides trying to run a compression algorithm on it, but that would take quite a deal of store time as a tradeoff - one would have to recompress everything I think.
As such I'm thinking the typical linked list with pointers.... a circle queue?
Any ideas?
Thanks
Edit: regardless of what is above; how does one make the fastest/least memory intensive basic queue structure essentially?
Linked lists are actually not very typical (except in functional languages or when newbies mistakenly think that a linked list is faster than a dynamic array). A dynamic circular buffer is more typical. The growing (and, optionally, shrinking) works slightly differently than in a dynamic array: if the "data holding part" crosses the end of the array, the data should be copied to the new space in such a way that it remains contiguous (simply extending the array would create a gap in the middle of the data).
As usual, it has some advantages and some drawbacks.
Drawbacks:
slightly more complicated implementation
not suitable for lock-free synchronization
Advantages:
more compact: in the worst case (when it just grew or is just about to shrink but hasn't yet) it has a space overhead of about 100%, a singly linked list almost always has an overhead of 100% or more (unless the data elements are larger than a pointer) and a doubly linked list is even worse.
cache efficient: reading happens close to previous reading, writing happens close to previous writing. So cache misses are rare, and when they do occur, they read data that is mostly relevant (or in the case of writing: they get a cache line that will probably be written to again soon). In a linked list, locality is poor and about half of every cache miss is wasted on the overhead (pointers to other nodes).
Usually these advantages outweigh the drawbacks.

How to efficiently gather data from threads in CUDA?

I have a application that solves a system of equations in CUDA, I know for sure that each thread can find up to 4 solutions, but how can I copy then back to the host?
I'm passing a huge array with enough space to all threads store 4 solutions (4 doubles for each solution), and another one with the number of solutions per thread, however that's a naive solution, and is the current bottleneck of my kernel.
I really like to optimize this. The main problem is concatenate a variable number of solutions per thread in a single array.
The functionality you're looking for is called stream compaction.
You probably do need to provide an array that contains room for 4 solutions per thread because attempting to directly store the results in a compact form is likely to create so many dependencies between the threads that the performance gained in being able to copy less data back to the host is lost by a longer kernel execution time. The exception to this is if almost all of the threads find no solutions. In that case, you might be able to use an atomic operation to maintain an index into an array. So, for each solution that is found, you would store it in an array at an index and then use an atomic operation to increase the index. I think it would be safe to use atomicAdd() for this. Before storing a result, the thread would use atomicAdd() to increase the index by one. atomicAdd() returns the old value, and the thread can store the result using the old value as the index.
However, given a more common situation, where there's a fair number of results, the best solution will be to perform a compacting operation as a separate step. One way to do this is with thrust::copy_if. See this question for some more background.

What structure should I use to store these objects?

I am trying to implement something similar to the flight control game. There will be a set of objects representing planes that get spawned and removed 'randomly'. Individual planes can then get touched and will respond. The model should take a plane index as a parameter when something get touched.
My storage requirements are:
Need fast iteration over all elements
Need fast insertion / deletion
Need to look up and item quickly by index
What should I use? NSMutableArray, NSMutableSet ?
Should I store each object in two places? (e.g. Set for fast iteration, Array for fast lookup)?
NSMutableArray is good enough if you want to look up only by index. The problem may be the deletion which takes O(n). When you do not need the index persistence you may delete in O(1) by placing the last item to the item deleted and shorten the array by 1.
Storing at two places would be slow in this case, because it would not bring any advantage in searching speed, but would require to maintain two containers.
Storing in 2 places seems silly. An Array should be fine, with o(n) iteration, o(1) look up by index. I am not familiar with objective-c to know the deletion or insertion speed, but both should be plenty fast if some system level array copy facilities are used.

What causes page fault and how to minimize them?

When examining a process in Process Explorer, what does it mean when there are several page faults? The application is processing quite a bit of data and the UI is not very responsive. Are there optimizations to the code that could reduce or eliminate page faults? Would increasing the physical RAM of the system make a difference?
http://en.wikipedia.org/wiki/Page_fault
Increasing the physical RAM on your machine could result in fewer page faults, although design changes to your application will do much better than adding RAM. In general, having a smaller memory footprint, and having things that will often be accessed around the same time be on the same page will decrease the number of page faults. It can, also, be helpful to try to do everything you can with some bit of data in memory all at once so that you don't need to access it many different times, which may cause page faults (aka thrashing).
It might also be helpful to make sure that memory that is accessed after each other is near to each other (eg if you have some objects, place them in an array) if these objects have lots of data that is very infrequently used, place it in another class and make the first class have a reference to the second one. This way you will use less memory most of the time.
A design option would be to write a memory cache system, lazy creating memory (create on demand). such cache would have a collection of pre-allocated memory chunks, accessed by their size. For example, an array of N lists, each list having M buffers.each list is responsible to bring you memory in a certain size range. (for example, from each list bringing you memory in the range of 2^i (i = 0..N-1). even if you want to use less then 2^i, you just dont use the extra memory in the buffer.
this would be a tradeoff of small memory waste, vs caching and less page faults.
another option is to use nedmalloc
good luck
Lior