What is the time complexity of Search in ArrayList? - arraylist

One interview question which I couldn't answer and couldn't find any relevant answers online.
I know the arraylist retrieve the data in constant time based on the indexes.
Suppose in an arraylist, there are 10000 data and the element is at 5000th location(We are not given the location), we have to search for a particular value( for eg integer 3 which happens to be on the 5000th index), for searching the value, we will have to traverse through the arraylist to find the value and it would take linear time right??
Because if we are traversing through the arraylist to find the data, it would take linear time and not constant time.
In short I want to know the internal working of contains method in which I have to check for the particular value and I don't have the index. It will have to traverse through the array to check for the particular value and it would take O(n) time right?
Thanks in advance.

I hope this is what you want to know about search in ArrayList:
Arrays are laid sequentially in memory. This means, if it is an array of integers that uses 4 bytes each, and starts at memory address 1000, next element will be at 1004, and next at 1008, and so forth. Thus, if I want the element at position 20 in my array, the code in get() will have to compute:
1000 + 20 * 4 = 1080
to have the exact memory address of the element. Well, RAM memory got their name of Random Access Memory because they are built in such way that they have a hierarchy of hardware multiplexers that allow them to access any stored memory unit (byte?) in constant time, given the address.
Thus, two simple arithmetic operations and one access to RAM is said to be O(1). See link to original answer.

Related

Optimizing a vb.net code that uses a very large string list, over 400 000 entries

I use a static list of unique strings T (french dictionary), with 402 325 entries. My code is playing Scrabble and uses a specialized construction called a DAGGAD to construct playable words and verifies that the words are actually in the list. Is there a way faster than list.indexof(T) to find if a word exists ? I looked at HashSet.Contains(T) but it does not use an index that I can use to retrieve a word. For example, for a given turn of play there could be thousands of valid solutions: I actually store only the index of the list, but with a HashSet I would not be able to do that and will have to store all words, which increases memory usage. In most cases solutions are found in one or two seconds, but in some cases (i.e. with blanks) it takes up to 15 seconds, and I need to reduce that if at all possible with VB !
As Craig suggested, using List.BinarySearch(T) on a sorted list of T improves the speed by around 10 fold. A Scrabble play with a blank letter is now taking no more than 1 or 2 seconds compared to 15 to 20 seconds when I was using IndexOf.

Time Complexity of 1-pass lookup given input size N**2

Given a list of lists, i.e.
[[1,2,3],[4,5,6],[7,8,9]]:
What is the time complexity of using nested For loops to see if each numeral from 1-9 is used once and only once? Furthermore, what would be the time complexity if the input is now a singular combined list, i.e. [1,2,3,4,5,6,7,8,9]?
What really matters is the size of the input, not the format. Either you have a list of 9 elements or 9 lists with 1 element, you still have 9 elements to be checked in the worst case.
The answer to the question, as stated, would be O(1), because you have a constant size input.
If what you mean is something like Given N elements what is the time complexity of checking if all number between 1 and N are present, then it would take linear time, i.e., O(N).
Indeed, an option is to use a hash table (e.g., a python set) and check if the element is already in the set, if not adding it. Note that in using this specific option you would get an expected (but not guaranteed, due to potential collisions) linear time complexity algorithm.

Algorithmic complexity: iterating over small bounded list

My question is sort of about semantics, as well as a little bit about theory vs practice.
Let's say you have a table of items that could be any amount of number. And let's say you have an array of visible items in the table (items on screen). The size of the visible cells array is limited by the size of the screen. This is a known value. Maybe it will vary from device to device and screen size to screen size, but it is safe to say that it will be a small number, like 20 or less.
Now, if you were to iterate over the visible items, theoretically this is a linear algorithm (iterating over a list of items), however my question is, from a practical software engineering standpoint, is it safe to consider/approximate this algorithm as a constant time algorithm?
Basically, O(n) for n<20 approximates 20 * O(1).
What do you all think?
It's perfectly correct to consider iterating over 20 or less items as a constant time operation.
Just notice, that the number of visible items is independent of the number of all items in the whole array which is the size of an input. The time complexity is always a function of size of the input.
For example, if you double the size of the input, you still iterate over 20 visible items.

Using multiple threads for faster execution

Approximate program behavior:
I have a map image with data associated with the map indicated by RGB index. The data has been populated into an MS Access database. I imported the information in the database into my program as an array and sorted them to go in the order I want the program to run.
I want the program to find the nearest pixel that has a different color from the incumbent pixel being compared. (Colors are stored as string attributes of object Pixel)
First question: Should I use integers to represent my colors instead of string? Would this make the comparison function run significantly faster?
In order to find the nearest pixel of different color, the program begins with all 8 adjacent pixels around the incumbent. If a nonMatch is not found, it then continues onto the next "degree", and in this fashion, it spirals out from the incumbent pixel until it hits a nonMatch. When found, the color of the nonMatch is saved as an attribute of incumbent. After I find the nonMatch for each of the Pixels, the data is re-inserted into the database
The program accomplishes what I want in the manner I've written it, but it is very very slow. After 24 hours, I am only about 3% through with execution.
Question Two: Does my program behavior sound about right? Is this algorithm you would use if you had to accomplish this task?
Question Three: Would it be appropriate for me to use threads in order to finish execution of the program faster? How exactly does that work? (I am brand new to threads, but know a little of the syntax)
Question Four: Would it be more "intelligent" for my program to find the nonMatch for each pixel and insert it into the database immediately after finding it? (I'm making a guess that this would be good in multi-threading, because while one record is accessing the database (to insert), another record is accessing the array of pixels (shared global variable in program).
Question Five: If threading is a good idea, I'm guessing I would split the records up into more manageable chunks (i.e. quarters), and have each thread run the same functions for their specified number of records? Am I close at all?
Please let me know if I can clarify or provide code samples, I just figured that this is more of a conceptual topic so do not want to overburden the post.
1.) Yes, integers compare much faster than strings. Additionally the y use much less memory
2.) I would adapt the algorithm in this way:
E.g.: #1: Let's say, for pixel(87,23) you found the nearest nonMatch to be (88,24) at degree=1 - you can immediately invert the relation and record, that the nearest nonMatch to (88,24) is (87,23). On degree=1 you finished 2 pixels with 1 search.
E.g. #2: Let's say, for pixel (17,18) you found the nearest nonMatch to be (17,20) at degree=2. You can immediately record, that all pixels, that border on both (16,19), (17,19) and (18,19) have the nearest noMatch (17,20) at degree=1, and that one of them is the nearest noMatch to (17,20). On degree=2 (or higher), you finished 5 pixels with 1 search.
3.) Using threads is a two-sided sword: You can do searches in parallel, but you need locking if you write to your array. So this depends on how many CPU cores you can throw at the problem. If this is 3 or more, threads will surely speed up the search.
4.) The results from 2.) make it necessary to mark a pixel as "done" in your array, as you might have finished up to 5 pixels with 1 search. I recommend you put those into a queue and use a dedicated thread to write the queue back into the database: MS Access can't handle concurrent updates, so a single database writer thread looks like a good idea.
5.) I recommend you NOT chunk up the array: You will run into problems with pixels on the edges of a chunk having their nearest nonMatch in a different chunk. Instead if you use e.g. 4 Threads, let them run 1.) From NW corner E, then S 2.) From SE Corner W, then N 3.) From NE Corner S, then W 4. From SW Corner N, then E
Yes. Using a integer would make it much faster
You can reuse the work you have done for previous pixel. Eg. If (a,b) is the nearest non-equal pixel of (x,y), it is likely that points around (x,y) might also have (a,b) as the nearest non-equal pixel
You can use different threads to work on different pixels instead of dividing searching for one pixel
IMHO, steps 1&2 should make your program much faster and you might not need multi-threading.
Yes, I'd convert colour strings to Integers for speed, or even Color structures if you intend to display them on the screen.
Don't work directly with the database if you can avoid it. Copy the necessary data out of the database into an array before you start, and copy your results back when you're finished.

Knapsack algorithm for time

I am using VB.NET and I am trying to come up with some algorithm or some pseudo-code, or some VB.NET code that will let me do the following (hopefully I can explain this well):
I have 2 collection objects, Cob1 and Cob2. These collection objects store objects that implement an interface called ICob. ICob has 3 properties. A boolean IsSelected property, a property called Length, which returns a TimeSpan, and a Rating property, which is a short integer.
OK, now Cob1 has about 100 objects stored in the collection and Cob2 is an empty collection. What I want to do is select objects from Cob1 and copy them over to Cob2. I want the following rules obeyed when selecting the objects though:
I want to be able to specify a timespan and I want enough objects to be selected to fit into the timespan I specify (based on the Length property). So for example, if I pass a 10 minute timespan to my function, it should pick enough objects that fill the entire 10 minute window, or come as close to filling it as possible.
No objects should be selected twice.
Objects that have a higher rating (via the Rating property) should have a better chance at being picked then other objects.
No object that has been selected in the last 30 minutes should be selected again (so that each object will eventually get selected at least once), regardless of rating.
Can anyone give me some tips on how to achieve this? The tips can be in the form of mental processes, VB.NET example code, Pseudo-code or just about anything else that might help me.
Thanks
EDIT:
Maybe It would help to everyone if I revealed what I'm trying to do in real life.
I am writing software for a radio station that will automatically select the music and advertisments to play, kinda of like a computerized program manager.
The length represents the length of the sound byte (either a song or an advertisement) and the rating is just that. If the song is popular, it gets more airtime. If an advertiser pays more money, then it also gets more airtime.
So my program should pick songs that play for 20 minutes or so, then pick some advertisements to play for about 5 minutes or so.
Hopefully this helps a little.
Thanks for the input from everyone!
Alan
Note that:
The restriction 1 is from the classical knapsack problem, which works on sets, as requested by restriction 2.
Restriction 3 is rather vague. It is better to have higher value or higher coverage of the lifespan? If you don't specify a objective function to maximaze (or, to be precise, there are two: lifespan itself and rate), there are some pareto optimal solutions.
Restriction 4 is implementable by making a map object -> last time selected., in the form of black list.
Long story short: first I'd filter the set by blacklisting the object by restriction 4, and then apply a knapsack algorithm.
In order to implement 4., I believe you'll need to save the date/time when the Cob was last selected. Then, I'd do it in the following steps:
Filter out the ones that have not been selected within the last 30 minutes.
Sort by rating and set your "cursor" on the first item in the list.
Check the item's timespan. If short enough to fit in the specified time, select it. If not, goto 3 and proceed with the next item.
Check if your timespan has been filled. If yes, you are done. If no, goto 3 and proceed with the next item.