Need Help Studying Running Times - binary-search

At the moment, I'm studying for a final exam for a Computer Science course. One of the questions that will be asked is most likely a question on how to combine running times, so I'll give an example.
I was wondering, if I created a program that preprocessed inputs using Insertion Sort, and then searched for a value "X" using Binary Search, how would I combine the running times to find the best, worst, and average case time complexities of the over-all program?
For example...
Insertion Sort
Worst Case O(n^2)
Best Case O(n)
Average Case O(n^2)
Binary Search
Worst Case O(logn)
Best Case O(1)
Average Case O(logn)
Would the Worst case be O(n^2 + logn), or would it be O(n^2), or neither?
Would the Best Case be O(n)?
Would the Average Case be O(nlogn), O(n+logn), O(logn), O(n^2+logn), or none of these?
I tend to over-think solutions, so if I can get any guidance on combining running times, it would be much appreciated.
Thank you very much.

You usually don't "combine" (as in add) the running times to determine the overall efficiency class rather, you take the one that takes the longest for each worst, average, and best case.
So if you're going to perform insertion sort and then do a binary search after to find an element X in an array, the worst case is O(n^2) and the best case is O(n) -- all from insertion sort since it takes the longest.

Based on my limited study, (we haven't reached Amortization so this might be where Jim has the rest correct), but basically you just go based on whoever is slowest of the overall algorithm.
This seems to be a good book on the subject of Algorithms (I haven't got much to compare to):
http://www.amazon.com/Introduction-Algorithms-Third-Thomas-Cormen/dp/0262033844/ref=sr_1_1?ie=UTF8&qid=1303528736&sr=8-1
Also MIT has a full course on the Algorithms on their site here is the link for that too!
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/
I've actually found it helpful, it might not answer specifically your question, but I think it will help get you more confident seeing some of the topics explained a few times.

Related

To what extent shall we optimize time complexity?

Theory vs practice here.
Regarding time complexity, and I have a conceptual question that we didn't get to go deeper into in class.
Here it is:
There's a barbaric BROOT force algorithm, O(n^3)... and we got it down o O(n) and it was considered good enough. If we dive in deeper, it is actually O(n)+O(n), two separate iterations of the input. I came up with another way which was actually O(n/2). But those two algorithms are considered to be the same since both are O(n) and as n reaches infinity, it makes no difference, so not necessary at all once we reach O(n).
My question is:
In reality, in practice, we always have a finite number of inputs (admittedly occasionally in the trillions). So following the time complexity logic, O(n/2) is four times as fast as O(2n). So if we can make it faster, why not?
Time complexity is not everything. As you already noticed, the Big-Oh can hide a lot and also assumes that all operations cost the same.
In Practice you should always try to find a fast/the fastest solution for your problem. Sometimes this means that you use a algorithm with a bad complexity but good constants if you know that your problem is always small. Depending on your use case, you also want to implement optimizations that utilize hardware properties like cache optimizations.

Best performance approach to find all combinations of numbers from a given set(>80 elements) to reach a given final sum

Before I am directed to go and keep searching instead of asking this general question, please understand my question in detail.
We have the algorithm that does it in pl sql. however it is not performing well when the set of numbers given has large number of elements. for example it works well when the set has around 22 elements. However after that the performance dies.
We are working with oracle database 12c and this combination of number searching is part of one of our applications and is pulled from oracle tables into associative arrays for finding combinations. example final sum required = 30
set of elements to choose from {1,2,4,6,7,2,8,10,5} and so forth.
My question in gist :
Is pl sql realistically suited to write such algo ? Should we be looking at another programming language/ technology/ server capacity/ tool to handle larger set of more than 80 elements ?
Oracle is not suitable for solving this problem because databases are not suited for it. In fact, I think this problem is an NP-complete problem, so there are no truly efficient solutions.
The approach in a database is to generate all possible combinations up to a certain size, and then filter down to the ones that match your sum. This is inherently an exponential algorithm. There may be some heuristic algorithms that come close to solving the problem, but this is an inherently hard problem.
Unless you can find some special condition to shrink the problem you will never solve it. Don't worry about the language implementation until you know this problem is even theoretically possible.
As others have mentioned, this problem grows exponentially. Solving it for 22 elements is not even close to solving it for 80.
A dynamic programming algorithm may be able to quickly find if there is one solution to a subset sum problem. But finding all solutions requires testing 2^80 sets.
2^80 = 1,208,925,819,614,629,174,706,176. That's 1.2e24.
That's a big number. Let's make a wildly optimistic assumption that a processor can test one billion sets a second. Buy a million of them and you can find your answer in about 38 years. Maybe a quantum computer can solve it more quickly some day.
It might help to explain exactly what you're trying to do. Unless there is some special condition, some way to eliminate most of the processing and avoid a brute-force solution, I don't see any hope for solving this problem. Perhaps this is a question for the Theoretical Computer Science site.

How is a hash map stored?

I have an upcoming interview and was looking through some technical interview questions and I came across this one. It is asking for the time complexity for the insertion and deletion functions of a hash map. The consensus seems to be that the time complexity is O(1) if the has map is distributed evenly but O(n) if they are all in the same pool.
I guess my question is how exactly are hash maps stored in memory? How would these 2 cases happen?
One answer on your linked page is:
insertion always would be O(1) if even not properly distributed (if we
make linked list on collision) but Deletion would be O(n) in worst
case.
This is not a good answer. A generalized answer to time complexity for a hashmap would come to a similar statement as the Wikipedia article on hash tables:
Time complexity
in big O notation
Average Worst case
Space O(n) O(n)
Search O(1) O(n)
Insert O(1) O(n)
Delete O(1) O(n)
To adress your question how hash maps are stored in memory: There are a number of "buckets" that store values in the average case, but must be expanded to some kind of list when a hash collision occurs. Good explanations of hash tables are the Wikipedia article, this SO question and this C++ example.
The time complexity table above is like this because in the average case, a hash map just looks up and stores single values, but collisions make everything O(n) in worst case, where all your elements share a bucket and the behaviour is similar to the list implementation you chose for that case.
Note that there are specialized implementations that adress the worst cases here, also described in the Wikipedia article, but each of them has other disadvantages, so you'll have to choose the best for your use case.

Dynamic number of test cases in genetic programming?

When looking at Genetic programming papers, it seems to me that the number of test cases is always fixed. However, most mutations should (?) at every stage of the execution be very deleterious, i. e. make it obvious after one test case that the mutated program performs much worse than the previous one. What happens if you, at first, only try very few (one?) test case and look whether the mutation makes any sense?
Is it maybe so that different test cases test for different features of the solutions, and one mutation will probably improve only one of those features?
I don't know if I agree with your assumption that most mutations should be very deleterious, but you shouldn't care even if they were. Your goal is not to optimize the individuals, but to optimize the population. So trying to determine if a "mutation makes any sense" is exactly what genetic programming is supposed to do: i.e. eliminate mutations that "don't make sense." Your only "guidance" for the algorithm should come through the fitness function.
I'm also not sure what you mean with "test case", but for me it sounds like you are looking for something related to multi-objective-optimization (MOO). That means you try to optimize a solution regarding different aspects of the problem - therefore you do not need to mutate/evaluate a population for a specific test-case, but to find a multi objective fitness function.
"The main idea in MOO is the notion of Pareto dominance" (http://www.gp-field-guide.org.uk)
I think this is a good idea in theory but tricky to put into practice. I can't remember seeing this approach actually used before but I wouldn't be surprised if it has.
I presume your motivation for doing this is to improve the efficiency of the applying the fitness function - you can stop evaluation early and discard the individual (or set fitness to 0) if the tests look like they're going to be terrible.
One challenge is to decide how many test cases to apply; discarding an individual after one random test case is surely not a good idea as the test case could be a real outlier. Perhaps terminating evaluation after 50% of test cases if the fitness of the individual was <10% of the best would probably not discard any very good individuals; on the other hand it might not be worth it given a lot of individuals will be of middle-of-the road fitness and might well only save a small proportion of the computation. You could adjust the numbers so you save more effort, but the more effort you try to save the more chances you have of genuinely good individuals being discarded by accident.
Factor in the extra time to taken to code this and possible bugs etc. and I shouldn't think the benefit would be worthwhile (unless this is a research project in which case it might be interesting to try it and see).
I think it's a good idea. Fitness evaluation is the most computational intense process in GP, so estimating the fitness value of individuals in order to reduce the computational expense of actually calculating the fitness could be an important optimization.
Your idea is a form of fitness approximation, sometimes it's called lazy evaluation (try searching these words, there are some research papers).
There are also distinct but somewhat overlapping schemes, for instance:
Dynamic Subset Selection (Chris Gathercole, Peter Ross) is a method to select a small subset of the training data set on which to actually carry out the GP algorithm;
Segment-Based Genetic Programming (Nailah Al-Madi, Simone Ludwig) is a technique that reduces the execution time of GP by partitioning the dataset into segments and using the segments in the fitness evaluation process.
PS also in the Brood Recombination Crossover (Tackett) child programs are usually evaluated on a restricted number of test cases to speed up the crossover.

Optimization algorithm question

This may be a simple question for those know-how guys. But I cannot figure it out by myself.
Suppose there are a large number of objects that I need to select some from. Each object has two known variables: cost and benefit. I have a budget, say $1000. How could I find out which objects I should buy to maximize the total benefit within the given budget? I want a numeric optimization solution. Thanks!
Your problem is called the "knapsack problem". You can read more on the wikipedia page. Translating the nomenclature from your original question into that of the wikipedia article, your problem's "cost" is the knapsack problem's "weight". Your problem's "benefit" is the knapsack problem's "value".
Finding an exact solution is an NP-complete problem, so be prepared for slow results if you have a lot of objects to choose from!
You might also look into Linear Programming. From MathWorld:
Simplistically, linear programming is
the optimization of an outcome based
on some set of constraints using a
linear mathematical model.
Yes, as stated before, this is the knapsack problem and I would choose to use linear programming.
The key to this problem is storing data so that you do not need to recompute things more than once (if enough memory is available). There are two general ways to go about linear programming: top-down, and bottom - up. This one is a bottom up problem.
(in general) Find base case values, what is the most optimal object to select for a small case. Then build on this. If we allow ourselves to spend more money what is the best combination of objects for that small increment in money. Possibilities could be taking more of what you previously had, taking one new object and replacing the old one, taking another small object that will still keep you under your budget etc.
Like I said, the main idea is to not recompute values. If you follow this pattern, you will get to a high number and find that in order to buy X amount of dollars worth of goods, the best solution is combining what you had for two smaller cases.