Comparing time complexities - time-complexity

Let's say we have an algorithm that goes through a list of n numbers twice and counts the number of the ones above 50 in one run and the ones below 50 in the other run and stores these in two variables.
If we change it to do the same in one run by incrementing not just one but either of the variables in each step, do we change time complexity of the algorithm? Do we consider the new one faster?
I know it will require less steps but not exactly sure about time complexity notation.
EDIT:
Pseudocode 1:
for (i = 0; i < TOTAL_NUMBERS; i++) {
if (numbers[i] >= 50) {
greaterThan50++;
}
}
for (i = 0; i < TOTAL_NUMBERS; i++) {
if (numbers[i] < 50) {
lessThan50++;
}
}
Pseudocode 2:
for (i = 0; i < TOTAL_NUMBERS; i++) {
if (numbers[i] >= 50) {
greaterThan50++;
}
else {
lessThan50++;
}
}

Something is simple if you can obtain the same result but with smallest expression posible. So in this case if you combine the update of the two counters in the same loop the algorithm will need less CPU cycles to be executed. You can also save time in the number validation cause you can just make one if / else statement to do what you need.
A pseudocode could look like this:
for (i = 0; i < TOTAL_NUMBERS; i++) {
if (numbers[i] >= 50) {
greaterThan50 ++;
}
else {
lessThan50 ++;
}
}
If you want to exclude the numbers that are equal 50 you can do something like:
for (int i = 0; i < TOTAL_NUMBERS; i++) {
if (numbers[i] > 50) {
greaterThan50 ++;
}
else if (numbers[i] < 50) {
lessThan50 ++;
}
}
As you might notice in this last example you are doing an extra validation so it will take an extra step, but this algorithm will still be much faster than looping through the list twice (and also simpler since it requires less lines of code and makes the code more readable and easy to understand).
I hope this helps with your question :)

Big O notation is a general statement for how a measurement (execution time in your case) changes as input grows. The first takes twice as long to execute, but both versions execution times grow linearly with regard to input size, so they're both O(n) algorithms.
more reading here: http://en.wikipedia.org/wiki/Big_O_notation

Related

Time complexity of this for loop

Could anyone explain me the time complexity of this method?
int sqrt = (int) Math.sqrt(n) + 1;
for (int i = 2; i < sqrt; i++) {
if (n % i == 0) {
return false;
}
}
return true;
I'm not really good in this subject and tried to find out some info online but could't find any explanation.
its On(1) - linear, as it its linearly proportional on i (or less due to early stopping from module check); this is typical for a one level for loop

Big-O of fixed loops

I was discussing some code during an interview and don't think I did a good job articulating one of my blocks of code.
I know (high-level) we are taught two for loops == O(n^2), but what happens when you make some assertions as part of the work that limit the work done to a constant amount.
The code I came up with was something like
String[] someVal = new String[]{'a','b','c','d'} ;// this was really - some other computation
if(someVal != 4) {
return false;
}
for(int i=0; i < someVal; i++){
String subString = someVal[i];
if(subString.length() != 8){
return false;
}
for(int j = 0; j < subString.length(); j++){
// do some other stuff
}
}
So there are two for loops, but the number of iterations become fixed because of the length check before proceeding.
for(int i=0; i < **4**; i++){
String subString = someVal[i];
if(subString.length() != 8){ return false }
for(int j = 0; j < **8**; j++){
// do some other stuff
}
}
I tried to argue this made it constant, but didn't do a great job.
Was I completely wrong or off-base?
Your early exit condition inside of the for loop is if(subString.length() != 8), so your second for loop is executed any time if the length exactly 8. This in fact makes the complexity of the second for loop constant, as it is not depending on the input size. But before your first for loop you have another early exit condition if(someVal != 4) making the first for loop also constant.
So yes, I would follow your argumentation, that the complete function has a constant big-O time complexity. Maybe repeating in the explanation that big-O always describes an upper bound complexity, which will never be crossed and constant time factors can be reduced to 1.
But keep in mind, that a constant complexity based on real world input could still be longer in execution time than a O(n) complexity, based on the size of n. If it would be a known pre-condition that n does not grow beyond a (low) given number, I would not argue about the Big-O complexity, but about overall expected runtime, where the second constant for loop could have a larger impact than expected by Big-O complexity analysis.

Selection sort implementation, I am stuck at calculating time complexity for number of swaps

static int count = 0;
for (int i = 0; i < arr.length; i++) {
for (int j = i + 1; j < arr.length; j++) {
if (arr[i] > arr[j]) {
swap(arr, i, j);
count++;
}
}
}
Is this the correct implementation for selection sort? I am not getting O(n-1) complexity for swaps with this implementation.
Is this the correct implementation for selection sort?
It depends, logically what you are doing is correct. It sort using "find the max/min value in the array". But, in Selection Sort, usually you didn't need more than one swap in one iteration. You just save the max/min value in the array, then at the end you swap it with the i-th element
I am not getting O(n-1) complexity for swaps
did you mean n-1 times of swap? yes, it happen because you swap every times find a larger value not only on the largest value. You can try to rewrite your code like this:
static int count=0;
static int maximum=0;
for(int i=0;i<arr.length-1;i++){
maximum = i;
for(int j=i+1;j<arr.length;j++){
if(arr[j] > arr[maximum]){
maximum = j;
}
}
swap(arr[maximum],arr[i]);
count++;
}
Also, if you want to exact n-1 times swap, your iteration for i should changed too.

What's the term for saving values of calculations instead of recalculating multiple times?

When you have code like this (written in java, but applicable to any similar language):
public static void main(String[] args) {
int total = 0;
for (int i = 0; i < 50; i++)
total += i * doStuff(i % 2); // multiplies i times doStuff(remainder of i / 2)
}
public static int doStuff(int i) {
// Lots of complicated calculations
}
You can see that there's room for improvement. doStuff(i % 2) only returns two different values - one for doStuff(0) on even numbers and one for doStuff(1) on odd numbers. Therefore you're wasting a lot of computation time/power on recalculating those values each time by saying doStuff(i % 2). You can improve like this:
public static void main(String[] args) {
int total = 0;
boolean[] alreadyCalculated = new boolean[2];
int[] results = new int[2];
for (int i = 0; i < 50; i++) {
if (!alreadyCalculated[i % 2]) {
results[i % 2] = doStuff(i % 2);
alreadyCalculated[i % 2] = true;
}
total += i * results[i % 2];
}
}
Now it accesses a stored value instead of recalculating each time. It might seem silly to keep arrays like that, but for cases like looping from, say, i = 0, i < 500 and you're checking i % 32 each time, or something, an array is an elegant approach.
Is there a term for this kind of code optimization? I'd like to read up more on the different forms and the conventions of it but I'm lacking a concise description.
Is there a term for this kind of code optimization?
Yes, there is:
In computing, memoization is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
https://en.wikipedia.org/wiki/Memoization
Common-subexpression-elimination (CSE) is related to this. This case is a combination of that and hoisting a loop-invariant calculation out of a loop.
I'd agree with CBroe that you could call this specific form of caching memoization, esp the way you're implementing it with the clunky alreadyCalculated array. You can optimize that away since you know which calls will be new values and which will be repeats. Normally you'd implement memoization with a static array inside the called function, for the benefit of all callers. Ideally there's a sentinel value you can use to mark entries which don't have a result computed yet, instead of maintaining a separate array for that. Or for a sparse set of input values, just use a hash (instead of e.g. an array with 2^32 entries).
You can also avoid the if in the main loop.
public class Optim
{
public static int doStuff(int i) { return (i+5) << 1; }
public static void main(String[] args)
{
int total = 0;
int results[] = new int[2];
// more interesting if we pretend the loop count isn't known to be > 1, so avoiding calling doStuff(1) for n=1 is useful.
// otherwise you'd just do int[] results = { doStuff(0), doStuff(1) };
int n = 50;
for (int i = 0 ; i < Math.min(n, 2) ; i++) {
results[i] = doStuff(i);
total += i * results[i];
}
for (int i = 2; i < n; i++) { // runs zero times if n < 2
total += i * results[i % 2];
}
System.out.print(total);
}
}
Of course, in this case we can optimize a lot further. sum(0..n) = n * (n+1) / 2, so we can use that to get a closed-form (non-looping) solution in terms of doStuff(0) (sum of the even terms) and doStuff(1) (sum of the odd terms). So we only need the two doStuff() results once each, avoiding any need to memoize.

.Net Parallel.For strange behavior

I brute-forced summing of all primes under 2000000. After that, just for fun I tried to parallel my for, but I was a little bit surprised when I saw that Parallel.For gives me an incorrect sum!
Here's my code : (C#)
static class Problem
{
public static long Solution()
{
long sum = 0;
//Correct result is 142913828922
//Parallel.For(2, 2000000, i =>
// {
// if (IsPrime(i)) sum += i;
// });
for (int i = 2; i < 2000000; i++)
{
if (IsPrime(i)) sum += i;
}
return sum;
}
private static bool IsPrime(int value)
{
for (int i = 2; i <= (int)Math.Sqrt(value); i++)
{
if (value % i == 0) return false;
}
return true;
}
}
I know that brute-force is pretty bad solution here but that is not a question about that. I think I've made some very stupid mistake, but I just can't locate it. So, for is calculating correctly, but Parallel.For is not.
You are accessing the variable sum from multiple threads without locking it, so it is possible that the read / write operations become overlapped.
Adding a lock will correct the result (but you will be effectively serializing the computation, losing the benefit you were aiming for).
You should instead calculate a subtotal on each thread and add the sub-totals at the end. See the article How to: Write a Parallel.For Loop That Has Thread-Local Variables on MSDN for more details.
long total = 0;
// Use type parameter to make subtotal a long, not an int
Parallel.For<long>(0, nums.Length, () => 0, (j, loop, subtotal) =>
{
subtotal += nums[j];
return subtotal;
},
(x) => Interlocked.Add(ref total, x)
);
Many thanks to all of you for your quick answers
i changed
sum += i;
to
Interlocked.Add(ref sum,i);
and now it works great.