Big-O notation of 8^log2(n) - time-complexity

I'm confused about what the big-O notation of 8^(log2(n)) would be.
Would you just be able to change it to O(8^n) since the log2 would somewhat act as constant that just reduces the value of n? or would it be something else?
In a somewhat similar case what would be the big-O for log2(n^n). would this just be O(log2(n))?

We'll need the following properties of logs and exponents to understand this one:
bLogb(x) = x
k . Logb(x) = Logb(xk)
(xa)b = xa.b
For the first problem:
8Log2(n)
= (23)Log2(n) (because 8 = 23)
= 23 . Log2(n) (using Prop#3)
= 2Log2(n3) (using Prop#2)
= n3 (using Prop#1)
= O(n3)
For the second problem:
Log2(nn)
= n . Log2(n) (using Prop#2)
= O(n Log(n))

Big-O of 8^(log(n)) will be O(2^log(n)) as 8 = 2^3.
Also log(n^n) = n*log(n). So big-O for log2(n^n) would be O(n*log2(n))

Related

Why does the following algorithm have runtime log(log(n))?

I don't understand how the runtime of the algorithm can be log(log(n)). Can someone help me?
s=1
while s <= (log n)^2 do
s=3s
Notation note: log(n) indicates log2(n) throughout the solution.
Well, I suppose (log n)^2 indicates the square of log(n), which means log(n)*log(n). Let us try to analyze the algorithm.
It starts from s=1 and goes like 1,3,9,27...
Since it goes by the exponents of 3, after each iteration s can be shown as 3^m, m being the number of iterations starting from 1.
We will do these iterations until s becomes bigger than log(n)*log(n). So at some point 3^m will be equal to log(n)*log(n).
Solve the equation:
3^m = log(n) * log(n)
m = log3(log(n) * log(n))
Time complexity of the algorithm can be shown as O(m). We have to express m in terms of n.
log3(log(n) * log(n)) = log3(log(n)) + log3(log(n))
= 2 * log3(log(n)) For Big-Oh notation, constants do not matter. So let us get rid of 2.
Time complexity = O(log3(log(n)))
Well ok, here is the deal: By the definition of Big-Oh notation, it represents an upper bound runtime for our function. Therefore O(n) ⊆ O(n^2).
Notice that log3(a) < log2(a) after a point.
By the same logic we can conclude that O(log3(log(n)) ⊆ O(log(log(n)).
So the time complexity of the algorithm : O(log(logn))
Not the most scientific explanation, but I hope you got the point.
This follows as a special case of a more general principle. Consider the following loop:
s = 1
while s < k:
s = 3s
How many times will this loop run? Well, the values of s taken on will be equal to 1, 3, 9, 27, 81, ... = 30, 31, 32, 33, ... . And more generally, on the ith iteration of the loop, the value of s will be 3i.
This loop stops running at soon as 3i overshoots k. To figure out where that is, we can equate and solve:
3i = k
i = log3 k
So this loop will run a total of log3 k times.
Now, what do you think would happen if we used this loop instead?
s = 1
while s < k:
s = 4s
Using similar logic, the number of loop iterations would be log4 k. And more generally, if we have the following loop:
s = 1
while s < k:
s = c * s
Then assuming c > 1, the number of iterations will be logc k.
Given this, let's look at your loop:
s = 1
while s <= (log n)^2 do
s = 3s
Using the reasoning from above, the number of iterations of this loop works out to log3 (log n)2. Using properties of logarithms, we can simplify this to
log3 (log n)2
= 2 log3 log n
= O(log log n).

Evaluating time complexity for the binomial coefficient

I'm new to Theoretical Computer Science, and I would like to calculate the time complexity of the following algorithm that evaluates the binomial coefficient defined as
nf = 1;
for i = 2 to n do nf = nf * i;
kf = 1;
for i = 2 to k do kf = kf * i;
nkf = 1;
for i = 2 to n-k do nkf = nkf * i;
c = nf / (kf * nkf);
My textbook suggests to use Stirling's approximation
However, I can get the same result by considering that for i = 2 to n do nf = nf * i; have complexity O(n-2)=O(n), that is predominant.
Stirling's approximation seems a little bit overkill. Is my approach wrong?
In your first approach you calculate n!, k! and (n-k)! separately and then calculate the binomial coefficient. Therefore since all of those terms can be calculated with at most operations you have O(n) time complexity.
However, you are wrong about the time complexity of calculating the Stirling's formula. You only need log(n) in base 2 operations to calculate it. This is because when trying to calculate p'th power of some real number, instead of multiplicating it p times, you can instead keep squaring the number to calculate it quickly. For example:
If you want to calculate 2^17, instead of doing 17 operations like this:
return 2*2*2*2*2*2*2*2*2*2*2*2*2*2*2*2*2
you can do this:
a = 2*2
b = a*a
c = b*b
d = c*c
return d * 2
which is only 5 operations.
Note: However keep in mind that the Stirling's formula is not equal to the factorial. It is only an approximation but a good one.
Edit: Also you can consider a^n as e^(log(a)*n) and then calculate it by the quickly converging series expansion
1 + (log(a)n) + ((log(a)n)^2)/2! + ((log(a)n)^3)/3! + ...
Since the series converges very quickly you can get really close approximations in no time.

Complexity class of a function with known dependency between its execution time and its input size

Suppose I am trying to find the complexity class of a function. My data set doubles every time I evaluate the function, and each time this happens, the time it takes to execute the function increases by a factor of (X).
If we know (X), how do we find the complexity class/ O notation of the function? For example, if X is slightly over 2, then the Big-O notation is O(N log N).
Let T(n) be the time complexity of the function you are talking about, where n is the size of input data. We can write recursive equation for T(n):
T(n) = X * T(n/2)
where X is your constant. Let's "unroll" this recursion:
T(n) = X * T(n/2) = X^2 * T(n/4) = X^3 * T(n/8) = ... = X^k * T(n/2^k)
This unrolling process should end when the parameter k becomes large enough to satisfy:
n/2^k = 1
which means that n = 2^k or k = log(n) (logarithm is by base 2). Also we can assume that:
T(1) = C
where C is some another constant. Now we look at unrolled equation and substitute k by log(n) and T(1) by C:
T(n) = X^log(n) * C
We can simplify this formula using logarithm properties:
T(n) = C * n^log(X)

Coordinate Descent Algorithm in Julia for Least Squares not converging

As a warm-up to writing my own elastic net solver, I'm trying to get a fast enough version of ordinary least squares implemented using coordinate descent.
I believe I've implemented the coordinate descent algorithm correctly, but when I use the "fast" version (see below), the algorithm is insanely unstable, outputting regression coefficients that routinely overflow a 64-bit float when the number of features is of moderate size compared to the number of samples.
Linear Regression and OLS
If b = A*x, where A is a matrix, x a vector of the unknown regression coefficients, and y is the output, I want to find x that minimizes
||b - Ax||^2
If A[j] is the jth column of A and A[-j] is A without column j, and the columns of A are normalized so that ||A[j]||^2 = 1 for all j, the coordinate-wise update is then
Coordinate Descent:
x[j] <-- A[j]^T * (b - A[-j] * x[-j])
I'm following along with these notes (page 9-10) but the derivation is simple calculus.
It's pointed out that instead of recomputing A[j]^T(b - A[-j] * x[-j]) all the time, a faster way to do it is with
Fast Coordinate Descent:
x[j] <-- A[j]^T*r + x[j]
where the total residual r = b - Ax is computed outside the loop over coordinates. The equivalence of these update rules follows from noting that Ax = A[j]*x[j] + A[-j]*x[-j] and rearranging terms.
My problem is that while the second method is indeed faster, it's wildly numerically unstable for me whenever the number of features isn't small compared to the number of samples. I was wondering if anyone might have some insight as to why that's the case. I should note that the first method, which is more stable, still starts disagreeing with more standard methods as the number of features approaches the number of samples.
Julia code
Below is some Julia code for the two update rules:
function OLS_builtin(A,b)
x = A\b
return(x)
end
function OLS_coord_descent(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
for j = 1:P
x[j] = dot(A[:,j], b - A[:,1:P .!= j]*x[1:P .!= j])
end
end
return(x)
end
function OLS_coord_descent_fast(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
r = b - A*x
for j = 1:P
x[j] += dot(A[:,j],r)
end
end
return(x)
end
Example of the problem
I generate data with the following:
n = 100
p = 50
σ = 0.1
β_nz = float([i*(-1)^i for i in 1:10])
β = append!(β_nz,zeros(Float64,p-length(β_nz)))
X = randn(n,p); X .-= mean(X,1); X ./= sqrt(sum(abs2(X),1))
y = X*β + σ*randn(n); y .-= mean(y);
Here I use p=50, and I get good agreement between OLS_coord_descent(X,y) and OLS_builtin(X,y), whereas OLS_coord_descent_fast(X,y)returns exponentially large values for the regression coefficients.
When p is less than about 20, OLS_coord_descent_fast(X,y) agrees with the other two.
Conjecture
Since things agrees for the regime of p << n, I think the algorithm is formally correct, but numerically unstable. Does anyone have any thoughts on whether this guess is correct, and if so how to correct for the instability while retaining (most) of the performance gains of the fast version of the algorithm?
The quick answer: You forgot to update r after each x[j] update. Following is the fixed function which behaves like OLS_coord_descent:
function OLS_coord_descent_fast(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
r = b - A*x
for j = 1:P
x[j] += dot(A[:,j],r)
r -= A[:,j]*dot(A[:,j],r) # Add this line
end
end
return(x)
end

Whats the time complexity of finding a max recursively

I just wanted to make sure I'm going in the right direction. I want to find a max value of an array by recursively splitting it and find the max of each separate array. Because I am splitting it, it would be 2*T(n/2). And because I have to make a comparison at the end for the 2 arrays, I have T(1).
So would my recurrence relation be like this:
T = { 2*T(n/2) + 1, when n>=2 ;T(1), when n = 1;
and and therefore my complexity would be Theta(nlgn)?
The formula you composed seems about right, but your analysis isn't perfect.
T = 2*T(n/2) + 1 = 2*(2*T(n/4) + 1) + 1 = ...
For the i-th iteration you'll get:
Ti(n) = 2^i*T(n/2^i) + i
now what you want to know for which i does n/2^i equals 1 (or just about any constant, if you like) so you reach the end-condition of n=1.
That would be the solution to n/2^I = 1 -> I = Log2(n). Plant it in the equation for Ti and you get:
TI(n) = 2^log2(n)*T(n/2^log2(n)) + log2(n) = n*1+log2(n) = n + log2(n)
and you get T(n) = O(n + log2(n) (just like #bdares said) = O(n) (just like #bdares said)
No, no... you are taking O(1) time for each recursion.
How many are there?
There are N leaves, so you know it's at least O(N).
How many do you need to compare to find the absolute maximum? That's O(log(N)).
Add them together, don't multiply. O(N+log(N)) is your time complexity.