How to calculate time complexity of big-theta in fragments of code? - time-complexity

I'm doing exercise on analyzing time complexity on fragments of code, however, I'm having trouble figuring out how the following two code can have different time complexity:
for(a=1,a<=n,a++):
for(b=1,b<=a, b++):
c=c+1
Which the running time of the code can
be expressed as θ(n^2).
Yet,
for(a=1,a<=n,a=2*a):
for(b=1,b<=a, b++):
c=c+1
is expressed as θ(n).
I thought the second fragments of code has the running time of θ(n^2/2)=θ(n^2).
Apparently I was mistaken.
Could some please give some hints of how to properly analyzed the time complexity of the mentioned two codes?
It would help a lot, thanks.

You will see it clearly when unfolding those. Let me try:
First fragment: suppose n = 8
a = 1, b = 1
a = 2, b = 1,2
a = 3, b = 1,2,3
a = 4, b = 1,2,3,4
a = 5, b = 1,2,3,4,5
a = 6, b = 1,2,3,4,5,6
a = 7, b = 1,2,3,4,5,6,7
a = 8, b = 1,2,3,4,5,6,7,8
Second fragment: suppose n = 8
a = 1, b = 1
a = 2, b = 1,2
a = 4, b = 1,2,3,4
a = 8, b = 1,2,3,4,5,6,7,8
By simply counting the number of loops above, you will start to see that a=a*2 has canceled the number of outer loop by the same proportion.
Actually, I believe the answer should be θ(n log(n))
Hope this help.

Related

MS-Access (SQL): Strange behaviour of the MID() function when using decimal arguments

I have noticed strange behavior of the MID() function in MS Access when used in combination with decimal numbers as arguments.
The data is as follows:
table: Test
ID Name Surname
1 Jamal Winstone
2 Joe Roan
3 Jake Tumble
4 Lea More
The SQL statement is:
SELECT MID(Surname, ID, LEN(Name)/2) FROM Test
The results are:
Expr1000
Wi
oa
mb
e
However, shouldn't it be as follows?
MID(Winstone, 1, LEN(Jamal)/2) = MID(Winstone, 1, 5/2) = MID(Winstone, 1, 2.5) = Wi (only 2 characters)
MID(Roan, 2, LEN(Joe)/2) = MID(Roan, 2, 3/2) = MID(Roan, 2, 1.5) = o (only 1 character)
MID(Tumble, 3, LEN(Jake)/2) = MID(Tumble, 3, 4/2) = MID(Tumble, 3, 2) = mb (2 charactes)
MID(More, 4, LEN(Lea)/2) = MID(More, 4, 3/2) = MID(More, 4, 1.5) = e (only 1 character)
This is very strange. Any ideas why this is happening, are the numbers with decimal places rounded?
Thanks
The logic here is very simple:
Mid takes a Long, so needs to cast a float/decimal/currency to a Long first.
Casting to a long uses banker's rounding (to the nearest even) on halves (Clng(1.5) = Clng(2.5) = 2), see the docs.
So these results are entirely expected.
Use Int if you want the integer part (e.g. Int(1.99) = 1)

generate maximum number of combinations [duplicate]

This question already has answers here:
How to generate a power set of a given set?
(8 answers)
Closed 4 years ago.
I am trying to find an algorithm enabling to generate the full list of possible combinations from x given numbers.
Example: possible combinations from 3 numbers (a, b,c):
a, b, c , a +b , a + c , b + c , a+b+c
Many thanks in advance for your help!
Treat the binary representation of the numbers from 0 to 2^x-1 as set membership. E.g., for ABC:
0 = 000 = {}
1 = 001 = {C}
2 = 010 = {B}
3 = 011 = {B,C}
4 = 100 = {A}
etc...
Do you meant generate possible combination of sum of numbers?
Start with an empty set s = {0}
For each number a,b,c:
duplicate the existing set s, add each number to the duplicated set. Add the results back to s.
Example:
s = {0}
for a:
duplicate s, s' = {0}
add a to each of s', s' = {a}
add s' back to s, s = {0,a}
for b:
duplicate s, s' = {0,a}
add b to each of s' = {b,a+b}
add s' back to s, s= {0,a,b,a+b}
for c:
dupicate s, s' = {0,a,b,a+b}
add c to each of s' = {c,a+c,b+c,a+b+c}
add s' to s, s = {0,a,b,a+b,c,a+c,b+c,a+b+c}

Hash function to iterate through a matrix

Given a NxN matrix and a (row,column) position, what is a method to select a different position in a random (or pseudo-random) order, trying to avoid collisions as much as possible?
For example: consider a 5x5 matrix and start from (1,2)
0 0 0 0 0
0 0 X 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
I'm looking for a method like
(x,y) hash (x,y);
to jump to a different position in the matrix, avoiding collisions as much as possible
(do not care how to return two different values, it doesn't matter, just think of an array).
Of course, I can simply use
row = rand()%N;
column = rand()%N;
but it's not that good to avoid collisions.
I thought I could apply twice a simple hash method for both row and column and use the results as new coordinates, but I'm not sure this is a good solution.
Any ideas?
Can you determine the order of the walk before you start iterating? If your matrices are large, this approach isn't space-efficient, but it is straightforward and collision-free. I would do something like:
Generate an array of all of the coordinates. Remove the starting position from the list.
Shuffle the list (there's sample code for a Fisher-Yates shuffle here)
Use the shuffled list for your walk order.
Edit 2 & 3: A modular approach: Given s array elements, choose a prime p of form 2+3*n, p>s. For i=1 to p, use cells (iii)%p when that value is in range 1...s-1. (For row-length r, cell #c subscripts are c%r, c/r.)
Effectively, this method uses H(i) = (iii) mod p as a hash function. The reference shows that as i ranges from 1 to p, H(i) takes on each of the values from 0 to p-1, exactly one time each.
For example, with s=25 and p=29 or 47, this uses cells in following order:
p=29: 1 8 6 9 13 24 19 4 14 17 22 18 11 7 12 3 15 10 5 16 20 23 2 21 0
p=47: 1 8 17 14 24 13 15 18 7 4 10 2 6 21 3 22 9 12 11 23 5 19 16 20 0
according to bc code like
s=25;p=29;for(i=1;i<=p;++i){t=(i^3)%p; if(t<s){print " ",t}}
The text above shows the suggestion I made in Edit 2 of my answer. The text below shows my first answer.
Edit 0: (This is the suggestion to which Seamus's comment applied): A simple method to go through a vector in a "random appearing" way is to repeatedly add d (d>1) to an index. This will access all elements if d and s are coprime (where s=vector length). Note, my example below is in terms of a vector; you could do the same thing independently on the other axis of your matrix, with a different delta for it, except a problem mentioned below would occur. Note, "coprime" means that gcd(d,s)=1. If s is variable, you'd need gcd() code.
Example: Say s is 10. gcd(s,x) is 1 for x in {1,3,7,9} and is not 1 for x in {2,4,5,6,8,10}. Suppose we choose d=7, and start with i=0. i will take on values 0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, which modulo 10 is 0, 7, 4, 1, 8, 5, 2, 9, 6, 3, 0.
Edit 1 & 3: Unfortunately this will have a problem in the two-axis case; for example, if you use d=7 for x axis, and e=3 for y-axis, while the first 21 hits will be distinct, it will then continue repeating the same 21 hits. To address this, treat the whole matrix as a vector, use d with gcd(d,s)=1, and convert cell numbers to subscripts as above.
If you just want to iterate through the matrix, what is wrong with row++; if (row == N) {row = 0; column++}?
If you iterate through the row and the column independently, and each cycles back to the beginning after N steps, then the (row, column) pair will interate through only N of the N^2 cells of the matrix.
If you want to iterate through all of the cells of the matrix in pseudo-random order, you could look at questions here on random permutations.
This is a companion answer to address a question about my previous answer: How to find an appropriate prime p >= s (where s = the number of matrix elements) to use in the hash function H(i) = (i*i*i) mod p.
We need to find a prime of form 3n+2, where n is any odd integer such that 3*n+2 >= s. Note that n odd gives 3n+2 = 3(2k+1)+2 = 6k+5 where k need not be odd. In the example code below, p = 5+6*(s/6); initializes p to be a number of form 6k+5, and p += 6; maintains p in this form.
The code below shows that half-a-dozen lines of code are enough for the calculation. Timings are shown after the code, which is reasonably fast: 12 us at s=half a million, 200 us at s=half a billion, where us denotes microseconds.
// timing how long to find primes of form 2+3*n by division
// jiw 20 Sep 2011
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
double ttime(double base) {
struct timeval tod;
gettimeofday(&tod, NULL);
return tod.tv_sec + tod.tv_usec/1e6 - base;
}
int main(int argc, char *argv[]) {
int d, s, p, par=0;
double t0=ttime(0);
++par; s=5000; if (argc > par) s = atoi(argv[par]);
p = 5+6*(s/6);
while (1) {
for (d=3; d*d<p; d+=2)
if (p%d==0) break;
if (d*d >= p) break;
p += 6;
}
printf ("p = %d after %.6f seconds\n", p, ttime(t0));
return 0;
}
Timing results on 2.5GHz Athlon 5200+:
qili ~/px > for i in 0 00 000 0000 00000 000000; do ./divide-timing 500$i; done
p = 5003 after 0.000008 seconds
p = 50021 after 0.000010 seconds
p = 500009 after 0.000012 seconds
p = 5000081 after 0.000031 seconds
p = 50000021 after 0.000072 seconds
p = 500000003 after 0.000200 seconds
qili ~/px > factor 5003 50021 500009 5000081 50000021 500000003
5003: 5003
50021: 50021
500009: 500009
5000081: 5000081
50000021: 50000021
500000003: 500000003
Update 1 Of course, timing is not determinate (ie, can vary substantially depending on the value of s, other processes on machine, etc); for example:
qili ~/px > time for i in 000 004 010 058 070 094 100 118 184; do ./divide-timing 500000$i; done
p = 500000003 after 0.000201 seconds
p = 500000009 after 0.000201 seconds
p = 500000057 after 0.000235 seconds
p = 500000069 after 0.000394 seconds
p = 500000093 after 0.000200 seconds
p = 500000099 after 0.000201 seconds
p = 500000117 after 0.000201 seconds
p = 500000183 after 0.000211 seconds
p = 500000201 after 0.000223 seconds
real 0m0.011s
user 0m0.002s
sys 0m0.004s
Consider using a double hash function to get a better distribution inside the matrix,
but given that you cannot avoid colisions, what I suggest is to use an array of sentinels
and mark the positions you visit, this way you are sure you get to visit a cell once.

Comparing vectors

I am new to R and am trying to find a better solution for accomplishing this fairly simple task efficiently.
I have a data.frame M with 100,000 lines (and many columns, out of which 2 columns are relevant to this problem, I'll call it M1, M2). I have another data.frame where column V1 with about 10,000 elements is essential to this task. My task is this:
For each of the element in V1, find where does it occur in M2 and pull out the corresponding M1. I am able to do this using for-loop and it is terribly slow! I am used to Matlab and Perl and this is taking for EVER in R! Surely there's a better way. I would appreciate any valuable suggestions in accomplishing this task...
for (x in c(1:length(V$V1)) {
start[x] = M$M1[M$M2 == V$V1[x]]
}
There is only 1 element that will match, and so I can use the logical statement to directly get the element in start vector. How can I vectorize this?
Thank you!
Here is another solution using the same example by #aix.
M[match(V$V1, M$M2),]
To benchmark performance, we can use the R package rbenchmark.
library(rbenchmark)
f_ramnath = function() M[match(V$V1, M$M2),]
f_aix = function() merge(V, M, by.x='V1', by.y='M2', sort=F)
f_chase = function() M[M$M2 %in% V$V1,] # modified to return full data frame
benchmark(f_ramnath(), f_aix(), f_chase(), replications = 10000)
test replications elapsed relative
2 f_aix() 10000 12.907 7.068456
3 f_chase() 10000 2.010 1.100767
1 f_ramnath() 10000 1.826 1.000000
Another option is to use the %in% operator:
> set.seed(1)
> M <- data.frame(M1 = sample(1:20, 15, FALSE), M2 = sample(1:20, 15, FALSE))
> V <- data.frame(V1 = sample(1:20, 10, FALSE))
> M$M1[M$M2 %in% V$V1]
[1] 6 8 11 9 19 1 3 5
Sounds like you're looking for merge:
> M <- data.frame(M1=c(1,2,3,4,10,3,15), M2=c(15,6,7,8,-1,12,5))
> V <- data.frame(V1=c(-1,12,5,7))
> merge(V, M, by.x='V1', by.y='M2', sort=F)
V1 M1
1 -1 10
2 12 3
3 5 15
4 7 3
If V$V1 might contain values not present in M$M2, you may want to specify all.x=T. This will fill in the missing values with NAs instead of omitting them from the result.

How can I optimize this timeline-matching code in Matlab?

I currently have two timelines (timeline1 and timeline2), with matching data (data1 and data2). Timelines almost, but not quite match (about 90% of common values).
I'm trying to find values from data1 and data2 that correspond to identical timestamps (ignoring all other values)
My first trivial implementation is as follows (and is obviously terribly slow, given that my timelines contain thousands of values). Any ideas on how to improve this? I'm sure there is a smart way of doing this while avoiding the for loop, or the find operation...
% We expect the common timeline to contain
% 0 1 4 5 9
timeline1 = [0 1 4 5 7 8 9 10];
timeline2 = [0 1 2 4 5 6 9];
% Some bogus data
data1 = timeline1*10;
data2 = timeline2*20;
reconstructedData1 = data1;
reconstructedData2 = zeros(size(data1));
currentSearchPosition = 1;
for t = 1:length(timeline1)
% We only look beyond the previous matching location, to speed up find
matchingIndex = find(timeline2(currentSearchPosition:end) == timeline1(t), 1);
if isempty(matchingIndex)
reconstructedData1(t) = nan;
reconstructedData2(t) = nan;
else
reconstructedData2(t) = data2(matchingIndex+currentSearchPosition-1);
currentSearchPosition = currentSearchPosition+matchingIndex;
end
end
% Remove values from data1 for which no match was found in data2
reconstructedData1(isnan(reconstructedData1)) = [];
reconstructedData2(isnan(reconstructedData2)) = [];
You can use Matlab's intersect function:
c = intersect(A, B)
Couldn't you just call INTERSECT?
commonTimeline = intersect(timeline1,timeline2);
commonTimeline =
0 1 4 5 9
You need to use the indexes returned from intersect.
[~ ia ib] = intersect(timeline1, timeline2);
recondata1 = data1(ia);
recondata2 = data2(ib);