Solve for q: (log(n))^q = log(log(n)) - time-complexity

I'm proving the big O runtime of an algorithm for an assignment but am unfortunately quite rusty when it comes to logs. Currently, I have:
(log(n))^q <= log(log(n))
I am trying to isolate q in terms of n (where I'm hoping n will cancel out). Can someone please explain to me how to do this (and not just provide an answer)? Thanks!

This would've been prettier on math stackexchange (because we can use latex), but you can just log both sides to bring the q exponent down (since log(x^n) = nlog(x) is a property of logs over the reals):
q log(log(n)) <= log(log(log(n)))
Now you can divide both sides to isolate q:
q <= log(log(log(n)))/log(log(n))

Related

Randomly increasing sequence- Wolfram Mathematica

Good afternoon, I have a problem making recurrence table with randomly increasing sequence. I want it to return an increasing sequence with a random difference between two elements. Right now I've got:
RecurrenceTable[{a[k+1]==a[k] + RandomInteger[{0,4}], a[1]==-12},a,{k,1,5}]
But it returns me an arithmetic progression with chosen d for all k (e.g. {-12,-8,-4,0,4,8,12,16,20,24}).
Also, I will be really grateful for explaining why if I replace every k in my code with n I get:
RecurrenceTable[{4+a[n] == a[n],a[1] == -12},a,{n,1,10}]
Thank You very much for Your time!
I don't believe that RecurrenceTable is what you are looking for.
Try this instead
FoldList[Plus,-12,RandomInteger[{0,4},5]]
which returns, this time,
{-12,-8,-7,-3,1,2}
and returns, this time,
{-12,-9,-5,-3,0,1}

Converting conditional subtour elimination of CVRP into normal ones

I am trying to eliminate the if condition constraints from the following CVRP formulation.
I tried some big M methods on paper but failed to come up with a proper reformulation. Can you please help me to find a solution?
Thanks!
You can split the equation into two inequalities and the apply the big-M method:
ui + qj <= uj + M(1-xij)
ui + qj >= uj - M(1-xij)
Models with big-M constants tend to be weak and numerically instable, so I suggest to select the constant as small as possible (i.e. make M depend on ij, if possible). To learn more about that have a look at the Perils of "Big M".

Analyzing time complexity (Poly log vs polynomial)

Say an algorithm runs at
[5n^3 + 8n^2(lg (n))^4]
Which is the first order term? Would it be the one with the poly log or the polynomial?
For each two constants a>0,b>0, log(n)^a is in o(n^b) (Note small o notation here).
One way to prove this claim is examine what happens when we apply a monotomically increasing function on both sides: the log function.
log(log(n)^a)) = a* log(log(n))
log(n^b) = b * log(n)
Since we know we can ignore constants when it comes to asymptotic notations, we can see that the answer to "which is bigger" log(n)^a or n^b, is the same as "which is bigger": log(log(n)) and log(n). This answer is much more intuitive to answer.

Renaming variables to solve recursion method

I know the idea of renaming the variables that is transforming the recurrence to one that you have seen before.
I'm OK with slide until line 4 .. they renamed T(2^m) with S(m) >> this mean they made 2^m = m
So S(m) should be :
S(m)= 2T(m^(0.5)) + m
also m i think we shouldn't leave m as it is, because it here mean 2^m but they in real are not
Could any one explain this to me?
And also how can i know which variables I should use to make it easy to me ?
Everything you're saying is correct up to the point where you claim that since S(m) = T(2m), then m = 2m.
The step of defining S(m) = T(2m) is similar to defining some new function g in terms of an old function f. For example, if you define a new function g(x) = 2f(5x), you're not saying that x = 5x. You're just defining a new function that's evaluated in terms of f.
So let's see what happens from here. We've defined S(m) = T(2m). That means that
S(m) = T(2m)
= 2T(√(2m)) + lg (2m)
We can do some algebraic simplification to see that
S(m) = 2T(2m/2) + m
And, using the connection between T and S, we see that
S(m) = 2S(m/2) + m
Notice that we ended up with the recurrence S(m) = 2S(m/2) + m not by just replacing T with S in the original recurrence, but by doing algebraic substitutions and simplifications.
Once we're here, we can use the master theorem to solve S(m) and get that S(m) = O(m log m), so
T(n) = S(lg n) = O(lg n lg lg n).
As for how you'd come up with this in the first place - that just takes practice. The key insight is that to use the master theorem you need to be shrink the size of the problem down by a constant factor each time, so you need to find a transformation that converts square roots into division by a constant. Square roots are a kind of exponentiation, and logarithms are specifically designed to convert exponentiation into multiplication and division, so it's reasonable to try a log or exponential substitution. Now that you know the trick, I suspect that you'll see it in a lot more places.
You could, as alternative, also just divide the first equation by log(n) to get
T(n)/log(n)=T(sqrt(n))/log(sqrt(n)) + 1
and then just use
S(n) = T(n)/log(n) with S(n) = S(sqrt(n)) + 1
or in a different way
S(k) = T(n^(2^(-k)))/log(n^(2^(-k)))
where then
S(k+1)=S(k)+1
is again a well-known recursive equation.

Circumventing R's `Error in if (nbins > .Machine$integer.max)`

This is a saga which began with the problem of how to do survey weighting. Now that I appear to be doing that correctly, I have hit a bit of a wall (see previous post for details on the import process and where the strata variable came from):
> require(foreign)
> ipums <- read.dta('/path/to/data.dta')
> require(survey)
> ipums.design <- svydesign(id=~serial, strata=~strata, data=ipums, weights=perwt)
Error in if (nbins > .Machine$integer.max) stop("attempt to make a table with >= 2^31 elements") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In pd * (as.integer(cat) - 1L) : NAs produced by integer overflow
2: In pd * nl : NAs produced by integer overflow
> traceback()
9: tabulate(bin, pd)
8: as.vector(data)
7: array(tabulate(bin, pd), dims, dimnames = dn)
6: table(ids[, 1], strata[, 1])
5: inherits(x, "data.frame")
4: is.data.frame(x)
3: rowSums(table(ids[, 1], strata[, 1]) > 0)
2: svydesign.default(id = ~serial, weights = ~perwt, strata = ~strata,
data = ipums)
1: svydesign(id = ~serial, weights = ~perwt, strata = ~strata, data = ipums)
This error seems to come from the tabulate function, which I hoped would be straightforward enough to circumvent, first by changing .Machine$integer.max
> .Machine$integer.max <- 2^40
and when that didn't work the whole source code of tabulate:
> tabulate <- function(bin, nbins = max(1L, bin, na.rm=TRUE))
{
if(!is.numeric(bin) && !is.factor(bin))
stop("'bin' must be numeric or a factor")
#if (nbins > .Machine$integer.max)
if (nbins > 2^40) #replacement line
stop("attempt to make a table with >= 2^31 elements")
.C("R_tabulate",
as.integer(bin),
as.integer(length(bin)),
as.integer(nbins),
ans = integer(nbins),
NAOK = TRUE,
PACKAGE="base")$ans
}
Neither circumvented the problem. Apparently this is one reason why the ff package was created, but what worries me is the extent to which this is a problem I cannot avoid in R. This post seems to indicate that even if I were to use a package that would avoid this problem, I would only be able to access 2^31 elements at a time. My hope was to use sql (either sqlite or postgresql) to get around the memory problems, but I'm afraid I'll spend a while getting that to work, only to run into the same fundamental limit.
Attempting to switch back to Stata doesn't solve the problem either. Again see the previous post for how I use svyset, but the calculation I would like to run causes Stata to hang:
svy: mean age, over(strata)
Whether throwing more memory at it will solve the problem I don't know. I run R on my desktop which has 16 gigs, and I use Stata through a Windows server, currently setting memory allocation to 2000MB, but I could theoretically experiment with increasing that.
So in sum:
Is this a hard limit in R?
Would sql solve my R problems?
If I split it up into many separate files would that fix it (a lot of work...)?
Would throwing a lot of memory at Stata do it?
Am I seriously barking up the wrong tree somehow?
Yes, R uses 32-bit indexes for vectors so they can contain no more than 2^31-1 entries and you are trying to create something with 2^40. There is talk of introducing 64-bit indexes but that will be some way off before appearing in R. Vectors have the stated hard limit and that is it as far as base R is concerned.
I am unfamiliar with the details of what you are doing to offer any further advice on the other parts of your Q.
Why do you want to work with the full data set? Wouldn't a smaller sample that can fit in to the restrictions R places on you be just as useful? You could use SQL to store all the data and query it from R to return a random subset of more appropriate size.
Since this question was asked some time ago, I'd like to point that my answer here uses the version 3.3 of the survey package.
If you check the code of svydesign, you can see that the function that causes all the problem is within a check step that looks whether you should set the nest parameter to TRUE or not. This step can be disabled setting the option check.strata=FALSE.
Of course, you shouldn't disable a check step unless you know what you are doing. In this case, you should be able to decide yourself whether you need to set the nest option to TRUE or FALSE. nest should be set to TRUE when the same PSU (cluster) id is recycled in different strata.
Concretely for the IPUMS dataset, since you are using the serial variable for cluster identification and serial is unique for each household in a given sample, you may want to set nest to FALSE.
So, your survey design line would be:
ipums.design <- svydesign(id=~serial, strata=~strata, data=ipums, weights=perwt, check.strata=FALSE, nest=FALSE)
Extra advice: even after circumventing this problem you will find that the code is pretty slow unless you remap strata to a range from 1 to length(unique(ipums$strata)):
ipums$strata <- match(ipums$strata,unique(ipums$strata))
Both #Gavin and #Martin deserve credit for this answer, or at least leading me in the right direction. I'm mostly answering it separately to make it easier to read.
In the order I asked:
Yes 2^31 is a hard limit in R, though it seems to matter what type it is (which is a bit strange given it is the length of the vector, rather than the amount of memory (which I have plenty of) which is the stated problem. Do not convert strata or id variables to factors, that will just fix their length and nullify the effects of subsetting (which is the way to get around this problem).
sql could probably help, provided I learn how to use it correctly. I did the following test:
library(multicore) # make svy fast!
ri.ny <- subset(ipums, statefips_num %in% c(36, 44))
ri.ny.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ri.ny)
svyby(~incwage, ~strata, ri.ny.design, svymean, data=ri.ny, na.rm=TRUE, multicore=TRUE)
ri <- subset(ri.ny, statefips_num==44)
ri.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ri)
ri.mean <- svymean(~incwage, ri.design, data=ri, na.rm=TRUE)
ny <- subset(ri.ny, statefips_num==36)
ny.design <- svydesign(id=~serial, weights=~perwt, strata=~strata, data=ny)
ny.mean <- svymean(~incwage, ny.design, data=ny, na.rm=TRUE, multicore=TRUE)
And found the means to be the same, which seems like a reasonable test.
So: in theory, provided I can split up the calculation by either using plyr or sql, the results should still be fine.
See 2.
Throwing a lot of memory at Stata definitely helps, but now I'm running into annoying formatting issues. I seem to be able to perform most of the calculation I want (much quicker and with more stability as well) but I can't figure out how to get it into the form I want. Will probably ask a separate question on this. I think the short version here is that for big survey data, Stata is much better out of the box.
In many ways yes. Trying to do analysis with data this big is not something I should have taken on lightly, and I'm far from figuring it out even now. I was using the svydesign function correctly, but I didn't really know what's going on. I have a (very slightly) better grasp now, and it's heartening to know I was generally correct about how to solve the problem. #Gavin's general suggestion of trying out small data with external results to compare to is invaluable, something I should have started ages ago. Many thanks to both #Gavin and #Martin.