Iterative solution to compute powers

Iterative solution to compute powers - iteration

I am working on developing an efficient iterative code to compute mn. After some thinking and googling I found this code;
public static int power(int n, int m)
// Efficiently calculates m to the power of n iteratively
{
int pow=m, acc=1, count=n;
while(count!=0)
{
if(count%2==1)
acc=acc*pow;
pow=pow*pow;
count=count/2;
}
return acc;
}
This logic makes sense to me except the fact that why are we squaring value of pow towards the end each time. I am familiar with similar recursive approach, but this squaring is not looking very intuitive to me. Can I kindly get some help her? An example with explanation will be really helpful.

The accumulator is being squared each iteration because count (which is the inverse cumulative power) is being halved each iteration.
If the count is odd, the accumulator is multiplied by the number. This algorithm relies on integer arithmetic, which discards the fractional part of a division, effectively further decrementing by 1 when the count is odd.

This is a very tricky solution to understand. I am solving this problem in leetcode and have found the iterative solution. I have spent a whole day to understand this beautiful iterative solution. The main problem is this iterative solution does not work as like its recursive solution.
Let's pick an example to demonstrate. But first I have to re-write your code by replacing some variable name as the name in your given code is very confusing.
// Find m^n
public static int power(int n, int m)
{
int pow=n, result=1, base=m;
while(pow > 0)
{
if(pow%2 == 1) result = result * base;
base = base * base;
pow = pow/2;
}
return result;
}
Let's understand the beauty step by step.
Let say, base = 2 and power = 10.
Calculation
Description
2^10= (2*2)^5 = 4^5
even
We have changed the base to 4 and power to 5. So it is now enough to find 4^5. [base multiplied with itself and power is half
4^5= 4*(4)^4 = 4^5
odd
We separate single 4 which is the base for current iteration. We store the base to result variable. We will now find the value of 4^4 and then multiply with the result variable.
4^4= (4*4)^2 = 16^2
even
We change the base to 16 and power to 2. It is now enough to find 16^2
16^2= (16*16)^1 = 256^1
even
We change the base to 256 and power to 1. It is now enough to find 256^1
256^1 = 256 * 256^0
odd
We separate single 256 which is the base for current iteration. This value comes from evaluation of 4^4.So, we have to multiply it with our previous result variable. And continue evaluating the rest value of 256^0.
256^0
zero
Power 0. So stop the iteration.
So, after translating the process in pseudo code it will be similar to this,
If power is even:
base = base * base
power /= 2
If power is odd:
result = result * base
power -= 1
Now, let have another observation. It is observed that floor(5 / 2) and (5-1) / 2 is same.
So, for odd power, we can directly set power / 2 instead of power -= 1. So, the pseudo code will be like the below,
if power is both odd or even:
base = base * base
power /= 2
If power is odd:
result = result * base
I hope you got the behind the scene.

Related

pyomo: minimal production time / BIG M

I am looking for a way to map a minimum necessary duty cycle in an optimization model.
After several attempts, however, I have now reached the end of my knowledge and hope for some inspiration here.
The idea is that a variable (binary) mdl.ontime is set so that the sum of successive ontime values is greater than or equal to the minimum duty cycle:
def ontime(mdl,t):
min_on_time = 3 # minimum on time in h
if t < min_on_time: return mdl.ontime[t] == 0
return sum(mdl.ontime[t-i] for i in range(min_on_time)) >= min_on_time
That works so far, if the variable mdl.ontime will not be recognized at all.
Then I tried three different constraints, unfortunately they all gave the same result: CPLEX only finds inf. results.
The first variant was:
def flag(mdl,t):
return mdl.ontime[t] + (mdl.production[t]>=0.1) >= 2
So if mdl.ontime is 1 and mdl.production is greater or equal 0.1 (the assumption is just exact enough), it should be greater or equal 2: a logical addition therm.
The second attemp was quite similar to the first:
def flag(mdl,t):
return (mdl.ontime[t]) >= (mdl.production[t] >= 0.1)
If mdl.ontime is 1, it should be greater or equal mdl.production compared with 0.1.
And the third with a big M variable:
def flag(mdl,t):
bigM = 10**6
return mdl.ontime[t] * bigM >= mdl.production[t]
bigM instead should be great enough in my case...
All of them do not work at all...and I have no idea, why CPLEX returns the error that there is only an infeasible solution.
Basically the model runs if I don't consider the ontime-integration.
Do you guys have any more ideas how I could implement this?
Many greetings,
Mathias

It isn't real clear what the desired relationship is between your variables/constraints. That said, I don't think this is legal. I'm surprised that it isn't popping an error....and if not popping an error, I'm pretty sure it isn't doing what you think:
def flag(mdl,t):
return mdl.ontime[t] + (mdl.production[t]>=0.1) >= 2
You are essentially burying an inferred binary variable in there with the test on mdl.production, which isn't going to work, I believe. You probably need to introduce another variable or such.

What is the time complexity of below function?

I was reading book about competitive programming and was encountered to problem where we have to count all possible paths in the n*n matrix.
Now the conditions are :
`
1. All cells must be visited for once (cells must not be unvisited or visited more than once)
2. Path should start from (1,1) and end at (n,n)
3. Possible moves are right, left, up, down from current cell
4. You cannot go out of the grid
Now this my code for the problem :
typedef long long ll;
ll path_count(ll n,vector<vector<bool>>& done,ll r,ll c){
ll count=0;
done[r][c] = true;
if(r==(n-1) && c==(n-1)){
for(ll i=0;i<n;i++){
for(ll j=0;j<n;j++) if(!done[i][j]) {
done[r][c]=false;
return 0;
}
}
count++;
}
else {
if((r+1)<n && !done[r+1][c]) count+=path_count(n,done,r+1,c);
if((r-1)>=0 && !done[r-1][c]) count+=path_count(n,done,r-1,c);
if((c+1)<n && !done[r][c+1]) count+=path_count(n,done,r,c+1);
if((c-1)>=0 && !done[r][c-1]) count+=path_count(n,done,r,c-1);
}
done[r][c] = false;
return count;
}
Here if we define recurrence relation then it can be like: T(n) = 4T(n-1)+n2
Is this recurrence relation true? I don't think so because if we use masters theorem then it would give us result as O(4n*n2) and I don't think it can be of this order.
The reason, why I am telling, is this because when I use it for 7*7 matrix it takes around 110.09 seconds and I don't think for n=7 O(4n*n2) should take that much time.
If we calculate it for n=7 the approx instructions can be 47*77 = 802816 ~ 106. For such amount of instruction it should not take that much time. So here I conclude that my recurrene relation is false.
This code generates output as 111712 for 7 and it is same as the book's output. So code is right.
So what is the correct time complexity??

No, the complexity is not O(4^n * n^2).
Consider the 4^n in your notation. This means, going to a depth of at most n - or 7 in your case, and having 4 choices at each level. But this is not the case. In the 8th, level you still have multiple choices where to go next. In fact, you are branching until you find the path, which is of depth n^2.
So, a non tight bound will give us O(4^(n^2) * n^2). This bound however is far from being tight, as it assumes you have 4 valid choices from each of your recursive calls. This is not the case.
I am not sure how much tighter it can be, but a first attempt will drop it to O(3^(n^2) * n^2), since you cannot go from the node you came from. This bound is still far from optimal.

Complexity of subset sum solver algorithm

I made a subset sum problem but I am still confused about its complexity.
Please find the algorithm here:
http://www.vinaysofs.com/P=NP%20proved%20with%20Subset%20Sum%20Problem%20solution%20manuscript.pdf
Basically the essence is:
i = 0;
func1(i)
{
if(func1(i + 1))
return true;
else
func1(i + 2, with few modifications in other arguments);
}
Here, i is index of an element in an integer set like {1,1,1,1,1,1}.
Worst case for above algorithm comes when all elements are 1 and we need a sum which is 1 more than the total sum of all elements.
I want to know the complexity of this algorithm as someone told me it is non polynomial(Exponential time - 2^n. If it is polynomial then it would be a big achievement.
In my view also it is not polynomial as T(n)=2T(n-1)+6 but that is in worst case that is every time the 1st recursive call fails.
Please help..
Thanks
Vinay

Efficient random permutation of n-set-bits

For the problem of producing a bit-pattern with exactly n set bits, I know of two practical methods, but they both have limitations I'm not happy with.
First, you can enumerate all of the possible word values which have that many bits set in a pre-computed table, and then generate a random index into that table to pick out a possible result. This has the problem that as the output size grows the list of candidate outputs eventually becomes impractically large.
Alternatively, you can pick n non-overlapping bit positions at random (for example, by using a partial Fisher-Yates shuffle) and set those bits only. This approach, however, computes a random state in a much larger space than the number of possible results. For example, it may choose the first and second bits out of three, or it might, separately, choose the second and first bits.
This second approach must consume more bits from the random number source than are strictly required. Since it is choosing n bits in a specific order when their order is unimportant, this means that it is making an arbitrary distinction between n! different ways of producing the same result, and consuming at least floor(log_2(n!)) more bits than are necessary.
Can this be avoided?
There is obviously a third approach of iteratively computing and counting off the legal permutations until a random index is reached, but that's simply a space-for-time trade-off on the first approach, and isn't directly helpful unless there is an efficient way to count off those n permutations.
clarification
The first approach requires picking a single random number between zero and (where w is the output size), as this is the number of possible solutions.
The second approach requires picking n random values between zero and w-1, zero and w-2, etc., and these have a product of , which is times larger than the first approach.
This means that the random number source has been forced to produce bits to distinguish n! different results which are all equivalent. I'd like to know if there's an efficient method to avoid relying on this superfluous randomness. Perhaps by using an algorithm which produces an un-ordered list of bit positions, or by directly computing the nth unique permutation of bits.

Seems like you want a variant of Floyd's algorithm:
Algorithm to select a single, random combination of values?
Should be especially useful in your case, because the containment test is a simple bitmask operation. This will require only k calls to the RNG. In the code below, I assume you have randint(limit) which produces a uniform random from 0 to limit-1, and that you want k bits set in a 32-bit int:
mask = 0;
for (j = 32 - k; j < 32; ++j) {
r = randint(j+1);
b = 1 << r;
if (mask & b) mask |= (1 << j);
else mask |= b;
}
How many bits of entropy you need here depends on how randint() is implemented. If k > 16, set it to 32 - k and negate the result.
Your alternative suggestion of generating a single random number representing one combination among the set (mathematicians would call this a rank of the combination) is simpler if you use colex order rather than lexicographic rank. This code, for example:
for (i = k; i >= 1; --i) {
while ((b = binomial(n, i)) > r) --n;
buf[i-1] = n;
r -= b;
}
will fill the array buf[] with indices from 0 to n-1 for the k-combination at colex rank r. In your case, you'd replace buf[i-1] = n with mask |= (1 << n). The binomial() function is binomial coefficient, which I do with a lookup table (see this). That would make the most efficient use of entropy, but I still think Floyd's algorithm would be a better compromise.

[Expanding my comment:] If you only have a little raw entropy available, then use a PRNG to stretch it further. You only need enough raw entropy to seed a PRNG. Use the PRNG to do the actual shuffle, not the raw entropy. For the next shuffle reseed the PRNG with some more raw entropy. That spreads out the raw entropy and makes less of a demand on your entropy source.
If you know exactly the range of numbers you need out of the PRNG, then you can, carefully, set up your own LCG PRNG to cover the appropriate range while needing the minimum entropy to seed it.
ETA: In C++there is a next_permutation() method. Try using that. See std::next_permutation Implementation Explanation for more.

Is this a theory problem or a practical problem?
You could still do the partial shuffle, but keep track of the order of the ones and forget the zeroes. There are log(k!) bits of unused entropy in their final order for your future consumption.
You could also just use the recurrence (n choose k) = (n-1 choose k-1) + (n-1 choose k) directly. Generate a random number between 0 and (n choose k)-1. Call it r. Iterate over all of the bits from the nth to the first. If we have to set j of the i remaining bits, set the ith if r < (i-1 choose j-1) and clear it, subtracting (i-1 choose j-1), otherwise.
Practically, I wouldn't worry about the couple of words of wasted entropy from the partial shuffle; generating a random 32-bit word with 16 bits set costs somewhere between 64 and 80 bits of entropy, and that's entirely acceptable. The growth rate of the required entropy is asymptotically worse than the theoretical bound, so I'd do something different for really big words.
For really big words, you might generate n independent bits that are 1 with probability k/n. This immediately blows your entropy budget (and then some), but it only uses linearly many bits. The number of set bits is tightly concentrated around k, though. For a further expected linear entropy cost, I can fix it up. This approach has much better memory locality than the partial shuffle approach, so I'd probably prefer it in practice.

I would use solution number 3, generate the i-th permutation.
But do you need to generate the first i-1 ones?
You can do it a bit faster than that with kind of divide and conquer method proposed here: Returning i-th combination of a bit array and maybe you can improve the solution a bit

Background
From the formula you have given - w! / ((w-n)! * n!) it looks like your problem set has to do with the binomial coefficient which deals with calculating the number of unique combinations and not permutations which deals with duplicates in different positions.
You said:
"There is obviously a third approach of iteratively computing and counting off the legal permutations until a random index is reached, but that's simply a space-for-time trade-off on the first approach, and isn't directly helpful unless there is an efficient way to count off those n permutations.
...
This means that the random number source has been forced to produce bits to distinguish n! different results which are all equivalent. I'd like to know if there's an efficient method to avoid relying on this superfluous randomness. Perhaps by using an algorithm which produces an un-ordered list of bit positions, or by directly computing the nth unique permutation of bits."
So, there is a way to efficiently compute the nth unique combination, or rank, from the k-indexes. The k-indexes refers to a unique combination. For example, lets say that the n choose k case of 4 choose 3 is taken. This means that there are a total of 4 numbers that can be selected (0, 1, 2, 3), which is represented by n, and they are taken in groups of 3, which is represented by k. The total number of unique combinations can be calculated as n! / ((k! * (n-k)!). The rank of zero corresponds to the k-index of (2, 1, 0). Rank one is represented by the k-index group of (3, 1, 0), and so forth.
Solution
There is a formula that can be used to very efficiently translate between a k-index group and the corresponding rank without iteration. Likewise, there is a formula for translating between the rank and corresponding k-index group.
I have written a paper on this formula and how it can be seen from Pascal's Triangle. The paper is called Tablizing The Binomial Coeffieicent.
I have written a C# class which is in the public domain that implements the formula described in the paper. It uses very little memory and can be downloaded from the site. It performs the following tasks:
Outputs all the k-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the k-index to the proper lexicographic index or rank of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle and is very efficient compared to iterating over the entire set.
Converts the index in a sorted binomial coefficient table to the corresponding k-index. The technique used is also much faster than older iterative solutions.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers. This version returns a long value. There is at least one other method that returns an int. Make sure that you use the method that returns a long value.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to use the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with at least 2 cases and there are no known bugs.
The following tested example code demonstrates how to use the class and will iterate through each unique combination:
public void Test10Choose5()
{
String S;
int Loop;
int N = 10; // Total number of elements in the set.
int K = 5; // Total number of elements in each group.
// Create the bin coeff object required to get all
// the combos for this N choose K combination.
BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
// The Kindexes array specifies the indexes for a lexigraphic element.
int[] KIndexes = new int[K];
StringBuilder SB = new StringBuilder();
// Loop thru all the combinations for this N choose K case.
for (int Combo = 0; Combo < NumCombos; Combo++)
{
// Get the k-indexes for this combination.
BC.GetKIndexes(Combo, KIndexes);
// Verify that the Kindexes returned can be used to retrive the
// rank or lexigraphic order of the KIndexes in the table.
int Val = BC.GetIndex(true, KIndexes);
if (Val != Combo)
{
S = "Val of " + Val.ToString() + " != Combo Value of " + Combo.ToString();
Console.WriteLine(S);
}
SB.Remove(0, SB.Length);
for (Loop = 0; Loop < K; Loop++)
{
SB.Append(KIndexes[Loop].ToString());
if (Loop < K - 1)
SB.Append(" ");
}
S = "KIndexes = " + SB.ToString();
Console.WriteLine(S);
}
}
So, the way to apply the class to your problem is by considering each bit in the word size as the total number of items. This would be n in the n!/((k! (n - k)!) formula. To obtain k, or the group size, simply count the number of bits set to 1. You would have to create a list or array of the class objects for each possible k, which in this case would be 32. Note that the class does not handle N choose N, N choose 0, or N choose 1 so the code would have to check for those cases and return 1 for both the 32 choose 0 case and 32 choose 32 case. For 32 choose 1, it would need to return 32.
If you need to use values not much larger than 32 choose 16 (the worst case for 32 items - yields 601,080,390 unique combinations), then you can use 32 bit integers, which is how the class is currently implemented. If you need to use 64 bit integers, then you will have to convert the class to use 64 bit longs. The largest value that a long can hold is 18,446,744,073,709,551,616 which is 2 ^ 64. The worst case for n choose k when n is 64 is 64 choose 32. 64 choose 32 is 1,832,624,140,942,590,534 - so a long value will work for all 64 choose k cases. If you need numbers bigger than that, then you will probably want to look into using some sort of big integer class. In C#, the .NET framework has a BigInteger class. If you are working in a different language, it should not be hard to port.
If you are looking for a very good PRNG, one of the fastest, lightweight, and high quality output is the Tiny Mersenne Twister or TinyMT for short . I ported the code over to C++ and C#. it can be found here, along with a link to the original author's C code.
Rather than using a shuffling algorithm like Fisher-Yates, you might consider doing something like the following example instead:
// Get 7 random cards.
ulong Card;
ulong SevenCardHand = 0;
for (int CardLoop = 0; CardLoop < 7; CardLoop++)
{
do
{
// The card has a value of between 0 and 51. So, get a random value and
// left shift it into the proper bit position.
Card = (1UL << RandObj.Next(CardsInDeck));
} while ((SevenCardHand & Card) != 0);
SevenCardHand |= Card;
}
The above code is faster than any shuffling algorithm (at least for obtaining a subset of random cards) since it only works on 7 cards instead of 52. It also packs the cards into individual bits within a single 64 bit word. It makes evaluating poker hands much more efficient as well.
As a side, note, the best binomial coefficient calculator I have found that works with very large numbers (it accurately calculated a case that yielded over 15,000 digits in the result) can be found here.

Is this considered memoisation?

In optimising some code recently, we ended up performing what I think is a "type" of memoisation but I'm not sure we should be calling it that. The pseudo-code below is not the actual algorithm (since we have little need for factorials in our application, and posting said code is a firing offence) but it should be adequate for explaining my question. This was the original:
def factorial (n):
if n == 1 return 1
return n * factorial (n-1)
Simple enough, but we added fixed points so that large numbers of calculations could be avoided for larger numbers, something like:
def factorial (n):
if n == 1 return 1
if n == 10 return 3628800
if n == 20 return 2432902008176640000
if n == 30 return 265252859812191058636308480000000
if n == 40 return 815915283247897734345611269596115894272000000000
# And so on.
return n * factorial (n-1)
This, of course, meant that 12! was calculated as 12 * 11 * 3628800 rather than the less efficient 12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1.
But I'm wondering whether we should be calling this memoisation since that seems to be defined as remembering past results of calculations and using them. This is more about hard-coding calculations (not remembering) and using that information.
Is there a proper name for this process or can we claim that memoisation extends back not just to calculations done at run-time but also those done at compile-time and even back to those done in my head before I even start writing the code?

I'd call it pre-calculation rather than memoization. You're not really remembering any of the calculations you've done in the process of calculating a final answer for a given input, rather you're pre-calculating some fixed number of answers for specific inputs. Memoization as I understand it is really more akin to "caching" a set of results as you calculate them for later reuse. If you were to store each value calculated so that you didn't need to recalculate it again later, that would be memoization. Your solution differs in that you never store any "calculated" results from your program, only the fixed points that have been pre-calculated. With memoization if you reran the function with an input different than one of the pre-calculated ones it would not have to recalculate the result, it would simply reuse it.

Whether or not you are hard coding the results in, this is still memoization because you have already calculated results that you are expecting to calculate again. Now this may come in the form of run-time, or compile time.. but either way, it's memoization.

Memoization is done at run-time. You are optimizing at compile time. So, it is not.
See for example ... Wikipedia
Or ...
Memoization
The term memoization was coined by Donald Michie (1968) to refer to the process by which a function is made to automatically remember the results of previous computations. The idea has become more popular in recent years with the rise of functional languages; Field and Harrison (1988) devote a whole chapter to it. The basic idea is just to keep a table of previously computed input/result pairs.
Peter Norvig
University of California
(the bold is mine)
Link

def memoisation(f):
dct = {}
def myfunction(x):
if x not in dct:
dct[x] = f(x)
return dct[x]
return myfunction
#memoisation
def fibonacci(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fibonacci(n-1) + fibonacci(n-2)
def nb_appels(n):
if n==0 or n==1:
return 0
else:
return 1 + nb_appels(n-1) + 1 + nb_appels(n-2)
print(fibonacci(13))
print ('nbappel',nb_appels(13))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas