Select statement in Promela much slower than the equivalent if statement? - spin

So I used the following line in my Promela code.
select( cycles: 26..31 );
However, it was causing state explosion. I replaced it with the following if statement and suddenly the state explosion problem vanished. Isn't the select statement I showed above supposed to be the equivalent of the if statement below? What is going on here?
if
:: cycles = 26;
:: cycles = 27;
:: cycles = 28;
:: cycles = 29;
:: cycles = 30;
:: cycles = 31;
fi;

Your select statement is converted by Spin into
cycles = 26;
do
:: cycles < 31 -> cycles++
:: break
od
This means that there are, in every loop execution, two possibilities to choose among, i.e. two different successor states in the transition system. If not break is chosen, you have to do a comparison and an assignment (two states), and then continue. If you want to reach the value 31, you have had 5 comparisons and 5 assignments before, whereas in the if version there is just a non-deterministic choice for an assignment.
I visualized the two different versions with spinspider, which should make the problem better understandable.
The following picture depicts the state space generated from a program with the "if"-version, where there are clearly only 6 possibilities to choose among:
int cycles;
active proctype testWithIf() {
if
:: cycles = 26;
:: cycles = 27;
:: cycles = 28;
:: cycles = 29;
:: cycles = 30;
:: cycles = 31;
fi;
assert(cycles >= 26 && cycles <= 31);
}
Compare to this the image generated from a program obtained as a transformation of your select statement into a do-loop:
int cycles;
active proctype test1() {
cycles = 26;
do
:: cycles < 31 -> cycles++
:: break
od;
assert(cycles >= 26 && cycles <= 31);
}
Do you see the differences? As I said, I think the main problem is that whereas in the if-version you just choose an assignment, you have to do multiple things in the do-version at each state where you do not choose break: You have to do the comparison, increment the counter, and then continue. This clearly generates a larger state space.

Related

How to find the complexity of the following loop

What would be the worst case running time of the following code where the input is 2 variables and loop exits when first variable becomes larger than the second one. My first guess was O(1) considering (x raised to 3) scales pretty quickly compared to (x raised to 2) but i don't know if it does close the gap quickly even when a is 1 and b is very very large integer.
i = 0;
cin >> a >> b;
while (a <= b)
{
i++;
a *= 3; b*= 2;
}
cout << i;
I think you are solving for the equation:
So solving for n, you get:
Assuming that b > a > 1.
Even for large differences, where a = 1.0001 and b = 10^1000, you get a small n = 41.8

Please explain this code for Merkle–Hellman knapsack cryptosystem?

This is the code snippet from a program that implements Merkle–Hellman knapsack cryptosystem.
// Generates keys based on input data size
private void generateKeys(int inputSize)
{
// Generating values for w
// This first value of the private key (w) is set to 1
w.addNode(new BigInteger("1"));
for (int i = 1; i < inputSize; i++)
{
w.addNode(nextSuperIncreasingNumber(w));
}
// Generate value for q
q = nextSuperIncreasingNumber(w);
// Generate value for r
Random random = new Random();
// Generate a value of r such that r and q are coprime
do
{
r = q.subtract(new BigInteger(random.nextInt(1000) + ""));
}
while ((r.compareTo(new BigInteger("0")) > 0) && (q.gcd(r).intValue() != 1));
// Generate b such that b = w * r mod q
for (int i = 0; i < inputSize; i++)
{
b.addNode(w.get(i).getData().multiply(r).mod(q));
}
}
Just tell me what is going on in the following lines:
do
{
r = q.subtract(new BigInteger(random.nextInt(1000) + ""));
}
while ((r.compareTo(new BigInteger("0")) > 0) && (q.gcd(r).intValue() != 1));
(1) Why is random number generated with upper bound 1000?
(2) Why is it subtracted from q?
The code is searching for a value that is co-prime with the already selected value q. In my opinion, it's doing so rather poorly, but you mention it's a simulator? I'm not sure what that means, but maybe it just means the code is quick and dirty rather than slow and secure.
Answering your questions directly:
Why is random number generated with upper bound 1000?
The Merkle-Hellman algorithm does indicate that r should be 'random'. The implementation for doing so is pretty haphazard; that might be what's thrown you off. The code is not technically an algorithm because the loop is not guaranteed to terminate. In theory, the psuedo-random candidate selection of r could be an arbitrarily long sequence of numbers which aren't co-prime to q, resulting in an infinite loop.
The upper bound of 1000 could be to ensure that the chosen r is sufficiently large. In general, large keys are harder to break than small keys, so if q is large, then this code will only find large r.
A more deterministic way to get a random co-prime would be to test each number lower than q, generating a list of co-primes and select one at random. This would probably be more secure, as an attacker knowing that q and r are within 1000 of each other would have a significantly reduced search space.
Why is it subtracted from q?
The subtraction is important because r must be less than q. The Merkle-Hellmen algorithm specifies it that way. I'm not convince that it needs to be that way. The public key is generated by multiplying each element in w by r and taking the modulus q. If r were very large, larger than q, it seems like it would further obfuscate q and each element in w.
The decryption step of Merkle-Hellmen, on the other hand, depends on the modular inverse of each encrypted letter a x r−1 mod q. This operation might be hampered by having r > q; it seems like it could still work out.
However, if nextInt can return 0, that iteration of the loop is a waste as a q and r must be different (gcd(a,a) is just a).
Breaking down the code:
do
Try it at least once. r is probably null or undefined before the method is called.
r = q.subtract(new BigInteger(random.nextInt(1000) + ""));
Find a candidate value that's between q and q - 1000.
while ((r.compareTo(new BigInteger("0")) > 0) && (q.gcd(r).intValue() != 1));
Keep going until you've found an r that is:
Greater than 0 r.compareTo(new BigInteger("0")) > 0, and
Is co-prime with q, q.gcd(r).intValue() != 1. Obviously, a randomly selected number is not guaranteed to be co-prime with another other number, so the randomly generated candidate might not be work for this q.
Does that clear it up? I have to admit that I'm not an expert on Merkle-Hellman.

Optimization of "static" loops

I'm writing a compiled language for fun, and I've recently gotten on a kick for making my optimizing compiler very robust. I've figured out several ways to optimize some things, for instance, 2 + 2 is always 4, so we can do that math at compile time, if(false){ ... } can be removed entirely, etc, but now I've gotten to loops. After some research, I think that what I'm trying to do isn't exactly loop unrolling, but it is still an optimization technique. Let me explain.
Take the following code.
String s = "";
for(int i = 0; i < 5; i++){
s += "x";
}
output(s);
As a human, I can sit here and tell you that this is 100% of the time going to be equivalent to
output("xxxxx");
So, in other words, this loop can be "compiled out" entirely. It's not loop unrolling, but what I'm calling "fully static", that is, there are no inputs that would change the behavior of the segment. My idea is that anything that is fully static can be resolved to a single value, anything that relies on input or makes conditional output of course can't be optimized further. So, from the machine's point of view, what do I need to consider? What makes a loop "fully static?"
I've come up with three types of loops that I need to figure out how to categorize. Loops that will always end up with the same machine state after every run, regardless of inputs, loops that WILL NEVER complete, and loops that I can't figure out one way or the other. In the case that I can't figure it out (it conditionally changes how many times it will run based on dynamic inputs), I'm not worried about optimizing. Loops that are infinite will be a compile error/warning unless specifically suppressed by the programmer, and loops that are the same every time should just skip directly to putting the machine in the proper state, without looping.
The main case of course to optimize is the static loop iterations, when all the function calls inside are also static. Determining if a loop has dynamic components is easy enough, and if it's not dynamic, I guess it has to be static. The thing I can't figure out is how to detect if it's going to be infinite or not. Does anyone have any thoughts on this? I know this is a subset of the halting problem, but I feel it's solvable; the halting problem is a problem due to the fact that for some subsets of programs, you just can't tell it may run forever, it may not, but I don't want to consider those cases, I just want to consider the cases where it WILL halt, or it WILL NOT halt, but first I have to distinguish between the three states.
This looks like a kind of a symbolic solver that can be defined for several classes, but not generally.
Let's restrict the requirements a bit: no number overflow, just for loops (while can be sometimes transformed to full for loop, except when using continue etc.), no breaks, no modifications of the control variable inside the for loop.
for (var i = S; E(i); i = U(i)) ...
where E(i) and U(i) are expressions that can be symbolically manipulated. There are several classes that are relatively easy:
U(i) = i + CONSTANT : n-th cycle the value of i is S + n * CONSTANT
U(i) = i * CONSTANT : n-th cycle the value of i is S * CONSTANT^n
U(i) = i / CONSTANT : n-th cycle the value of i is S * CONSTANT^-n
U(i) = (i + CONSTANT) % M : n-th cycle the value of i is (S + n * CONSTANT) % M
and some other quite easy combinations (and some very difficult ones)
Determining whether the loop terminates is searching for n where E(i(n)) is false.
This can be done by some symbolic manipulation for a lot of cases, but there is a lot of work involved in making the solver.
E.g.
for(int i = 0; i < 5; i++),
i(n) = 0 + n * 1 = n, E(i(n)) => not(n < 5) =>
n >= 5 => stops for n = 5
for(int i = 0; i < 5; i--),
i(n) = 0 + n * -1 = -n, E(i(n)) => not(-n < 5) => -n >= 5 =>
n < -5 - since n is a non-negative whole number this is never true - never stops
for(int i = 0; i < 5; i = (i + 1) % 3),
E(i(n)) => not(n % 3 < 5) => n % 3 >= 5 => this is never true => never stops
for(int i = 10; i + 10 < 500; i = i + 2 * i) =>
for(int i = 10; i < 480; i = 3 * i),
i(n) = 10 * 3^n,
E(i(n)) => not(10 * 3^n < 480) => 10 * 3^n >= 480 => 3^n >= 48 => n >= log3(48) => n >= 3.5... =>
since n is whole => it will stop for n = 4
for other cases it would be good if they can get transformed to the ones you can already solve...
Many tricks for symbolic manipulation come from Lisp era, and are not too difficult. Although the ones described (or variants) are the most common types practice, there are many more difficult and/or impossible to solve scenarios.

Fast FFT Bit Reversal, Can I Count Down Backwards Bit Reversed?

I'm using FFT's for audio processing, and I've come up with some potentially very fast ways of doing the bit reversal needed which might be of use to others, but because of the size of my FFT's (8192), I'm trying to reduce memory usage / cache flushing do to size of lookup tables or code, and increase performance. I've seen lots of clever bit reversal routines; they all allow you can feed them with any arbitrary value and get a bit reversed output, but FFT's don't need that flexibility since they go in a predictable sequence. First let me state what I have tried and/or figured out since it may be the fastest to date and you can see the problem, then I'll ask the question.
1) I've written a program to generate straight through, unlooped x86 source code that can be pasted into my FFT code, which reads an audio sample, multiplies it by a window value (that's a lookup table itself) and then just places the resulting value in it's proper bit reversed sorted position by absolute values within the x86 addressing modes like: movlps [edi+1876],xmm0. This is the absolute fastest way to do this for smaller FFT sizes. The problem is when I write straight through code to handle 8192 values, the code grows beyond the L1 instruction cache size and performance drops way down. Of course in contrast, a 32K bit reversal lookup table mixed with a 32K window table, plus other stuff, is also too big to fit the L1 data cache, and performance drops way down, but that's the way I'm currently doing it.
2) I've found patterns in the bit reversal sequence that can be exploited to reduce lookup table size, for example using 4 bit numbers (0..15) as an example, the bit reversal sequence looks like: 0,8,4,12,2,10,6,14|1,5,9,13,3,11,7,15. First thing that can be seen is that the last 8 numbers are the same as the first 8 +1, so I can chop my LUT half. If I look at the difference between the numbers there is more redundancy, so if I start with a zero in a register and want to add values to it to get the next bit reversed number they would be: +0,+8,-4,+8,-10,+8,-4,+8 and the same for the second half. As can be seen, I could have a lookup table of just 0 and -10 because the +8's and -4's always show up in a predictable way. The code would be unrolled to handle 4 values per loop: one would be a lookup table read, and the other 3 would be straight code for +8, -4, +8, before looping around again. Then a second loop could handle the 1,5,9,13,3,11,7,15 sequence. This is great, because I can now chop down my lookup table by another factor of 4. This scales up the same way for an 8192 size FFT. I can now get by with a 4K size LUT instead of 32K. I can exploit the same pattern and double the size of my code and chop down the LUT by another half yet again, however far I want to go. But in order to eliminate the LUT altogether, I'm back to the prohibitive code size.
For large FFT sizes, I believe that this #2 solution is the absolute fastest to date, since a relatively small percentage of lookup table reads need to be done, and every algorithm I currently find on the web requires too many serial/dependency calculations which can't be vectorized.
The question is, is there an algorithm that can increment numbers so the MSB acts like the LSB, and so on? In other words (in binary): 0000, 1000, 0100, 1100, 0010, etc… I've tried to think up some way, and so far, short of a bunch of nested loops, I can't seem to find a way for a fast and simple algorithm that is a mirror image of simply adding 1 to the LSB of a number. Yet it seems like there should be a way.
One other approach to consider: take a well known bit reversal algorithm - typically a few masks, shifts, and ORs - then implement this with SSE, so you get e.g. 8 x 16 bit bit reversals for the price of one. For 16 bits you need 5*log2(N) = 20 instructions, so the aggregate throughput would be 2.5 instructions per bit reversal.
This is the most trivial and straightforward solution (in C):
void BitReversedIncrement(unsigned *var, int bit)
{
unsigned c, one = 1u << bit;
do {
c = *var & one;
(*var) ^= one;
one >>= 1;
} while (one && c);
}
The main problem with is the conditional branches, which are often costly on modern CPUs. You have one conditional branch per bit.
You can do reversed increments by working on several bits at a time, e.g. 3 if ints are 32-bit:
void BitReversedIncrement2(unsigned *var, int bit)
{
unsigned r = *var, t = 0;
while (bit >= 2 && !t)
{
unsigned tt = (r >> (bit - 2)) & 7;
t = (07351624 >> (tt * 3)) & 7;
r ^= ((tt ^ t) << (bit - 2));
bit -= 3;
}
if (bit >= 0 && !t)
{
t = r & ((1 << (bit + 1)) - 1);
r ^= t;
t <<= 2 - bit;
t = (07351624 >> (t * 3)) & 7;
t >>= 2 - bit;
r |= t;
}
*var = r;
}
This is better, you only have 1 conditional branch per 3 bits.
If your CPU supports 64-bit ints, you can work on 4 bits at a time:
void BitReversedIncrement3(unsigned *var, int bit)
{
unsigned r = *var, t = 0;
while (bit >= 3 && !t)
{
unsigned tt = (r >> (bit - 3)) & 0xF;
t = (0xF7B3D591E6A2C48ULL >> (tt * 4)) & 0xF;
r ^= ((tt ^ t) << (bit - 3));
bit -= 4;
}
if (bit >= 0 && !t)
{
t = r & ((1 << (bit + 1)) - 1);
r ^= t;
t <<= 3 - bit;
t = (0xF7B3D591E6A2C48ULL >> (t * 4)) & 0xF;
t >>= 3 - bit;
r |= t;
}
*var = r;
}
Which is even better. And the only look-up table (07351624 or 0xF7B3D591E6A2C48) is tiny and likely encoded as an immediate instruction operand.
You can further improve the code if the bit position for the reversed "1" is a known constant. Just unroll the while loop into nested ifs, substitute the reversed one bit position constant.
For larger FFTs, paying attention to cache blocking (minimizing total uncovered cache miss cycles) can have a far larger effect on performance than optimization of the cycle count taken by indexing bit reversal. Make sure not to de-optimize a bigger effect by a larger cycle count while optimizing the smaller effect. For small FFTs, where everything fits in cache, LUTs can be a good solution as long as you pay attention to any load-use hazards by making sure things are or can be pipelined appropriately.

MathProg (AMPL) - Variable Array Sized by Another Variable

I am writing my first GNU MathProg (AMPL) program to find the minimum switch (vertex) count instances of a HyperX topology (graph) for a given radix, number of hosts, and bisection bandwidth. This is a simple first program because all of the equations have been described in the following paper: http://cal.snu.ac.kr/files/2009.sc.hyperx.pdf
I have read the specification and example programs, but I am stuck on a very simple syntax error. I need to have the following two variables: L, the number of dimensions in the network, and an array S of length L, where each element of S is the number of switches in each dimension. In my MathProg program, I express this as:
var L >= 1, integer;
var S{1 .. L} >= 2, integer;
However, when I run $ glpsol --check --math hyperx.mod, I get the following error:
hyperx.mod:28: operand following .. has invalid type
Context: ...isec ; param radix ; var L >= 1 , integer ; var S { 1 .. L }
If anybody can help explain how I should properly express this relationship, I will be grateful. Also, I am including the entire program I have written for reference and extra help. I expect there to be many syntax errors in my program, but until I fix the first one, I have no way of finding the rest.
/*
* A MathProg linear program to find an optimal HyperX topology of a
* given network size, switch radix, and bisection bandwidth. Optimal
* is simplistically defined as minimum switch count network.
*
* A HyperX topology is a multi-dimensional network (graph) where, in
* each dimension, the switches are fully connected. Every switch
* (vertex) is a point in an L-dimensional integer lattic. Each switch
* is identified by a multi-index I = (I_1, ..., I_L) where 0 <= I_k <
* S_k for each k = 1..L, where S_k is the number of switches in each
* dimension. A switch connects to all others whose multi-index is the
* same in all but one coordinate.
*/
/* Network size in number of hosts. */
param hosts;
/* Desired bisection bandwidth. */
param bisec;
/* Maximum switch radix. */
param radix;
/* The number of dimensions in the HyperX. */
var L >= 1, integer;
/* The number of switches in each dimension. */
var S{1 .. L} >= 2, integer;
/*
* Relative bandwidth of the dimension, i.e., the number of links in a
* given dimension.
*/
var K{1 .. L} >= 1, integer;
/* The number Terminals (hosts) per switch */
var T >= 1, integer;
/* Minimize the total number of switches. */
minimize cost: prod{i in 1..L} S[i];
/* The total number of links must be less than the switch radix. */
s.t. Radix: T + sum{i in 1..L} K[i] * (S[i] - 1) <= radix;
/* There must be enough hosts in the network. */
s.t. Hosts: T * prod{i in 1..L} S[i] >= hosts;
/* There must be enough bandwidth. */
s.t. Bandwidth: min{K[i]*S[i]} / (2 * T) >= bisec;
/* The order of the dimensions doesn't matter, so constrain them */
s.t. SwitchDimen: forall{i in 1..(L-1)} S[i] <= S[i+1];
/*
* Bisection bandwidth depends on the smallest S_i * K_i, so we know
* that the smallest switch count dimension needs the most links.
*/
s.t. LinkDimen: forall{i in 1..(L-1)} K[i] >= K[i+1];
# TODO: I would like to constrain the search such that the number of
# terminals, T, is bounded to T >= (hosts / O), where O is the switch
# count of the smallest switch count topology discovered so far, but I
# don't know how to do this.
/* Data section */
data;
param hosts := 32
param bisec := 0.5
param radix := 64
end;
Fixed number of variables in a problem is a common assumption in solvers and algebraic modelling languages including AMPL/MathProg. Therefore you can only use constant expressions, in particular parameters, not variables in indexing expressions. One possible solution is to make L a parameter, resolve your problem for different values of L and select the one that gives the best objective value. This can be done with a simple AMPL script.