I'm new to CBMC and experimenting with it. In this link here, there is a toy example for checking the function binsearch with CBMC. I decided to run the following command that they provided, just changing up the number of times the loop was unwound:
cbmc binsearch.c --function binsearch --unwind 4 --bounds-check --unwinding-assertions
It returned the following:
** Results:
[binsearch.unwind.0] unwinding assertion loop 0: FAILURE
prog.c function binsearch
[binsearch.array_bounds.1] line 7 array `a' lower bound in a[(signed long int)middle]: SUCCESS
[binsearch.array_bounds.2] line 7 array `a' upper bound in a[(signed long int)middle]: SUCCESS
[binsearch.array_bounds.3] line 9 array `a' lower bound in a[(signed long int)middle]: SUCCESS
[binsearch.array_bounds.4] line 9 array `a' upper bound in a[(signed long int)middle]: SUCCESS
Is the fact that the unwinding assertion failed because there weren't enough iterations a bad thing? From my point-of-view, it seems like the example is bug-free because the code didn't access portions of memory that it's not supposed to, but I'm not sure based on that one unwinding assertions failure. Anyone have any ideas about the safety? Does that failure matter?
Based on the --unwinding-assertion property, which checks the following:
Checks whether --unwind is large enough to cover all program paths. If the argument is too small, CBMC will detect that not enough unwinding is done reports that an unwinding assertion has failed.
I'd say that it is alerts to the possibility that there aren't enough loop iterations to make sure that the function won't access the array outside of the bounds. This means that while the function didn't violate any properties with 4, we need to check all paths before we can say that it is safe for certain.
Related
I am working on a problem that requires to optimize a double variable. I wrote a simple code to try it out that tries to find a number given an upper bound ( maximize X while X < upper bound), but I get the following error that I did not understand :
2022-11-13 09:59:07,003 [main] INFO Solving started: time spent (119), best score (-1init/0hard/0soft), environment mode (REPRODUCIBLE), move thread count (NONE), random (JDK with seed 0).
Exception in thread "main" java.lang.ClassCastException: class org.optaplanner.core.impl.domain.valuerange.buildin.primdouble.DoubleValueRange cannot be cast to class org.optaplanner.core.api.domain.valuerange.CountableValueRange (org.optaplanner.core.impl.domain.valuerange.buildin.primdouble.DoubleValueRange and org.optaplanner.core.api.domain.valuerange.CountableValueRange are in unnamed module of loader 'app')
at org.optaplanner.core.impl.heuristic.selector.value.FromSolutionPropertyValueSelector.iterator(FromSolutionPropertyValueSelector.java:127)
at org.optaplanner.core.impl.heuristic.selector.value.FromSolutionPropertyValueSelector.iterator(FromSolutionPropertyValueSelector.java:120)
at org.optaplanner.core.impl.heuristic.selector.value.decorator.ReinitializeVariableValueSelector.iterator(ReinitializeVariableValueSelector.java:58)
at org.optaplanner.core.impl.heuristic.selector.common.iterator.AbstractOriginalChangeIterator.createUpcomingSelection(AbstractOriginalChangeIterator.java:35)
at org.optaplanner.core.impl.heuristic.selector.common.iterator.AbstractOriginalChangeIterator.createUpcomingSelection(AbstractOriginalChangeIterator.java:10)
at org.optaplanner.core.impl.heuristic.selector.common.iterator.UpcomingSelectionIterator.hasNext(UpcomingSelectionIterator.java:27)
at org.optaplanner.core.impl.constructionheuristic.placer.QueuedEntityPlacer$QueuedEntityPlacingIterator.createUpcomingSelection(QueuedEntityPlacer.java:45)
at org.optaplanner.core.impl.constructionheuristic.placer.QueuedEntityPlacer$QueuedEntityPlacingIterator.createUpcomingSelection(QueuedEntityPlacer.java:31)
at org.optaplanner.core.impl.heuristic.selector.common.iterator.UpcomingSelectionIterator.hasNext(UpcomingSelectionIterator.java:27)
at org.optaplanner.core.impl.constructionheuristic.DefaultConstructionHeuristicPhase.solve(DefaultConstructionHeuristicPhase.java:45)
at org.optaplanner.core.impl.solver.AbstractSolver.runPhases(AbstractSolver.java:83)
at org.optaplanner.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:193)
at SimpleApp.main(SimpleApp.java:42)
The variable X has a range between 1 and 300, and the upper bound is an arbitrary 10.548
OptaPlanner intentionally avoids working with doubles. The documentation explains why, and also describes better ways of dealing with the issue.
That said, the exception you mention still shouldn't be happening, or there should be a more descriptive one. I'll eventually look into it. But my advice is to not count on doubles in your scoring function.
A: if
:: q?a -> ...
:: else -> ...
fi
Note that a race condition is built-in to this type of code. How long
should the process wait, for instance, before deciding that the
message receive operation will not be executable? The problem can be
avoided by using message poll operations, for instance, as follows:
The above citation comes from http://spinroot.com/spin/Man/else.html
I cannot understand that argumentation. Just Spin can decide on q?a:
if q is empty then it is executable. Otherwise, it is blocking.
The given argument raised a race condition.
But, I can make the same argument:
byte x = 1;
A: if
:: x == 2 -> ...
:: else -> ...
fi
It is ok from point of Spin's view. But, I am asking, How long should the process wait, for instance, before deciding that the value of x will not be incremented by other process?
The argumentation is sound with respect to the semantics of Promela and the selection construct. Note that for selection, if multiple guard statements are executable, one of them will be selected non-deterministically. This in turns implies the semantics such that selection (even though it can non-deterministally execute guards) needs to determine which guards are executable at the point of invocation of the selection statement.
The question about the race condition might make more sense when considering the semantics of selection and message receives. Note that race condition in this case means that the output of the selection might depend on the time for which it needs to invoke the receive (i.e. whether it finishes at a point at which there is a message in the channel or not).
More specifically, for the selection statement, there should be no ambiguity in terms of feasible guards. Now, the message receive gets the message from the channel only if the channel is not empty (otherwise, it cannot finish executing and waits). Therefore, with respect to the semantics of receive, it is not clear whether it is executable before it is actually executed. In turn, else should execute if the receive is not executable. However, since else should execute only if ? is not executable, so to know if else is executable the program needs to know the future (or determine how much should it wait to know this, thus incurring the race condition).
Note that the argument does not apply to your second example:
byte x = 1;
A: if
:: x == 2 -> ...
:: else -> ...
fi
since here, to answer whether else is eligible no waiting is required (nor knowing the future), since the program can at any point determine if x == 2.
I have a Fortran program which uses a routine in a module to resize a matrix like:
module resizemod
contains
subroutine ResizeMatrix(A,newSize,lindx)
integer,dimension(:,:),intent(inout),pointer :: A
integer,intent(in) :: newSize(2)
integer,dimension(:,:),allocatable :: B
integer,optional,intent(in) :: lindx(2)
integer :: i,j
allocate(B(lbound(A,1):ubound(A,1),lbound(A,2):ubound(A,2)))
forall (i=lbound(A,1):ubound(A,1),j=lbound(A,2):ubound(A,2))
B(i,j)=A(i,j)
end forall
if (associated(A)) deallocate(A)
if (present(lindx)) then
allocate(A(lindx(1):lindx(1)+newSize(1)-1,lindx(2):lindx(2)+newSize(2)-1))
else
allocate(A(newSize(1),newSize(2)))
end if
do i=lbound(B,1),ubound(B,1)
do j=lbound(B,2), ubound(B,2)
A(i,j)=B(i,j)
end do
end do
deallocate(B)
end subroutine ResizeMatrix
end module resizemod
The main program looks like:
program resize
use :: resizemod
implicit none
integer,pointer :: mtest(:,:)
allocate(mtest(0:1,3))
mtest(0,:)=[1,2,3]
mtest(1,:)=[1,4,5]
call ResizeMatrix(mtest,[3,3],lindx=[0,1])
mtest(2,:)=0
print *,mtest(0,:)
print *,mtest(1,:)
print *,mtest(2,:)
end program resize
I use ifort 14.0 to compile the codes. The issue that I am facing is that sometimes I don't get the desired result:
1 0 0
1 0 5
0 0 -677609912
Actually I couldn't reproduce the issue (which is present in my original program) using the minimal test codes. But the point that I noticed was that when I remove the compiler option -fast, this problem disappears.
Then my question would be
If the pieces of code that I use are completely legal?
If any other method for resizing the matrices would be recommended which is better than the one presented in here?
The relevance of the described issue and the compiler option "-fast".
If I've read the code right it's legal but incorrect. In your example you've resized a 2x3 array into 3x3 but the routine ResizeMatrix doesn't do anything to set the values of the extra elements. The strange values you see, such as -677609912, are the interpretation, as integers. of whatever bits were lying around in memory when the memory location corresponding to the unset array element was read (so that it's value could be written out).
The relevance of -fast is that it is common for compilers in debug or low-optimisation modes, to zero-out memory locations but not to bother when higher optimisation is switched on. The program is legal in the sense that it contains no compiler-determinable syntax errors. But it is incorrect in the sense that reading a variable whose value has not been initialised or assigned is not something you regularly ought to do; doing so makes your program, essentially, non-deterministic.
As for your question 2, it raises the possibility that you are not familiar with the intrinsic functions reshape or (F2003) move_alloc. The latter is almost certainly what you want, the former may help too.
As an aside: these days I rarely use pointer on arrays, allocatable is much more useful and generally easier and safer too. But you may have requirements of which I wot not.
EDIT: I realized that I, unfortunately, overlooked a semicolon at the end of the while statement in the first example code and misinterpreted it myself. So there is in fact an empty loop for threads with threadIdx.x != s, a convergency point after that loop and a thread waiting at this point for all the others without incrementing the s variable. I am leaving the original (uncorrected) question below for anyone interested in it. Be aware, that there is a semicolon missing at the end of the second line in the first example and thus, s++ has nothing in common with the cycle body.
--
We were studying serialization in our CUDA lesson and our teacher told us that a code like this:
__shared__ int s = 0;
while (s != threadIdx.x)
s++; // serialized code
would end up with a HW deadlock because the nvcc compiler puts a reconvergence point between the while (s != threadIdx.x) and s++ statements. If I understand it correctly, this means that once the reconvergence point is reached by a thread, this thread stops execution and waits for the other threads until they reach the point too. In this example, however, this never happens, because thread #0 enters the body of the while loop, reaches the reconvergence point without incrementing the s variable and other threads get stuck in an endless loop.
A working solution should be the following:
__shared__ int s = 0;
while (s < blockDim.x)
if (threadIdx.x == s)
s++; // serialized code
Here, all threads within a block enter the body of the loop, all evaluate the condition and only thread #0 increments the s variable in the first iteration (and loop goes on).
My question is, why does the second example work if the first hangs? To be more specific, the if statement is just another point of divergence and in terms of the Assembler language should be compiled into the same conditional jump instruction as the condition in the loop. So why isn't there any reconvergence point before s++ in the second example and has it in fact gone immediately after the statement?
In other sources I have only found that a divergent code is computed independently for every branch - e.g. in an if/else statement, first the if branch is computed with all else-branched threads masked within the same warp and then the other threads compute the else branch while the first wait. There's a reconvergence point after the if/else statement. Why then does the first example freeze, not having the loop split into two branches (a true branch for one thread and a waiting false branch for all the others in a warp)?
Thank you.
It does not make sense to put the reconvergence point between the call to while (s != threadIdx.x) and s++;. It disrupts the program flow since the reconvergence point for a piece of code should be reachable by all threads at compile time. Below picture shows the flowchart of your first piece of code and possible and impossible points of reconvergence.
Regarding this answer about recording the convergence point via SSY instruction, I created below simple kernel resembling your first piece of code
__global__ void kernel_1() {
__shared__ int s;
if(threadIdx.x==0)
s = 0;
__syncthreads();
while (s == threadIdx.x)
s++; // serialized code
}
and compiled it for CC=3.5 with -O3. Below is the result of using cuobjdumbinary tool for the output to observe the CUDA assembly. The result is:
I'm not an expert in reading CUDA assembly but I can see while loop condition checks in lines 0038 and 00a0. At line 00a8, it branches to 0x80 if it satisfies the while loop condition and executes the code block again. The introduction of the reconvergence point is at line 0058 introducing line 0xb8 as the reconvergence point which is after the loop condition check near the exit.
Overall, it is not clear what you're trying to achieve with this piece of code. Also in the second piece of code, the reconvergence point should be again after while loop code block (I don't mean between while and if).
The reason why it "hangs" is neither a HW deadlock nor branching, at least not directly. You produce an endless loop for one or multiple threads (as already suspected).
In your example, there isn't really a convergence point. Since you do not use any synchronization, there aren't any threads that actually wait. What happens here with the while-loop is pretty much a busy-wait.
A kernel only finishes if all threads return. Since you have one (or multiple) endless loops (by accident maybe even none - this is unlikely however) the kernel will never finish.
You declared a shared variable s. This variable is known to all threads within a block.
With your while-statement you basically say (to each thread): increment s until it reaches the value of your (local) thread id. Since all threads are incrementing s in parallel, you introduce race conditions.
Example:
List item
Thread 5 is looping and checking for s to become 5
s is 4
Two threads increment s, it becomes 6
At the same time thread 5 only reached the end of its loop.
Now it reaches the next loop iteration and checks for s and it's not 5.
Thread 5 will never be able to finish since you check via == and the value of s already exceeded the value of the thread id.
Also your solution is quite confusing, because each thread executes the serialized code consecutively (which probably was the intention after all - even though that actually is strange):
Thread 0 will execute the serialized code
After that, thread 1 will execute the serialized code
and so on
Most examples show a program where each thread works on some code, then all threads are synchronized and only single thread executes some more code (maybe it needed the results of all threads).
So, your second example "works" because no thread is stuck in an endless loop, however I can't think of a reason why anyone would use such a code,
since it is confusing and, well, not parallel at all.
I am using Xilinx ISE 10.1 to run some verilog code. In the code I want to write the register values of 3 registers in a file, cipher.txt. The following is the code snippet:
if (clk_count==528) begin
f1 = $fopen("cipher.txt", "w");
$fwrite(f1, "clk: %d", clk_count[11:0]);
$fwrite(f1, "plain: %h", plain[31:0]);
$fwrite(f1, "cipher: %h", cipher[31:0]);
$fclose(f1);
end
At the end of execution, the contents of cipher.txt is found as:
clk: %dplain: %hcipher: %h
There is no other error encountered, but a warning comes up corresponding to the 3 fwrite's:
Parameter 3 is not constant in call of system task $fwrite.
Parameter 3 is not constant in call of system task $fwrite.
Parameter 3 is not constant in call of system task $fwrite.
The values of the registers clk_count and cipher change on every clock cycle (value of register plain remains constant throughout), and the values are written to cipher.txt when clk_count equals 528 (indicated by the if statement)
Can anybody provide some insight and/or help me get past this hurdle?
Thanks.
It appears that ISE expects the arguments to $fwrite to be constant. The warnings are referring to clk_count[11:0], plain[31:0], and cipher[31:0], which are not constant. By definition they are changing each cycle so they are not known at compile time. This also explains why they are not printing and you are seeing %d and %h in the output.
There is nothing to my knowledge in the Verilog spec that requires the arguments to $fwrite be constant. The same code works as expected with Cadence Incisive. My guess is that it's a limitation of ISE, so you may want to check with Xilinx.
Possible work-arounds:
1) Use $swrite to create a string with the proper formatting. Then write the string to the file.
2) Try using an intermediate variable in the calls to $fwrite. Maybe the part-selects are throwing it off. e.g.
integer foo;
foo = clk_count[11:0];
$fwrite(... , foo , ...);
Either of those might work, or not.
Out of curiosity, if you remove the part-selects, and try to print clk_count without the [11:0] , do you get the same warnings?