Receiving message from channel as guard - spin

A: if
:: q?a -> ...
:: else -> ...
fi
Note that a race condition is built-in to this type of code. How long
should the process wait, for instance, before deciding that the
message receive operation will not be executable? The problem can be
avoided by using message poll operations, for instance, as follows:
The above citation comes from http://spinroot.com/spin/Man/else.html
I cannot understand that argumentation. Just Spin can decide on q?a:
if q is empty then it is executable. Otherwise, it is blocking.
The given argument raised a race condition.
But, I can make the same argument:
byte x = 1;
A: if
:: x == 2 -> ...
:: else -> ...
fi
It is ok from point of Spin's view. But, I am asking, How long should the process wait, for instance, before deciding that the value of x will not be incremented by other process?

The argumentation is sound with respect to the semantics of Promela and the selection construct. Note that for selection, if multiple guard statements are executable, one of them will be selected non-deterministically. This in turns implies the semantics such that selection (even though it can non-deterministally execute guards) needs to determine which guards are executable at the point of invocation of the selection statement.
The question about the race condition might make more sense when considering the semantics of selection and message receives. Note that race condition in this case means that the output of the selection might depend on the time for which it needs to invoke the receive (i.e. whether it finishes at a point at which there is a message in the channel or not).
More specifically, for the selection statement, there should be no ambiguity in terms of feasible guards. Now, the message receive gets the message from the channel only if the channel is not empty (otherwise, it cannot finish executing and waits). Therefore, with respect to the semantics of receive, it is not clear whether it is executable before it is actually executed. In turn, else should execute if the receive is not executable. However, since else should execute only if ? is not executable, so to know if else is executable the program needs to know the future (or determine how much should it wait to know this, thus incurring the race condition).
Note that the argument does not apply to your second example:
byte x = 1;
A: if
:: x == 2 -> ...
:: else -> ...
fi
since here, to answer whether else is eligible no waiting is required (nor knowing the future), since the program can at any point determine if x == 2.

Related

Does the process get activated or suspended?

I am still having a little bit of trouble understanding the sensitivity list and it activating a process.
most textbooks say that a process is activated every time an event occurs on a signal inside the sensitivity list.
process(in)
begin
x <= in;
end process;
Now looking at this example, "in" is an input declared in the entity. Now if "in" starts off at 0 and changes to 1 then the process would activate and the value of x would take in value "in". Now suppose after in changed from 0 to 1 that it now stays at constant value of 1. Does this mean the process will not get activated? Will x still give output of '1'? I want to say that it wont get activated and will only activate once in changes back from 1 back to 0. Can someone please confirm?
Within the sensitivity list (I assume this is hardware language, VHDL has the same exact syntax and format), whenever there is some type of signal change (L-> H, 0 ->1, 1-> 0... any change in the variable you listed within the sensitivity list), it will activate the process and the process will execute until completion, which then the process will end. When the process end, signals/outputs (depends on how you interpret them) will be stored on a driver, which will update given signals after some propagation delay.
So from your second statement, yes. If it changes to 0 -> 1, the process activates, if its 1 -> 0, the process activates, and if in remains 1, the process will not be activated. So x's value remains.

Understanding CUDA serialization and reconvergence point

EDIT: I realized that I, unfortunately, overlooked a semicolon at the end of the while statement in the first example code and misinterpreted it myself. So there is in fact an empty loop for threads with threadIdx.x != s, a convergency point after that loop and a thread waiting at this point for all the others without incrementing the s variable. I am leaving the original (uncorrected) question below for anyone interested in it. Be aware, that there is a semicolon missing at the end of the second line in the first example and thus, s++ has nothing in common with the cycle body.
--
We were studying serialization in our CUDA lesson and our teacher told us that a code like this:
__shared__ int s = 0;
while (s != threadIdx.x)
s++; // serialized code
would end up with a HW deadlock because the nvcc compiler puts a reconvergence point between the while (s != threadIdx.x) and s++ statements. If I understand it correctly, this means that once the reconvergence point is reached by a thread, this thread stops execution and waits for the other threads until they reach the point too. In this example, however, this never happens, because thread #0 enters the body of the while loop, reaches the reconvergence point without incrementing the s variable and other threads get stuck in an endless loop.
A working solution should be the following:
__shared__ int s = 0;
while (s < blockDim.x)
if (threadIdx.x == s)
s++; // serialized code
Here, all threads within a block enter the body of the loop, all evaluate the condition and only thread #0 increments the s variable in the first iteration (and loop goes on).
My question is, why does the second example work if the first hangs? To be more specific, the if statement is just another point of divergence and in terms of the Assembler language should be compiled into the same conditional jump instruction as the condition in the loop. So why isn't there any reconvergence point before s++ in the second example and has it in fact gone immediately after the statement?
In other sources I have only found that a divergent code is computed independently for every branch - e.g. in an if/else statement, first the if branch is computed with all else-branched threads masked within the same warp and then the other threads compute the else branch while the first wait. There's a reconvergence point after the if/else statement. Why then does the first example freeze, not having the loop split into two branches (a true branch for one thread and a waiting false branch for all the others in a warp)?
Thank you.
It does not make sense to put the reconvergence point between the call to while (s != threadIdx.x) and s++;. It disrupts the program flow since the reconvergence point for a piece of code should be reachable by all threads at compile time. Below picture shows the flowchart of your first piece of code and possible and impossible points of reconvergence.
Regarding this answer about recording the convergence point via SSY instruction, I created below simple kernel resembling your first piece of code
__global__ void kernel_1() {
__shared__ int s;
if(threadIdx.x==0)
s = 0;
__syncthreads();
while (s == threadIdx.x)
s++; // serialized code
}
and compiled it for CC=3.5 with -O3. Below is the result of using cuobjdumbinary tool for the output to observe the CUDA assembly. The result is:
I'm not an expert in reading CUDA assembly but I can see while loop condition checks in lines 0038 and 00a0. At line 00a8, it branches to 0x80 if it satisfies the while loop condition and executes the code block again. The introduction of the reconvergence point is at line 0058 introducing line 0xb8 as the reconvergence point which is after the loop condition check near the exit.
Overall, it is not clear what you're trying to achieve with this piece of code. Also in the second piece of code, the reconvergence point should be again after while loop code block (I don't mean between while and if).
The reason why it "hangs" is neither a HW deadlock nor branching, at least not directly. You produce an endless loop for one or multiple threads (as already suspected).
In your example, there isn't really a convergence point. Since you do not use any synchronization, there aren't any threads that actually wait. What happens here with the while-loop is pretty much a busy-wait.
A kernel only finishes if all threads return. Since you have one (or multiple) endless loops (by accident maybe even none - this is unlikely however) the kernel will never finish.
You declared a shared variable s. This variable is known to all threads within a block.
With your while-statement you basically say (to each thread): increment s until it reaches the value of your (local) thread id. Since all threads are incrementing s in parallel, you introduce race conditions.
Example:
List item
Thread 5 is looping and checking for s to become 5
s is 4
Two threads increment s, it becomes 6
At the same time thread 5 only reached the end of its loop.
Now it reaches the next loop iteration and checks for s and it's not 5.
Thread 5 will never be able to finish since you check via == and the value of s already exceeded the value of the thread id.
Also your solution is quite confusing, because each thread executes the serialized code consecutively (which probably was the intention after all - even though that actually is strange):
Thread 0 will execute the serialized code
After that, thread 1 will execute the serialized code
and so on
Most examples show a program where each thread works on some code, then all threads are synchronized and only single thread executes some more code (maybe it needed the results of all threads).
So, your second example "works" because no thread is stuck in an endless loop, however I can't think of a reason why anyone would use such a code,
since it is confusing and, well, not parallel at all.

How to go about testing go routines?

An example of this problem is when a user creates a resource and deletes a resource. We will perform the operation and also increment (decrement) a counter cache.
In testing, there is sometimes a race condition where the counter cache has not been updated by the go routine.
EDIT: Sorry about the confusion, to clarify: the counter cache is not in memory, it is actually a field in the database. The race condition is not to a variable in memory, it is actually that the goroutine might be slow to write into the database itself!
I currently use a 1 second sleep after the operation to ensure that the counter cache has been updated before testing the counter cache. Is there another way to test go routine without the arbitrary 1 second sleep to wait for the go routine to finish?
Cheers
In testing, there is sometimes a race condition where the counter cache has not been updated by the go routine. I currently use a 1 second sleep after the operation to ensure that the counter cache has been updated before testing the counter cache.
Yikes, I hate to say it, but you're doing it wrong. Go has first-class features to make concurrency easy! If you use them correctly, it's impossible to have race conditions.
In fact, there's a tool that will detect races for you. I'll bet it complains about your program.
One simple solution:
Have the main routine create a goroutine for keeping track of the counter.
the goroutine will just do a select and get a message to increment/decrement or read the counter. (If reading, it will be passed in a channel to return the number)
when you create/delete resources, send an appropriate message to the goroutine counter via it's channel.
when you want to read the counter, send a message for read, and then read the return channel.
(Another alternative would be to use locks. It would be a tiny bit more performant, but much more cumbersome to write and ensure it's correct.)
One solution is to make to let your counter offer a channel which is updated as soon as the value
changes. In go it is common practice to synchronize by communicating the result. For example your
Couter could look like this:
type Counter struct {
value int
ValueChange chan int
}
func (c *Counter) Change(n int) {
c.value += n
c.ValueChange <- c.value
}
Whenever Change is called, the new value is passed through the channel and whoever is
waiting for the value unblocks and continues execution, therefore synchronizing with the
counter. With this code you can listen on ValueChange for changes like this:
v := <-c.ValueChange
Concurrently calling c.Change is no problem anymore.
There is a runnable example on play.

Techniques for controlling program order of execution

I'm wrestling with the concept of code "order of execution" and so far my research has come up short. I'm not sure if I'm phrasing it incorrectly, it's possible there is a more appropriate term for the concept. I'd appreciate it if someone could shed some light on my various stumbling blocks below.
I understand that if you call one method after another:
[self generateGrid1];
[self generateGrid2];
Both methods are run, but generateGrid1 doesn't necessarily wait for generateGrid2. But what if I need it to? Say generateGrid1 does some complex calculations (that take an unknown amount of time) and populate an array that generateGrid2 uses for it's calculations? This needs to be done every time an event is fired, it's not just a one time initialization.
I need a way to call methods sequentially, but have some methods wait for others. I've looked into call backs, but the concept is always married to delegates in all the examples I've seen.
I'm also not sure when to make the determinate that I can't reasonably expect a line of code to be parsed in time for it to be used. For example:
int myVar = [self complexFloatCalculation];
if (myVar <= 10.0f) {} else {}
How do I determine if something will take long enough to implement checks for "Is this other thing done before I start my thing". Just trial and error?
Or maybe I'm passing a method as parameter of another method? Does it wait for the arguments to be evaluated before executing the method?
[self getNameForValue:[self getIntValue]];
I understand that if you call one method after another:
[self generateGrid1];
[self generateGrid2];
Both methods are run, but generateGrid1 doesn't necessarily wait for generateGrid2. But what if I need it to?
False. generateGrid1 will run, and then generateGrid2 will run. This sequential execution is the very basis of procedural languages.
Technically, the compiler is allowed to rearrange statements, but only if the end result would be provably indistinguishable from the original. For example, look at the following code:
int x = 3;
int y = 4;
x = x + 6;
y = y - 1;
int z = x + y;
printf("z is %d", z);
It really doesn't matter whether the x+6 or the y-1 line happens first; the code as written does not make use of either of the intermediate values other than to calculate z, and that can happen in either order. So if the compiler can for some reason generate more efficient code by rearranging those lines, it is allowed to do so.
You'd never be able to see the effects of such rearranging, though, because as soon as you try to use one of those intermediate values (say, to log it), the compiler will recognize that the value is being used, and get rid of the optimization that would break your logging.
So really, the compiler is not required to execute your code in the order provided; it is only required to generate code that is functionally identical to the code you provided. This means that you actually can see the effects of these kinds of optimizations if you attach a debugger to a program that was compiled with optimizations in place. This leads to all sorts of confusing things, because the source code the debugger is tracking does not necessarily match up line-for-line with the code the compiled code the compiler generated. This is why optimizations are almost always turned off for debug builds of a program.
Anyway, the point is that the compiler can only do these sorts of tricks when it can prove that there will be no effect. Objective-c method calls are dynamically bound, meaning that the compiler has absolutely no guarantee about what will actually happen at runtime when that method is called. Since the compiler can't make any guarantees about what will happen, the compiler will never reorder Objective-C method calls. But again, this just falls back to the same principle I stated earlier: the compiler may change order of execution, but only if it is completely imperceptible to the user.
In other words, don't worry about it. Your code will always run top-to-bottom, each statement waiting for the one before it to complete.
In general, most method calls that you see in the style you described are synchronous, that means they'll have the effect you desire, running in the order the statements were coded, where the second call will only run after the first call finishes and returns.
Also, when a method takes parameters, its parameters are evaluated before the method is called.

Reliable clean-up in Mathematica

For better or worse, Mathematica provides a wealth of constructs that allow you to do non-local transfers of control, including Return, Catch/Throw, Abort and Goto. However, these kinds of non-local transfers of control often conflict with writing robust programs that need to ensure that clean-up code (like closing streams) gets run. Many languages provide ways of ensuring that clean-up code gets run in a wide variety of circumstances; Java has its finally blocks, C++ has destructors, Common Lisp has UNWIND-PROTECT, and so on.
In Mathematica, I don't know how to accomplish the same thing. I have a partial solution that looks like this:
Attributes[CleanUp] = {HoldAll};
CleanUp[body_, form_] :=
Module[{return, aborted = False},
Catch[
CheckAbort[
return = body,
aborted = True];
form;
If[aborted,
Abort[],
return],
_, (form; Throw[##]) &]];
This certainly isn't going to win any beauty contests, but it also only handles Abort and Throw. In particular, it fails in the presence of Return; I figure if you're using Goto to do this kind of non-local control in Mathematica you deserve what you get.
I don't see a good way around this. There's no CheckReturn for instance, and when you get right down to it, Return has pretty murky semantics. Is there a trick I'm missing?
EDIT: The problem with Return, and the vagueness in its definition, has to do with its interaction with conditionals (which somehow aren't "control structures" in Mathematica). An example, using my CleanUp form:
CleanUp[
If[2 == 2,
If[3 == 3,
Return["foo"]]];
Print["bar"],
Print["cleanup"]]
This will return "foo" without printing "cleanup". Likewise,
CleanUp[
baz /.
{bar :> Return["wongle"],
baz :> Return["bongle"]},
Print["cleanup"]]
will return "bongle" without printing cleanup. I don't see a way around this without tedious, error-prone and maybe impossible code-walking or somehow locally redefining Return using Block, which is heinously hacky and doesn't actually seem to work (though experimenting with it is a great way to totally wedge a kernel!)
Great question, but I don't agree that the semantics of Return are murky; They are documented in the link you provide. In short, Return exits the innermost construct (namely, a control structure or function definition) in which it is invoked.
The only case in which your CleanUp function above fails to cleanup from a Return is when you directly pass a single or CompoundExpression (e.g. (one;two;three) directly as input to it.
Return exits the function f:
In[28]:= f[] := Return["ret"]
In[29]:= CleanUp[f[], Print["cleaned"]]
During evaluation of In[29]:= cleaned
Out[29]= "ret"
Return exits x:
In[31]:= x = Return["foo"]
In[32]:= CleanUp[x, Print["cleaned"]]
During evaluation of In[32]:= cleaned
Out[32]= "foo"
Return exits the Do loop:
In[33]:= g[] := (x = 0; Do[x++; Return["blah"], {10}]; x)
In[34]:= CleanUp[g[], Print["cleaned"]]
During evaluation of In[34]:= cleaned
Out[34]= 1
Returns from the body of CleanUp at the point where body is evaluated (since CleanUp is HoldAll):
In[35]:= CleanUp[Return["ret"], Print["cleaned"]];
Out[35]= "ret"
In[36]:= CleanUp[(Print["before"]; Return["ret"]; Print["after"]),
Print["cleaned"]]
During evaluation of In[36]:= before
Out[36]= "ret"
As I noted above, the latter two examples are the only problematic cases I can contrive (although I could be wrong) but they can be handled by adding a definition to CleanUp:
In[44]:= CleanUp[CompoundExpression[before___, Return[ret_], ___], form_] :=
(before; form; ret)
In[45]:= CleanUp[Return["ret"], Print["cleaned"]]
During evaluation of In[46]:= cleaned
Out[45]= "ret"
In[46]:= CleanUp[(Print["before"]; Return["ret"]; Print["after"]),
Print["cleaned"]]
During evaluation of In[46]:= before
During evaluation of In[46]:= cleaned
Out[46]= "ret"
As you said, not going to win any beauty contests, but hopefully this helps solve your problem!
Response to your update
I would argue that using Return inside If is unnecessary, and even an abuse of Return, given that If already returns either the second or third argument based on the state of the condition in the first argument. While I realize your example is probably contrived, If[3==3, Return["Foo"]] is functionally identical to If[3==3, "foo"]
If you have a more complicated If statement, you're better off using Throw and Catch to break out of the evaluation and "return" something to the point you want it to be returned to.
That said, I realize you might not always have control over the code you have to clean up after, so you could always wrap the expression in CleanUp in a no-op control structure, such as:
ret1 = Do[ret2 = expr, {1}]
... by abusing Do to force a Return not contained within a control structure in expr to return out of the Do loop. The only tricky part (I think, not having tried this) is having to deal with two different return values above: ret1 will contain the value of an uncontained Return, but ret2 would have the value of any other evaluation of expr. There's probably a cleaner way to handle that, but I can't see it right now.
HTH!
Pillsy's later version of CleanUp is a good one. At the risk of being pedantic, I must point out a troublesome use case:
Catch[CleanUp[Throw[23], Print["cleanup"]]]
The problem is due to the fact that one cannot explicitly specify a tag pattern for Catch that will match an untagged Throw.
The following version of CleanUp addresses that problem:
SetAttributes[CleanUp, HoldAll]
CleanUp[expr_, cleanup_] :=
Module[{exprFn, result, abort = False, rethrow = True, seq},
exprFn[] := expr;
result = CheckAbort[
Catch[
Catch[result = exprFn[]; rethrow = False; result],
_,
seq[##]&
],
abort = True
];
cleanup;
If[abort, Abort[]];
If[rethrow, Throw[result /. seq -> Sequence]];
result
]
Alas, this code is even less likely to be competitive in a beauty contest. Furthermore, it wouldn't surprise me if someone jumped in with yet another non-local control flow that that this code will not handle. Even in the unlikely event that it handles all possible cases now, problematic cases could be introduced in Mathematica X (where X > 7.01).
I fear that there cannot be a definitive answer to this problem until Wolfram introduces a new control structure expressly for this purpose. UnwindProtect would be a fine name for such a facility.
Michael Pilat provided the key trick for "catching" returns, but I ended up using it in a slightly different way, using the fact that Return forces the return value of a named function as well as control structures like Do. I made the expression that is being cleaned up after into the down-value of a local symbol, like so:
Attributes[CleanUp] = {HoldAll};
CleanUp[expr_, form_] :=
Module[{body, value, aborted = False},
body[] := expr;
Catch[
CheckAbort[
value = body[],
aborted = True];
form;
If[aborted,
Abort[],
value],
_, (form; Throw[##]) &]];