blockIdx (and threadIdx) in Cuda - variables

Why is the Cuda variable 'blockIdx' called blockIdx instead of just blockId? It seems confusing since you can have both blockIdx.x and blockIdx.y, and since it's just the ID of the block, what's the 'x' all about? Same with threadIdx.
Just starting to get into Cuda and was trying to explain to someone how blocks and threads work and we both thought it was a weird/confusing naming convention.

Common shortcuts:
id - Identifier
idx - Index
in CUDA you talk about "block index" and "thread index", hence the shortcut Idx.

Related

G_LLL_XD function in NTL library faulty

I am trying to use the G_LLL_XD function on the NTL library. Whenever I use the function in this format:
G_LLL_XD(B, delta); ,
the program works.
Though, when I want to change the default deep or prune variables and write the function in one of these ways:
G_LLL_XD(B, delta, deep, check, verbose);
G_LLL_XD(B, delta, prune, check, verbose);
during runtime, I get this error:
R610
- abort() has been called
and in the command prompt it says:
"sorry...deep insertions not implemented"
I find this very weird since whenever I use prune as a variable, I get this crash error, which I shouldn't because the function shouldn't be looking for deep insertion but prune, and when I do use deep as a variable and have implemented deep, I still get an error.
Can anybody help me understand what the problem is or how I can fix this? Thank you very much.
I dont found a argument prune for LLL function in NTL. But there is one for BKZ. Since the are both accept positive intergers, its only a naming confusion.
From the documentation:
NOTE: use of "deep" is obsolete, and has been "deprecated". It is
recommended to use BKZ_FP to achieve higher-quality reductions.
Moreover, the Givens versions do not support "deep", and setting
deep != 0 will raise an error in this case.
So you can not use G_LLL_XD with deep != 0 but LLL_XD should work (but it is deprecated).
But as mentioned, you should consider using BKZ_XD instead of LLL_XD.
A BKZ basis of a lattice is also LLL reduced, so there should be no problem. BKZ is slower than LLL but you can choose a small Blocksize, maybe 10 or 20 but also 2 or 4 will work, to speed the reduction up.

Handling magic constants during 64-bit migration

I confess I did something dumb and it now bites me. I used a magic number constant defined as NSUIntegerMax to define a special case index. The value is normally used as index to access selected item in NSArray. In the special case, denoted by the magic number I get the value from elsewhere, instead of from the array.
This index value is serialized in User Defaults as NSNumber.
With Xcode 5.1 my iOS app gets compiled with standard architecture that now also includes arm64. This changed the value of NSUIntegerMax, so now after deserialization I get 32-bit value of NSUIntegerMax, which no longer matches in comparisons with the magic number, whose value is now 64-bit NSUIntegerMax. And it results in NSRangeException with reason: -[__NSArrayI objectAtIndex:]: index 4294967295 beyond bounds [0 .. 10].
It is a minor issue in my code, given the normal range of that array is small, I may just get away with redefining my magic number as 4294967295. But it doesn't feel right. How should I have handled this issue properly?
I guess avoiding the magic number altogether would be the most robust approach?
Note
I think the problem with my magic number is roughly equivalent to what happened to NSNotFound constant. Apple's 64-bit Transition Guide for Cocoa Touch says in section about Common Type-Conversion Problems in Cocoa Touch:
Working with constants defined in the framework as NSInteger. Of particular note is the NSNotFound constant. In the 64-bit runtime, its value is larger than the maximum range of an int type, so truncating its value often causes errors in your app.
… but it does not say what should be done, except to be careful ;-)
If you use NSInteger/NSUInteger it's 4b on 32bit OS and 8b on 64 OS.
If you want to use the the same size integer for both OSs you should consider use int (4) or long long (8) or int32_t/int64_t. To get max int from int you can use cast:
(int)INT_MAX
//or LONG_MAX

Creating robust real-time monitors for variables

We can create a real-time monitor for a variable like this:
CreatePalette#Panel#Row[{"x = ", Dynamic[x]}]
(This is more interesting and useful if x happens to be something like $Assumptions. It's so easy to set a value and then forget about it.)
Unfortunately this stops working if the kernel is re-launched (Quit[], then evaluate something). The palette won't show changes in the value of x any more.
Is there a way to do this so it keeps working even across kernel sessions? I find myself restarting the kernel quite often. (If the resulting palette causes the kernel to be automatically started after Quit that's fine.)
Update: As mentioned in the comments, it turns out that the palette ceases working only if we quit by evaluating Quit[]. When using Evaluation -> Quit Kernel -> Local, it will keep working.
Link to same question on MathGroup.
I can only guess, because on my Ubuntu here the situations seems buggy. The trick with the Quit from the menu like Leonid suggested did not work here. Another one is: on a fresh Mathematica session with only one notebook open:
Dynamic[x]
x = 1
Dynamic[x]
x = 2
gives as expected
2
1
2
2
Typing in the next line Quit, evaluating and typing then x=3 updates only the first of the Dynamic[x].
Nevertheless, have you checked the command
Internal`GetTrackedSymbols[]
This gives not only the tracked symbols but additionally some kind of ID where the dynamic content belongs. If you can find out, what exactly these numbers are and investigate in the other functions you find in the Internal context, you may be able to add your palette Dynamic-content manually after restarting the kernel.
I thought I had something like that with
Internal`SetValueTrackExtra
but I'm currently not able to reproduce the behavior.
#halirutan's answer jarred my memory...
Have you ever come across: Experimental/ref/ValueFunction? (documentation address)
Although the documentation contains no examples, the 'more information' section provides the following tidbit:
The assignment ValueFunction[symb] = f specifies that whenever
symb gets a new value val, the expression f[symb,val] should be
evaluated.

What is the 'accumulator' in HQ9+?

I was just reading a bit about the HQ9+ programming language:
https://esolangs.org/wiki/HQ9+,
https://en.wikipedia.org/wiki/HQ9+, and
https://cliffle.com/esoterica/hq9plus,
and it tells me something about a so-called “accumulator” which can be incremented, but not accessed. Also, using + doesn't manipulate the result, so that the input
H+H
gives the result:
Hello World
Hello World
Can anyone explain me how this works, what it does, and whether it makes any sense? Thanks.
Having recently completed an implementation in Clojure (which follows) I can safely say that the accumulator is absolutely central to a successful implementation of HQ9+. Without it one would be left with an implementation of HQ9 which, while doubtless worthy in and of itself, is clearly different, and thus HQ9+ without an accumulator, and the instruction to increment it, would thus NOT be an implementation of HQ9+.
(Editor's note: Bob has taken his meds today but they haven't quite kicked in yet; thus, further explanation is perhaps needed. What I believe Bob is trying to say is that HQ9+ is useless as a programming language, per se; however, implementing it can actually be useful in the context of learning how to implement something successfully in a new language. OK, I'll just go and curl up quietly in the back of Bob's brain now and let him get back to doing...whatever it is he does when I'm not minding the store...).
Anyways...implementation in Clojure follows:
(defn hq9+ [& args]
"HQ9+ interpreter"
(loop [program (apply concat args)
accumulator 0]
(if (not (empty? program))
(case (first program)
\H (println "Hello, World!")
\Q (println (first (concat args)))
\9 (apply println (map #(str % " bottles of beer on the wall, "
% " bottles of beer, if one of those bottles should happen to fall, "
(if (> % 0) (- % 1) 99) " bottles of beer on the wall") (reverse (range 100))))
\+ (inc accumulator)
(println "invalid instruction: " (first program)))) ; default case
(if (> (count program) 1)
(recur (rest program) accumulator))))
Note that this implementation only accepts commands passed into the function as parameters; it doesn't read a file for its program. This may be remedied in future releases. Also note that this is a "strict" implementation of the language - the original page (at the Wayback Machine) clearly shows that only UPPER CASE 'H's and 'Q's should be accepted, although it implies that lower-case letters may also be accepted. Since part of the point of implementing any programming language is to strictly adhere to the specification as written this version of HQ9+ is written to only accept upper-case letters. Should the need arise I am fully prepared to found a religion, tentatively named the CONVOCATION OF THE HOLY CAPS LOCK, which will declare the use of upper-case to be COMMANDED BY FRED (our god - Fred - it seems like such a friendly name for a god, doesn't it?), and will deem the use of lower-case letters to be anathema...I MEAN, TO BE ANATHEMA!
Share and enjoy.
Having written an implementation, I think I can say without a doubt that it makes no sense at all. I advise you to not worry about it; it's a very silly language after all.
It's a joke.
There's also an object-oriented extension of HQ9+, called HQ9++. It has a new command ++ which instantiates an object, and, for reasons of backwards-compatibility, also increments the accumulator register twice. And again, since there is no way to store, retrieve, access, manipulate, print or otherwise affect an object, it's completely useless.
It increments something not accessible, not spec-defined, and apparently not really even used. I'd say you can implement it however you want or possibly not at all.
The right answer is one that has been hinted at by the other answers but not quite stated explicitly: the effect of incrementing the accumulator is undefined by the language specification and left as a choice of the implementation.
Actually, I am mistaken.
The accumulator is the register to which the result of the last calculation is stored. In an Intel x86, any register may be specified as the accumulator, except in the case of MUL.
Source:
http://en.wikipedia.org/wiki/Accumulator_(computing)
I was quite surprised the first time I visited the third site in your question to find out a schoolmate of mine wrote the OCaml implementation at the bottom of the page.
(updated site link)
I think that there is, or there must be, a reason for this accumulator and the most important operation on it - increment: future compatibility.
Very often we see that a language is invented, often inspirated by some other language, with of course some salt (new concepts, or at least some improvement). Later, when the language spreads, problems arise and modifications, additions or whatever are introduced. That is the same as saying "we were wrong, this thing was necessary but we didn't tought of it at the time".
Well, this accumulator idea in HQ9+ is exactly the opposite. In the future, when the language will be spread, nobody will be able to say "we need an accumulator, but HQ9+ lacks it", because the standard of the language, even in its first draft, states that an accumulator is present and it is even modifiable (otherwise, it would be a non-sense).

Performance overhead of perform: in Smalltalk (specifically Squeak)

How much slower can I reasonably expect perform: to be than a literal message send, on average? Should I avoid sending perform: in a loop, similar to the admonishment given to Perl/Python programmers to avoid calling eval("...") (Compiler evaluate: in Smalltalk) in a loop?
I'm concerned mainly with Squeak, but interested in other Smalltalks as well. Also, is the overhead greater with the perform:with: variants? Thank you
#perform: is not like eval(). The problem with eval() (performance-wise, anyway) is that it has to compile the code you're sending it at runtime, which is a very slow operation. Smalltalk's #perform:, on the other hand, is equivalent to Ruby's send() or Objective-C's performSelector: (in fact, both of these languages were strongly inspired by Smalltalk). Languages like these already look up methods based on their name — #perform: just lets you specify the name at runtime rather than write-time. It doesn't have to parse any syntax or compile anything like eval().
It will be a little slower (the cost of one extra method call at least), but it isn't like eval(). Also, the variants with more arguments shouldn't show any difference in speed vs. just plain perform:whatever. I can't talk with that much experience about Squeak specifically, but this is how it generally works.
Here are some numbers from my machine (it is Smalltalk/X, but I guess the numbers are comparable - at least the ratios should be):
The called methods "foo" and "foo:" are a noops (i.e. consist of a ^self):
self foo ... 3.2 ns
self perform:#foo ... 3.3 ns
[self foo] value ... 12.5 ns (2 sends and 2 contexts)
[ ] value ... 3.1 ns (empty block)
Compiler valuate:('TestClass foo') ... 1.15 ms
self foo:123 ... 3.3 ns
self perform:#foo: with:123 ... 3.6 ns
[self foo:123] value ... 15 ns (2 sends and 2 contexts)
[self foo:arg] value:123 ... 23 ns (2 sends and 2 contexts)
Compiler valuate:('TestClass foo:123') ... 1.16 ms
Notice the big difference between "perform:" and "evaluate:"; evaluate is calling the compiler to parse the string, generate a throw-away method (bytecode), execute it (it is jitted on the first call) and finally discarded. The compiler is actually written to be used mainly for the IDE and to fileIn code from external streams; it has code for error reporting, warning messages etc.
In general, eval is not what you want when performance is critical.
Timings from a Dell Vostro; your milage may vary, but the ratios not.
I tried to get the net execution times, by measuring the empty loop time and subtracting;
also, I ran the tests 10 times and took the best times, to eliminate OS/network/disk/email or whatever disturbances. However, I did not really care for a load-free machine.
The measure code was (replaced the second timesRepeat-arg with the stuff above):
callFoo2
|t1 t2|
t1 :=
TimeDuration toRun:[
100000000 timesRepeat:[]
].
t2 :=
TimeDuration toRun:[
100000000 timesRepeat:[self foo:123]
].
Transcript showCR:t2-t1
EDIT:
PS: I forgot to mention: these are the times from within the IDE (i.e. bytecode-jitted execution). Statically compiled code (using the stc-compiler) will generally be a bit faster (20-30%) on these low-level micro benchmarks, due to a better register allocation algorithm.
EDIT: I tried to reproduce these numbers the other day, but got completely different results (8ns for the simple call, but 9ns for the perform). So be very careful with these micro-timings, as they run completely out of the first-level cache (and empty messages even omit the context setup, or get inlined) - they are usually not very representative of the overall performance.