HLSL 'technique' keyword documentation - documentation

Very simple question...
I have some example of code:
technique Draw
{
pass
{
vertex_shader = VertexShaerName(vec_in);
pixel_shader = PixelShaderName(vec_in);
}
}
Where can I find documentation of technique keyword usage? Here is no link with description provided for such a statetament...

Techniques are used by the (now deprecated) effects system.
It wraps a lot of the low level api (originally DirectX9 worked with techniques, so this was created to facilitate transition between direct3d9 to direct3d10/11).
It provides helpers to manage constant buffers and a reflection/variable api to assign data to the shaders.
So while with low level pipeline you would compile your vertex and pixel shaders independently, create you constant buffers, and the structures that go with it it allows to build everything in a single block, breaks down constant buffers into a variable system and has a pass api that allows to perform binding all in one go.
Both have their pros and cons, fx is really nice for authoring and prototyping (its really easy to use their variable system to create automatic gui for example), but since it manages a lot of boilerplate for you, but it gets a bit awkard when it comes to efficient resource reuse or complex shader permutations.
One thing which I particularly miss from effects are variable semantics and annotations, you could set things like :
float2 tex : TARGETSIZE;
and by using variable system detect that tex has a TARGETSIZE semantic, hide from ui and auto attach render taget size for example.
Old common usage for annotations were to provide some metatada to values like :
float4 color <bool color=true;> = 1.0f;
Reflecting across annotations allows to see we consider this variable as a color (and display a color picker in editor instead of 4 channels)
While the fx_5_0 profile is deprecated in the d3dcompiler_47, it is still possible to use it, and the wrapper is open source:
https://github.com/microsoft/FX11

Related

Can the Kotlin compiler optimize away wrapper functions?

I'm new to Kotlin, but I want to try using it for game development, targeting at least Android with OpenGL ES 2.0 and HTML5 with WebGL (with which I am reasonably familiar). Not having to have slightly different versions of my rendering engine's classes/functions for WebGL and GLES20 would obviously be a good thing, but is there a practical way to achieve this in Kotlin without overhead?
I think what I'll have to do is write a class that implements WebGLRenderingContextBase or a clone of it (if a clone is necessary I can just use a delegate for the WebGL implementation) in OpenGL ES 2.0, full of methods like this:
override fun bindBuffer(target: Int, buffer, Int) {
GLES20.glBindBuffer(target, buffer)
}
I'll write a script to do the bulk of the work.
My question is, is the compiler smart enough to optimise away such wrappers and use GLES20.glBindBuffer etc directly in my class' vtable, or whatever equivalent the JVM has? Presumably inline can't be of any use when calling an overridden method via a reference to an interface or base class.
The Kotlin compiler does not optimize the bytecode to this extent, and it does not need to: the JVM itself is quite good at optimizing the code.
Moreover, inline functions were not designed to be a performance tool in Kotlin, instead they are used for non-local control flow and code transformation that cannot be achieved without inlining.
Actually, the JVM performs a lot of optimizations, sparing the compilers from the necessity of optimizing the bytecode they generate on their side too much. And inlining is one of the optimizations the JVM can do. (1) (2) (3)
Though neither compilers nor JVM can inline native methods, because of completely different nature of the native code.
The Kotlin compiler, in turn, performs some local optimizations that do not affect the overall structure of the program. One more reason to do so is debugging experience which is hard to preserve with heavy optimizations. To check the exact Kotlin optimizations, you can try to disable them by adding the -Xno-optimize flag to the free compiler arguments, then look through the generated bytecode or do some benchmarking.

Programmatic introspection/reflection - easier in VMs?

What makes programmatic introspection/reflection easier in virtual machines rather than native code?
I read somewhere that VMs by nature allow for better introspection/reflection capabilities but I cannot find more information about it online. Would like to know why.
I believe you mean higher-level languages vs lower-level languages instead of virtual machines.
Higher level languages like Java and C# have implemented reflection and introspection, so there are functions available to the developer to use this information.
Languages like C do not have any pre-built reflection capabilities.
Reflection is very expensive (time-consuming) for any language to run, and should not be used in code that needs to be extremely fast.
Programmatic introspection essentially means to examine & inspect the current call stack, or the current continuation. (Read Appel's book: Compiling with Continuations).
Few programming languages provide this ability. Scheme's call/cc reifies the current continuation, but give no standard ways to inspect it.
The current call stack might be inspectable (e.g. see GCC __builtin_return_address as an ad hoc example).
Most compilers (but not all) do not have an easy way to give information about the layout of the current call frame (however, the debugger DWARF format contains it).
And optimizing compilers (e.g. for C) usually don't give access to the offset of some local variable in the call frame (even if the compiler computes this offset). BTW, the same stack slot might be reused for different variables; read about register spilling.
See also J.Pitrat's CAIA system - the generated C code is able to organize the stack to be able to inspect it;
In a bytecode VM like JVM or NekoVM or Parrot, introspection is easier because each local variable has a well defined slot in the call frame. This is not the case for most compiled languages (e.g. C or C++) because the compiler is able to reuse (for optimization purposes) some slots, or even put a variable only in some machine register, without even allocating any call stack slot to spill it.

Why does Math.sin() delegate to StrictMath.sin()?

I was wondering, why does Math.sin(double) delegate to StrictMath.sin(double) when I've found the problem in a Reddit thread. The mentioned code fragment looks like this (JDK 7u25):
Math.java :
public static double sin(double a) {
return StrictMath.sin(a); // default impl. delegates to StrictMath
}
StrictMath.java :
public static native double sin(double a);
The second declaration is native which is reasonable for me. The doc of Math states that:
Code generators are encouraged to use platform-specific native libraries or microprocessor instructions, where available (...)
And the question is: isn't the native library that implements StrictMath platform-specific enough? What more can a JIT know about the platform than an installed JRE (please only concentrate on this very case)? In ther words, why isn't Math.sin() native already?
I'll try to wrap up the entire discussion in a single post..
Generally, Math delegates to StrictMath. Obviously, the call can be inlined so this is not a performance issue.
StrictMath is a final class with native methods backed by native libraries. One might think, that native means optimal, but this doesn't necessarily has to be the case. Looking through StrictMath javadoc one can read the following:
(...) the definitions of some of the numeric functions in this package require that they produce the same results as certain published algorithms. These algorithms are available from the well-known network library netlib as the package "Freely Distributable Math Library," fdlibm. These algorithms, which are written in the C programming language, are then to be understood as executed with all floating-point operations following the rules of Java floating-point arithmetic.
How I understand this doc is that the native library implementing StrictMath is implemented in terms of fdlibm library, which is multi-platform and known to produce predictable results. Because it's multi-platform, it can't be expected to be an optimal implementation on every platform and I believe that this is the place where a smart JIT can fine-tune the actual performance e.g. by statistical analysis of input ranges and adjusting the algorithms/implementation accordingly.
Digging deeper into the implementation it quickly turns out, that the native library backing up StrictMath actually uses fdlibm:
StrictMath.c source in OpenJDK 7 looks like this:
#include "fdlibm.h"
...
JNIEXPORT jdouble JNICALL
Java_java_lang_StrictMath_sin(JNIEnv *env, jclass unused, jdouble d)
{
return (jdouble) jsin((double)d);
}
and the sine function is defined in fdlibm/src/s_sin.c refering in a few places to __kernel_sin function that comes directly from the header fdlibm.h.
While I'm temporarily accepting my own answer, I'd be glad to accept a more competent one when it comes up.
Why does Math.sin() delegate to StrictMath.sin()?
The JIT compiler should be able to inline the StrictMath.sin(a) call. So there's little point creating an extra native method for the Math.sin() case ... and adding extra JIT compiler smarts to optimize the calling sequence, etcetera.
In the light of that, your objection really boils down to an "elegance" issue. But the "pragmatic" viewpoint is more persuasive:
Fewer native calls makes the JVM core and JIT easier to maintain, less fragile, etcetera.
If it ain't broken, don't fix it.
At least, that's how I imagine how the Java team would view this.
The question assumes that the JVM actually runs the delegation code. On many JVMs, it won't. Calls to Math.sin(), etc.. will potentially be replaced by the JIT with some intrinsic function code (if suitable) transparently. This will typically be done in an unobservable way to the end user. This is a common trick for JVM implementers where interesting specializations can happen (even if the method is not tagged as native).
Note however that most platforms can't simply drop in the single processor instruction for sin due to suitable input ranges (eg see: Intel discussion).
Math API permits a non-strict but better-performing implementations of its methods but does not require it and by default Math simply uses StrictMath impl.

How do I add grain to an image using the ImageJ API

I am new to ImageJ and I am seeking to add grain (as defined here: http://en.wikipedia.org/wiki/Film_grain) to an image using the programmatic API of ImageJ.
Is it possible? If so how?
Where is the relevant documentation/Javadocs regarding adding grain
to an image using ImageJ?
I'd start in Process > Noise, described in ImageJ User Guide: ยง29.6 Noise. You'll have to decide if the existing implementations can be made to meet your requirements.
Where I can find documentation on how to achieve this using the actual API instead of the UI.
As discussed in ImageJ Macro Language, one easy way is to start Plugin > Macros > Record and then operate the desired GUI command. This reveals the macro command name and any settings, for example:
run("Add Noise");
run("Add Specified Noise...", "standard=16");
You can apply such a macro to multiple files using the -batch command line option.
If you want to use a feature directly from Java, see ImageJ programming tutorials.
I saw that there was no language tag so I choose to write an example in Scala. The code below would read twice the lena.png image, and create two ImagePlus objects and add noise to one of them.
I am kind of guessing that the API comment is related to the software library ImageJ instead of the graphical user interface/program ImageJ.
An ImagePlus has a processor (of type ij.process.ImageProcessor) that you can get a reference to with the method getProcessor()
(getProcessor() is a method here that acts on the object lenaWithNoise and returns a reference to the current ImageProcessor (attached to lenaWithNose)).
The method noise acts on the image that the ImageProcessor handles, and has no return value (void method or in scala unit)
import ij._
object Noise {
def main(args: Array[String]): Unit = {
val lenaNoiseFree:ImagePlus = IJ.openImage("src/test/scala/images/lena.png")
val lenaWithNoise:ImagePlus = IJ.openImage("src/test/scala/images/lena.png")
lenaNoiseFree.show()
lenaWithNoise.getProcessor().noise(10.0)
lenaWithNoise.show()
}
}

STM32 programming tips and questions

I could not find any good document on internet about STM32 programming. STM's own documents do not explain anything more than register functions. I will greatly appreciate if anyone can explain my following questions?
I noticed that in all example programs that STM provides, local variables for main() are always defined outside of the main() function (with occasional use of static keyword). Is there any reason for that? Should I follow a similar practice? Should I avoid using local variables inside the main?
I have a gloabal variable which is updated within the clock interrupt handle. I am using the same variable inside another function as a loop condition. Don't I need to access this variable using some form of atomic read operation? How can I know that a clock interrupt does not change its value in the middle of the function execution? Should I need to cancel clock interrupt everytime I need to use this variable inside a function? (However, this seems extremely ineffective to me as I use it as loop condition. I believe there should be better ways of doing it).
Keil automatically inserts a startup code which is written in assembly (i.e. startup_stm32f4xx.s). This startup code has the following import statements:
IMPORT SystemInit
IMPORT __main
.In "C", it makes sense. However, in C++ both main and system_init have different names (e.g. _int_main__void). How can this startup code can still work in C++ even without using "extern "C" " (I tried and it worked). How can the c++ linker (armcc --cpp) can associate these statements with the correct functions?
you can use local or global variables, using local in embedded systems has a risk of your stack colliding with your data. with globals you dont have that problem. but this is true no matter where you are, embedded microcontroller, desktop, etc.
I would make a copy of the global in the foreground task that uses it.
unsigned int myglobal;
void fun ( void )
{
unsigned int myg;
myg=myglobal;
and then only use myg for the rest of the function. Basically you are taking a snapshot and using the snapshot. You would want to do the same thing if you are reading a register, if you want to do multiple things based on a sample of something take one sample of it and make decisions on that one sample, otherwise the item can change between samples. If you are using one global to communicate back and forth to the interrupt handler, well I would use two variables one foreground to interrupt, the other interrupt to foreground. yes, there are times where you need to carefully manage a shared resource like that, normally it has to do with times where you need to do more than one thing, for example if you had several items that all need to change as a group before the handler can see them change then you need to disable the interrupt handler until all the items have changed. here again there is nothing special about embedded microcontrollers this is all basic stuff you would see on a desktop system with a full blown operating system.
Keil knows what they are doing if they support C++ then from a system level they have this worked out. I dont use Keil I use gcc and llvm for microcontrollers like this one.
Edit:
Here is an example of what I am talking about
https://github.com/dwelch67/stm32vld/tree/master/stm32f4d/blinker05
stm32 using timer based interrupts, the interrupt handler modifies a variable shared with the foreground task. The foreground task takes a single snapshot of the shared variable (per loop) and if need be uses the snapshot more than once in the loop rather than the shared variable which can change. This is C not C++ I understand that, and I am using gcc and llvm not Keil. (note llvm has known problems optimizing tight while loops, very old bug, dont know why they have no interest in fixing it, llvm works for this example).
Question 1: Local variables
The sample code provided by ST is not particularly efficient or elegant. It gets the job done, but sometimes there are no good reasons for the things they do.
In general, you use always want your variables to have the smallest scope possible. If you only use a variable in one function, define it inside that function. Add the "static" keyword to local variables if and only if you need them to retain their value after the function is done.
In some embedded environments, like the PIC18 architecture with the C18 compiler, local variables are much more expensive (more program space, slower execution time) than global. On the Cortex M3, that is not true, so you should feel free to use local variables. Check the assembly listing and see for yourself.
Question 2: Sharing variables between interrupts and the main loop
People have written entire chapters explaining the answers to this group of questions. Whenever you share a variable between the main loop and an interrupt, you should definitely use the volatile keywords on it. Variables of 32 or fewer bits can be accessed atomically (unless they are misaligned).
If you need to access a larger variable, or two variables at the same time from the main loop, then you will have to disable the clock interrupt while you are accessing the variables. If your interrupt does not require precise timing, this will not be a problem. When you re-enable the interrupt, it will automatically fire if it needs to.
Question 3: main function in C++
I'm not sure. You can use arm-none-eabi-nm (or whatever nm is called in your toolchain) on your object file to see what symbol name the C++ compiler assigns to main(). I would bet that C++ compilers refrain from mangling the main function for this exact reason, but I'm not sure.
STM's sample code is not an exemplar of good coding practice, it is merely intended to exemplify use of their standard peripheral library (assuming those are the examples you are talking about). In some cases it may be that variables are declared external to main() because they are accessed from an interrupt context (shared memory). There is also perhaps a possibility that it was done that way merely to allow the variables to be watched in the debugger from any context; but that is not a reason to copy the technique. My opinion of STM's example code is that it is generally pretty poor even as example code, let alone from a software engineering point of view.
In this case your clock interrupt variable is atomic so long as it is 32bit or less so long as you are not using read-modify-write semantics with multiple writers. You can safely have one writer, and multiple readers regardless. This is true for this particular platform, but not necessarily universally; the answer may be different for 8 or 16 bit systems, or for multi-core systems for example. The variable should be declared volatile in any case.
I am using C++ on STM32 with Keil, and there is no problem. I am not sure why you think that the C++ entry points are different, they are not here (Keil ARM-MDK v4.22a). The start-up code calls SystemInit() which initialises the PLL and memory timing for example, then calls __main() which performs global static initialisation then calls C++ constructors for global static objects before calling main(). If in doubt, step through the code in the debugger. It is important to note that __main() is not the main() function you write for your application, it is a wrapper with different behaviour for C and C++, but which ultimately calls your main() function.