Equivalent OpenGL ES 2.0 Method to void glBindFragDataLocation(GLuint program, GLuint colorNumber, const char * name); - opengl-es-2.0

The online documentation at http://www.khronos.org/opengles/sdk/docs/man/ does not give reference to the glBindFragDataLocation(GLuint program, GLuint colorNumber, const char * name); method. What is the equivalent to this in OpenGL es 2.0?

There is not equivalent, read below.
OpenGL ES 2.0 does not allow to emit more than one fragment output, you either write to gl_FragColor or gl_FragData[0]. This is one of the things that with plain OpenGLES 2.0 makes really slow deferred shading since you cannot define multiple targets.
If you are on Tegra you can slightly change your program to emit gl_FragData[i] using NV_draw_buffers extension, but you cannot use an user defined out variables, there is only gl_FragData[i] out variable that could output to different attachments.
That being said, and trying to answer your question, you need to change your fragment shader to use gl_FragColor or gl_FragData[0], there are not user defined out variables.

Related

What are the use cases of sparsed VkDescriptorSetLayoutBinding?

I have troubles figuring out any use case for VkDescriptorSetLayoutBinding::binding, here is the struct :
struct VkDescriptorSetLayoutBinding
{
uint32_t binding;
VkDescriptorType descriptorType;
uint32_t descriptorCount;
VkShaderStageFlags stageFlags;
const VkSampler* pImmutableSamplers;
};
used here to create a DescriptorSetLayout :
struct VkDescriptorSetLayoutCreateInfo
{
VkStructureType sType;
const void* pNext;
VkDescriptorSetLayoutCreateFlags flags;
uint32_t bindingCount;
const VkDescriptorSetLayoutBinding* pBindings;
};
I was wondering why the "binding" variable is not deduced from the index in the pBindings array.
After some research I found that the vulkan specs says :
The above layout definition allows the descriptor bindings to be specified sparsely such that not all binding numbers between 0 and the maximum binding number need to be specified in the pBindings array. Bindings that are not specified have a descriptorCount and stageFlags of zero, and the value of descriptorType is undefined. However, all binding numbers between 0 and the maximum binding number in the VkDescriptorSetLayoutCreateInfo::pBindings array may consume memory in the descriptor set layout even if not all descriptor bindings are used, though it should not consume additional memory from the descriptor pool.
I can't find in which case you can use those sparsed bindings, why would you leave an empty unused space ?
Binding indices are hard-coded into shaders (you can define binding indices via specialization constants, but otherwise, they're part of the shader code). So let's imagine that you have the code for a shaders stage. And you want to use it in two different pipelines (A and B). And let's say that the descriptor set layouts for these pipelines are not meant to be compatible; we just want to reuse the shader.
Well, the binding indices in your shader didn't change; they can't change. So if this shader has a UBO in binding 3 of set 0, then any descriptor set layout it gets used with must have a UBO in binding 3 of set 0.
Maybe in pipeline A, some shader other than the one we reuse might use bindings 0, 1, and 2 from set 0. But what if none of the other shaders for pipeline B need binding index 2? Maybe the fragment shader in pipeline A used 3 descriptor resources, but the one in pipeline B only needs 2.
Having sparse descriptor bindings allow you to reuse compiled shader modules without having to reassign the binding indices within a shader. Oh yes, you have to make sure that all such shaders are compatible with each other (that they don't use the same set+binding index in different ways), but other than that, you can mix-and-match freely.
And it should be noted that contiguous bindings has almost never been a requirement of any API. In OpenGL, your shader pipeline could use texture units 2, 40, and 32, and that's 100% fine.
Why should it be different for Vulkan, just because its resource binding model is more abstract?

Code sharing between multiple independently compiled binaries/hex files

I'm looking for documentation/information on how to share information/code between multiple binaries compiled for a Cortex-m/0/4/7 architectures. The two binaries will be on the same chip and same architecture. They are flashed at different locations and sets the main stack pointer and resets the program counter so that one binary "jumps" to the other binary. I want to share code between these two binaries.
I've done a simple copy of an array of function pointers into a section defined in the linker script into RAM. Then read the RAM out in the other binary and cast it to an array then use the index to call functions in the other binary. This does work as a Proof-of-concept, but I think what I'm looking for is a bit more complex. As I want some way of describing compatibility between the two binaries. I want some what the functionality of shared libraries, but I'm unsure if I need position independent code.
As an example how the current copy process is done it is basically:
Source binary:
void copy_func()
{
memncpy(array_of_function_pointers, fixed_size, address_custom_ram_section)
}
Binary which is jumped too from source binary:
array_fp_type get_funcs()
{
memncpy(adress_custom_ram_section, fixed_size, array_of_fp)
return array_of_fp;
}
Then I can use the array_of_fp to call into functions residing in the source binary from the jump binary.
So what I'm looking for is some resources or input for someone who have implemented a similar system. Like I would like to not have to have a custom RAM section where I'm copying the function pointers into.
I would be fine with having the compilation step of source binary outputting something which can be included into the compilation step of the jump binary. However it needs to be reproducible and recompiling the source binary shouldn't break the compatibility with the jump binary(even if it included a different file from what is now outputted) as long as you don't change the interface.
To clarify source binary shouldn't require any specific knowledge about the jump binary. The code should not reside in both binaries as this would defeat the purpose of this mechanism. The overall goal if this mechanism is a way to save space when creating multi-binary applications on cortex-m processors.
Any ideas or links to resources are welcome. If you have any more questions feel free to comment on the question and I'll try to answer it.
Its very hard for me to picture what you want to do, but if you're interested in having an application link against your bootloader/ROM, then see Loading symbol file while linking for a hint on what you could do.
Build your "source"(?) image, scrape its mapfile and make a symbol file, then use that when you link your "jump"(?) image.
This does mean you need to link your "jump" image against a specific version of your "source" image.
If you need them to be semi-version independent (i.e. you define a set of functions that get exported, but you can rebuild on either side), then you need to export function pointers at known locations in your "source" image and link against those function pointers in your "jump" image. You can simplify the bookkeeping by making a structure of function pointers access the functions through that on either side.
For example:
shared_functions.h:
struct FunctionPointerTable
{
void(*function1)(int);
void(*function2)(char);
};
extern struct FunctionPointerTable sharedFunctions;
Source file in "source" image:
void function1Implementation(int a)
{
printf("You sent me an integer: %d\r\n", a);
function2Implementation((char)(a%256))
sharedFunctions.function2((char)(a%256));
}
void function2Implementation(char b)
{
printf("You sent me an char: %c\r\n", b);
}
struct FunctionPointerTable sharedFunctions =
{
function1Implementation,
function2Implementation,
};
Source file in "jump" image:
#include "shared_functions.h"
sharedFunctions.function1(1024);
sharedFunctions.function2(100);
When you compile/link the "source", take its mapfile and extract the location of sharedFunctions and create a symbol file that is linked with the source the "jump" image.
Note: the printfs (or anything directly called by the shared functions) would come from the "source" image (and not the "jump" image).
If you need them to come from the "jump" image (or be overridable) , then you need to access them through the same function pointer table, and the "jump" image needs to fix the function pointer table up with its version of the relevant function. I updated the function1() to show this. The direct call to function2 will always be the "source" version. The shared function call version of it will go through the jump table and call the "source" version unless the "jump" image updates the function table to point to its implementation.
You CAN get away from the structure, but then you need to export the function pointers one by one (not a big problem), but you want to keep them in order and at a fixed location, which means explicitly putting them in the linker descriptor file, etc. etc. I showed the structure method to distill it down to the easiest example.
As you can see, things get pretty hairy, and there is some penalty (calling through the function pointer is slower because you need to load up the address to jump to)
As explained in comment, we could imagine an application and a bootloader relying on same dynamic library. So application and bootloader rely on library, application can be changed without impact on library or boot.
I did not find an easy way to do a shared library with arm-none-eabi-gcc. However
this document gives some alternatives to shared libraries. I your case, I would recommand the jump table solution.
Write a library with the functions that need to be used in bootloader and in applicative.
"library" code
typedef void (*genericFunctionPointer)(void)
// use the linker script to set MySection at a known address
// I think this could be a structure like Russ Schultz solution but struct may or may not compile identically in lib and boot. However yes struct would be much easyer and avoiding many function pointer cast.
const genericFunctionPointer FpointerArray[] __attribute__ ((section ("MySection")))=
{
(genericFunctionPointer)lib_f1,
(genericFunctionPointer)lib_f2,
}
void lib_f1(void)
{
//some code
}
uint8_t lib_f2(uint8_t param)
{
//some code
}
applicative and/or bootloader code
typedef void (*genericFunctionPointer)(void)
// Use the linker script to set MySection at same address as library was compiled
// in linker script also put this section as `NOLOAD` because it is init by library and not by our code
//volatile is needed here because you read in flash memory and compiler may initialyse usage of this array to NULL pointers
volatile const genericFunctionPointer FpointerArray[NB_F] __attribute__ ((section ("MySection")));
enum
{
lib_f1,
lib_f2,
NB_F,
}
int main(void)
{
(correctCastF1)(FpointerArray[lib_f1])();
uint8_t a = (correctCastF2)(FpointerArray[lib_f2])(10);
}
You can look into using linker sections. If you have your bootloader source code in folder bootloader, you can use
SECTIONS
{
.bootloader:
{
build_output/bootloader/*.o(.text)
} >flash_region1
.binary1:
{
build_output/binary1/*.o(.text)
} >flash_region2
.binary2:
{
build_output/binary2/*.o(.text)
} >flash_region3
}

Poor performance with C++/CLI interop with native headers

In an attempt to implement wrappers for parts of the CGAL library I have run into some performance issues. Some of it seems to be linked to C++/CLI interop. In an effort to identify the issue I have created a simple example, making a few calls to the some of the parts CGAL I want to wrap. I don't think the specific calls are super important to the issue, but i included it for reference. My program looks as follows:
std::ifstream stream1("Sphere1.OFF");
std::ifstream stream2("Sphere2.OFF");
Polyhedron P1 = Polyhedron();
Polyhedron P2 = Polyhedron();
stream1 >> P1;
stream2 >> P2;
Nef_polyhedron N1(P1);
Nef_polyhedron N2(P2);
Nef_polyhedron N3 = N1 + N2;
Polyhedron and Nef_polyhedron are CGAL types defined as:
typedef CGAL::Simple_cartesian<CGAL::Lazy_exact_nt<CGAL::Gmpq> > Kernel;
typedef CGAL::Nef_polyhedron_3<Kernel> Nef_polyhedron;
typedef CGAL::Polyhedron_3<Kernel> Polyhedron;
If I simply run the program compiling without CLR support it runs in around 10 seconds. This is not ideal, but will have to do for now. The problem is that if i enable CLR support the time to run this small sample program triples to over 30 seconds.
I suspect the issue has something to do with CGAL being largely implemented as header only. Thus, Kernel, Polyhedron and Nef_polyhedron are compiled with CLR support, making it unpredictable how often interop calls take place. The idea was that the Nef_polyhedron N3 = N1 + N2; call should be a single interop call to native code to reduce the overhead.
I tried using #pragma managed(push,off), to force native compilation of the header, but this just redoubled the execution time to around 70 seconds
Why am i getting such a big performance hit with CLR? Might it have something to do with marshalling of the Polyhedron and Nef_polyhedron types? And if so, would it help to specify at marshalling for these types?

Compile-time information in CUDA

I'm optimizing a very time-critical CUDA kernel. My application accepts a wide range of switches that affect the behavior (for instance, whether to use 3rd or 5th order derivative). Consider as an approximation a set of 50 switches, where every switch is an integer variable (a bool sometimes, or a float, but this case is not so relevant for this question).
All these switches are constant during the execution of the application. Most of these switches are run-time and I store them in constant memory, so to exploit the caching mechanism. Some other switches can be compile-time and the customer is fine with having to re-compile the application if he wants to change the value in the switch. A very simple example could be:
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
Assume that do_this and do_that are compute-bound and very cheap, that I optimize the for loop so that its overhead is negligible, that I have to place the if inside the iteration. If the compiler recognizes that compile_time_switch is static information it can optimize out the call to the "wrong" function and create code that is just as optimized as if the if weren't there. Now the real question:
In which ways can I provide the compiler with the static value of this switch? I see two such ways, listed below, but none of them work for me. What other possibilities remain?
Template parameters
Providing a template parameter enables this static optimization.
template<int compile_time_switch>
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
This simple solution does not work for me, since I don't have direct access to the code that calls the kernel.
Static members
Consider the following struct:
struct GlobalParameters
{
static const bool compile_time_switch = true;
};
Now GlobalParameters::compile_time_switch contains the static information as I want it, and that compiler would be able to optimize the kernel. Unfortunately, CUDA does not support such static members.
EDIT: the last statement is apparently wrong. the definition of the struct is of course legit and you are able to use the static member GlobalParameters::compile_time_switch in device code. The compiler inlines the variable, so that the final code will directly contain the value, not a run-time variable access, which is the behavior you would expect from an optimizer compiler. So, the second options is actually suitable.
I consider my problem solved both thanks to this fact and to kronos' answer. However, I'm still looking for other alternative methods to provide compile-time information to the compiler.
Yor third options are preprocessor definitions:
#define compile_time_switch 1
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
The preprocessor will discard the else case compleatly and the compiler has nothing to optimize in his dead code elemination pass, because there is no dead code.
Furthermore, you can specify the definition with the -D comand line switch and (I think) any by nvidia supported compiler will accept -D (msvc may use a different switch).

How to pass a struct parameter using TCOM in Tcl

I've inherited a piece of custom test equipment with a control library built in a COM object, and I'm trying to connect it to our Tcl test script library. I can connect to the DLL using TCOM, and do some simple control operations with single int parameters. However, certain features are controlled by passing in a C/C++ struct that contains the control blocks, and attempting to use them in TCOM is giving me an error 0x80020005 {Type mismatch.}. The struct is defined in the .idl file, so it's available to TCOM to use.
The simplest example is a particular call as follows:
C++ .idl file:
struct SourceScaleRange
{
float MinVoltage;
float MaxVoltage;
};
interface IAnalogIn : IDispatch{
...
[id(4), helpstring("method GetAdcScaleRange")] HRESULT GetAdcScaleRange(
[out] struct SourceScaleRange *scaleRange);
...
}
Tcl wrapper:
::tcom::import [file join $::libDir "PulseMeas.tlb"] ::char
set ::characterizer(AnalogIn) [::char::AnalogIn]
set scaleRange ""
set response [$::characterizer(AnalogIn) GetAdcScaleRange scaleRange]
Resulting error:
0x80020005 {Type mismatch.}
while executing
"$::characterizer(AnalogIn) GetAdcScaleRange scaleRange"
(procedure "charGetAdcScaleRange" line 4)
When I dump TCOM's methods, it knows of the name of the struct, at least, but it seems to have dropped the struct keyword. Some introspection code
set ifhandle [::tcom::info interface $::characterizer(AnalogIn)]
puts "methods: [$ifhandle methods]"
returns
methods: ... {4 VOID GetAdcScaleRange {{out {SourceScaleRange *} scaleRange}}} ...
I don't know if this is meaningful or not.
At this point, I'd be happy to get any ideas on where to look next. Is this a known TCOM limitation (undocumented, but known)? Is there a way to pre-process the parameter into an appropriate format using tcom? Do I need to force it into a correctly sized block of memory via binary format by manual construction? Do I need to take the DLL back to the original developer and have him pull out all the struct parameters? (Not likely to happen, in this reality.) Any input is good input.