Variable Name Efficiency in Shader (OpenGL ES 2) - opengl-es-2.0

Out of curiosity, will it be more efficient to write shader variables like this :
lowp vec4 tC = texture2D(uTexture, vTexCoord); // texture color
or
lowp vec4 textureColor = texture2D(uTexture, vTexCoord); // texture color
Note that I wrote variable tC because it has less characters than variable textureColor
I understand in programming language like C/ObjC, it doesn't matter, but what about shader, since you can query the attributes / uniform names.

It shouldn't make a measurable difference. After linking your program during initialization, query the locations of attributes/uniforms, and keep the result around with the program handle. From then on, neither your app nor the driver will be touching the name strings, just the integer locations.
Even if you re-query locations every time you need to change an attrib binding or uniform value, the difference between a short and "moderate" name length likely won't make much difference compared to the other costs of doing the lookup and binding/value change.

Related

Can I overlap two framebuffer attachments outputs in a fragment shader?

Right now I am writing out to a colour buffer in the fragment shader, which is a float format.
layout (location = 0) out vec4 outColour;
I need to have a way to write the object's id to a framebuffer for picking. There are a number of ways I've thought about doing this. I can compile two versions for each shader, one a normal one, and another for the picking, which basically only needs to do the vertex position transformations and then skip everything else, lighting calculations, texturing, etc. This probably isn't ideal because this is essentially doubling the number of shaders I have to write.
An easier method I've thought is to do a conditional branch (preferably over a specialisation constant), and for picking purposes compile a picking version of the graphics pipeline with the picking boolean value set to true. This sounds better. For the ordinary passes I can write to multiple attachments. Will it be best to compile that picking pipeline with a new render pass that writes to only one framebuffer attachment, an integer one? If I swap the render pass for one that writes an integer at attachment 0 instead of the float 4 can I alias this in the fragment shader?
layout (location = 0) out vec4 outColour;
layout (location = 0) out ivec4 out_id;
void main()
{
vec4 colour;
int object_id;
if (bPicking)
out_id = ivec4(object_id, 0, 0, 0); // y, z, w not used
else
out_colour = colour;
}
I'm guessing I really need a different render pass because instead of writing to a R32G32B32A32_SFLOAT image I'm writing to a R8_UINT image for the IDs. This is really confusing what's the best way to do this?

Specifying push constant block offset in HLSL

I am trying to write a Vulkan renderer, I use glslangValidator with HLSL for shaders and am trying to implement push constants.
[[vk::push_constant]]
cbuffer cbFragment {
float4 imageColor;
float4 aaaa;
};
[[vk::push_constant]]
cbuffer cbMatrices {
float4 bbbb;
};
The annotation "[[vk::push_constant]]" works, I use spirv_reflect for reflection and both push constants show up and they work as intended.
The problem I'm having is that they seemingly overlap, if I assign "bbbb" a value, "imageColor" is affected in exactly the same way and vice versa. In the reflection data both push constant blocks have the offset 0, which explains the issue. However, I seem to be completely unable to change the offset of either of the push constants.
[[vk::offset(x)]] does not work at all, it neither affects the individual member offsets nor the offset of the push constants. The only offset that works at all is HLSL's built in "packoffset", which only applies to the buffer members. And although it might actually be a solution to just offset the members of one of the push constants to be outside the range of the other, I hardly believe that can be a sensible solution as it's also causing the validation layer to fail because offsetting the individual member simply increases the size of the push constant unnecessarily and the overlap itself is still present.
I would greatly appreciate any help on this matter and am willing to provide any necessary clarification, thank you very much!
Push constants live in a single chunk of contiguous memory. The compiler doesn't try to append multiple blocks into that memory; like with the GLSL syntax, it's intended to just have one block containing all the push constant data.
This is consistent with other places where the compiler has to pack variables in a block: it only packs within a block, not across multiple blocks. Two separate non-pushconstant cbuffers would refer to two distinct buffers in memory, with contents that begin at offset zero within their individual buffer. There's only one "push constant buffer", hence you should only decorate one cbuffer with vk::push_constant.

Compiling GLSL written for OpenGL ES versions to Vulkan

My question is similar to this one but part of the (useful) given answer isn't compatible with compiling GLSL for vulkan based on OpenGL ES ESSL 3.10.
In order to use a separate section of the push constant memory in the vertex shader and the fragment shader, the suggested solution is to use layout(offset = #) before the first member of the push constant structure.
Attempting to do this in GLSL ES 310 code leads to the error "'offset on block member': not supported with this profile: es".
Is there a supported way to declare such an offset that is compatible with es?
The only workaround I've found is to declare a bunch of dummy variables in the fragment shader. When I do so, I get validation layer errors if I don't declare the full range of the fragment shader's push constant buffer in VkPipelineLayoutCreateInfo. After fixing that, I get validation layer warnings about "vkCreatePipelineLayout() call has push constants with overlapping ranges".
Obviously I can ignore warnings, but if there's a tidier solution, then that would be much more preferable.
Simple example, this compiles successfully with VulkanSDK\1.0.13.0\Bin\glslangValidator.exe:
#version 430
#extension GL_ARB_enhanced_layouts: enable
layout(std140, push_constant) uniform PushConstants
{
layout(offset=64) mat4 matWorldViewProj;
} ubuf;
layout(location = 0) in vec4 i_Position;
void main() {
gl_Position = ubuf.matWorldViewProj * i_Position;
}
Whereas this does not:
#version 310 es
#extension GL_ARB_enhanced_layouts: enable
layout(std140, push_constant) uniform PushConstants
{
layout(offset=64) mat4 matWorldViewProj;
} ubuf;
layout(location = 0) in vec4 i_Position;
void main() {
gl_Position = ubuf.matWorldViewProj * i_Position;
}
Converting all my 310 ES shader code to 430 would solve my problem, but that wouldn't be ideal. GL_ARB_enhanced_layouts doesn't apply to 310 ES code, so my question is not about why it doesn't work, but rather, do I have any options in ES to achieve the same goal?
I would consider this an error in the GLSL compiler.
What's happening is this. There are some things which compiling GLSL for Vulkan adds to the language, as defined by KHR_vulkan_glsl. The push_constant layout, for example, is explicitly added to GLSL syntax.
However, there are certain things which it does not add to the language. Important to your use case is the ability to apply offsets to members of uniform blocks. Oh yes, KHR_vulkan_glsl uses that information when building the shader's block layout. But the grammar that allows you to say layout(offset=#) is defined by GLSL, not by KHR_vulkan_glsl.
And that grammar is not a part of any version of GLSL-ES. Nor is it provided by any ES extension I am aware of. So you can't use it.
I would say that the reference compiler should, when compiling a shader for Vulkan, either fail to compile any GLSL-ES-based version, or silently ignore any version and extension declarations, and just assume desktop GLSL 4.50.
As for what you can do about it... nothing. Short of hacking that solution into the compiler yourself, your primary solution is to write your code against versions of desktop OpenGL. Like 4.50.
If you compile SPIR-V for Vulkan there is a "VULKAN" define set in your shaders (see GL_KHR_VULKAN_glsl), so you could do something like this:
#ifdef VULKAN
layout(push_constant) uniform pushConstants {
vec4 (offset = 12) pos;
} pushConstBlock;
#else
// GLES stuff
#endif

How do I access an integer array within a struct/class from in-line assembly (blackfin dialect) using gcc?

Not very familiar with in-line assembly to begin with, and much less with that of the blackfin processor. I am in the process of migrating a legacy C application over to C++, and ran into a problem this morning regarding the following routine:
//
void clear_buffer ( short * buffer, int len ) {
__asm__ (
"/* clear_buffer */\n\t"
"LSETUP (1f, 1f) LC0=%1;\n"
"1:\n\t"
"W [%0++] = %2;"
:: "a" ( buffer ), "a" ( len ), "d" ( 0 )
: "memory", "LC0", "LT0", "LB0"
);
}
I have a class that contains an array of shorts that is used for audio processing:
class AudProc
{
enum { buffer_size = 512 };
short M_samples[ buffer_size * 2 ];
// remaining part of class omitted for brevity
};
Within the AudProc class I have a method that calls clear_buffer, passing it the samples array:
clear_buffer ( M_samples, sizeof ( M_samples ) / 2 );
This generates a "Bus Error" and aborts the application.
I have tried making the array public, and that produces the same result. I have also tried making it static; that allows the call to go through without error, but no longer allows for multiple instances of my class as each needs its own buffer to work with. Now, my first thought is, it has something to do with where the buffer is in memory, or from where it is being accessed. Does something need to be changed in the in-line assembly to make this work, or in the way it is being called?
Thought that this was similar to what I was trying to accomplish, but it is using a different dialect of asm, and I can't figure out if it is the same problem I am experiencing or not:
GCC extended asm, struct element offset encoding
Anyone know why this is occurring and how to correct it?
Does anyone know where there is helpful documentation regarding the blackfin asm instruction set? I've tried looking on the ADSP site, but to no avail.
I would suspect that you could define your clear_buffer as
inline void clear_buffer (short * buffer, int len) {
memset (buffer, 0, sizeof(short)*len);
}
and probably GCC is able to optimize (when invoked with -O2 or -O3) that cleverly (because GCC knows about memset).
To understand assembly code, I suggest running gcc -S -O -fverbose-asm on some small C file, then to look inside the produced .s file.
I would have take a guess, because I don't know Blackfin assembler:
That LC0 sounds like "loop counter", LSETUP looks like a macro/insn, which, well, setups a loop between two labels and with a certain loop counter.
The "%0" operands is apparently the address to write to and we can safely guess it's incremented in the loop, in other words it's both an input and output operand and should be described as such.
Thus, I suggest describing it as in input-output operand, using "+" constraint modifier, as follows:
void clear_buffer ( short * buffer, int len ) {
__asm__ (
"/* clear_buffer */\n\t"
"LSETUP (1f, 1f) LC0=%1;\n"
"1:\n\t"
"W [%0++] = %2;"
: "+a" ( buffer )
: "a" ( len ), "d" ( 0 )
: "memory", "LC0", "LT0", "LB0"
);
}
This is, of course, just a hypothesis, but you could disassemble the code and check if by any chance GCC allocated the same register for "%0" and "%2".
PS. Actually, only "+a" should be enough, early-clobber is irrelevant.
For anyone else who runs into a similar circumstance, the problem here was not with the in-line assembly, nor with the way it was being called: it was with the classes / structs in the program. The class that I believed to be the offender was not the problem - there was another class that held an instance of it, and due to other members of that outer class, the inner one was not aligned on a word boundary. This was causing the "Bus Error" that I was experiencing. I had not come across this before because the classes were not declared with __attribute__((packed)) in other code, but they are in my implementation.
Giving Type Attributes - Using the GNU Compiler Collection (GCC) a read was what actually sparked the answer for me. Two particular attributes that affect memory alignment (and, thus, in-line assembly such as I am using) are packed and aligned.
As taken from the aforementioned link:
aligned (alignment)
This attribute specifies a minimum alignment (in bytes) for variables of the specified type. For example, the declarations:
struct S { short f[3]; } __attribute__ ((aligned (8)));
typedef int more_aligned_int __attribute__ ((aligned (8)));
force the compiler to ensure (as far as it can) that each variable whose type is struct S or more_aligned_int is allocated and aligned at least on a 8-byte boundary. On a SPARC, having all variables of type struct S aligned to 8-byte boundaries allows the compiler to use the ldd and std (doubleword load and store) instructions when copying one variable of type struct S to another, thus improving run-time efficiency.
Note that the alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question. This means that you can effectively adjust the alignment of a struct or union type by attaching an aligned attribute to any one of the members of such a type, but the notation illustrated in the example above is a more obvious, intuitive, and readable way to request the compiler to adjust the alignment of an entire struct or union type.
As in the preceding example, you can explicitly specify the alignment (in bytes) that you wish the compiler to use for a given struct or union type. Alternatively, you can leave out the alignment factor and just ask the compiler to align a type to the maximum useful alignment for the target machine you are compiling for. For example, you could write:
struct S { short f[3]; } __attribute__ ((aligned));
Whenever you leave out the alignment factor in an aligned attribute specification, the compiler automatically sets the alignment for the type to the largest alignment that is ever used for any data type on the target machine you are compiling for. Doing this can often make copy operations more efficient, because the compiler can use whatever instructions copy the biggest chunks of memory when performing copies to or from the variables that have types that you have aligned this way.
In the example above, if the size of each short is 2 bytes, then the size of the entire struct S type is 6 bytes. The smallest power of two that is greater than or equal to that is 8, so the compiler sets the alignment for the entire struct S type to 8 bytes.
Note that although you can ask the compiler to select a time-efficient alignment for a given type and then declare only individual stand-alone objects of that type, the compiler's ability to select a time-efficient alignment is primarily useful only when you plan to create arrays of variables having the relevant (efficiently aligned) type. If you declare or use arrays of variables of an efficiently-aligned type, then it is likely that your program also does pointer arithmetic (or subscripting, which amounts to the same thing) on pointers to the relevant type, and the code that the compiler generates for these pointer arithmetic operations is often more efficient for efficiently-aligned types than for other types.
The aligned attribute can only increase the alignment; but you can decrease it by specifying packed as well. See below.
Note that the effectiveness of aligned attributes may be limited by inherent limitations in your linker. On many systems, the linker is only able to arrange for variables to be aligned up to a certain maximum alignment. (For some linkers, the maximum supported alignment may be very very small.) If your linker is only able to align variables up to a maximum of 8-byte alignment, then specifying aligned(16) in an __attribute__ still only provides you with 8-byte alignment. See your linker documentation for further information.
.
packed
This attribute, attached to struct or union type definition, specifies that each member (other than zero-width bit-fields) of the structure or union is placed to minimize the memory required. When attached to an enum definition, it indicates that the smallest integral type should be used.
Specifying this attribute for struct and union types is equivalent to specifying the packed attribute on each of the structure or union members. Specifying the -fshort-enums flag on the line is equivalent to specifying the packed attribute on all enum definitions.
In the following example struct my_packed_struct's members are packed closely together, but the internal layout of its s member is not packed—to do that, struct my_unpacked_struct needs to be packed too.
struct my_unpacked_struct
{
char c;
int i;
};
struct __attribute__ ((__packed__)) my_packed_struct
{
char c;
int i;
struct my_unpacked_struct s;
};
You may only specify this attribute on the definition of an enum, struct or union, not on a typedef that does not also define the enumerated type, structure or union.
The problem which I was experiencing was specifically due to the use of packed. I attempted to simply add the aligned attribute to the structs and classes, but the error persisted. Only removing the packed attribute resolved the problem. For now, I am leaving the aligned attribute on them and testing to see if I find any improvements in the efficiency of the code as mentioned above, simply due to their being aligned on word boundaries. The application makes use of arrays of these structures, so perhaps there will be better performance, but only profiling the code will say for certain.

How to reset OpenGL program's uniform attribute value to default?

Let's say I have an OpenGL program that has a uniform attribute "diffuseColor". I have set it as following:
GLint location = glGetUniformLocation(handle, "diffuseColor");
glUniform3f(location, 1, 0, 0);
Now I would like to return it to the default value, which is encoded in the shader code. I do not have access to the source code, but I can call OpenGL API functions on the compiled program. Is there a way to read default value and set it with glUniform3f? Or even better, is there a something like glResetUniform3f(GLint loc)?
Uniform initializers are applied upon linking the program. The value can then be read using glGetUniformfv/glGetUniformiv. There is no way to read the initial value of the uniform after you changed the uniform value.
There is no way to reset a single uniform to its initial value, but relinking the program will reset all uniforms in it. Linking a program is a costly operation and should be avoided in between frames.