Translating OpeGLES1.1 fixed function pipeline to programmable pipeline on the fly - opengl-es-2.0

Is it possible to emulate the completed fixed function pipeline with shaders on the fly? By on the fly mean not rewriting the fixed function code to use shaders but sort of an intermediate driver which receives fixed function GLES calls (possibly caching it for full one frame as there is no direct one to one translation from fixed to programmable pipeline) and outputs equivalent GLES2.0 calls.
And even if it possible then how much work would it really be?

For most of ES 1.1, that looks pretty straightforward. All the typical fixed functionality like transformations, lights, and materials, translates directly into shader code.
For a complete replacement, you would obviously have to implement all the functionality. From skimming over the ES 1.1 entry points, I spotted a few items that would not directly translate to ES 2.0, where the last of these looks particularly problematic:
Arbitrary clipping planes. This is not available in ES 2.0, but not terribly hard to emulate in shaders by calculating a distance in the vertex shader, and then discarding the clipped fragments in the fragment shader.
ES 1.1 has something called "palette textures". From my understanding, it looks somewhat painful to implement in ES 2.0, but possible. You would probably need two textures, one for the indices, and one for the palette, with two levels of sampling in the fragment shader.
ES 1.1 supports logical operations (glLogicOp) as part of the per-fragment operations that are executed after the fragment shader. ES 2.0 does not have this, and I can't think of a good way to replicate it. The only thing that comes to mind is to render, read back the result, do the logical operation on the CPU, and then render the resulting image. And you would have to do that every time the operation is changed.

Related

Is programming a voxel based graphics API theoretically possible?

This is entirely a theoretical question because I understand the time it would take to do such a thing would be ridiculous
I've been working with "voxels" a lot lately and the only way I can display them to a user is to either triangulate the visible surfaces or make a CPU ray-tracer but both come with their own problems.
Simply put, if we dismiss the storage space needed for voxel meshs and targeted a very specific GPU would someone who was wanting to create a graphics API like OpenGL but with "true" voxel primitives that don't need to be converted be able to make such thing or are GPUs designed specifically for triangles with no way to introduce a new base primitive?
Its possible and it was already done many times
games like Minecraft,SpaceEngineers...
3D printing tools and slicers
MRI/PET scans tools
Yes rendering on GPU is possible with the two base methods you mention. Games usually use the transform to boundary representation 3D geometry. With rise of shaders even ray tracers are now possible here mine:
simple GLSL voxel ray tracer
using native OpenGL architecture and passing geometry as 3D texture. In order to obtain speed you need to add BVH or similar spatial subdivision of geometry...
However voxel based tools have been here for quite some time. For example many isometric games/engines are voxel based (tile is a voxel) like this one:
Improving performance of click detection on a staggered column isometric grid
Also do you remember UFO ? It was playable on x286 and it was also "voxel/tile" based isometric.

How to organize opengl es 2.0 program?

I thought in two ways to write my opengl es 2.0 code.
First, I write many calls to draw elements in the screen with many VAOs and VBOs or one only VAO and many VBOs.
Second, I save the coordinates of all elements in one list and I write all vertices of these coordinates in one only VAO and one only VBO and draw all vertices in the screen.
What is the better way that I should follow?
These are the ones I thought, what other ways are there?
The VAO is meant to save you some setup calls when setting the vertex attributes pointers and enabling/disabling the pipeline states related to that setup. Having just one VAO isn't saving you anything, because you will repeatedly re-bind the vertex buffers and change some settings. So you should aim to have multiple VAOs, one per "static" rendering batch, but not necessarily one per object drawn.
As to having all vertices in single VBO or many VBOs - that really depends on the task.
Having all data in single VBO has no benefits if you draw that all in many calls. But there's also no point in allocating one VBO per sprite. It's always about the balance between the costs of different calls to setup the pipeline, so ideally you try different approaches and decide what's best for you in your particular case.
There might be restrictions on the buffer sizes, and there's definitely "reasonable" sizes preferred by specific implementations. I remember some issues with old Intel drivers, when rendering the portion of the buffer would process the entire buffer, skipping unneeded vertices.

D3D12 Use backbuffer surface as unordered access view (UAV)

Im making a simple raytracer for a schoolproject were a compute shader is supposed to be used to shade a triangle or some other primitive.
For this I'd like to write to a backbuffer-surface directly in the compute shader, to then present the results imideatly. I know for certain that this is possible in DX11 though i can't seem to get it to work in DX12.
I couldn't gather that much information about this, but i found this gamedev thread discussing the exact same problem I try to figure out and they seem to come to the conclusion which was my go to workaround: writing to an intermediate texture and then sampling in a pipeline.
I can't fully accept that this would be impossible to achieve in dx12. Why would that feature be removed? Could it be that the queuing-systems removes some overhead that makes it unnecessary to have this feature?
Is there any way to achieve a raytracer without writing to a separate texture and then sampling in a pipeline or copy it onto the back-buffer? What are my best alternatives for achieving performance?
You will have to access the answer. They removed the capability to create an UAV the same way they removed the capability to use multisample surface in the swapchain.
The problem with authorizing UAV on the swapchain surface is that they would have to forfeit tracking of what is happening to it. DX12 rely on descriptor heaps that are 100% volatile at runtime for UAVs ( render targets are CPU side only and can be tracked ).
Microsoft need to track the swapchain surface status strongly in order to guarantee behavior with the desktop presentation and for that reason, they choose to deny the UAV binding.

OpenGL lights, textures, etc. correct way?

Until this moment I've only implemented all the effects in GLSL shaders using inputs, outputs and uniforms, except for a couple of really essential constants like gl_Position, etc. I've read several tutorials, had a lecture on computer graphics and everytime all they implement things by looking at physical model and calculating all the stuff using input values and uniforms. That is a kind of the way I thought it all works.
Now I faced the fact, that there are much more GLSL things, like glLight* API functions and gl_LightSource, gl_Texture constants in GLSL with a big set of light types and lighting models predefined. Seems to be a kind of different way of programming shaders.
I wonder if there are any advantages/disadvantages using one or other way? Did I miss something very important? It looks I'm doing a lot of redundant work.
All the glLight* calls you might find in both GLSL and the OpenGL API are from the old and deprecated fixed-function pipeline!
Now you must do all the calculations yourself through Shaders, as I can guess you're already doing.
Why did they "remove" all the awesome stuff?
They "removed" (deprecated) the Matrix Stack, Light calls, Immediate Mode Rendering, etc. etc. etc. and the list goes one for various reason. But the overall reason is that it's better to implement and control those things yourself.
It requires more work from our side implementing and controlling all those things, though you're in total control of everything and when you actually want to use something.
Using the fixed-function pipeline OpenGL would allocate and load various things you might never even wanted to use.
Also when talking about the Matrix Stack as an example, you would usually (the lazy way) make OpenGL re-calculate the Matrix Stack each render call, using the old glPushMatrix(), glPopMatrix(), glTranslate*(), etc. functions. Now because YOU HAVE TO, you are forced to do all those calculations and handling the Matrices yourself. So now you would realize that most of the Matrices and much more could simply be allocated and calculated once, or atleast not every render call.
Of course they didn't deprecated Immediate Mode Rendering, because we need to implement that ourselves, now we simply need to use Buffers, because they are so much better in every way.
Extra
If you want a great spreadsheet that shows which function are deprecated and which are core functions, and extension functions, etc. Then take a look here, though be aware that this spreadsheet is made by people who use OpenGL and not by the Khronos Group (current developers of OpenGL) nor Silicon Graphics (the creators of OpenGL).
Ignore glLightXXX functions, the related gl_LightXXX variables and all the documentation associated with them. It's all deprecated and if you look closely at the docs, you'll probably that it's several years old or specifically designed for versions of OpenGL <= 2.x. Instead continue to work with your own vertex attributes and set up lighting configuration in your own uniforms however you please based on the model of lighting you want to implement. It's more work, but it's more flexible in the long run.
The OpenGL lighting model that uses glLight pre-dates the programmable shader pipeline, and represent a particular way of doing lighting in the fixed function pipeline.
Once GLSL entered the scene it was possible to use the OpenGL lighting model in conjunction with shaders. You could use the same glLight function and it's related functions to set up your lighting parameters but then write shaders that used the same information in different ways, allowing per-pixel lighting calculations.
Textures are a little more murky, because OpenGL still has a texture model and many of the GL functions relating to textures are still valid, though some are deprecated. However, any documentation that refers to GLSL variables like gl_Texture is similarly out of date. Current OpenGL uses sampler objects for texture access.
If you want to make sure you're doing it the 'modern' way, make sure you create a forward-compatible OpenGL profile of 3.3 or higher or 4.0 or higher, and make sure your shaders declare the appropriate version number as their first line like so:
#version 330
This will cause the use of any deprecated OpenGL function or deprecated shader variable to generate an error so that you know to avoid them.
Current graphics hardware offers an interface to customize any rendering step e.g Vertex Shading, Tesselation, Geometry shading, fragment shading and so on. GLSL is the language to programm or influence the rendering steps of the graphics hardware leveraging this interface.
The predefined function glLight, glTexture and so on belong to the deprecated fixed
graphics pipeline of opengl. Modern OpenGL still supports the functions of this fixed pipeline but it ist strongly recommended to use GLSL for the different rendering steps.
The glLight function is a fixed function which just influences Vertex Processing. So you can just achieve a per vertex shading, which not looks very realistic.
When you programm the lighting on your own within the fragment shader using GLSL you can directly influence any pixel.
So to summarize the main advantage is that a programmer is more flexible and is able to influence every kind of rendering step, which enables you to achieve sophisticated and realistic 3d graphics. The main disadvantage is. You need much more knowledge and (GLSL, graphics pipeline) and much more programming effort to achieve the same result as with fixed functions.
Best regards

At what phase in rendering does clipping occur?

I've got some OpenGL drawing code that I'm trying to optimize. It's currently testing all drawing objects for visibility client-side before deciding whether or not to send rendering data to OpenGL. (This is easier than it sounds. It's drawing a 2D scene so clipping is trivial: just test against the current coordinates of the viewport rectangle.)
It occurs to me that the entire model could be greatly simplified by passing the entire scene to OpenGL and letting the GPU take care of the clipping. But sometimes the total can be very, very complex, involving up to 100,000 total sprites, most of which never get rendered because they're off-camera, and I'd prefer to not end up killing the framerate in the name of simplicity.
I'm using OpenGL 2.0, and I've got a pretty simple vertex shader and a much more complicated fragment shader. Is there any guarantee that says that if the vertex shader runs and determines coordinates that are completely off-camera for all vertices of a polygon, that a clipping test will be applied somewhere between there and the fragment shader and prevent the fragment shader from ever running for that polygon? And if so, is this automatic or is there something I need to do to enable it? I've looked around online for information on this but I haven't found anything conclusive...
Clipping happens after the vertex transform stage before and after the NDC space; clip planes are applied in clip space, viewport clipping is done in NDC space. That is one step before rasterizing. Clipping means, that a face only partially visible is "cut" by inserting new vertices at the visibility border, or fragments outside the viewport discarded. What you mean is usually called culling. Faces completely outside the viewport are culled, at the same stage like clipping.
From a performance point of view, the best code is code never executed, and the best data is data never accessed. So in your case sending off a single drawing call that makes the GPU process a large batch of vertices clearly takes load off the CPU, but it consumes GPU processing power. Culling those vertices before sending the drawing command consumes CPU power, but takes load off the GPU. The goal is to find the right balance. If the number of vertices is low, a simple brute force approach (just render the whole thing) may easily outperform ever other scheme.
However using a simple, yet effective data management scheme can greatly improve performance on both ends. For example a spatial subdivision structure like a Kd tree is easily built (you don't have to balance it). Sorting the vertices into the Kd tree you can omit (cull) large portions of the tree if one branch near to the root is completely outside the viewport. Preparing drawing a frame you iterate through the visible parts of the tree, building the list of vertices to draw, then you pass this list to the rendering command. Kd trees can be traversed on average in O(n log n) time.
It's important to understand the difference between clipping and culling. You appear to be talking about the latter.
Clipping means taking a triangle and literally cutting it into pieces to fit into the viewport. The OpenGL specification defines this process to happen post-vertex shader, for any triangle that is only partially in view.
Culling means throwing something away entirely. If a triangle is not entirely in view, it can therefore be culled. OpenGL does not say that culling has to happen. Remember: the OpenGL specification defines behavior, not performance.
That being said, hardware makers are not stupid. Obvious efforts like not rasterizing triangles that are outside of the viewport are easily implemented and improve performance. Pretty much any hardware that exists will do this.
Similarly, clipping is typically implemented (where possible) with rasterizer tricks, rather than by creating new triangles. Fragments that would be outside of the viewport simply aren't generated by the rasterizer. This is also legal according to OpenGL, because the spec defines apparent behavior. It doesn't really care if you actually cut the triangle into pieces as long as it looks indistinguishable form if you did.
Your question is essentially one of, "How much work should I do to not render off-screen objects?" That really depends on what your scene is and how you're rendering it. You say you're rendering 100,000 sprites. Are you making 100,000 draw calls, or are these sprites part of larger structures that you render with larger granularity? Do you stream the vertex data to the GPU every frame, or is the vertex data static?
Clipping and culling happen before fragment processing. http://www.opengl.org/wiki/Rendering_Pipeline_Overview
However, you will still be passing 100000 * 4 vertices (assuming you're rendering the sprites with quads and not point sprites) to the card if you don't do culling yourself. Depending on the card's memory performance this can be an issue.