D3D12 Use backbuffer surface as unordered access view (UAV) - compute-shader

Im making a simple raytracer for a schoolproject were a compute shader is supposed to be used to shade a triangle or some other primitive.
For this I'd like to write to a backbuffer-surface directly in the compute shader, to then present the results imideatly. I know for certain that this is possible in DX11 though i can't seem to get it to work in DX12.
I couldn't gather that much information about this, but i found this gamedev thread discussing the exact same problem I try to figure out and they seem to come to the conclusion which was my go to workaround: writing to an intermediate texture and then sampling in a pipeline.
I can't fully accept that this would be impossible to achieve in dx12. Why would that feature be removed? Could it be that the queuing-systems removes some overhead that makes it unnecessary to have this feature?
Is there any way to achieve a raytracer without writing to a separate texture and then sampling in a pipeline or copy it onto the back-buffer? What are my best alternatives for achieving performance?

You will have to access the answer. They removed the capability to create an UAV the same way they removed the capability to use multisample surface in the swapchain.
The problem with authorizing UAV on the swapchain surface is that they would have to forfeit tracking of what is happening to it. DX12 rely on descriptor heaps that are 100% volatile at runtime for UAVs ( render targets are CPU side only and can be tracked ).
Microsoft need to track the swapchain surface status strongly in order to guarantee behavior with the desktop presentation and for that reason, they choose to deny the UAV binding.

Related

Custom rendering with GPU, Direct3D or OpenGL

I have a Windows application that currently renders graphics largely using MFC that I'd like to change to get better use out of the GPU. Most of the graphics are straightforward and could easily be built up into a scene graph, but some of the graphics could prove very difficult. Specifically, in addition to the normal mesh type objects, I'm also dealing with point clouds which are liable to contain billions of Cartesian stored in a very compact manner that use quite a lot of custom culling techniques to be displayed in real time (Example). What I'm looking for is a mechanism that does the bulk of the scene rendering to a buffer and then gives me access to that buffer, a z buffer, and camera parameters such that I can modify them before putting them out to the display. I'm wondering whether this is possible with Direct3D, OpenGL or possibly use a higher level framework like OpenSceneGraph, and what would be the best starting point? Given the software is Windows based, I'd probably prefer to use Direct3D as this is likely to lead to fewest driver issues which I'm eager to avoid. OpenSceneGraph seems to provide custom culling via octrees, which are close but not identical to what I'm using.
Edit: To clarify a bit more, currently I have the following;
A display list / scene in memory which will typically contain up to a few million triangles, lines, and pieces of text, which I cull in software and output to a bitmap using low performing drawing primitives
A point cloud in memory which may contain billions of points in a highly compressed format (~4.5 bytes per 3d point) which I cull and output to the same bitmap
Cursor information that gets added to the bitmap prior to output
A camera, z-buffer and attribute buffers for navigation and picking purposes
The slow bit is the highlighted part of section 1 which I'd like to replace with GPU rendering of some kind. The solution I envisage is to build a scene for the GPU, render it to a bitmap (with matching z-buffer) based on my current camera parameters and then add my point cloud prior to output.
Alternatively, I could move to a scene based framework that managed the cameras and navigation for me and provide points in view as spheres or splats based on volume and level of detail during the rendering loop. In this scenario I'd also need to be able add cursor information to the view.
In either scenario, the hosting application will be MFC C++ based on VS2017 which would require too much work to change for the purposes of this exercise.
It's hard to say exactly based on your description of a complex problem.
OSG can probably do what you're looking for.
Depending on your timeframe, I'd consider eschewing both OpenGL (OSG) and DirectX in favor of the newer Vulkan 3D API. It's a successor to both D3D and OGL, and is designed by the GPU manufacturers themselves to provide optimal performance exceeding both of its predecessors.
The OSG project is currently developing a Vulkan scenegraph known as VSG, which already demonstrates superior performance to OSG and will have more generalized culling ability.
I've worked a bunch with point clouds and am pretty experienced with them, but I'm not exactly clear on what you're proposing to do.
If you want to actually have a verbal discussion about the matter, I'm pretty easy to find (my company is AlphaPixel -- AlphaPixel.com) and you could call us. I'm in the European time zone right now, it's not clear from your question where you are but you sound US-based.

Does a 3D engine needs to analyse all the map objects before rendering

Does a 3D engine needs to analyse every single object on the map to see if it's gonna be rendered or not. My understanding is that a line from the center of projection to a pixel in the view plan, the engine will find the closest plan that intersect with it, but wouldn't that mean that for each pixel the engine needs to analyse all objects in the map, is there a way to limits the objects analysed.
Thanks for your help.
Such procedure are called frustum-culling algorithm.
You can also find more information about it here :-
https://en.wikipedia.org/wiki/Viewing_frustum (wiki)
http://www.lighthouse3d.com/tutorials/view-frustum-culling/
http://www.cse.chalmers.se/~uffe/vfc.pdf (better but hard to read)
IMHO, this last link is similar as what Nico Schertler mentioned in comment.
Beware, what you seek for is not the same as "occlusion culling" (another related link "Most efficient algorithm for mesh-level, optimal occlusion culling? ) ", which is another optimization when an object is totally hidden behind another one.
Note that most game-engine render by object (a pack of many triangles - via draw calls, roughly speaking ), not by tracing each pixel (ray-tracing) as you might understand.
Ray-tracing is too expensive in most real-time application.

When to use VK_IMAGE_LAYOUT_GENERAL

It isn't clear to me when it's a good idea to use VK_IMAGE_LAYOUT_GENERAL as opposed to transitioning to the optimal layout for whatever action I'm about to perform. Currently, my policy is to always transition to the optimal layout.
But VK_IMAGE_LAYOUT_GENERAL exists. Maybe I should be using it when I'm only going to use a given layout for a short period of time.
For example, right now, I'm writing code to generate mipmaps using vkCmdBlitImage. As I loop through the sub-resources performing the vkCmdBlitImage commands, should I transition to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL as I scale down into a mip, then transition to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL when I'll be the source for the next mip before finally transitioning to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL when I'm all done? It seems like a lot of transitioning, and maybe generating the mips in VK_IMAGE_LAYOUT_GENERAL is better.
I appreciate the answer might be to measure, but it's hard to measure on all my target GPUs (especially because I haven't got anything running on Android yet) so if anyone has any decent rule of thumb to apply it would be much appreciated.
FWIW, I'm writing Vulkan code that will run on desktop GPUs and Android, but I'm mainly concerned about performance on the latter.
You would use it when:
You are lazy
You need to map the memory to host (unless you can use PREINITIALIZED)
When you use the image as multiple incompatible attachments and you have no choice
For Store Images
( 5. Other cases when you would switch layouts too much (and you don't even need barriers) relatively to the work done on the images. Measurement needed to confirm GENERAL is better in that case. Most likely a premature optimalization even then.
)
PS: You could transition all the mip-maps together to TRANSFER_DST by a single command beforehand and then only the one you need to SRC. With a decent HDD, it should be even best to already have them stored with mip-maps, if that's a option (and perhaps even have a better quality using some sophisticated algorithm).
PS2: Too bad, there's not a mip-map creation command. The cmdBlit most likely does it anyway under the hood for Images smaller than half resolution....
If you read from mipmap[n] image for creating the mipmap[n+1] image then you should use the transfer image flags if you want your code to run on all Vulkan implementations and get the most performance across all implementations as the flags may be used by the GPU to optimize the image for reads or writes.
So if you want to go cross-vendor only use VK_IMAGE_LAYOUT_GENERAL for setting up the descriptor that uses the final image and not image reads or writes.
If you don't want to use that many transitions you may copy from a buffer instead of an image, though you obviously wouldn't get the format conversion, scaling and filtering that vkCmdBlitImage does for you for free.
Also don't forget to check if the target format actually supports the BLIT_SRC or BLIT_DST bits. This is independent of whether you use the transfer or general layout for copies.

OpenGL lights, textures, etc. correct way?

Until this moment I've only implemented all the effects in GLSL shaders using inputs, outputs and uniforms, except for a couple of really essential constants like gl_Position, etc. I've read several tutorials, had a lecture on computer graphics and everytime all they implement things by looking at physical model and calculating all the stuff using input values and uniforms. That is a kind of the way I thought it all works.
Now I faced the fact, that there are much more GLSL things, like glLight* API functions and gl_LightSource, gl_Texture constants in GLSL with a big set of light types and lighting models predefined. Seems to be a kind of different way of programming shaders.
I wonder if there are any advantages/disadvantages using one or other way? Did I miss something very important? It looks I'm doing a lot of redundant work.
All the glLight* calls you might find in both GLSL and the OpenGL API are from the old and deprecated fixed-function pipeline!
Now you must do all the calculations yourself through Shaders, as I can guess you're already doing.
Why did they "remove" all the awesome stuff?
They "removed" (deprecated) the Matrix Stack, Light calls, Immediate Mode Rendering, etc. etc. etc. and the list goes one for various reason. But the overall reason is that it's better to implement and control those things yourself.
It requires more work from our side implementing and controlling all those things, though you're in total control of everything and when you actually want to use something.
Using the fixed-function pipeline OpenGL would allocate and load various things you might never even wanted to use.
Also when talking about the Matrix Stack as an example, you would usually (the lazy way) make OpenGL re-calculate the Matrix Stack each render call, using the old glPushMatrix(), glPopMatrix(), glTranslate*(), etc. functions. Now because YOU HAVE TO, you are forced to do all those calculations and handling the Matrices yourself. So now you would realize that most of the Matrices and much more could simply be allocated and calculated once, or atleast not every render call.
Of course they didn't deprecated Immediate Mode Rendering, because we need to implement that ourselves, now we simply need to use Buffers, because they are so much better in every way.
Extra
If you want a great spreadsheet that shows which function are deprecated and which are core functions, and extension functions, etc. Then take a look here, though be aware that this spreadsheet is made by people who use OpenGL and not by the Khronos Group (current developers of OpenGL) nor Silicon Graphics (the creators of OpenGL).
Ignore glLightXXX functions, the related gl_LightXXX variables and all the documentation associated with them. It's all deprecated and if you look closely at the docs, you'll probably that it's several years old or specifically designed for versions of OpenGL <= 2.x. Instead continue to work with your own vertex attributes and set up lighting configuration in your own uniforms however you please based on the model of lighting you want to implement. It's more work, but it's more flexible in the long run.
The OpenGL lighting model that uses glLight pre-dates the programmable shader pipeline, and represent a particular way of doing lighting in the fixed function pipeline.
Once GLSL entered the scene it was possible to use the OpenGL lighting model in conjunction with shaders. You could use the same glLight function and it's related functions to set up your lighting parameters but then write shaders that used the same information in different ways, allowing per-pixel lighting calculations.
Textures are a little more murky, because OpenGL still has a texture model and many of the GL functions relating to textures are still valid, though some are deprecated. However, any documentation that refers to GLSL variables like gl_Texture is similarly out of date. Current OpenGL uses sampler objects for texture access.
If you want to make sure you're doing it the 'modern' way, make sure you create a forward-compatible OpenGL profile of 3.3 or higher or 4.0 or higher, and make sure your shaders declare the appropriate version number as their first line like so:
#version 330
This will cause the use of any deprecated OpenGL function or deprecated shader variable to generate an error so that you know to avoid them.
Current graphics hardware offers an interface to customize any rendering step e.g Vertex Shading, Tesselation, Geometry shading, fragment shading and so on. GLSL is the language to programm or influence the rendering steps of the graphics hardware leveraging this interface.
The predefined function glLight, glTexture and so on belong to the deprecated fixed
graphics pipeline of opengl. Modern OpenGL still supports the functions of this fixed pipeline but it ist strongly recommended to use GLSL for the different rendering steps.
The glLight function is a fixed function which just influences Vertex Processing. So you can just achieve a per vertex shading, which not looks very realistic.
When you programm the lighting on your own within the fragment shader using GLSL you can directly influence any pixel.
So to summarize the main advantage is that a programmer is more flexible and is able to influence every kind of rendering step, which enables you to achieve sophisticated and realistic 3d graphics. The main disadvantage is. You need much more knowledge and (GLSL, graphics pipeline) and much more programming effort to achieve the same result as with fixed functions.
Best regards

Is it possible to read from a VBO?

I'm trying to make an OpenGL renderer that mashes various shapes into one large mesh and stores these in two VBOs, one GL_ARRAY_BUFFER and one GL_ELEMENT_ARRAY_BUFFER. I'm aiming for it to work on both OpenGL ES 2 and OpenGL 3.2 core. I am currently trying to find the best way to handle deleting shapes from within this mesh and my current approach is to periodically rebuild the entire thing, possibly on a background thread.
The problem is that in order to rebuild the new and clean mesh, I need access to the vertices / indices that have been written to the buffers using glMapBuffer. According to the documentation for GL_OES_mapbuffer, WRITE_ONLY_OES is the only acceptable parameter for 'access'.
So, I don't think the data pointed at there is reliable to read from in order to create my new buffers. I know there are other functions in GL Core that allow you to copy the buffer data, but these also seem to be missing.
Can anyone verify that this is not possible on ES 2.0 or give some approach for achieving buffer reading? My current solution is to keep a shadow copy of all the data, which is obviously not ideal.
I think that keeping a shadow copy of GPU data in main memory is much better than reading these data from GPU memory. It is recommended to discard previous data before using glMapBuffer anyway. Read this for more information (It will not give you direct answer to your question, but it might be usefull).