Custom rendering with GPU, Direct3D or OpenGL - rendering

I have a Windows application that currently renders graphics largely using MFC that I'd like to change to get better use out of the GPU. Most of the graphics are straightforward and could easily be built up into a scene graph, but some of the graphics could prove very difficult. Specifically, in addition to the normal mesh type objects, I'm also dealing with point clouds which are liable to contain billions of Cartesian stored in a very compact manner that use quite a lot of custom culling techniques to be displayed in real time (Example). What I'm looking for is a mechanism that does the bulk of the scene rendering to a buffer and then gives me access to that buffer, a z buffer, and camera parameters such that I can modify them before putting them out to the display. I'm wondering whether this is possible with Direct3D, OpenGL or possibly use a higher level framework like OpenSceneGraph, and what would be the best starting point? Given the software is Windows based, I'd probably prefer to use Direct3D as this is likely to lead to fewest driver issues which I'm eager to avoid. OpenSceneGraph seems to provide custom culling via octrees, which are close but not identical to what I'm using.
Edit: To clarify a bit more, currently I have the following;
A display list / scene in memory which will typically contain up to a few million triangles, lines, and pieces of text, which I cull in software and output to a bitmap using low performing drawing primitives
A point cloud in memory which may contain billions of points in a highly compressed format (~4.5 bytes per 3d point) which I cull and output to the same bitmap
Cursor information that gets added to the bitmap prior to output
A camera, z-buffer and attribute buffers for navigation and picking purposes
The slow bit is the highlighted part of section 1 which I'd like to replace with GPU rendering of some kind. The solution I envisage is to build a scene for the GPU, render it to a bitmap (with matching z-buffer) based on my current camera parameters and then add my point cloud prior to output.
Alternatively, I could move to a scene based framework that managed the cameras and navigation for me and provide points in view as spheres or splats based on volume and level of detail during the rendering loop. In this scenario I'd also need to be able add cursor information to the view.
In either scenario, the hosting application will be MFC C++ based on VS2017 which would require too much work to change for the purposes of this exercise.

It's hard to say exactly based on your description of a complex problem.
OSG can probably do what you're looking for.
Depending on your timeframe, I'd consider eschewing both OpenGL (OSG) and DirectX in favor of the newer Vulkan 3D API. It's a successor to both D3D and OGL, and is designed by the GPU manufacturers themselves to provide optimal performance exceeding both of its predecessors.
The OSG project is currently developing a Vulkan scenegraph known as VSG, which already demonstrates superior performance to OSG and will have more generalized culling ability.
I've worked a bunch with point clouds and am pretty experienced with them, but I'm not exactly clear on what you're proposing to do.
If you want to actually have a verbal discussion about the matter, I'm pretty easy to find (my company is AlphaPixel -- AlphaPixel.com) and you could call us. I'm in the European time zone right now, it's not clear from your question where you are but you sound US-based.

Related

Is programming a voxel based graphics API theoretically possible?

This is entirely a theoretical question because I understand the time it would take to do such a thing would be ridiculous
I've been working with "voxels" a lot lately and the only way I can display them to a user is to either triangulate the visible surfaces or make a CPU ray-tracer but both come with their own problems.
Simply put, if we dismiss the storage space needed for voxel meshs and targeted a very specific GPU would someone who was wanting to create a graphics API like OpenGL but with "true" voxel primitives that don't need to be converted be able to make such thing or are GPUs designed specifically for triangles with no way to introduce a new base primitive?
Its possible and it was already done many times
games like Minecraft,SpaceEngineers...
3D printing tools and slicers
MRI/PET scans tools
Yes rendering on GPU is possible with the two base methods you mention. Games usually use the transform to boundary representation 3D geometry. With rise of shaders even ray tracers are now possible here mine:
simple GLSL voxel ray tracer
using native OpenGL architecture and passing geometry as 3D texture. In order to obtain speed you need to add BVH or similar spatial subdivision of geometry...
However voxel based tools have been here for quite some time. For example many isometric games/engines are voxel based (tile is a voxel) like this one:
Improving performance of click detection on a staggered column isometric grid
Also do you remember UFO ? It was playable on x286 and it was also "voxel/tile" based isometric.

D3D12 Use backbuffer surface as unordered access view (UAV)

Im making a simple raytracer for a schoolproject were a compute shader is supposed to be used to shade a triangle or some other primitive.
For this I'd like to write to a backbuffer-surface directly in the compute shader, to then present the results imideatly. I know for certain that this is possible in DX11 though i can't seem to get it to work in DX12.
I couldn't gather that much information about this, but i found this gamedev thread discussing the exact same problem I try to figure out and they seem to come to the conclusion which was my go to workaround: writing to an intermediate texture and then sampling in a pipeline.
I can't fully accept that this would be impossible to achieve in dx12. Why would that feature be removed? Could it be that the queuing-systems removes some overhead that makes it unnecessary to have this feature?
Is there any way to achieve a raytracer without writing to a separate texture and then sampling in a pipeline or copy it onto the back-buffer? What are my best alternatives for achieving performance?
You will have to access the answer. They removed the capability to create an UAV the same way they removed the capability to use multisample surface in the swapchain.
The problem with authorizing UAV on the swapchain surface is that they would have to forfeit tracking of what is happening to it. DX12 rely on descriptor heaps that are 100% volatile at runtime for UAVs ( render targets are CPU side only and can be tracked ).
Microsoft need to track the swapchain surface status strongly in order to guarantee behavior with the desktop presentation and for that reason, they choose to deny the UAV binding.

Rendering multiple models bug on DirectX 12

I am trying to render multiple models on DirectX 12 using only one graphic context, but the result is very weird and I have not much idea what is the reason. Rendering result of the sponza model from outside, the one on right is the correct result and the one on left has problem.
Rendering result of the left sponza (the one has problem) from inside.
Even the loaded two meshes are the same, each model has its own vertex buffer, index buffer and SRVs. In the process of creating graphics context, there is only one graphics context and set with each model's index and vertex buffer, and then I call the drawIndexed() function to render it. After the graphics context is created, we execute the graphics context once per frame. However, if we create an individual graphics context to each model and execute all graphics contexts per frame, the rendering works fine but the frame rate drops a lot.
It will be very helpful for you to provide any hints about what is the reason for the weird result, or providing a solution is even better. Thank you very much in advanced.
First, i would recommend you to stay away from dx12 and stick to dx11, unless you are a dx11 expert already and that you are the top 1% application case, like triple A games or very specific high demand on control over the gpu memory.
Without much details on your problems here, i can only give you a few basic advices :
Run with the debug layer and look at the console log with D3D12GetDebugInterface ( you will need to install the optional feature named graphic tools )
Use frame capture tools, like VSGD in visual studio or nsight from nVidia and inspect your frame step by step
Use Dx11, really

Effort Required to make 3D Game Engine?

For the sake of theory (and general understanding),
I would like to understand in a moderately exhaustive list of all the things that must be done in order to create a "modern" 3D Game Engine (from a coder's perspective)
I seem to have a hard time finding this information anywhere else, so I think that you guys at Stack overflow will have the knowledge I seek.
In terms of "moderately exhaustive", I mean such things as a general explanation of the design stages of such engine, such as Binary Space Partitioning, then actual implementation of such an engine, and the uses of the software ( it would be helpful if the means of rendering other than BSP could be explained).
I don't want to make a 3D Engine, but simply understand what sheer amount of effort is required to make one.
Focusing on 3D rendering alone:
Binary space partitioning, like many elements of 3d rendering, is optional. In this case, it is an optimization, allowing the computer to do less work to render a scene, by cutting out invisible sections.
At its core, rendering is simply a five stage process. First, a list of triangles is generated. Next, the triangles are converted from 3-space to 2-space using matrix multiplication. Next, the triangles are filled in with pixels and meta information. Finally, the pixels are shaded individually using the meta-information. Extra finally, the pixels are drawn to the screen.
Most of those steps are partially or wholly done by a graphics card, meaning the programmer's job is to tell the card which step to perform and provide the input data.
This bare bones engine is not even close to a modern engine, however. Modern engines will be filled with optimizations like binary space partitioning, mesh simplification, background loading and texture compression. They will also be filled with special features like shadows, mirrors, mist and particle effects.
Modern engines have to be able to load and interpret textures and meshes, and in some cases, deform and modify both at runtime. The most common example would be interpolating between keyframes.
Engines may need to interact with game logic modules in order to reuse data for collision detection. Collision detection being the thing that determines if bullets hit something and also the thing that makes makes walls and floors real.

Non power of two textures and memory consumption optimization

I read somewhere that XNA framework upscales a texture to nearest power of two size and then sends that to VRAM, which, provided it's how it really works, might be not efficient when loading many small (in my case 150×150) textures, which essentially waste memory with unused texture data resulting from upscaling.
So is there some automatic optimization, or should I make my own implementation of it, like loading all textures, figuring out where the "upscaled" space is big enough to hold some other texture and place it there, remembering sprite positions, thus using one texture instead of two (or more)?
It isn't always handy to do this manually for each texture (placing many small sprites in a single texture), because it's hard to work with later (essentially it becomes less human-oriented), and not always a sprite will be needed in some level of a game, so it would be better if sprites were in a different composition, so it should be done automatically.
There are tools available to create what are known as "sprite sheets" or "texture atlases". This XNA sample does this for you as part of a content pipeline extension.
Note that the padding of textures only happens on devices that do not support non-power-of-two textures. Windows Phone, for example. Modern GPUs won't waste the RAM. However this is still a useful optimisation to allow you to merge batches of sprites (see this answer for details).