Rendering multiple models bug on DirectX 12 - rendering

I am trying to render multiple models on DirectX 12 using only one graphic context, but the result is very weird and I have not much idea what is the reason. Rendering result of the sponza model from outside, the one on right is the correct result and the one on left has problem.
Rendering result of the left sponza (the one has problem) from inside.
Even the loaded two meshes are the same, each model has its own vertex buffer, index buffer and SRVs. In the process of creating graphics context, there is only one graphics context and set with each model's index and vertex buffer, and then I call the drawIndexed() function to render it. After the graphics context is created, we execute the graphics context once per frame. However, if we create an individual graphics context to each model and execute all graphics contexts per frame, the rendering works fine but the frame rate drops a lot.
It will be very helpful for you to provide any hints about what is the reason for the weird result, or providing a solution is even better. Thank you very much in advanced.

First, i would recommend you to stay away from dx12 and stick to dx11, unless you are a dx11 expert already and that you are the top 1% application case, like triple A games or very specific high demand on control over the gpu memory.
Without much details on your problems here, i can only give you a few basic advices :
Run with the debug layer and look at the console log with D3D12GetDebugInterface ( you will need to install the optional feature named graphic tools )
Use frame capture tools, like VSGD in visual studio or nsight from nVidia and inspect your frame step by step
Use Dx11, really

Related

FabricJS v3.4.0: Filters & maxTextureSize - performance/size limitations

Intro:
I've been messing with fabricJS image filtering features in an attempt to start using them in my webapp, but i've run into the following.
It seems fabricJS by default only sets the image size cap (textureSize) on filters to be 2048, meaning the largest image is 2048x2048 pixels.
I've attempted to raise the default by calling fabric.isWebGLSupported() and then setting fabric.textureSize = fabric.maxTextureSize, but that still caps it at 4096x4096 pixels, even though my maxTextureSize on my device is in the 16000~ range.
I realize that devices usually report the full value without accounting for current memory actually available, but that still seems like a hard limitation.
So I guess the main issues I'm looking at here to start effectively using this feature:
1- Render blocking applyFilters() method:
The current filter application function seems to be render blocking in the browser, is there a way call it without blocking the rendering, so I can show an indeterministic loading spinner or something?
is it as simple as making the apply filter method async and calling it from somewhere else in the app? (I'm using vue for context, with webpack/babel which polyfills async/await etc.)
2- Size limits:
Is there a way to bypass the size limit on images? I'm looking to filter images up to 4800x7200 pixels
I can think of one way atleast to do this, which is to "break up" the image into smaller images, apply the filters, and then stitch it back together. But I worry it might be a performance hit, as there will be a lot of canvas exports & canvas initializations in this process.
I'm surprised fabricjs doesn't do this "chunking" by default as its quite a comprehensive library, and I think they've already gone to the point where they use webGL shaders (which is a black box to me) for filtering under the hood for performance, is there a better way to do this?
My other solution would be to send the image to a service (one i handroll, or a pre-existing paid one) that applies the filters somewhere in the cloud and returns it to the user, but thats not a solution i prefer to resort to just yet.
For context, i'm mostly using fabric.Canvas and fabric.StaticCanvas to initialize canvases in my app.
Any insights/help with this would be great.
i wrote the filtering backend for fabricJS, with Mr. Scott Seaward (credits to him too), and i can give you some answers.
Hard block to 2048
A lot of macbook with intel integrated only videocard report a max texture size of 4096, but then they crash the webgl instance at anything higher of 2280. This was happening widely in 2017 when the webgl filtering was written. 4096 would have left uncovered by default a LOT of notebooks. Do not forget mobile phones too.
You know your userbase, you can up the limit to what your video card allows and what canvas allows in your browser. The final image, for how big the texture can be, must be copied in a canvas and displayed. ( canvas has a different max size depending on browser and device )
Render blocking applyFilters() method
Webgl is sync for what i understood.
Creating a parallel executing in a thread for filtering operations that are in the order of 20-30 ms ( sometimes just a couple of ms in chrome ) seems excessive.
Also consider that i tried it but when more than 4 webgl context were open in firefox, some would have been dropped. So i decided for one at time.
The non webgl filtering take longer of course, that could be done probably in a separate thread, but fabricJS is a generic library that does both vectors and filterings and serialization, it has already lot of things on the plate, filtering performances are not that bad. But i'm open to argue around it.
Chunking
Shutterstock editor uses fabricJS and is the main reason why a webgl backend was written. The editor has also chunking and can filter with tiles of 2048 pixels bigger images. We did not release that as opensource and i do not plan of asking. That kind of tiling limit the kind of filters you can write because the code has knowledge of a limited portion of the image at time, even just blurring becomes complicated.
Here there is a description of the process of tiling, is written for casual reader and not only software engineers, is just a blog post.
https://tech.shutterstock.com/2019/04/30/canvas-webgl-filtering-concepts
Generic render blocking consideration
So fabricJS has some pre-written filters made with shaders.
The timing i note here are from my memory and not reverified
The time that pass away filtering an image is:
Uploading the image in the GPU ( i do not know how many ms )
Compiling the shader ( up to 40 ms, depends )
Running the shader ( like 2 ms )
Downloading the result on the GPU ( like 0ms or 13 depends on what method is using )
Now the first time you run a filter on a single image:
The image gets uploaded
Filter compiled
Shader Run
Result downloaded
The second time you do this:
Shader Run
Result downloaded
When a new filter is added or filter is changed:
New filter compiled
Shader or both shader run
Result downloaded
Most common errors in application building with filtering that i have noticed are:
You forget to remove old filters, leaving them active with a value near 0 that does not produce visual changes, but adds up time
You connect the filter to a slider change event, without throttling, and that depending on the browser/device brings up to 120 filtering operation per second.
Look at the official simple demo:
http://fabricjs.com/image-filters
Use the sliders to filter, apply even more filters, everything seems pretty smooth to me.

Can we control progressive rendering in the Viewer based on the distance to the camera?

We have to work with very large models and we're hoping to use the first person camera to walk through them, and eventually do this in VR. The progressive rendering does wonders for improving perceived responsiveness, but it can be disorienting to have so many items around you disappear as you move.
Is there any way to turn progressive rendering off but only for objects closest to the camera? Maybe a number of objects up to a maximum number, or objects within a radius from the camera. Everything further away can load in later and flicker during motion, but it would be nice to keep the nearby objects rendered, especially structural objects like stairs. I've often walked towards stairs just to have them disappear in front of me, forcing me to fly to a platform with E and Q instead of walking.
So far I've only found a way to toggle progressive rendering for the entire model on or off with viewer.setProgressiveRendering(bool) but I haven't found a way to customize the rendering behavior.
Per our Engineering's recommendation can you try set rendering targets with
viewer.impl.setFPSTargets(1, 5, 15) //min, target, max
In fact we've been having similar reports from other developers requesting similar capability to fine tune rendering with large models so our Engineering is considering their options to extend on existing functions and even build them into extensions.

Custom rendering with GPU, Direct3D or OpenGL

I have a Windows application that currently renders graphics largely using MFC that I'd like to change to get better use out of the GPU. Most of the graphics are straightforward and could easily be built up into a scene graph, but some of the graphics could prove very difficult. Specifically, in addition to the normal mesh type objects, I'm also dealing with point clouds which are liable to contain billions of Cartesian stored in a very compact manner that use quite a lot of custom culling techniques to be displayed in real time (Example). What I'm looking for is a mechanism that does the bulk of the scene rendering to a buffer and then gives me access to that buffer, a z buffer, and camera parameters such that I can modify them before putting them out to the display. I'm wondering whether this is possible with Direct3D, OpenGL or possibly use a higher level framework like OpenSceneGraph, and what would be the best starting point? Given the software is Windows based, I'd probably prefer to use Direct3D as this is likely to lead to fewest driver issues which I'm eager to avoid. OpenSceneGraph seems to provide custom culling via octrees, which are close but not identical to what I'm using.
Edit: To clarify a bit more, currently I have the following;
A display list / scene in memory which will typically contain up to a few million triangles, lines, and pieces of text, which I cull in software and output to a bitmap using low performing drawing primitives
A point cloud in memory which may contain billions of points in a highly compressed format (~4.5 bytes per 3d point) which I cull and output to the same bitmap
Cursor information that gets added to the bitmap prior to output
A camera, z-buffer and attribute buffers for navigation and picking purposes
The slow bit is the highlighted part of section 1 which I'd like to replace with GPU rendering of some kind. The solution I envisage is to build a scene for the GPU, render it to a bitmap (with matching z-buffer) based on my current camera parameters and then add my point cloud prior to output.
Alternatively, I could move to a scene based framework that managed the cameras and navigation for me and provide points in view as spheres or splats based on volume and level of detail during the rendering loop. In this scenario I'd also need to be able add cursor information to the view.
In either scenario, the hosting application will be MFC C++ based on VS2017 which would require too much work to change for the purposes of this exercise.
It's hard to say exactly based on your description of a complex problem.
OSG can probably do what you're looking for.
Depending on your timeframe, I'd consider eschewing both OpenGL (OSG) and DirectX in favor of the newer Vulkan 3D API. It's a successor to both D3D and OGL, and is designed by the GPU manufacturers themselves to provide optimal performance exceeding both of its predecessors.
The OSG project is currently developing a Vulkan scenegraph known as VSG, which already demonstrates superior performance to OSG and will have more generalized culling ability.
I've worked a bunch with point clouds and am pretty experienced with them, but I'm not exactly clear on what you're proposing to do.
If you want to actually have a verbal discussion about the matter, I'm pretty easy to find (my company is AlphaPixel -- AlphaPixel.com) and you could call us. I'm in the European time zone right now, it's not clear from your question where you are but you sound US-based.

D3D12 Use backbuffer surface as unordered access view (UAV)

Im making a simple raytracer for a schoolproject were a compute shader is supposed to be used to shade a triangle or some other primitive.
For this I'd like to write to a backbuffer-surface directly in the compute shader, to then present the results imideatly. I know for certain that this is possible in DX11 though i can't seem to get it to work in DX12.
I couldn't gather that much information about this, but i found this gamedev thread discussing the exact same problem I try to figure out and they seem to come to the conclusion which was my go to workaround: writing to an intermediate texture and then sampling in a pipeline.
I can't fully accept that this would be impossible to achieve in dx12. Why would that feature be removed? Could it be that the queuing-systems removes some overhead that makes it unnecessary to have this feature?
Is there any way to achieve a raytracer without writing to a separate texture and then sampling in a pipeline or copy it onto the back-buffer? What are my best alternatives for achieving performance?
You will have to access the answer. They removed the capability to create an UAV the same way they removed the capability to use multisample surface in the swapchain.
The problem with authorizing UAV on the swapchain surface is that they would have to forfeit tracking of what is happening to it. DX12 rely on descriptor heaps that are 100% volatile at runtime for UAVs ( render targets are CPU side only and can be tracked ).
Microsoft need to track the swapchain surface status strongly in order to guarantee behavior with the desktop presentation and for that reason, they choose to deny the UAV binding.

Working around WebGL readPixels being slow

I'm trying to use WebGL to speed up computations in a simulation of a small quantum circuit, like what the Quantum Computing Playground does. The problem I'm running into is that readPixels takes ~10ms, but I want to call it several times per frame while animating in order to get information out of gpu-land and into javascript-land.
As an example, here's my exact use case. The following circuit animation was created by computing things about the state between each column of gates, in order to show the inline-with-the-wire probability-of-being-on graphing:
The way I'm computing those things now, I'd need to call readPixels eight times for the above circuit (once after each column of gates). This is waaaaay too slow at the moment, easily taking 50ms when I profile it (bleh).
What are some tricks for speeding up readPixels in this kind of use case?
Are there configuration options that significantly affect the speed of readPixels? (e.g. the pixel format, the size, not having a depth buffer)
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Should I try to aggregate all the textures I'm reading into a single megatexture and sort things out after a single big read?
Should I be using a different method to get the information back out of the textures?
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
Should I try to make the readPixel calls all happen at once, after all the render calls have been made (maybe allows some pipelining)?
Yes, yes, yes. readPixels is fundamentally a blocking, pipeline-stalling operation, and it is always going to kill your performance wherever it happens, because it's sending a request for data to the GPU and then waiting for it to respond, which normal draw calls don't have to do.
Do readPixels as few times as you can (use a single combined buffer to read from). Do it as late as you can. Everything else hardly matters.
Should I be avoiding getting the information out at all, and doing all the layout and rendering gpu-side (urgh...)?
This will get you immensely better performance.
If your graphics are all like you show above, you shouldn't need to do any “layout” at all (which is good, because it'd be very awkward to implement) — everything but the text is some kind of color or boundary animation which could easily be done in a shader, and all the layout can be just a static vertex buffer (each vertex has attributes which point at which simulation-state-texel it should be depending on).
The text will be more tedious merely because you need to load all the digits into a texture to use as a spritesheet and do the lookups into that, but that's a standard technique. (Oh, and divide/modulo to get the digits.)
I don't know enough about your use case but just guessing, Why do you need to readPixels at all?
First, you don't need to draw text or your the static parts of your diagram in WebGL. Put another canvas or svg or img over the WebGL canvas, set the css so they overlap. Let the browser composite them. Then you don't have to do it.
Second, let's assume you have a texture that has your computed results in it. Can't you just then make some geometry that matches the places in your diagram that needs to have colors and use texture coords to look up the results from the correct places in the results texture? Then you don't need to call readPixels at all. That shader can use a ramp texture lookup or any other technique to convert the results to other colors to shade the animated parts of your diagram.
If you want to draw numbers based on the result you can use a technique like this so you'd make a shader at references the result shader to look at a result value and then indexes glyphs from another texture based on that.
Am I making any sense?