Using Vulkan occlusion queries to inform future rendering decisions? - vulkan

In the Vulkan spec section 17.3 (Queries > Occlusion Queries) it says:
Occlusion queries track the number of samples that pass the per-fragment tests for a set of drawing commands. The application can then use these results to inform future rendering decisions.
In what way would you use the number of samples that pass the fragment tests to inform future rendering decisions?
What's an example of such a rendering decision that would use the result of an occlusion query?

Related

In a 2d application where you're drawing a lot of individual sprites, will the rasterization stage inevitably become a bottleneck? [duplicate]

I'm in the processing of learning Vulkan, and I have just integrated ImGui into my code using the Vulkan-GLFW example in the original ImGui repo, and it works fine.
Now I want to render both the GUI and my 3D model on the screen at the same time, and since the GUI and the model definitely needs different shaders, I need to use multiple pipelines and submit multiples commands. The GUI is partly transparent, so I would like it to be rendered after the model. The Vulkan specs states that the execution order of commands are not likely to be the order that I record the commands, thus I need synchronization of some kind. In this Reddit post several methods of exactly achieving my goals was proposed, and I once believed that I must use multiple subpasses (together with subpass dependency) or barriers or other synchronization methods like that to solve this problem.
Then I had a look at SaschaWillems' Vulkan examples, in the ImGui example though, I see no synchronization between the two draw calls, it just record the command to draw the model first, and then the command to draw the GUI.
I am confused. Is synchronization really needed in this case, or did I misunderstand something about command re-ordering or blending? Thanks.
Think about what you're doing for a second. Why do you think there needs to be synchronization between the two sets of commands? Because the second set of commands needs to blend with the data in the first set, right? And therefore, it needs to do a read/modify/write (RMW), which must be able to read data written by the previous set of commands. The data being read has to have been written, and that typically requires synchronization.
But think a bit more about what that means. Blending has to read from the framebuffer to do its job. But... so does the depth test, right? It has to read the existing sample's depth value, compare it with the incoming fragment, and then discard the fragment or not based on the depth test. So basically every draw call that uses a depth test contains a framebuffer read/modify/wright.
And yet... your depth tests work. Not only do they work between draw calls without explicit synchronization, they also work within a draw call. If two triangles in a draw call overlap, you don't have any problem with seeing the bottom one through the top one, right? You don't have to do inter-triangle synchronization to make sure that the previous triangles' writes are finished before the reads.
So somehow, the depth test's RMW works without any explicit synchronization. So... why do you think that this is untrue of the blend stage's RMW?
The Vulkan specification states that commands, and stages within commands, will execute in a largely unordered way, with several exceptions. The most obvious being the presence of explicit execution barriers/dependencies. But it also says that the fixed-function per-sample testing and blending stages will always execute (as if) in submission order (within a subpass). Not only that, it requires that the triangles generated within a command also execute these stages (as if) in a specific, well-defined order.
That's why your depth test doesn't need synchronization; Vulkan requires that this is handled. This is also why your blending will not need synchronization (within a subpass).
So you have plenty of options (in order from fastest to slowest):
Render your UI in the same subpass as the non-UI. Just change pipelines as appropriate.
Render your UI in a subpass with an explicit dependency on the framebuffer images of the non-UI subpass. While this is technically slower, it probably won't be slower by much if at all. Also, this is useful for deferred rendering, since your UI needs to happen after your lighting pass, which will undoubtedly be its own subpass.
Render your UI in a different render pass. This would only really be needed for cases where you need to do some full-screen work (SSAO) that would force your non-UI render pass to terminate anyway.

Custom rendering with GPU, Direct3D or OpenGL

I have a Windows application that currently renders graphics largely using MFC that I'd like to change to get better use out of the GPU. Most of the graphics are straightforward and could easily be built up into a scene graph, but some of the graphics could prove very difficult. Specifically, in addition to the normal mesh type objects, I'm also dealing with point clouds which are liable to contain billions of Cartesian stored in a very compact manner that use quite a lot of custom culling techniques to be displayed in real time (Example). What I'm looking for is a mechanism that does the bulk of the scene rendering to a buffer and then gives me access to that buffer, a z buffer, and camera parameters such that I can modify them before putting them out to the display. I'm wondering whether this is possible with Direct3D, OpenGL or possibly use a higher level framework like OpenSceneGraph, and what would be the best starting point? Given the software is Windows based, I'd probably prefer to use Direct3D as this is likely to lead to fewest driver issues which I'm eager to avoid. OpenSceneGraph seems to provide custom culling via octrees, which are close but not identical to what I'm using.
Edit: To clarify a bit more, currently I have the following;
A display list / scene in memory which will typically contain up to a few million triangles, lines, and pieces of text, which I cull in software and output to a bitmap using low performing drawing primitives
A point cloud in memory which may contain billions of points in a highly compressed format (~4.5 bytes per 3d point) which I cull and output to the same bitmap
Cursor information that gets added to the bitmap prior to output
A camera, z-buffer and attribute buffers for navigation and picking purposes
The slow bit is the highlighted part of section 1 which I'd like to replace with GPU rendering of some kind. The solution I envisage is to build a scene for the GPU, render it to a bitmap (with matching z-buffer) based on my current camera parameters and then add my point cloud prior to output.
Alternatively, I could move to a scene based framework that managed the cameras and navigation for me and provide points in view as spheres or splats based on volume and level of detail during the rendering loop. In this scenario I'd also need to be able add cursor information to the view.
In either scenario, the hosting application will be MFC C++ based on VS2017 which would require too much work to change for the purposes of this exercise.
It's hard to say exactly based on your description of a complex problem.
OSG can probably do what you're looking for.
Depending on your timeframe, I'd consider eschewing both OpenGL (OSG) and DirectX in favor of the newer Vulkan 3D API. It's a successor to both D3D and OGL, and is designed by the GPU manufacturers themselves to provide optimal performance exceeding both of its predecessors.
The OSG project is currently developing a Vulkan scenegraph known as VSG, which already demonstrates superior performance to OSG and will have more generalized culling ability.
I've worked a bunch with point clouds and am pretty experienced with them, but I'm not exactly clear on what you're proposing to do.
If you want to actually have a verbal discussion about the matter, I'm pretty easy to find (my company is AlphaPixel -- AlphaPixel.com) and you could call us. I'm in the European time zone right now, it's not clear from your question where you are but you sound US-based.

Does a 3D engine needs to analyse all the map objects before rendering

Does a 3D engine needs to analyse every single object on the map to see if it's gonna be rendered or not. My understanding is that a line from the center of projection to a pixel in the view plan, the engine will find the closest plan that intersect with it, but wouldn't that mean that for each pixel the engine needs to analyse all objects in the map, is there a way to limits the objects analysed.
Thanks for your help.
Such procedure are called frustum-culling algorithm.
You can also find more information about it here :-
https://en.wikipedia.org/wiki/Viewing_frustum (wiki)
http://www.lighthouse3d.com/tutorials/view-frustum-culling/
http://www.cse.chalmers.se/~uffe/vfc.pdf (better but hard to read)
IMHO, this last link is similar as what Nico Schertler mentioned in comment.
Beware, what you seek for is not the same as "occlusion culling" (another related link "Most efficient algorithm for mesh-level, optimal occlusion culling? ) ", which is another optimization when an object is totally hidden behind another one.
Note that most game-engine render by object (a pack of many triangles - via draw calls, roughly speaking ), not by tracing each pixel (ray-tracing) as you might understand.
Ray-tracing is too expensive in most real-time application.

OpenGL lights, textures, etc. correct way?

Until this moment I've only implemented all the effects in GLSL shaders using inputs, outputs and uniforms, except for a couple of really essential constants like gl_Position, etc. I've read several tutorials, had a lecture on computer graphics and everytime all they implement things by looking at physical model and calculating all the stuff using input values and uniforms. That is a kind of the way I thought it all works.
Now I faced the fact, that there are much more GLSL things, like glLight* API functions and gl_LightSource, gl_Texture constants in GLSL with a big set of light types and lighting models predefined. Seems to be a kind of different way of programming shaders.
I wonder if there are any advantages/disadvantages using one or other way? Did I miss something very important? It looks I'm doing a lot of redundant work.
All the glLight* calls you might find in both GLSL and the OpenGL API are from the old and deprecated fixed-function pipeline!
Now you must do all the calculations yourself through Shaders, as I can guess you're already doing.
Why did they "remove" all the awesome stuff?
They "removed" (deprecated) the Matrix Stack, Light calls, Immediate Mode Rendering, etc. etc. etc. and the list goes one for various reason. But the overall reason is that it's better to implement and control those things yourself.
It requires more work from our side implementing and controlling all those things, though you're in total control of everything and when you actually want to use something.
Using the fixed-function pipeline OpenGL would allocate and load various things you might never even wanted to use.
Also when talking about the Matrix Stack as an example, you would usually (the lazy way) make OpenGL re-calculate the Matrix Stack each render call, using the old glPushMatrix(), glPopMatrix(), glTranslate*(), etc. functions. Now because YOU HAVE TO, you are forced to do all those calculations and handling the Matrices yourself. So now you would realize that most of the Matrices and much more could simply be allocated and calculated once, or atleast not every render call.
Of course they didn't deprecated Immediate Mode Rendering, because we need to implement that ourselves, now we simply need to use Buffers, because they are so much better in every way.
Extra
If you want a great spreadsheet that shows which function are deprecated and which are core functions, and extension functions, etc. Then take a look here, though be aware that this spreadsheet is made by people who use OpenGL and not by the Khronos Group (current developers of OpenGL) nor Silicon Graphics (the creators of OpenGL).
Ignore glLightXXX functions, the related gl_LightXXX variables and all the documentation associated with them. It's all deprecated and if you look closely at the docs, you'll probably that it's several years old or specifically designed for versions of OpenGL <= 2.x. Instead continue to work with your own vertex attributes and set up lighting configuration in your own uniforms however you please based on the model of lighting you want to implement. It's more work, but it's more flexible in the long run.
The OpenGL lighting model that uses glLight pre-dates the programmable shader pipeline, and represent a particular way of doing lighting in the fixed function pipeline.
Once GLSL entered the scene it was possible to use the OpenGL lighting model in conjunction with shaders. You could use the same glLight function and it's related functions to set up your lighting parameters but then write shaders that used the same information in different ways, allowing per-pixel lighting calculations.
Textures are a little more murky, because OpenGL still has a texture model and many of the GL functions relating to textures are still valid, though some are deprecated. However, any documentation that refers to GLSL variables like gl_Texture is similarly out of date. Current OpenGL uses sampler objects for texture access.
If you want to make sure you're doing it the 'modern' way, make sure you create a forward-compatible OpenGL profile of 3.3 or higher or 4.0 or higher, and make sure your shaders declare the appropriate version number as their first line like so:
#version 330
This will cause the use of any deprecated OpenGL function or deprecated shader variable to generate an error so that you know to avoid them.
Current graphics hardware offers an interface to customize any rendering step e.g Vertex Shading, Tesselation, Geometry shading, fragment shading and so on. GLSL is the language to programm or influence the rendering steps of the graphics hardware leveraging this interface.
The predefined function glLight, glTexture and so on belong to the deprecated fixed
graphics pipeline of opengl. Modern OpenGL still supports the functions of this fixed pipeline but it ist strongly recommended to use GLSL for the different rendering steps.
The glLight function is a fixed function which just influences Vertex Processing. So you can just achieve a per vertex shading, which not looks very realistic.
When you programm the lighting on your own within the fragment shader using GLSL you can directly influence any pixel.
So to summarize the main advantage is that a programmer is more flexible and is able to influence every kind of rendering step, which enables you to achieve sophisticated and realistic 3d graphics. The main disadvantage is. You need much more knowledge and (GLSL, graphics pipeline) and much more programming effort to achieve the same result as with fixed functions.
Best regards

What's the fastest force-directed network graph engine for large data sets? [duplicate]

We currently have a dynamically updated network graph with around 1,500 nodes and 2,000 edges. It's ever-growing. Our current layout engine uses Prefuse - the force directed layout in particular - and it takes about 10 minutes with a hefty server to get a nice, stable layout.
I've looked a little GraphViz's sfpd algorithm, but haven't tested it yet...
Are there faster alternatives I should look at?
I don't care about the visual appearance of the nodes and edges - we process that separately - just putting x, y on the nodes.
We do need to be able to tinker with the layout properties for specific parts of the graph, for instance, applying special tighter or looser springs for certain nodes.
Thanks in advance, and please comment if you need more specific information to answer!
EDIT: I'm particularly looking for speed comparisons between the layout engine options. Benchmarks, specific examples, or just personal experience would suffice!
I wrote a JavaScript-based graph drawing library VivaGraph.js.
It calculates layout and renders graph with 2K+ vertices, 8.5K edges in ~10-15 seconds. If you don't need rendering part it should be even faster.
Here is a video demonstrating it in action: WebGL Graph Rendering With VivaGraphJS.
Online demo is available here. WebGL is required to view the demo but is not needed to calculate graphs layouts. The library also works under node.js, thus could be used as a service.
Example of API usage (layout only):
var graph = Viva.Graph.graph(),
layout = Viva.Graph.Layout.forceDirected(graph);
graph.addLink(1, 2);
layout.run(50); // runs 50 iterations of graph layout
// print results:
graph.forEachNode(function(node) { console.log(node.position); })
Hope this helps :)
I would have a look at OGDF, specifically http://www.ogdf.net/doku.php/tech:howto:frcl
I have not used OGDF, but I do know that Fast Multipole Multilevel is a good performant algorithm and when you're dealing with the types of runtimes involved with force directed layout with the number of nodes you want, that matters a lot.
Why, among other reasons, that algorithm is awesome: Fast Multipole method. The fast multipole method is a matrix multiplication approximation which reduces the O() runtime of matrix multiplication for approximation to a small degree. Ideally, you'd have code from something like this: http://mgarland.org/files/papers/layoutgpu.pdf but I can't find it anywhere; maybe a CUDA solution isn't up your alley anyways.
Good luck.
The Gephi Toolkit might be what you need: some layouts are very fast yet with a good quality: http://gephi.org/toolkit/
30 secondes to 2 minutes are enough to layout such a graph, depending on your machine.
You can use the ForAtlas layout, or the Yifan Hu Multilevel layout.
For very large graphs (+50K nodes and 500K links), the OpenOrd layout wil
In a commercial scenario, you might also want to look at the family of yFiles graph layout and visualization libraries.
Even the JavaScript version of it can perform layouts for thousands of nodes and edges using different arrangement styles. The "organic" layout style is an implementation of a force directed layout algorithm similar in nature to the one used in Neo4j's browser application. But there are a lot more layout algorithms available that can give better visualizations for certain types of graph structures and diagrams. Depending on the settings and structure of the problem, some of the algorithms take only seconds, while more complex implementations can also bring your JavaScript engine to its knees. The Java and .net based variants still perform quite a bit better, as of today, but the JavaScript engines are catching up.
You can play with these algorithms and settings in this online demo.
Disclaimer: I work for yWorks, which is the maker of these libraries, but I do not represent my employer on SO.
I would take a look at http://neo4j.org/ its open source which is beneficial in your case so you can customize it to your needs. The github account can be found here.