Are Acceleration Structures Mandatory for A Ray Tracing Pipeline - vulkan

My understanding is that an acceleration structure is used as an optimization within a ray-tracing pipeline to speed up potential intersection calculations before actually needing to invoke your intersection shader. Basically, if you had a scene of some complex geometry you wrap each geometry up in an acceleration structure, then you wrap multiple of these acceleration structures together to minimize the number of geometry you actually need to test against.
If however the scene you are ray tracing against is completely dynamic and all geometry is changing locations every tick, and to go one step further, every geometry is a perfect sphere, this acceleration structure does nothing but add pointless overhead. Is it somehow possible to create a pipeline and omit the acceleration structure, or if not create a custom acceleration structure identical to the intersection shader (in the case of the perfect sphere) so that you aren't forced to use an AABB accelerator?

Related

Particles passing through each other

I am writing a code to simulate particle movement. (currently 2D soon 3D hopefully)
The thing is, if I use a relatively large timestep particles end up passing through each other.
Do you have any suggestion that would allow me to correct that without using a really small step?
(it is in C++ if that makes much difference).
The use of timestep to advance the clock introduces model artifacts which can destroy the model validity, as is happening in your case. Use discrete event scheduling instead. This paper from Winter Simulation Conference 2005 describes how to implement movement in a discrete event framework. Your model will not only be more accurate, it will probably run much faster as well.
So you will have to do some sort of collision detection to see if two objects would collide.
Depending on your data structure the detection could take many forms. If you just have a list of points you would have to check all against each other in N^2 each step for the particle (adding the movement vector to create a larger spacial foot print). This could be done by the GJK algorithm.
Using some spacial data structure could reduce the complexity by only running the GJK on a pruned set of particles, i.e. no need to check if they impossible could overlap.

CGAL: Simplify convex polyhedra in 3D

I have been using the cgal library to generate convex hulls which are further used for discrete element simulations. Currently, I am trying to make the polyhedral particles break, which is right now implemented as plane clipping of the polyhedron. The problem is that after several (sometimes even one) clipping, the polyhedrons start having "bad" attributes, such as nearly degenerate faces, nearly coplanar edges or nearly degenerate edges, which cause problems in the contact calculation. I have been looking at CGAL/Surface_mesh_simplification routines and used the edge_collapse function, but it does not preserve convexity of the particles. Is there any way to use routines from cgal for convex polyhedra simplifications while preserving convexity?
You can try using the function isotropic_remeshing(). While there is no guarantee that the output will stay convex, the points are guaranteed to be on the input mesh. If you have some sharp edges you want to preserve, you can specify it to the function and it will take them into account.

Optimization using VBO in OpenGL ES 2.0

My system is composed of several objects that represent quads. Each quad is represented by the same vertices and therefore, each object only stores matrices that represent the object's transformation through the world, and its own object space. During each render pass, after these matrices are updated with their frame transforms, they are multiplied with the current view and projection matrices to form the MVP matrix for that object. The objects vertices are then sent with the MVP matrix to the shader, where the vertices are multiplied by the MVP matrix. The inefficiency here is that each quad is drawn separately, meaning there is a separate call glDrawElements for each quad. At any given moment, there may be 50 or 60 quads in existence, some move out of scope and are destroyed or their animation may complete, so they're also destroyed, but more will randomly enter existence. Would there be a significant performance gain to storing all the necessary values in a VBO and just calling glDrawElements once during each pass?
Let's first reason about it with some simple mathematics:
At the moment you don't need to push any vertex data onto the GPU (each frame), but 12-16 floats matrix data per quad, and perform a matrix-matrix multiplication per quad on the CPU.
When putting all in one VBO, you have to transfer 4 vertices (~12 floats) per quad, but no matrix data (except for the global VP, of course) and you have to do 4 matrix-vector multiplies (~1 matrix-matrix multiply) on the CPU.
So the amount of work and data transferred doesn't really change much. But what changes is, that the transferred data is shifted from many many small uniform updates to a single large VBO update, which is very likely to be faster (both because a buffer update is likely to be faster from the hardware side than multiple uniform updates, but don't nail me on that, and second because of the much reduced driver overhead). And on top of that comes the even more reduced overhead by using a single large draw call instead of many smaller.
So yes, it will certainly be worth a try, though it has to be evaluated if it is really a "significant" improvement in your particular application.
Would there be a significant performance gain to storing all the
necessary values in a VBO and just calling glDrawElements once during
each pass?
Yes, it would be much faster. First reason, as you correctly identified, will be a single glDrawElements call. And second being the fact that VBO keeps the data in the GPU itself.
If quads move out of scope you can reuse their memory for the new quads. VBO's can be used to draw subregions of the buffer, so you can get big flexibility without memory allocations.
By using VBO's you are minimising interaction with the GPU and so getting the performance benefit.

Tweaking Heightmap Generation For Hexagon Grids

Currently I'm working on a little project just for a bit of fun. It is a C++, WinAPI application using OpenGL.
I hope it will turn into a RTS Game played on a hexagon grid and when I get the basic game engine done, I have plans to expand it further.
At the moment my application consists of a VBO that holds vertex and heightmap information. The heightmap is generated using a midpoint displacement algorithm (diamond-square).
In order to implement a hexagon grid I went with the idea explained here. It shifts down odd rows of a normal grid to allow relatively easy rendering of hexagons without too many further complications (I hope).
After a few days it is beginning to come together and I've added mouse picking, which is implemented by rendering each hex in the grid in a unique colour, and then sampling a given mouse position within this FBO to identify the ID of the selected cell (visible in the top right of the screenshot below).
In the next stage of my project I would like to look at generating more 'playable' terrains. To me this means that the shape of each hexagon should be more regular than those seen in the image above.
So finally coming to my point, is there:
A way of smoothing or adjusting the vertices in my current method
that would bring all point of a hexagon onto one plane (coplanar).
EDIT:
For anyone looking for information on how to make points coplanar here is a great explination.
A better approach to procedural terrain generation that would allow
for better control of this sort of thing.
A way to represent my vertex information in a different way that allows for this.
To be clear, I am not trying to achieve a flat hex grid with raised edges or platforms (as seen below).
)
I would like all the geometry to join and lead into the next bit.
I'm hope to achieve something similar to what I have now (relatively nice undulating hills & terrain) but with more controllable plateaus. This gives me the flexibility of cording off areas (unplayable tiles) later on, where I can add higher detail meshes if needed.
Any feedback is welcome, I'm using this as a learning exercise so please - all comments welcome!
It depends on what you actually want and what you mean by "more controlled".
Do you want to be able to say "there will be a mountain on coordinates [11, -127] with radius 20"? Complexity of this this depends on how far you want to go. If you want just mountains, then radial gradients are enough (just add the gradient values to the noise values). But if you want some more complex shapes, you are in for a treat.
I explore this idea to great depth in my project (please consider that the published version is just a prototype, which is currently undergoing major redesign, it is completely usable a map generator though).
Another way is to make the generation much more procedural - you just specify a sequence of mathematical functions, which you apply on the terrain. Even a simple value transformation can get you very far.
All of these methods should work just fine for hex grid. If artefacts occur because of the odd-row shift, then you could interpolate the odd rows instead (just calculate the height value for the vertex from the two vertices between which it is located with simple linear interpolation formula).
Consider a function, which maps the purple line into the blue curve - it emphasizes lower located heights as well as very high located heights, but makes the transition between them steeper (this example is just a cosine function, making the curve less smooth would make the transformation more prominent).
You could also only use bottom half of the curve, making peaks sharper and lower located areas flatter (thus more playable).
"sharpness" of the curve can be easily modulated with power (making the effect much more dramatic) or square root (decreasing the effect).
Implementation of this is actually extremely simple (especially if you use the cosine function) - just apply the function on each pixel in the map. If the function isn't so mathematically trivial, lookup tables work just fine (with cubic interpolation between the table values, linear interpolation creates artefacts).
Several more simple methods of "gamification" of random noise terrain can be found in this paper: "Realtime Synthesis of Eroded Fractal Terrain for Use in Computer Games".
Good luck with your project

At what phase in rendering does clipping occur?

I've got some OpenGL drawing code that I'm trying to optimize. It's currently testing all drawing objects for visibility client-side before deciding whether or not to send rendering data to OpenGL. (This is easier than it sounds. It's drawing a 2D scene so clipping is trivial: just test against the current coordinates of the viewport rectangle.)
It occurs to me that the entire model could be greatly simplified by passing the entire scene to OpenGL and letting the GPU take care of the clipping. But sometimes the total can be very, very complex, involving up to 100,000 total sprites, most of which never get rendered because they're off-camera, and I'd prefer to not end up killing the framerate in the name of simplicity.
I'm using OpenGL 2.0, and I've got a pretty simple vertex shader and a much more complicated fragment shader. Is there any guarantee that says that if the vertex shader runs and determines coordinates that are completely off-camera for all vertices of a polygon, that a clipping test will be applied somewhere between there and the fragment shader and prevent the fragment shader from ever running for that polygon? And if so, is this automatic or is there something I need to do to enable it? I've looked around online for information on this but I haven't found anything conclusive...
Clipping happens after the vertex transform stage before and after the NDC space; clip planes are applied in clip space, viewport clipping is done in NDC space. That is one step before rasterizing. Clipping means, that a face only partially visible is "cut" by inserting new vertices at the visibility border, or fragments outside the viewport discarded. What you mean is usually called culling. Faces completely outside the viewport are culled, at the same stage like clipping.
From a performance point of view, the best code is code never executed, and the best data is data never accessed. So in your case sending off a single drawing call that makes the GPU process a large batch of vertices clearly takes load off the CPU, but it consumes GPU processing power. Culling those vertices before sending the drawing command consumes CPU power, but takes load off the GPU. The goal is to find the right balance. If the number of vertices is low, a simple brute force approach (just render the whole thing) may easily outperform ever other scheme.
However using a simple, yet effective data management scheme can greatly improve performance on both ends. For example a spatial subdivision structure like a Kd tree is easily built (you don't have to balance it). Sorting the vertices into the Kd tree you can omit (cull) large portions of the tree if one branch near to the root is completely outside the viewport. Preparing drawing a frame you iterate through the visible parts of the tree, building the list of vertices to draw, then you pass this list to the rendering command. Kd trees can be traversed on average in O(n log n) time.
It's important to understand the difference between clipping and culling. You appear to be talking about the latter.
Clipping means taking a triangle and literally cutting it into pieces to fit into the viewport. The OpenGL specification defines this process to happen post-vertex shader, for any triangle that is only partially in view.
Culling means throwing something away entirely. If a triangle is not entirely in view, it can therefore be culled. OpenGL does not say that culling has to happen. Remember: the OpenGL specification defines behavior, not performance.
That being said, hardware makers are not stupid. Obvious efforts like not rasterizing triangles that are outside of the viewport are easily implemented and improve performance. Pretty much any hardware that exists will do this.
Similarly, clipping is typically implemented (where possible) with rasterizer tricks, rather than by creating new triangles. Fragments that would be outside of the viewport simply aren't generated by the rasterizer. This is also legal according to OpenGL, because the spec defines apparent behavior. It doesn't really care if you actually cut the triangle into pieces as long as it looks indistinguishable form if you did.
Your question is essentially one of, "How much work should I do to not render off-screen objects?" That really depends on what your scene is and how you're rendering it. You say you're rendering 100,000 sprites. Are you making 100,000 draw calls, or are these sprites part of larger structures that you render with larger granularity? Do you stream the vertex data to the GPU every frame, or is the vertex data static?
Clipping and culling happen before fragment processing. http://www.opengl.org/wiki/Rendering_Pipeline_Overview
However, you will still be passing 100000 * 4 vertices (assuming you're rendering the sprites with quads and not point sprites) to the card if you don't do culling yourself. Depending on the card's memory performance this can be an issue.