I'm learning Vulkan.
So far, in the sample programs that I've done, I've uploaded enough vertices to the GPU for drawing one or two quads. I also uploaded an image to display on the quad. It has all been very static.
I'm now interested in doing some dynamic tests -- specially, creating and modifying sprites on the fly. I'm not sure how to go about it, so I'm hoping to get some pointers about possible techniques. I'm trying to create a toy 2d engine, for learning purposes.
Basically, I'm not sure what's the best way to maintain the vertex data up-to-date on the GPU. Below is the definition for one quad:
struct Vertex2d {
glm::vec3 mPos;
glm::vec2 mCoord;
};
// Vertices for one quad.
const std::vector<fmk::Vertex2d> quadVertices = {
{{-0.5f, -0.5f, 0.0f}, {0.0f, 0.0f}}, // Vert 0: Top left
{{0.5f, -0.5f, 0.0f}, {1.0f, 0.0f}}, // Vert 1: Top Right
{{0.5f, 0.5f, 0.0f}, {1.0f, 1.0f}}, // Vert 2: Bottom Right
{{-0.5f, 0.5f, 0.0f}, {0.0f, 1.0f}}, // Vert 3: Bottom Left
};
const std::vector<uint16_t> quadIndexes = {
0, 1, 2, 2, 3, 0,
};
The vertex data represents the quad's position, rotation, scale, texture coordinates. Any of those properties could, potentially, change every frame. Also, new sprites can, potentially, be created or destroyed every frame.
Any pointers on data structures, techniques, functions, or any other info to managing sprites using Vulkan is appreciated.
EDIT:
I should add that I'm trying to avoid brute-force uploading all the vertices every frame. I'm currently trying to implement a brute-force approach, so that I can compare it with a good solution, once I've learned of one.
If You want to change positions of vertices stored in a buffer, You have two options:
You upload them from the CPU
You calculate them on the GPU
There are no other ways for the data to appear on the GPU. You either transfer it or generate.
The first solution is the one You call a brute-force, but in many situations You cannot avoid it. One way or another You have to transfer data to the GPU so it can use it for rendering purposes. Besides, transfer rates are quite high in today's GPUs. It is possible to transfer several dozen of gigabytes per second.
The second option is more procedural and doesn't require data transfer between CPU and GPU. To do that You either create a formula to calculate positions in a vertex shader on the fly, based on a time or any other parameter (without changing the original values). The second option, similar to transform feedback, is to calculate positions in a compute shader, store them in a buffer and then use that buffer for drawing purposes. Here is an example from a Vulkan Cookbook which does exactly that - draws particles (sprites) whose positions are calculated in a compute shader.
And don't forget that You don't need to transfer all the data with each vertex. To render a quad You just need a single position (center) and potentially a horizontal and vertical scale (size of the quad in each dimension). Offsets, rotations, translations and other operations don't need to be passes with each vertex, but only for the whole quad. So this also limits the size of the data You may need to transfer.
Related
Trying to optimize a falling sand simulation and I'm implementing optimizations that the noita devs talked about in their GDC talk. At around 10:45 they talk about how they use dirty rects. I've started trying to implement a similar system.
Currently, I am able to create a dirty rect that covers the particles that need updating. I do this by every time a valid particle(particle is not air or solid like a wall) is set inside a chunk, I call a function to update the dirty rect giving the placed particles position as an argument. From there, I can easily calculate the new min/max of the rectangle from this position.
Here's a gif of that working.
and here's the code for updating the rect:
public void UpdateDirtyRect(int2 newPos)
{
minX = Math.Min(minX, newPos.x);
minY = Math.Min(minY, newPos.y);
maxX = Math.Max(maxX, newPos.x);
maxY = Math.Max(maxY, newPos.y);
dirtyrect = .(.(minX, minY), .(maxX, maxY));
//Inflate by two pixels. Not doing this will cause the rect to not change size as particles update
dirtyrect=dirtyrect.Inflate(2);
}
The problem, as can be seen in the gif, is that I currently have no way to shrink the dirty rect. I can do a few things, such as detecting when a particle is erased/replaced with air/solid particle on the boundary edge of the dirty rect, but I'm unsure on what to do from there.
Here’s one approach that might work for you.
Keep the dirty rectangle updated by the previous frame.
Compute the dirty rectangle updated by one frame only.
Combine these two rectangles into a single one that contains both of them.
Use the rectangle from step 3 to update the screen.
Replace the previous frame rectangle with the one you have computed on step 2. Not the combined one you computed on step 3, doing so would cause the same problem you’re describing.
I'm developing an audio visualizer using libGDX.
I want to pass the audio spectrum data (an array containing the FFT of the audio sample) to a shader I took from Shadertoy: https://www.shadertoy.com/view/ttfGzH.
In the GLSL code I expect an uniform containing the data as texture:
uniform sampler2D iChannel0;
The problem is that I can't figure out how to pass an arbitrary array as a texture to a shader in libGDX.
I already searched in SO and in libGDX's forum but there isn't a satisfying answer to my problem.
Here is my Kotlin code (that obviously doesn't work xD):
val p = Pixmap(512, 1, Pixmap.Format.Alpha)
val t = Texture(p)
val map = p.pixels
map.putFloat(....) // fill the map with FFT data
[...]
t.bind(0)
shader.setUniformi("iChannel0", 0)
You could simply use the drawPixel method and store your data in the first channel of each pixel just like in the shadertoy example (they use the red channel).
float[] fftData = // your data
Color tmpColor = new Color();
Pixmap pixmap = new Pixmap(fftData.length, 1, Pixmap.Format.RGBA8888);
for(int i = 0; i < fftData.length i++)
{
tmpColor.set(fftData[i], 0, 0, 0); // using only 1 channel per pixel
pixmap.drawPixel(i, 0, Color.rgba8888(tmpColor));
}
// then create your texture and bind it to the shader
To be more efficient and require 4x less memory (and possibly less samples depending on the shader), you could use 4 channels per pixels by splitting your data accross the r, g, b and a channels. However, this will complexify the shader a bit.
This data being passed in the shader example you provided is not arbitrary though, it has pretty limited precision and ranges between 0 and 1. If you want to increase precision you may want to store the floating point accross multiple channels (although the IEEE recomposition in the shader may be painful) or passing an integer to be scaled down (fixed point). If you need data between -inf and inf you may use sigmoid and anti sigmoig functions, at the cost of highly reducing the precision again. I believe this technique will work for your example though, as they seem to only require values between 0 and 1 and precision is not super important because the result is smoothed.
Intro
I am trying to render squares in DirectX 11 in the most efficient way. Each square has a color (float3) and a position (float3). Typical count of squares is about 5 millions.
I tried 3 ways:
Render raw data
Use geometry shader
Use instanced rendering
Raw data means, that each square is represented as 4 vertices in vertex buffer and two triangles in index buffer.
Geometry shader and instanced rendering mean, that each square has just one vertex in vertex buffer.
My results (on nvidia GTX960M) for 5M squares are:
Geometry shader 22 FPS
Instanced rendering 30 FPS
Raw data rendering 41 FPS
I expected that geometry shader is not the most efficient method. On the other hand I am surprised that Instanced rendering is slower than raw data. Computation in vertex shader is exactly the same. It is just multiplication with transform matrix stored in constant buffer + addition of Shift variable.
Raw data input
struct VSInput{
float3 Position : POSITION0;
float3 Colot : COLOR0;
float2 Shift : TEXCOORD0;// This is xy deviation from square center
};
Instanced rendering input
struct VSInputPerVertex{
float2 Shift : TEXCOORD0;
};
struct VSInputPerInstance{
float3 Position : POSITION0;
float3 Colot : COLOR0;
};
Note
For bigger models (20M squares) is more efficient instanced rendering (evidently because of memory traffic).
Question
Why is instanced rendering slower (in case of 5M squares), than raw data rendering? Is there another efficient way how to accomplish this rendering task? Am I missing something?
Edit
StrcturedBuffer method
One of possible solutions is to use StructuredBuffer as #galop1n suggested (for details see his answer).
My results (on nvidia GTX960M) for 5M squares
StructuredBuffer 48 FPS
Observations
Sometimes I observed that StructuredBuffer method was oscilating between 30 FPS - 55 FPS (accumulated number from 100 frames). It seems to be little unstable. Median is 48 FPS. I did not observe this using previous methods.
Consider balance between draw calls and StructuredBuffer sizes. I reached the fastest behavior, when I used buffers with 1K - 4K points, for smaller models. When I tried to render 5M square model, I had big number of draw calls and it was not efficient (30 FPS). The best behavior I observe with 5M squares was with 16K points per buffer. 32K and 8K points per buffer seemed to be slower settings.
Small vertex count per instance is usually a good way to underused the hardware. I suggest you that variant, it should provide good performance on every vendors.
VSSetShaderResourceViews(0,1,&quadData);
SetPrimitiveTopology(TRIANGLE);
Draw( 6 * quadCount, 0);
In the vertex shader, you have
struct Quad {
float3 pos;
float3 color;
};
StructuredBuffer<Quad> quads : register(t0);
And to rebuild you quads in the vertex shader :
// shift for each vertex
static const float2 shifts[6] = { float2(-1,-1), ..., float2(1,1) };
void main( uint vtx : SV_VertexID, out YourStuff yourStuff) {
Quad quad = quads[vtx/6];
float2 offs = shifts[vtx%6];
}
Then rebuild the vertex and transform as usual. You have to note, because you bypass the input assembly stage, if you want to send colors as rgba8, you need to use a uint and unpack yourself manually. The bandwidth usage will lower if you have millions of quads to draw.
I am developing an iOS game that will need to render 500-800 particles at a time. I have learned that it is a good idea to batch render many sprites in OpenGL ES instead of calling glDrawArrays(..) on every sprite in the game in order to be able to render more sprites w/out a drastic reduction in frame rate.
My question is: how do I batch render 500+ particles that all have different alphas, rotations, and scales, but share the same texture atlas? The emphasis of this question is on the different alphas, rotations, and scales for each particle.
I realize this question is very similar to How do I draw 1000+ particles (w/ unique rotation, scale, and alpha) in iPhone OpenGL ES particle system without slowing down the game?, however, that question does not address batch rendering. Before I take advantage of vertex buffer objects, I want to understand batch rendering in OpenGL ES w/ unique alphas, rotations, and scales (but with the same texture). Therefore, while I plan on using VBOs eventually, I want to take this approach first.
Code examples would greatly be appreciated, and if you use an indices array as some examples do, please explain the structure and purpose of the indices array.
EDIT I am using OpenGL ES 1.1.
EDIT Below is a code example of how I render each particle in the scene. Assume that they share the same texture and that texture is already bound in OpenGL ES 1.1 before this code executes.
- (void) render {
glPushMatrix();
glTranslatef(translation.x, translation.y, translation.z);
glRotatef(rotation.x, 1, 0, 0);
glRotatef(rotation.y, 0, 1, 0);
glRotatef(rotation.z, 0, 0, 1);
glScalef(scale.x, scale.y, scale.z);
// change alpha
glColor4f(1.0, 1.0, 1.0, alpha);
// glBindTexture(GL_TEXTURE_2D, texture[0]);
glVertexPointer(2, GL_FLOAT, 0, texturedQuad.vertices);
glEnableClientState(GL_VERTEX_ARRAY);
glTexCoordPointer(2, GL_FLOAT, 0, texturedQuad.textureCoords);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glPopMatrix();
}
A code alternative to this method would be greatly appreciated!
One possibility would be to include those values in a vertex attrib array - I think this is the best option. If you're using OpenGL ES 1.1 instead of 2.0 you're screwed out of this method. Vertex attrib arrays allow you to store values at each vertex in this case you could store the alphas and rotations each in their own attrib array and pass them to the shader with glVertexAttribArray. The shader would then do the rotation transformation and color processing with the alpha.
The other option would be to do rotation transformation on the CPU, and then batch particles with a similar alpha values into several draw calls. This version would require a little bit more work and it would not be a single draw call but it would still help to optimize if the shader is not an option.
NOTE: The question you linked to also recommends the array solution
EDIT: Given your code here is an OpenGL ES 1.0, here's a solution using glColorPointer:
// allocate buffers to store an array of all particle data
verticesBuffer = ...
texCoordBuffer = ...
colorBuffer = ...
for (particle in allParticles)
{
// Create matrix from rotation
rotMatrix = matrix(particle.rotation.x, particle.rotation.y, particle.rotation.z)
// Transform particle by matrix
verticesBuffer[i] = particle.vertices * rotMatrix
// copy other data
texCoordBuffer[i] = particle.texCoords;
colorBuffer[i] = color(1.0, 1.0, 1.0, particle.alpha);
}
glVertexPointer(verticesBuffer, ...)
glTexCoordPointer(texCoodBuffer, ...)
glColorPointer(colorBuffer, ...)
glDrawArrays(particleCount * 4, ...);
A good optimization for this solution would be to share the buffers for each render so you don't have to reallocate them every frame.
I am new to GLUT and opengl. I need to draw a scatterplot matrix for n dimensional array.
I have saved the data from csv to a vector of vectors and each vector corresponds to a row. I have plotted just one scatterplot. And used GL_LINES to draw the grid. My questions
1. How do I draw points in a particular grid? Using GL_POINTS I can only draw points in the entire window.
Please let me know need any further info to answer this question
Thanks
What you need to do is be able to transform your data's (x,y) coordinates into screen coordinates. The most straightforward way to do it actually does not rely on OpenGL or GLUT. All you have to do is use a little math. Determine the screen (x,y) coordinates of the place where you want a datapoint for (0,0) to be on the screen, and then determine how far apart you want one increment to be on the screen. Simply take your original data points, apply the offset, and then scale them, to get your screen coordinates, which you then pass into glVertex2f() (or whatever function you are using to specify points in your API).
For instance, you might decide you want point (0,0) in your data to be at location (200,0) on your screen, and the distance between 0 and 1 in your data to be 30 pixels on the screen. This operation will look like this:
int x = 0, y = 0; //Original data points
int scaleX = 30, scaleY = 30; //Scaling values for each component
int offsetX = 100, offsetY = 100; //Where you want the origin of your graph to be
// Apply the scaling values and offsets:
int screenX = x * scaleX + offsetX;
int screenY = y * scaleY + offsetY;
// Calls to your drawing functions using screenX and screenY as your coordinates
You will have to determine values that make sense for the scalaing and offsets. You can also have your program use different values for different sets of data, so you can display multiple graphs on the same screen. But this is a simple way to do it.
There are also other ways you can go about this. OpenGL has very powerful coordinate transformation functions and matrix math capabilities. Those may become more useful when you develop increasingly elaborate programs. They're most useful if you're going to be moving things around the screen in real-time, or operating on incredibly large data sets, as they allow you to perform these mathematical calculations very quickly using your graphics hardware (which is able to do them much faster than the CPU). However, the time it takes for the CPU to do simple calculations like those where you only are going to do them once or very infrequently on limited sets of data is not a problem for computers today.