Intro
I am trying to render squares in DirectX 11 in the most efficient way. Each square has a color (float3) and a position (float3). Typical count of squares is about 5 millions.
I tried 3 ways:
Render raw data
Use geometry shader
Use instanced rendering
Raw data means, that each square is represented as 4 vertices in vertex buffer and two triangles in index buffer.
Geometry shader and instanced rendering mean, that each square has just one vertex in vertex buffer.
My results (on nvidia GTX960M) for 5M squares are:
Geometry shader 22 FPS
Instanced rendering 30 FPS
Raw data rendering 41 FPS
I expected that geometry shader is not the most efficient method. On the other hand I am surprised that Instanced rendering is slower than raw data. Computation in vertex shader is exactly the same. It is just multiplication with transform matrix stored in constant buffer + addition of Shift variable.
Raw data input
struct VSInput{
float3 Position : POSITION0;
float3 Colot : COLOR0;
float2 Shift : TEXCOORD0;// This is xy deviation from square center
};
Instanced rendering input
struct VSInputPerVertex{
float2 Shift : TEXCOORD0;
};
struct VSInputPerInstance{
float3 Position : POSITION0;
float3 Colot : COLOR0;
};
Note
For bigger models (20M squares) is more efficient instanced rendering (evidently because of memory traffic).
Question
Why is instanced rendering slower (in case of 5M squares), than raw data rendering? Is there another efficient way how to accomplish this rendering task? Am I missing something?
Edit
StrcturedBuffer method
One of possible solutions is to use StructuredBuffer as #galop1n suggested (for details see his answer).
My results (on nvidia GTX960M) for 5M squares
StructuredBuffer 48 FPS
Observations
Sometimes I observed that StructuredBuffer method was oscilating between 30 FPS - 55 FPS (accumulated number from 100 frames). It seems to be little unstable. Median is 48 FPS. I did not observe this using previous methods.
Consider balance between draw calls and StructuredBuffer sizes. I reached the fastest behavior, when I used buffers with 1K - 4K points, for smaller models. When I tried to render 5M square model, I had big number of draw calls and it was not efficient (30 FPS). The best behavior I observe with 5M squares was with 16K points per buffer. 32K and 8K points per buffer seemed to be slower settings.
Small vertex count per instance is usually a good way to underused the hardware. I suggest you that variant, it should provide good performance on every vendors.
VSSetShaderResourceViews(0,1,&quadData);
SetPrimitiveTopology(TRIANGLE);
Draw( 6 * quadCount, 0);
In the vertex shader, you have
struct Quad {
float3 pos;
float3 color;
};
StructuredBuffer<Quad> quads : register(t0);
And to rebuild you quads in the vertex shader :
// shift for each vertex
static const float2 shifts[6] = { float2(-1,-1), ..., float2(1,1) };
void main( uint vtx : SV_VertexID, out YourStuff yourStuff) {
Quad quad = quads[vtx/6];
float2 offs = shifts[vtx%6];
}
Then rebuild the vertex and transform as usual. You have to note, because you bypass the input assembly stage, if you want to send colors as rgba8, you need to use a uint and unpack yourself manually. The bandwidth usage will lower if you have millions of quads to draw.
Related
I'm developing an audio visualizer using libGDX.
I want to pass the audio spectrum data (an array containing the FFT of the audio sample) to a shader I took from Shadertoy: https://www.shadertoy.com/view/ttfGzH.
In the GLSL code I expect an uniform containing the data as texture:
uniform sampler2D iChannel0;
The problem is that I can't figure out how to pass an arbitrary array as a texture to a shader in libGDX.
I already searched in SO and in libGDX's forum but there isn't a satisfying answer to my problem.
Here is my Kotlin code (that obviously doesn't work xD):
val p = Pixmap(512, 1, Pixmap.Format.Alpha)
val t = Texture(p)
val map = p.pixels
map.putFloat(....) // fill the map with FFT data
[...]
t.bind(0)
shader.setUniformi("iChannel0", 0)
You could simply use the drawPixel method and store your data in the first channel of each pixel just like in the shadertoy example (they use the red channel).
float[] fftData = // your data
Color tmpColor = new Color();
Pixmap pixmap = new Pixmap(fftData.length, 1, Pixmap.Format.RGBA8888);
for(int i = 0; i < fftData.length i++)
{
tmpColor.set(fftData[i], 0, 0, 0); // using only 1 channel per pixel
pixmap.drawPixel(i, 0, Color.rgba8888(tmpColor));
}
// then create your texture and bind it to the shader
To be more efficient and require 4x less memory (and possibly less samples depending on the shader), you could use 4 channels per pixels by splitting your data accross the r, g, b and a channels. However, this will complexify the shader a bit.
This data being passed in the shader example you provided is not arbitrary though, it has pretty limited precision and ranges between 0 and 1. If you want to increase precision you may want to store the floating point accross multiple channels (although the IEEE recomposition in the shader may be painful) or passing an integer to be scaled down (fixed point). If you need data between -inf and inf you may use sigmoid and anti sigmoig functions, at the cost of highly reducing the precision again. I believe this technique will work for your example though, as they seem to only require values between 0 and 1 and precision is not super important because the result is smoothed.
I'm learning Vulkan.
So far, in the sample programs that I've done, I've uploaded enough vertices to the GPU for drawing one or two quads. I also uploaded an image to display on the quad. It has all been very static.
I'm now interested in doing some dynamic tests -- specially, creating and modifying sprites on the fly. I'm not sure how to go about it, so I'm hoping to get some pointers about possible techniques. I'm trying to create a toy 2d engine, for learning purposes.
Basically, I'm not sure what's the best way to maintain the vertex data up-to-date on the GPU. Below is the definition for one quad:
struct Vertex2d {
glm::vec3 mPos;
glm::vec2 mCoord;
};
// Vertices for one quad.
const std::vector<fmk::Vertex2d> quadVertices = {
{{-0.5f, -0.5f, 0.0f}, {0.0f, 0.0f}}, // Vert 0: Top left
{{0.5f, -0.5f, 0.0f}, {1.0f, 0.0f}}, // Vert 1: Top Right
{{0.5f, 0.5f, 0.0f}, {1.0f, 1.0f}}, // Vert 2: Bottom Right
{{-0.5f, 0.5f, 0.0f}, {0.0f, 1.0f}}, // Vert 3: Bottom Left
};
const std::vector<uint16_t> quadIndexes = {
0, 1, 2, 2, 3, 0,
};
The vertex data represents the quad's position, rotation, scale, texture coordinates. Any of those properties could, potentially, change every frame. Also, new sprites can, potentially, be created or destroyed every frame.
Any pointers on data structures, techniques, functions, or any other info to managing sprites using Vulkan is appreciated.
EDIT:
I should add that I'm trying to avoid brute-force uploading all the vertices every frame. I'm currently trying to implement a brute-force approach, so that I can compare it with a good solution, once I've learned of one.
If You want to change positions of vertices stored in a buffer, You have two options:
You upload them from the CPU
You calculate them on the GPU
There are no other ways for the data to appear on the GPU. You either transfer it or generate.
The first solution is the one You call a brute-force, but in many situations You cannot avoid it. One way or another You have to transfer data to the GPU so it can use it for rendering purposes. Besides, transfer rates are quite high in today's GPUs. It is possible to transfer several dozen of gigabytes per second.
The second option is more procedural and doesn't require data transfer between CPU and GPU. To do that You either create a formula to calculate positions in a vertex shader on the fly, based on a time or any other parameter (without changing the original values). The second option, similar to transform feedback, is to calculate positions in a compute shader, store them in a buffer and then use that buffer for drawing purposes. Here is an example from a Vulkan Cookbook which does exactly that - draws particles (sprites) whose positions are calculated in a compute shader.
And don't forget that You don't need to transfer all the data with each vertex. To render a quad You just need a single position (center) and potentially a horizontal and vertical scale (size of the quad in each dimension). Offsets, rotations, translations and other operations don't need to be passes with each vertex, but only for the whole quad. So this also limits the size of the data You may need to transfer.
I have been learning DirectX 11, and in the book I am reading, it states that the Rasterizer outputs Fragments. It is my understanding, that these Fragments are the output of the Rasterizer(which inputs geometric primitives), and in-fact are just 2D Positions(your 2D Render Target View)
Here is what I think I understand, please correct me.
The Rasterizer takes Geometric Primitives(spheres, cubes or boxes, toroids
cylinders, pyramids, triangle meshes or polygon meshes) (https://en.wikipedia.org/wiki/Geometric_primitive). It then translates these primitives into pixels(or dots) that are mapped to your Render Target View(that is 2D). This is what a Fragment is. For each Fragment, it executes the Pixel Shader, to determine its color.
However, I am only assuming because there is no simple explanation of what it is (That I can find).
So my questions are ...
1: What is a Rasterizer? What are the inputs, and what is the output?
2: What is a fragment, in relation to Rasterizer output.
3: Why is a fragment a float 4 value (SV_Position)? If it just 2D Screen Space for the Render Target View?
4: How does it correlate to the Render Target Output (the 2D Screen Texture)?
5: Is this why we clear the Render Target View(to whatever color) because the Razterizer, and Pixel Shader will not execute on all X,Y locations of the Render Target View?
Thank you!
I do not use DirectXI but OpenGL instead but the terminology should bi similar if not the same. My understanding is this:
(scene geometry) -> [Vertex shader] -> (per vertex data)
(per vertex data) -> [Geometry&Teseletaion shader] -> (per primitive data)
(per primitive data) -> [rasterizer] -> (per fragment data)
(per fragment data) -> [Fragment shader] -> (fragment)
(fragment) -> [depth/stencil/alpha/blend...]-> (pixels)
So in Vertex shader you can perform any per vertex operations like transform of coordinate systems, pre-computation of needed parameters etc.
In geometry and teselation you can compute normals from geometry, emit/convert primitives and much much more.
The Rasterizer then convert geometry (primitive) into fragments. This is done by interpolation. It basically divide the viewed part of any primitive into fragments see convex polygon rasterizer.
Fragments are not pixels nor super pixels but they are close to it. The difference is that they may or may not be outputted depending on the circumstances and pipeline configuration (Pixels are visible outputs). You can think of them as a possible super-pixels.
Fragment shader convert per fragment data into final fragments. Here you are computing per fragment/pixel lighting,shading, doing all the texture stuff, compute colors etc. The output is also fragment which is basically pixel + some additional info so it does not have just position and color but can have other properties as well (like more colors, depth, alpha, stencil, etc).
This goes into final combiner which provides the depth test and any other enabled tests or functionality like Blending. And only that output goes into framebuffer as pixel.
I think that answered #1,#2,#4.
Now #3 (I may be wrong here due to my lack of knowledge about DirectX) in per fragment data you often need 3D position of fragments for proper lighting or what ever computations and as homogenuous coordinates are used we need 4D (x,y,z,w) vector for it. The fragment itself has 2D coordinates but the 3D position is its interpolated value from geometry passed from Vertex shader. So it may not contain the screen position but world coordinates instead (or any other).
#5 Yes the scene may not cover whole screen and or you need to preset the buffers like Depth, Stencil, Alpha so the rendering works as should and is not invalidated by previous frame results. So we need to clear framebuffers usually at start of frame. Some techniques require multiple clearings per frame others (like glow effect) clears once per multiple frames ...
I've got a bunch of thumbnails/icons packed right up next to each other in a texture map / sprite sheet. From a pixel to pixel relationship, these are being scaled up from being 145 pixels square to 238 screen pixels square. I was expecting to get +-1 or 2 pixel accuracy on the edges of the box when accessing the texture coordinates, so I'm also drawing a 4 pixel outline overtop of the thumbnail to hide this probable artifact. But I'm seeing huge variations in accuracy. Sometimes it's off in one direction, sometimes the other.
I've checked over the math and I can't figure out what's happening.
The the thumbnail is being scaled up about 1.64 times. So a single pixel off in the source texture coordinate could result in around 2 pixels off on the screen. The 4 pixel white frame over top is being drawn at a 1-1 pixel to fragment relationship and is supposed to cover about 2 pixels on either side of the edge of the box. That part is working. Here I've turned off the border to show how far off the texture coordinates are....
I can tweak the numbers manually to make it go away. But I have to shrink the texture coordinate width/height by several source pixels and in some cases add (or subtract) 5 or 6 pixels to the starting point. I really just want the math to work out or to figure out what I'm doing wrong here. This sort of stuff drives me nuts!
A bunch of crap to know.
The shader is doing the texture coordinate offsetting in the vertex shader...
v_fragmentTexCoord0 = vec2((a_vertexTexCoord0.x * u_texScale) + u_texOffset.s, (a_vertexTexCoord0.y * u_texScale) + u_texOffset.t);
gl_Position = u_modelViewProjectionMatrix * vec4(a_vertexPosition,1.0);
This object is a box which is a triangle strip with 2 tris.
Not that it should matter, but matrix applied to the model isn't doing any scaling. The box is to screen scale. The scaling is happening only in the texture coordinates that are being supplied.
The texture coordinates of the object as seen above are 0.00 - 0.07, then in the shader have an addition of an offset amount which is different per thumbnail. .07 out of 2048 is like 143. Originally I had it at .0708 which should be closer to 145 it was worse and showed more like 148 pixels from the texture. To get it to only show 145 source pixels I have to make it .0.06835 which is 140 pixels.
I've tried doing the math in a calculator and typing in the numbers directly. I've also tried doing like =1305/2048. These are going in to GLfloats not doubles.
This texture map image is PNG and is loaded with these settings:
glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_MIN_FILTER,GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER,GL_NEAREST);
glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE );
glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE );
but I've also tried GL_LINEAR with no apparent difference.
I'm not having any accuracy problems on other textures (in the same texture map) where I'm not doing the texture scaling.
It doesn't get farther off as the coords get higher. In the image above the NEG MAP thumb is right next to the HEAT MAP thumb and are off in different directions but correct at the seam.
here's the offset data for those two..
filterTypes[FT_gradientMap20].thumbTexOffsetS = 0.63720703125;
filterTypes[FT_gradientMap20].thumbTexOffsetT = 0.1416015625;
filterTypes[FT_gradientMap21].thumbTexOffsetS = 0.7080078125;
filterTypes[FT_gradientMap21].thumbTexOffsetT = 0.1416015625;
==== UPDATE ====
A couple of things off the bat I realized I was doing wrong and are discussed over here: OpenGL Texture Coordinates in Pixel Space
The width of a single thumbnail is 145. But that would be 0-144, with 145 starting the next one. I was using a width of 145 so that's going to be 1 pixel too big. Using the above center of pixel type math, we should actually go from the center of 0 to the center of 144. 144.5 - 0.5 = 144.
Using his formula of (2i + 1)/(2N) I made new offset amounts for each of the starting points and used the 144/2048 as the width. That made things better but still off in some areas. And again still off in one direction sometimes and the other other times. Although consistent for each x or y position.
Using a width of 143 proves better results. But I can fix them all by just adjusting the numbers manually to work. I want to have the math to make it work out right.
... or.. maybe it has something to do with min/mag filtering - although I read up on that and what I'm doing seems right for this case.
After a lot of experiments and having to create a grid-lined guide texture so I could see exactly how far off each texture was... I finally got it!
It's pretty simple actually.
uniform mat4 u_modelViewProjectionMatrix;
uniform mediump vec2 u_texOffset;
uniform mediump float u_texScale;
attribute vec3 a_vertexPosition;
attribute mediump vec2 a_vertexTexCoord0;
The precision of the texture coordinates. By specifying mediump it just fixed itself. I suspect this also would help solve the problem I was having in this question:
Why is a texture coordinate of 1.0 getting beyond the edge of the texture?
Once I did that, I had to go back to my original 145 width (which still seems wrong but oh well). And for what it's worth I ended up then going back to all my original math on all the texture coordinates. The "center of pixel" method was showing more of the neighboring pixels than the straight /2048 did.
I already asked a question about texture mapping and these two are related (this question).
I'm working with Quartz Composer which appears to be kind specific with textures...
I have a complex polygon that I triangulate in a specific coordinate system (-1 -> 1 on x | -0.75 -> 0.75 on y). I obtain an array of triangles vertices in this coordinate system (triangles 1 to 6 on the left pic).
Then I render each polygon separately (it's necessary for my program), by applying a scale function on its vertices from this coordinate system to OpenGL one (0. -> 1.). Here, even if for 0->1 range it's kind of stupid :
return (((1. - 0.) * (**myVertexXorY** - minTriangleBound)) / (maxTriangleBound - minTriangleBound)) + 0.;
But I want one image to be textured on these triangles (like on the picture above). So I begin by getting the whole polygon bounds (1 on the right pic), then the triangle bounds (2 on the right pic). I scale 1 to the picture coordinates (3 on the right pic) in pixels, then I get the triangle bounds (2) in pixels.
It gives me the bounds to lock my texture in OpenGL with Quartz :
NSRect myBounds = NSMakeRect(originXinPixels, originYinPixels, widthForTheTriangle, heightForTheTriangle);
And I lock my texture
[myImage lockTextureRepresentationWithColorSpace:space forBounds:myBounds];
Then, with OpenGL :
for (int32 i = 0; i < vertexCount; ++i)
{
verts[i] = myTriangle.vertices[i];
texcoord[0] = [self myScaleFunctionFor:XinQuartzCoordinateSystem From:0 To:1]
texcoord[1] = [self myScaleFunctionFor:YinQuartzCoordinateSystem From:0 To:1]
glTexCoord2fv(texcoord);
}
And I obtain what you can see : sometimes parts of the image are fitting, sometimes no (well, in fact with this particular polygon, it doesn't fit at all...).
I'm not really sure if I did understand your question, but:
What hinders you from directly supplying texture coordinates that do match the topology of your source picture? This was far easier than trying to find some per triangle linear mapping that moves the picture in the right way.