I am having some troubles understanding the Image and image view parameter compatibility requirements table in the VkImageViewCreateInfo documentation and the VkImageViewCreateInfo::viewType. The image VkImageViewCreateInfo properties seams to be flexible enough to create, for example, a single 1D or 1D array image view of a 2D image. I tried to create 1D image view out of a 2D image with validation layers enabled and I got no warnings (I don't know exactly which row/column will be used if this is a valid usage).
Is it true to assume that there is one-to-one mapping between the VkImageCreateInfo::imageType + VkImageCreateInfo::arrayLayers in the image and the VkImageViewCreateInfo::viewType in the view, i.e. this VkImageViewType type is there to handle the special case of cube maps, otherwise viewType could've been inferred from the image type? If not, how does the 1D view of 2D image work?
You can't create a 1D view of a 2D image, only the combinations listed in the table are valid.
It looks like the page you're looking at hasn't been regenerated recently, or doesn't include modifications made by the VK_KHR_maintenance1 extension.
Ignoring that extension and cubemaps for now, it's not quite true that there is a 1:1 correspondence between imageType+arrayLayers and viewType. A 2D image with multiple layers can be used with either 2D or 2D_ARRAY view types, and a 2D image with only one layer can still be used with a 2D_ARRAY view type. The view type corresponds to the SPIR-V resource types, and mostly determines how many coordinates are needed to identify a location in the view.
Then there is the cubemap complication, as you observed.
With VK_KHR_maintenance1, you can create 2D and 2D_ARRAY views of a subset of the slices in a 3D image. The extension adds two new rows to the table to describe that case.
Related
I created and merged an images SFrame with an Annotations SFrame.
I have verified that the coordinates of the annotation boxes matches the location of the features measured in Photoshop.
However the models I create are non-functional, so I explored the merged data set with
data['image_with_ground_truth'] =
tc.object_detector.util.draw_bounding_boxes(data['image'], data['annotations'])
and I find that all the annotations are squashed in the top-left corner in Turi Create despite them actually being widely distributed on the source image as in the second image. The annotations list column shows the coordinates get read correctly into TC, but are mapped badly into what the model sees as bounding boxes.
Where should I look to find the scaling problem in Turi Create??
the version of ml-annotate I was using output coordinates with different scale factors for each image in set, some close, some off by as much as 3.3x
I have been trying to locate detected objects from 2D image in the 3D space for a single fixed camera installed at the height.
I went trough the similar questions, but the perspective view is not mentioned.
What I have:
The height of the camera
Calibration parameters
The exact location of one fixed object in view
I've wrote a set of solutions to this kind of problem. 3d points reconstruction from 2d coordinates (yes, "in perspective"), are obtained by means of the extrinsics matrix. See https://github.com/rodolfoap/screen2world-k. Other methods are linked from there.
I have some legacy DX11 code that renders to multiple 3d render targets. Destination target is passed via SV_TARGETxx and the slice is set via SV_RenderTargetArrayIndex in GS. Is there any way to do the same in Vulkan?
My plan is to create individual view for each slice of each 3d target and pass them all together as attachments to a single frame buffer, then in GS I can have something like gl_Layer = sliceNo + targetOffsets[xx]. Is there any better solution?
In Vulkan, the GS SV_RenderTargetArrayIndex is called Layer in SPIR-V or gl_Layer in GLSL. It behaves the same as in D3D. You create one view per 3D target, and attach that to the framebuffer. The Layer output from the GS will say which layer (of all the targets) the output primitive is drawn to.
In Vulkan there's no "true" 3D framebuffer attachments, in the sense that after projection to screen space coordinates everything exists in a 2D plane. So attachment image views can have 2D_ARRAY dimensionality, but not 3D. The Image and image view parameter compatibility requirements table says that given a 3D image, you can create a 2D_ARRAY image view with layerCount >= 1. Note that you have to create the image with the VK_IMAGE_CREATE_2D_ARRAY_COMPATIBLE_BIT flag.
So if you want to have N 3D render target images:
Create your N 3D images, with the VK_IMAGE_CREATE_2D_ARRAY_COMPATIBLE_BIT flag.
Create one image view for each image, with VK_IMAGE_VIEW_TYPE_2D_ARRAY and layerCount equal to the number of slices you want to be able to render to.
Create a VkRenderPass with one VkAttachmentDescription per 3D render target, plus whatever others you need for depth/stencil, resolve target, etc.
Create a VkFrameBuffer based on that VkRenderPass, and pass your image views in the VkFrameBufferCreateInfo::pAttachments array. Set VkFramebufferCreateInfo::layerCount to the number of layers/slices you want to be able to render to.
[Edit: Below paragraph can be ignored based on first comment. Leaving it for transparency.]
I'm confused what you're trying to do with SV_Target[n]. In both D3D and Vulkan, if you've got multiple render targets / color attachments, the fragment shader will write to all of them -- if your fragment shader doesn't provide a value for a bound target, the value written is undefined. So SV_Target[n] is used to tell which shader output variables go to which target, but they don't let you write to some without writing to others. Vulkan works similarly, using output variables gl_FragData[n] in GLSL.
If you're talking about having 1 draw call rendered from multiple points of view (but otherwise using the same pipeline) then you want VK_KHR_multiview. This is an extension in Vulkan 1.0, but core in 1.1.
There's an example of it's usage here and the corresponding shader functionality is here. It functions similar to what you seem to describe. You attach multiple images from a texture array to a single framebuffer ("rendertarget" in D3D) and then in the vertex shader you can determine which layer you're rendering to via the gl_ViewIndex variable. There's no need for a geometry shader with this approach.
When learning to program simple 2D games, each object would have a sprite sheet with little pictures of how a player would look in every frame/animation. 3D models don't seem to work this way or we would need one image for every possible view of the object!
For example, a rotating cube would need a lot images depicting how it would look on every single side. So my question is, how are 3D model "images" represented and rendered by the engine when viewed from arbitrary perspectives?
Multiple methods
There is a number of methods for rendering and storing 3D graphics and models. There are even different methods for rendering 2D graphics! In addition to 2D bitmaps, you also have SVG. SVG uses numbers to define points in an image. These points make shapes. The points can also define curves. This allows you to make images without the need for pixels. The result can be smaller file sizes, in addition to the ability to transform the image (scale and rotate) without causing distortion. Most 3D graphics use a similar technique, except in 3D. What these methods have in common, however, is that they all ultimately render the data to a 2D grid of pixels.
Projection
The most common method for rendering 3D models is projection. All of the shapes to be rendered are broken down into triangles before rendering. Why triangles? Because triangles are guaranteed to be coplanar. That saves a lot of work for the renderer since it doesn't have to worry about "coloring outside of the lines". One drawback to this is that most 3D graphics projection technologies don't support perfect spheres or other round surfaces. You have to use approximations and other tricks to make round surfaces (although there are some renderers which support round surfaces). The next step is to convert or project all of the 3D points into 2D points on the screen (as seen below).
From there, you essentially "color in" the triangles to make everything look solid. While this is pretty fast, another downside is that you can't really have things like reflections and refractions. Anytime you see a refractive or reflective surface in a game, they are only using trickery to make it look like a reflective or refractive material. The same goes for lighting and shading.
Here is an example of special coloring being used to make a sphere approximation look smooth. Notice that you can still see straight lines around the smoothed version:
Ray tracing
You also can render polygons using ray tracing. With this method, you basically trace the paths that the light takes to reach the camera. This allows you to make realistic reflections and refractions. However, I won't go into detail since it is too slow to realistically use in games currently. It is mainly used for 3D animations (like what Pixar makes). Simple scenes with low quality settings can be ray traced pretty quickly. But with complicated, realistic scenes, rendering can take several hours for a single frame (as is the case with Pixar movies). However, it does produce ultra realistic images:
Ray casting
Ray casting is not to be confused with the above-mentioned ray tracing. Ray casting does not trace the light paths. That means that you only have flat surfaces; not reflective. It also does not produce realistic light. However, this can be done relatively quickly, since in most cases you don't even need to cast a ray for every pixel. This is the method that was used for early games such as Doom and Wolfenstein 3D. In early games, ray casting was used for the maps, and the characters and other items were rendered using 2D sprites that were always facing the camera. The sprites were drawn from a few different angles to make them look 3D. Here is an image of Wolfenstein 3D:
Castle Wolfenstein with JavaScript and HTML5 Canvas: Image by Martin Kliehm
Storing the data
3D data can be stored using multiple methods. It is not necessarily dependent on the rendering method that is used. The stored data doesn't mean anything by itself, so you have to render it using one of the methods that have already been mentioned.
Polygons
This is similar to SVG. It is also the most common method for storing model data. You define the geometry using 3D points. These points can have other properties, such as texture data (in the form of UV mapping), color data, and whatever else you might want.
The data can be stored using a number of file formats. A common file format that is used is COLLADA, which is an XML file that stores the 3D data. There are a lot of other formats though. Fundamentally, however, all file formats are still storing the 3D data.
Here is an example of a polygon model:
Voxels
This method is pretty simple. You can think of voxel models like bitmaps, except they are a bunch of bitmaps layered together to make 3D bitmaps. So you have a 3D grid of pixels. One way of rendering voxels is converting the voxel points to 3D cubes. Note that voxels do not have to be rendered as cubes, however. Like pixels, they are only points that may have color data which can be interpreted in different ways. I won't go into much detail since this isn't too common and you generally render the voxels with polygon methods (like when you render them as cubes. Here is an example of a voxel model:
Image by Wikipedia user Vossman
In the 2D world with sprite sheets, you are drawing one of the sprites depending on the state of the actor (visual representation of your object). In the 3D world you are rendering a model for your actor that is a series of polygons with a texture mapped to it. There are standardized model files (I am mostly familiar with Autodesk 3DS Max), in which the model and the assigned textures can be packaged together (a .3DS or .MAX file), providing everything your graphics library needs to render the object and its textures.
In a nutshell, you don't use images for each view of a 3D object, you have a model with a texture rendered on it, creating a dynamic view as it is rendered by the graphics library.
I am using wxWidgets to design a GUI that draws multiple layers with transparency on top of each other.
Therefore I have one method for each layer that draws with wxGraphicsContext onto the "shared" wxImage, which is then plotted to the wxWindow in the paintEvent method.
I have the layer data in arrays exactly of the same dimension as my wxImage and therefore I need to draw/manipulate pixel-wise, of course. Currently I am doing that with the drawRectangle-routine. My guess is that this is quite inefficient.
Is there a clever way to manipulate wxImage's pixel data directly, enabling me to still use transparency of each separate layer in the resulting image? Or is the 1x1 pixel drawing with drawRectangle sufficient?
Thanks for any thoughts on this!
You can efficiently manipulate wxImage pixels by just directly accessing them, they are stored in two contiguous RGB and alpha arrays which you can work with directly.
The problem is usually converting this wxImage to wxBitmap which can be displayed -- this is the expensive operation, and to avoid it raw bitmap access can be used to manipulate wxBitmap directly instead.