Cast MTLTexture from depth32Float to bgra8UNorm - metalkit

I would like to process information from depth buffer using Metal Performance Shaders e.g. using gauss or sobel shaders.
I run into problems when using MTLTexture with depth32Float pixel format. MPSImageGaussianBlur or any other performance shader isn't accepting it as source texture.
I tried to convert it using: depthBufferTexture.makeTextureView(pixelFormat: .bgra8Unorm) but got error saying:
validateArgumentsForTextureViewOnDevice:1406: failed assertion source texture pixelFormat (MTLPixelFormatDepth32Float) not castable.
Is there any way how to convert depth32Float to bgra8UNorm or any other pixel format?

Converting from depth32Float to bgra8UNorm, in my opinion, does not make much sense, they have different dimensions and number of channels. In your case, the best solution would be using MTLPixelFormatR32Float.
To convert from depth32Float to MTLPixelFormatR32Float use MTLComputeCommandEncoder.


Vertex buffer with vertices of different formats

I want to draw a model that's composed of multiple meshes, where each mesh has different vertex formats. Is it possible to put all the various vertices within the same vertex buffer, and to point to the correct offset at vkCmdBindVertexBuffers time?
Or must all vertices within a buffer have the same format, thus necessitating multiple vbufs for such a model?
Looking at the manual for vkCmdBindVertexBuffers, it's not clear whether the offset is in bytes or in vertex-strides.
Your question really breaks down into 3 questions
Does the pOffsets parameter for vkCmdBindVertexBuffers accept bytes or vertex strides?
Can I put more than one vertex format into a vertex buffer?
Should I put more than one vertex format into a vertex buffer?
The short version is
Probably not
Does the offsets parameter for vkCmdBindVertexBuffers accept bytes or vertex strides?
The function signature is
void vkCmdBindVertexBuffers(
VkCommandBuffer commandBuffer,
uint32_t firstBinding,
uint32_t bindingCount,
const VkBuffer* pBuffers,
const VkDeviceSize* pOffsets);
Note the VkDeviceSize type for pOffsets. This unambiguously means "bytes", not strides. Any VkDeviceSize means an offset or size in raw memory. Vertex Strides aren't raw memory, they're simply a count, so the type would have to be a uint32_t or uint64_t.
Furthermore there's nothing in that function signature that specifies the vertex format so there would be no way to convert the vertex stride count to actual memory sizes. Remember that unlike OpenGL, Vulkan is not a state machine, so this function doesn't have any "memory" of a rendering pipeline that you might have previously bound.
Can I put more than one vertex format into a vertex buffer?
As a consequence of the above answer, yes. You can put pretty much whatever you want into a vertex buffer, although I believe some hardware will have alignment restrictions on what are valid offsets for vertex buffers, so make sure you check that.
Should I put more than one vertex format into a vertex buffer?
Generally speaking you want to render your scene in as few draw calls as possible, and having lots of arbitrary vertex formats runs counter to that. I would argue that if possible, the only time you want to change vertex formats is when you're switching to a different rendering pass, such as when switching between rendering opaque items to rendering transparent ones.
Instead you should try to make format normalization part of your asset pipeline, taking your source assets and converting them to a single consistent format. If that's not possible, then you could consider doing the normalization at load time. This adds complexity to the loading code, but should drastically reduce the complexity of the rendering code, since you now only have to think in terms of a single vertex format.

Surface format is B8G8R8A8_UNORM, but vkCmdClearColorImage takes float?

I use vkGetPhysicalDeviceSurfaceFormatsKHR to get supported image formats for the swapchain, and (on Linux+Nvidia, using SDL) I get VK_FORMAT_B8G8R8A8_UNORM as the first option and I go ahead and create the swapchain with that format:
VkSwapchainCreateInfoKHR swapchain_info = {
.imageFormat = format, /* taken from vkGetPhysicalDeviceSurfaceFormatsKHR */
So far, it all makes sense. The image format used to draw on the screen is the usual 8-bits-per-channel BGRA.
As part of my learning process, I have so far arrived at setting up a lot of stuff but not yet the graphics pipeline1. So I am trying the only command I can use that doesn't need a pipeline: vkCmdClearColorImage2.
The VkClearColorValue used to define the clear color can take the color as float, uint32_t or int32_t, depending on the format of the image. I would have expected, based on the image format given to the swapchain, that I should give it uint32_t values, but that doesn't seem to be correct. I know because the screen color didn't change. I tried giving it floats and it works.
My question is, why does the clear color need to be specified in floats when the image format is VK_FORMAT_B8G8R8A8_UNORM?
1 Actually I have, but thought I would try out the simpler case of no pipeline first. I'm trying to incrementally use Vulkan (given its verbosity) particularly because I'm also writing tutorials on it as I learn.
2 Actually, it technically doesn't need a render pass, but I figured hey, I'm not using any pipeline stuff here, so let's try it without a pipeline and it worked.
My rendering loop is essentially the following:
acquire image from swapchain
create a command buffer with the following:
transition from VK_IMAGE_LAYOUT_UNDEFINED to VK_IMAGE_LAYOUT_GENERAL (because I'm clearing the image outside a render pass)
clear the image
submit command buffer to queue (taking care of synchronization with swapchain with semaphores)
submit for presentation
My question is, why does the clear color need to be specified in floats when the image format is VK_FORMAT_B8G8R8A8_UNORM?
Because the normalized, scaled, or sRGB image formats are really just various forms of floating-point compression. A normalized integer is a way of storing floating-point values on the range [0, 1] or [-1, 1], but using a much smaller amount of data than even a 16-bit float. A scaled integer is a way of storing floating point values on the range [0, MAX] or [-MIN, MAX]. And sRGB is just a compressed way of storing linear color values on the range [0, 1], but in a gamma-corrected color space that puts precision in different places than the linear color values would suggest.
You see the same things with inputs to the vertex shader. A vec4 input type can be fed by normalized formats just as well as by floating-point formats.

Explain how premultiplied alpha works

Can somebody please explain why rendering with premultiplied alpha (and corrected blending function) looks differently than "normal" alpha when, mathematically speaking, those are the same?
I've looked into this post for understanding of premultiplied alpha:
The author also said that the end computation is the same:
"Look at the blend equations for conventional vs. premultiplied alpha. If you substitute this color format conversion into the premultiplied blend function, you get the conventional blend function, so either way produces the same end result. The difference is that premultiplied alpha applies the (source.rgb * source.a) computation as a preprocess rather than inside the blending hardware."
Am I missing something? Why is the result different then?
The difference is in filtering.
Imagine that you have a texture with just two pixels and you are sampling it exactly in the middle between the two pixels. Also assume linear filtering.
R|G|B|A + R|G|B|A = R|G|B|A
1|0|0|1 + 0|1|0|0 = 0.5|0.5|0|0.5
1|0|0|1 + 0|0|0|0 = 0.5|0|0|0.5
Notice the difference in green channel.
Filtering premultiplied alpha produces correct results.
Note that all this has nothing to do with blending.
This is a guess, because there is not enough information yet to figure it out.
It should be the same. One common way of getting a different value is to use a different Gamma correction method between the premultiply and the rendering step.
I am going to guess that one of your stages, either the blending, or the premultiplying stage is being done with a different gamma value. If you generate your premultiplied textures with a tool like DirectXTex texconv and use the default srgb option for premultiplying alpha, then your sampler needs to be an _SRGB format and your render target should be _SRGB as well. If you are treating them linearly then you may not be able to render to an _SRGB target or sample the texture with gamma correction, even if you are doing the premultiply in the same shader that samples (depending on 3D API and render target setup differences). Doing so will cause the alpha to be significantly different between the two methods in the midtones.
See: The Importance of Being Linear.
If you are generating the alpha in Photoshop then you should know a couple things. Photoshop does not save alpha in linear OR sRGB format. It saves it as a Gamma value about half way between linear and sRGB. If you premultiply in Photoshop it will compute the premultiply correctly but save the result with the wrong ramp. If you generate a normal alpha then sample it as sRGB or LINEAR in your 3d API it will be close but will not match the values Photoshop shows in either case.
For a more in depth reply the information we would need would be.
What 3d API are you using.
How are your textures generated and sampled
When and how are you premultiplying the alpha.
and preferably a code or shader example that shows the error.
I was researching why one would use Pre vs non-Pre and found this interesting info from Nvidia
It seems that their specific case has more precision when using Pre, over Post-Alpha.
I also read (I believe on here but cannot find it), that doing pre-alpha (which is multiplying Alpha to each RGB value), you will save time. I still need to find out if that's true or not, but there seems to be a reason why pre-alpha is preferred.

UIImage to YUV 422 Colorspace

I have been trying to figure this out for a while to no avail, I was wondering if someone could help or point me in he right direction.
I have a need to convert an UIImage or a stored JPG to get its YUV422 data so I can then apply some image enhancements, and with the result convert it back to either a JPG or UIImage.
I'm a bit stuck at the moment, I this point I am just trying to get it to YUV422.
Any help would be greatly appreciated.
Thanks in advance.
You must first read the JPEG markers to determine the meta data. The meta data such as the size, the sample rate (usually 4:2:2 but not always ), the quantization tables, and the huffman tables.
You must then de-huffman-code the entropy encoded data segment. This will give you DC coefficient followed by any AC coefficients for the color channel for each channel in zig zag form. you must then de zigzag the entries and multiply it by the corresponding quantization table. Finally you must preform the Inverse Discrete Cosine Transformation on the decoded macroblock.
This will then give you 3 channels in YCrCb (YUV is for analog) at the sample rate the JPEG was encoded at. If you need it to be 4:2:2 you will have to resample.
Hopefully you have a library to do the actual JPEG decoding since writing one that is compliant is a non trivial task.
Here is a very basic and flawed JPEG decoder I started writing to give you more technical details. Ruby JPEG decoder It does not successfully implement the IDCT
For a correct implementation in C I suggest IJG

Texture format for cellular automata in OpenGL ES 2.0

I need some quick advice.
I would like to simulate a cellular automata (from A Simple, Efficient Method
for Realistic Animation of Clouds) on the GPU. However, I am limited to OpenGL ES 2.0 shaders (in WebGL) which does not support any bitwise operations.
Since every cell in this cellular automata represents a boolean value, storing 1 bit per cell would have been the ideal. So what is the most efficient way of representing this data in OpenGL's texture formats? Are there any tricks or should I just stick with a straight-forward RGBA texture?
EDIT: Here's my thoughts so far...
At the moment I'm thinking of going with either plain GL_RGBA8, GL_RGBA4 or GL_RGB5_A1:
Possibly I could pick GL_RGBA8, and try to extract the original bits using floating point ops. E.g. x*255.0 gives an approximate integer value. However, extracting the individual bits is a bit of a pain (i.e. dividing by 2 and rounding a couple times). Also I'm wary of precision problems.
If I pick GL_RGBA4, I could store 1.0 or 0.0 per component, but then I could probably also try the same trick as before with GL_RGBA8. In this case, it's only x*15.0. Not sure if it would be faster or not seeing as there should be fewer ops to extract the bits but less information per texture read.
Using GL_RGB5_A1 I could try and see if I can pack my cells together with some additional information like a color per voxel where the alpha channel stores the 1 bit cell state.
Create a second texture and use it as a lookup table. In each 256x256 block of the texture you can represent one boolean operation where the inputs are represented by the row/column and the output is the texture value. Actually in each RGBA texture you can represent four boolean operations per 256x256 region. Beware texture compression and MIP maps, though!