Seam Carving – Accessing pixel data in cocoa - objective-c

I want to implement the seam carving algorithm by Avidan/Shamir. After the energy computing stage which can be implemented using a core image filter, I need to compute the seams with the lowest energy which can't be implemented as a core image filter for it uses dynamic programming (and you don't have access to previous computations in opengl shading language).
So i need a way to access the pixel data of an image efficiently in objective-c cocoa.
Pseudo code omitting boundary checks:
for y in 0..lines(image) do:
for x in 0..columns(image) do:
output[x][y] = value(image, x, y) +
min{ output[x-1][y-1]; output[x][y-1]; output[x+1][y-1] }

The best way to get access to the pixel values for an image, is to create a CGBitmapContextRef with CGBitmapContextCreate. The important part about this is that when you create the context, you get to pass the pointer in that will be used as the backing store for the bitmap's data. Meaning that data will hold the pixel values and you can do what ever you want with them.
So the steps should be:
Allocate a buffer with malloc or another suitable allocator.
Pass that buffer as the first parameter to CGBitmapContextCreate.
Draw your image into the returned CGBitmapContextRef.
Release the context.
Now you have your original data pointer that is filled with pixels in the format specified in the call to CGBitmapContextCreate.

Related

Vertex buffer with vertices of different formats

I want to draw a model that's composed of multiple meshes, where each mesh has different vertex formats. Is it possible to put all the various vertices within the same vertex buffer, and to point to the correct offset at vkCmdBindVertexBuffers time?
Or must all vertices within a buffer have the same format, thus necessitating multiple vbufs for such a model?
Looking at the manual for vkCmdBindVertexBuffers, it's not clear whether the offset is in bytes or in vertex-strides.
https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdBindVertexBuffers.html
Your question really breaks down into 3 questions
Does the pOffsets parameter for vkCmdBindVertexBuffers accept bytes or vertex strides?
Can I put more than one vertex format into a vertex buffer?
Should I put more than one vertex format into a vertex buffer?
The short version is
Bytes
Yes
Probably not
Does the offsets parameter for vkCmdBindVertexBuffers accept bytes or vertex strides?
The function signature is
void vkCmdBindVertexBuffers(
VkCommandBuffer commandBuffer,
uint32_t firstBinding,
uint32_t bindingCount,
const VkBuffer* pBuffers,
const VkDeviceSize* pOffsets);
Note the VkDeviceSize type for pOffsets. This unambiguously means "bytes", not strides. Any VkDeviceSize means an offset or size in raw memory. Vertex Strides aren't raw memory, they're simply a count, so the type would have to be a uint32_t or uint64_t.
Furthermore there's nothing in that function signature that specifies the vertex format so there would be no way to convert the vertex stride count to actual memory sizes. Remember that unlike OpenGL, Vulkan is not a state machine, so this function doesn't have any "memory" of a rendering pipeline that you might have previously bound.
Can I put more than one vertex format into a vertex buffer?
As a consequence of the above answer, yes. You can put pretty much whatever you want into a vertex buffer, although I believe some hardware will have alignment restrictions on what are valid offsets for vertex buffers, so make sure you check that.
Should I put more than one vertex format into a vertex buffer?
Generally speaking you want to render your scene in as few draw calls as possible, and having lots of arbitrary vertex formats runs counter to that. I would argue that if possible, the only time you want to change vertex formats is when you're switching to a different rendering pass, such as when switching between rendering opaque items to rendering transparent ones.
Instead you should try to make format normalization part of your asset pipeline, taking your source assets and converting them to a single consistent format. If that's not possible, then you could consider doing the normalization at load time. This adds complexity to the loading code, but should drastically reduce the complexity of the rendering code, since you now only have to think in terms of a single vertex format.

DirectX 11 What is a Fragment?

I have been learning DirectX 11, and in the book I am reading, it states that the Rasterizer outputs Fragments. It is my understanding, that these Fragments are the output of the Rasterizer(which inputs geometric primitives), and in-fact are just 2D Positions(your 2D Render Target View)
Here is what I think I understand, please correct me.
The Rasterizer takes Geometric Primitives(spheres, cubes or boxes, toroids
cylinders, pyramids, triangle meshes or polygon meshes) (https://en.wikipedia.org/wiki/Geometric_primitive). It then translates these primitives into pixels(or dots) that are mapped to your Render Target View(that is 2D). This is what a Fragment is. For each Fragment, it executes the Pixel Shader, to determine its color.
However, I am only assuming because there is no simple explanation of what it is (That I can find).
So my questions are ...
1: What is a Rasterizer? What are the inputs, and what is the output?
2: What is a fragment, in relation to Rasterizer output.
3: Why is a fragment a float 4 value (SV_Position)? If it just 2D Screen Space for the Render Target View?
4: How does it correlate to the Render Target Output (the 2D Screen Texture)?
5: Is this why we clear the Render Target View(to whatever color) because the Razterizer, and Pixel Shader will not execute on all X,Y locations of the Render Target View?
Thank you!
I do not use DirectXI but OpenGL instead but the terminology should bi similar if not the same. My understanding is this:
(scene geometry) -> [Vertex shader] -> (per vertex data)
(per vertex data) -> [Geometry&Teseletaion shader] -> (per primitive data)
(per primitive data) -> [rasterizer] -> (per fragment data)
(per fragment data) -> [Fragment shader] -> (fragment)
(fragment) -> [depth/stencil/alpha/blend...]-> (pixels)
So in Vertex shader you can perform any per vertex operations like transform of coordinate systems, pre-computation of needed parameters etc.
In geometry and teselation you can compute normals from geometry, emit/convert primitives and much much more.
The Rasterizer then convert geometry (primitive) into fragments. This is done by interpolation. It basically divide the viewed part of any primitive into fragments see convex polygon rasterizer.
Fragments are not pixels nor super pixels but they are close to it. The difference is that they may or may not be outputted depending on the circumstances and pipeline configuration (Pixels are visible outputs). You can think of them as a possible super-pixels.
Fragment shader convert per fragment data into final fragments. Here you are computing per fragment/pixel lighting,shading, doing all the texture stuff, compute colors etc. The output is also fragment which is basically pixel + some additional info so it does not have just position and color but can have other properties as well (like more colors, depth, alpha, stencil, etc).
This goes into final combiner which provides the depth test and any other enabled tests or functionality like Blending. And only that output goes into framebuffer as pixel.
I think that answered #1,#2,#4.
Now #3 (I may be wrong here due to my lack of knowledge about DirectX) in per fragment data you often need 3D position of fragments for proper lighting or what ever computations and as homogenuous coordinates are used we need 4D (x,y,z,w) vector for it. The fragment itself has 2D coordinates but the 3D position is its interpolated value from geometry passed from Vertex shader. So it may not contain the screen position but world coordinates instead (or any other).
#5 Yes the scene may not cover whole screen and or you need to preset the buffers like Depth, Stencil, Alpha so the rendering works as should and is not invalidated by previous frame results. So we need to clear framebuffers usually at start of frame. Some techniques require multiple clearings per frame others (like glow effect) clears once per multiple frames ...

Surface format is B8G8R8A8_UNORM, but vkCmdClearColorImage takes float?

I use vkGetPhysicalDeviceSurfaceFormatsKHR to get supported image formats for the swapchain, and (on Linux+Nvidia, using SDL) I get VK_FORMAT_B8G8R8A8_UNORM as the first option and I go ahead and create the swapchain with that format:
VkSwapchainCreateInfoKHR swapchain_info = {
...
.imageFormat = format, /* taken from vkGetPhysicalDeviceSurfaceFormatsKHR */
...
};
So far, it all makes sense. The image format used to draw on the screen is the usual 8-bits-per-channel BGRA.
As part of my learning process, I have so far arrived at setting up a lot of stuff but not yet the graphics pipeline1. So I am trying the only command I can use that doesn't need a pipeline: vkCmdClearColorImage2.
The VkClearColorValue used to define the clear color can take the color as float, uint32_t or int32_t, depending on the format of the image. I would have expected, based on the image format given to the swapchain, that I should give it uint32_t values, but that doesn't seem to be correct. I know because the screen color didn't change. I tried giving it floats and it works.
My question is, why does the clear color need to be specified in floats when the image format is VK_FORMAT_B8G8R8A8_UNORM?
1 Actually I have, but thought I would try out the simpler case of no pipeline first. I'm trying to incrementally use Vulkan (given its verbosity) particularly because I'm also writing tutorials on it as I learn.
2 Actually, it technically doesn't need a render pass, but I figured hey, I'm not using any pipeline stuff here, so let's try it without a pipeline and it worked.
My rendering loop is essentially the following:
acquire image from swapchain
create a command buffer with the following:
transition from VK_IMAGE_LAYOUT_UNDEFINED to VK_IMAGE_LAYOUT_GENERAL (because I'm clearing the image outside a render pass)
clear the image
transition from VK_IMAGE_LAYOUT_GENERAL to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
submit command buffer to queue (taking care of synchronization with swapchain with semaphores)
submit for presentation
My question is, why does the clear color need to be specified in floats when the image format is VK_FORMAT_B8G8R8A8_UNORM?
Because the normalized, scaled, or sRGB image formats are really just various forms of floating-point compression. A normalized integer is a way of storing floating-point values on the range [0, 1] or [-1, 1], but using a much smaller amount of data than even a 16-bit float. A scaled integer is a way of storing floating point values on the range [0, MAX] or [-MIN, MAX]. And sRGB is just a compressed way of storing linear color values on the range [0, 1], but in a gamma-corrected color space that puts precision in different places than the linear color values would suggest.
You see the same things with inputs to the vertex shader. A vec4 input type can be fed by normalized formats just as well as by floating-point formats.

Comparing Kinect depth to OpenGL depth efficiently

Background:
This problem is related with 3D tracking of object.
My system projects object/samples from known parameters (X, Y, Z) to OpenGL and
try to match with image and depth informations obtained from Kinect sensor to infer the object's 3D position.
Problem:
Kinect depth->process-> value in millimeters
OpenGL->depth buffer-> value between 0-1 (which is nonlinearly mapped between near and far)
Though I could recover Z value from OpenGL using method mentioned on http://www.songho.ca/opengl/gl_projectionmatrix.html but this will yield very slow performance.
I am sure this is the common problem, so I hope there must be some cleaver solution exist.
Question:
Efficient way to recover eye Z coordinate from OpenGL?
Or is there any other way around to solve above problem?
Now my problem is Kinect depth is in mm
No, it is not. Kinect reports it's depth as a value in a 11 bit range of arbitrary units. Only after some calibration has been applied, the depth value can be interpreted as a physical unit. You're right insofar, that OpenGL perspective projection depth values are nonlinear.
So if I understand you correctly, you want to emulatea Kinect by retrieving the content of the depth buffer, right? Then the most easy solution was using a combination of vertex and fragment shader, in which the vertex shader passes the linear depth as an additional varying to the fragment shader, and the fragment shader then overwrites the fragment's depth value with the passed value. (You could also use an additional render target for this).
Another method was using a 1D texture, projected into the depth range of the scene, where the texture values encode the depth value. Then the desired value would be in the color buffer.

Texture format for cellular automata in OpenGL ES 2.0

I need some quick advice.
I would like to simulate a cellular automata (from A Simple, Efficient Method
for Realistic Animation of Clouds) on the GPU. However, I am limited to OpenGL ES 2.0 shaders (in WebGL) which does not support any bitwise operations.
Since every cell in this cellular automata represents a boolean value, storing 1 bit per cell would have been the ideal. So what is the most efficient way of representing this data in OpenGL's texture formats? Are there any tricks or should I just stick with a straight-forward RGBA texture?
EDIT: Here's my thoughts so far...
At the moment I'm thinking of going with either plain GL_RGBA8, GL_RGBA4 or GL_RGB5_A1:
Possibly I could pick GL_RGBA8, and try to extract the original bits using floating point ops. E.g. x*255.0 gives an approximate integer value. However, extracting the individual bits is a bit of a pain (i.e. dividing by 2 and rounding a couple times). Also I'm wary of precision problems.
If I pick GL_RGBA4, I could store 1.0 or 0.0 per component, but then I could probably also try the same trick as before with GL_RGBA8. In this case, it's only x*15.0. Not sure if it would be faster or not seeing as there should be fewer ops to extract the bits but less information per texture read.
Using GL_RGB5_A1 I could try and see if I can pack my cells together with some additional information like a color per voxel where the alpha channel stores the 1 bit cell state.
Create a second texture and use it as a lookup table. In each 256x256 block of the texture you can represent one boolean operation where the inputs are represented by the row/column and the output is the texture value. Actually in each RGBA texture you can represent four boolean operations per 256x256 region. Beware texture compression and MIP maps, though!