Skeletal animation bug with Assimp in DirectX 12 - fbx

I am using Assimp to load an FBX model with animation (created in Blender) into my DirectX 12 game, but I'm experiencing a very frustrating bug with the animation rendered by the game application.
The test model is a simple 'flagpole' containing four bones like so:
Bone0 -> Bone1 -> Bone2 -> Bone3
The model renders correctly in its rest pose when the keyframe animation is bypassed.
The model also renders and animates properly when the animation rotates the model only by the root bone (Bone0).
However, when importing a model that rotates at the first joint (i.e. at Bone1), the vertices clustered around each joint seem 'stuck' in their original positions, while the vertices surrounding the 'bones' proper appear to follow through with the correct animation.
The result is a crappy zigzag of stretched geometry like so:
Instead the model should resemble an 'allen-key' shape at the end of its animation pose, as shown by the same model rendered in the AssimpViewer utility tool:
Since the model is rendering correctly in AssimpViewer, it's reasonable to assume there are no issues with the FBX file exported by Blender. I then checked and confirmed that the vertices 'stuck' around the joints did indeed have their vertex weights correctly assigned by the game loading code.
The C++ model loading and animation code is based on the popular OGLDev tutorial: https://ogldev.org/www/tutorial38/tutorial38.html
Now the infuriating thing is, since the AssimpViewer tool was correctly rendering the model animation, I also copied in the SceneAnimator and AnimEvaluator classes from that tool to generate the final bone transforms via that code branch as well... only to end up with exactly the same zigzag bug in the game!
I'm reasonably confident there aren't any issues with finding the bone hierarchy structure at initialization, so here are the key functions that traverse the hierarchy and interpolate key frames each frame.
VOID Mesh::ReadNodeHeirarchy(FLOAT animationTime, CONST aiNode* pNode, CONST aiAnimation* pAnim, CONST aiMatrix4x4 parentTransform)
{
std::string nodeName(pNode->mName.data);
// nodeTransform is a relative transform to parent node space
aiMatrix4x4 nodeTransform = pNode->mTransformation;
CONST aiNodeAnim* pNodeAnim = FindNodeAnim(pAnim, nodeName);
if (pNodeAnim)
{
// Interpolate scaling and generate scaling transformation matrix
aiVector3D scaling(1.f, 1.f, 1.f);
CalcInterpolatedScaling(scaling, animationTime, pNodeAnim);
// Interpolate rotation and generate rotation transformation matrix
aiQuaternion rotationQ (1.f, 0.f, 0.f, 0.f);
CalcInterpolatedRotation(rotationQ, animationTime, pNodeAnim);
// Interpolate translation and generate translation transformation matrix
aiVector3D translat(0.f, 0.f, 0.f);
CalcInterpolatedPosition(translat, animationTime, pNodeAnim);
// build the SRT transform matrix
nodeTransform = aiMatrix4x4(rotationQ.GetMatrix());
nodeTransform.a1 *= scaling.x; nodeTransform.b1 *= scaling.x; nodeTransform.c1 *= scaling.x;
nodeTransform.a2 *= scaling.y; nodeTransform.b2 *= scaling.y; nodeTransform.c2 *= scaling.y;
nodeTransform.a3 *= scaling.z; nodeTransform.b3 *= scaling.z; nodeTransform.c3 *= scaling.z;
nodeTransform.a4 = translat.x; nodeTransform.b4 = translat.y; nodeTransform.c4 = translat.z;
}
aiMatrix4x4 globalTransform = parentTransform * nodeTransform;
if (m_boneMapping.find(nodeName) != m_boneMapping.end())
{
UINT boneIndex = m_boneMapping[nodeName];
// the global inverse transform returns us to mesh space!!!
m_boneInfo[boneIndex].FinalTransform = m_globalInverseTransform * globalTransform * m_boneInfo[boneIndex].BoneOffset;
//m_boneInfo[boneIndex].FinalTransform = m_boneInfo[boneIndex].BoneOffset * globalTransform * m_globalInverseTransform;
m_shaderTransforms[boneIndex] = aiMatrixToSimpleMatrix(m_boneInfo[boneIndex].FinalTransform);
}
for (UINT i = 0u; i < pNode->mNumChildren; i++)
{
ReadNodeHeirarchy(animationTime, pNode->mChildren[i], pAnim, globalTransform);
}
}
VOID Mesh::CalcInterpolatedRotation(aiQuaternion& out, FLOAT animationTime, CONST aiNodeAnim* pNodeAnim)
{
UINT rotationKeys = pNodeAnim->mNumRotationKeys;
// we need at least two values to interpolate...
if (rotationKeys == 1u)
{
CONST aiQuaternion& key = pNodeAnim->mRotationKeys[0u].mValue;
out = key;
return;
}
UINT rotationIndex = FindRotation(animationTime, pNodeAnim);
UINT nextRotationIndex = (rotationIndex + 1u) % rotationKeys;
assert(nextRotationIndex < rotationKeys);
CONST aiQuatKey& key = pNodeAnim->mRotationKeys[rotationIndex];
CONST aiQuatKey& nextKey = pNodeAnim->mRotationKeys[nextRotationIndex];
FLOAT deltaTime = FLOAT(nextKey.mTime) - FLOAT(key.mTime);
FLOAT factor = (animationTime - FLOAT(key.mTime)) / deltaTime;
assert(factor >= 0.f && factor <= 1.f);
aiQuaternion::Interpolate(out, key.mValue, nextKey.mValue, factor);
}
I've just included the rotation interpolation here, since the scaling and translation functions are identical. For those unaware, Assimp's aiMatrix4x4 type follows a column-vector math convention, so I haven't messed with original matrix multiplication order.
About the only deviation between my code and the two Assimp-based code branches I've adopted is the requirement to convert the final transforms from aiMatrix4x4 types into a DirectXTK SimpleMath Matrix (really an XMMATRIX) with this conversion function:
Matrix Mesh::aiMatrixToSimpleMatrix(CONST aiMatrix4x4 m)
{
return Matrix
(m.a1, m.a2, m.a3, m.a4,
m.b1, m.b2, m.b3, m.b4,
m.c1, m.c2, m.c3, m.c4,
m.d1, m.d2, m.d3, m.d4);
}
Because of the column-vector orientation of aiMatrix4x4 Assimp matrices, the final bone transforms are not transposed for HLSL consumption. The array of final bone transforms are passed to the skinning vertex shader constant buffer as follows.
commandList->SetPipelineState(m_psoForwardSkinned.Get()); // set PSO
// Update vertex shader with current bone transforms
CONST std::vector<Matrix> transforms = m_assimpModel.GetShaderTransforms();
VSBonePassConstants vsBoneConstants{};
for (UINT i = 0; i < m_assimpModel.GetNumBones(); i++)
{
// We do not transpose bone matrices for HLSL because the original
// Assimp matrices are column-vector matrices.
vsBoneConstants.boneTransforms[i] = transforms[i];
//vsBoneConstants.boneTransforms[i] = transforms[i].Transpose();
//vsBoneConstants.boneTransforms[i] = Matrix::Identity;
}
GraphicsResource vsBoneCB = m_graphicsMemory->AllocateConstant(vsBoneConstants);
vsPerObjects.gWorld = m_assimp_world.Transpose(); // vertex shader per object constant
vsPerObjectCB = m_graphicsMemory->AllocateConstant(vsPerObjects);
commandList->SetGraphicsRootConstantBufferView(RootParameterIndex::VSBoneConstantBuffer, vsBoneCB.GpuAddress());
commandList->SetGraphicsRootConstantBufferView(RootParameterIndex::VSPerObjConstBuffer, vsPerObjectCB.GpuAddress());
//commandList->SetGraphicsRootDescriptorTable(RootParameterIndex::ObjectSRV, m_shaderTextureHeap->GetGpuHandle(ShaderTexDescriptors::SuzanneDiffuse));
commandList->SetGraphicsRootDescriptorTable(RootParameterIndex::ObjectSRV, m_shaderTextureHeap->GetGpuHandle(ShaderTexDescriptors::DefaultDiffuse));
for (UINT i = 0; i < m_assimpModel.GetMeshSize(); i++)
{
commandList->IASetVertexBuffers(0u, 1u, &m_assimpModel.meshEntries[i].GetVertexBufferView());
commandList->IASetIndexBuffer(&m_assimpModel.meshEntries[i].GetIndexBufferView());
commandList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
commandList->DrawIndexedInstanced(m_assimpModel.meshEntries[i].GetIndexCount(), 1u, 0u, 0u, 0u);
}
Please note I am using the Graphics Resource memory management helper object found in the DirectXTK12 library in the code above. Finally, here's the skinning vertex shader I'm using.
// Luna (2016) lighting model adapted from Moller
#define MAX_BONES 4
// vertex shader constant data that varies per object
cbuffer cbVSPerObject : register(b3)
{
float4x4 gWorld;
//float4x4 gTexTransform;
}
// vertex shader constant data that varies per frame
cbuffer cbVSPerFrame : register(b5)
{
float4x4 gViewProj;
float4x4 gShadowTransform;
}
// bone matrix constant data that varies per object
cbuffer cbVSBonesPerObject : register(b9)
{
float4x4 gBoneTransforms[MAX_BONES];
}
struct VertexIn
{
float3 posL : SV_POSITION;
float3 normalL : NORMAL;
float2 texCoord : TEXCOORD0;
float3 tangentU : TANGENT;
float4 boneWeights : BONEWEIGHT;
uint4 boneIndices : BONEINDEX;
};
struct VertexOut
{
float4 posH : SV_POSITION;
//float3 posW : POSITION;
float4 shadowPosH : POSITION0;
float3 posW : POSITION1;
float3 normalW : NORMAL;
float2 texCoord : TEXCOORD0;
float3 tangentW : TANGENT;
};
VertexOut VS_main(VertexIn vin)
{
VertexOut vout = (VertexOut)0.f;
// Perform vertex skinning.
// Ignore BoneWeights.w and instead calculate the last weight value
// to ensure all bone weights sum to unity.
float4 weights = vin.boneWeights;
//weights.w = 1.f - dot(weights.xyz, float3(1.f, 1.f, 1.f));
//float4 weights = { 0.f, 0.f, 0.f, 0.f };
//weights.x = vin.boneWeights.x;
//weights.y = vin.boneWeights.y;
//weights.z = vin.boneWeights.z;
weights.w = 1.f - (weights.x + weights.y + weights.z);
float4 localPos = float4(vin.posL, 1.f);
float3 localNrm = vin.normalL;
float3 localTan = vin.tangentU;
float3 objPos = mul(localPos, (float4x3)gBoneTransforms[vin.boneIndices.x]).xyz * weights.x;
objPos += mul(localPos, (float4x3)gBoneTransforms[vin.boneIndices.y]).xyz * weights.y;
objPos += mul(localPos, (float4x3)gBoneTransforms[vin.boneIndices.z]).xyz * weights.z;
objPos += mul(localPos, (float4x3)gBoneTransforms[vin.boneIndices.w]).xyz * weights.w;
float3 objNrm = mul(localNrm, (float3x3)gBoneTransforms[vin.boneIndices.x]) * weights.x;
objNrm += mul(localNrm, (float3x3)gBoneTransforms[vin.boneIndices.y]) * weights.y;
objNrm += mul(localNrm, (float3x3)gBoneTransforms[vin.boneIndices.z]) * weights.z;
objNrm += mul(localNrm, (float3x3)gBoneTransforms[vin.boneIndices.w]) * weights.w;
float3 objTan = mul(localTan, (float3x3)gBoneTransforms[vin.boneIndices.x]) * weights.x;
objTan += mul(localTan, (float3x3)gBoneTransforms[vin.boneIndices.y]) * weights.y;
objTan += mul(localTan, (float3x3)gBoneTransforms[vin.boneIndices.z]) * weights.z;
objTan += mul(localTan, (float3x3)gBoneTransforms[vin.boneIndices.w]) * weights.w;
vin.posL = objPos;
vin.normalL = objNrm;
vin.tangentU.xyz = objTan;
//vin.posL = posL;
//vin.normalL = normalL;
//vin.tangentU.xyz = tangentL;
// End vertex skinning
// transform to world space
float4 posW = mul(float4(vin.posL, 1.f), gWorld);
vout.posW = posW.xyz;
// assumes nonuniform scaling, otherwise needs inverse-transpose of world matrix
vout.normalW = mul(vin.normalL, (float3x3)gWorld);
vout.tangentW = mul(vin.tangentU, (float3x3)gWorld);
// transform to homogenous clip space
vout.posH = mul(posW, gViewProj);
// pass texcoords to pixel shader
vout.texCoord = vin.texCoord;
//float4 texC = mul(float4(vin.TexC, 0.0f, 1.0f), gTexTransform);
//vout.TexC = mul(texC, gMatTransform).xy;
// generate projective tex-coords to project shadow map onto scene
vout.shadowPosH = mul(posW, gShadowTransform);
return vout;
}
Some last tests I tried before posting:
I tested the code with a Collada (DAE) model exported from Blender, only to observe the same distorted zigzagging in the Win32 desktop application.
I also confirmed the aiScene object for the loaded model returns an identity matrix for the global root transform (also verified in AssimpViewer).
I have stared at this code for about a week and am going out of my mind! Really hoping someone can spot what I have missed. If you need more code or info, please ask!

This seems to be a bug with the published code in the tutorials / documentation. It would be great if you could open an issue-report here: Assimp-Projectpage on GitHub .

It's taken almost another two weeks of pain, but I finally found the bug. It was in my own code, and it was self-inflicted. Before I show the solution, I should explain the further troubleshooting I did to get there.
After losing faith with Assimp (even though the AssimpViewer tool was animating my model correctly), I turned to the FBX SDK. The FBX ViewScene command line utility tool that's available as part of the SDK was also showing and animating my model properly, so I had hope...
So after a few days reviewing the FBX SDK tutorials, and taking another week to write an FBX importer for my Windows desktop game, I loaded my model and... saw exactly the same zig-zag animation anomaly as the version loaded by Assimp!
This frustrating outcome meant I could at least eliminate Assimp and the FBX SDK as the source of the problem, and focus again on the vertex shader. The shader I'm using for vertex skinning was adopted from the 'Character Animation' chapter of Frank Luna's text. It was identical in every way, which led me to recheck the C++ vertex structure declared on the application side...
Here's the C++ vertex declaration for skinned vertices:
struct Vertex
{
// added constructors
Vertex() = default;
Vertex(FLOAT x, FLOAT y, FLOAT z,
FLOAT nx, FLOAT ny, FLOAT nz,
FLOAT u, FLOAT v,
FLOAT tx, FLOAT ty, FLOAT tz) :
Pos(x, y, z),
Normal(nx, ny, nz),
TexC(u, v),
Tangent(tx, ty, tz) {}
Vertex(DirectX::SimpleMath::Vector3 pos,
DirectX::SimpleMath::Vector3 normal,
DirectX::SimpleMath::Vector2 texC,
DirectX::SimpleMath::Vector3 tangent) :
Pos(pos), Normal(normal), TexC(texC), Tangent(tangent) {}
DirectX::SimpleMath::Vector3 Pos;
DirectX::SimpleMath::Vector3 Normal;
DirectX::SimpleMath::Vector2 TexC;
DirectX::SimpleMath::Vector3 Tangent;
FLOAT BoneWeights[4];
BYTE BoneIndices[4];
//UINT BoneIndices[4]; <--- YOU HAVE CAUSED ME A MONTH OF PAIN
};
Quite early on, being confused by Luna's use of BYTE to store the array of bone indices, I changed this structure element to UINT, figuring this still matched the input declaration shown here:
static CONST D3D12_INPUT_ELEMENT_DESC inputElementDescSkinned[] =
{
{ "SV_POSITION", 0u, DXGI_FORMAT_R32G32B32_FLOAT, 0u, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0u },
{ "NORMAL", 0u, DXGI_FORMAT_R32G32B32_FLOAT, 0u, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0u },
{ "TEXCOORD", 0u, DXGI_FORMAT_R32G32_FLOAT, 0u, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0u },
{ "TANGENT", 0u, DXGI_FORMAT_R32G32B32_FLOAT, 0u, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0u },
//{ "BINORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
{ "BONEWEIGHT", 0u, DXGI_FORMAT_R32G32B32A32_FLOAT, 0u, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0u },
{ "BONEINDEX", 0u, DXGI_FORMAT_R8G8B8A8_UINT, 0u, D3D12_APPEND_ALIGNED_ELEMENT, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0u },
};
Here was the bug. By declaring UINT in the vertex structure for bone indices, four bytes were being assigned to store each bone index. But in the vertex input declaration, the DXGI_FORMAT_R8G8B8A8_UINT format specified for the "BONEINDEX" was assigning one byte per index. I suspect this data type and format size mismatch was resulting in only one valid bone index being able to fit in the BONEINDEX element, and so only one index value was passed to the vertex shader each frame, instead of the whole array of four indices for correct bone transform lookups.
So now I've learned... the hard way... why Luna had declared an array of BYTE for bone indices in the original C++ vertex structure.
I hope this experience will be of value to someone else, and always be careful changing code from your original learning sources.

Related

How do I extract the output from CGAL::poisson_surface_reconstruction_delaunay?

I am trying to convert a point cloud to a trimesh using CGAL::poisson_surface_reconstruction_delaunay() and extract the data inside the trimesh to an OpenGL friendly format:
// The function below should set vertices and indices so that:
// triangle 0: (vertices[indices[0]],vertices[indices[1]],vertices[indices[2]]),
// triangle 1: (vertices[indices[3]],vertices[indices[4]],vertices[indices[5]])
// ...
// triangle n - 1
void reconstructPointsToSurfaceInOpenGLFormat(const& std::list<std::pair<Kernel::Point_3, Kernel::Vector_3>> points, // input: points and normals
std::vector<glm::vec3>& vertices, // output
std::vector<unsigned int>& indices) { // output
CGAL::Surface_mesh<Kernel::Point_3> trimesh;
double spacing = 10;
bool ok = CGAL::poisson_surface_reconstruction_delaunay(points.begin(), points.end(),
CGAL::First_of_pair_property_map<std::pair<Kernel::Point_3, Kernel::Vector_3>>(),
CGAL::Second_of_pair_property_map<std::pair<Kernel::Point_3, Kernel::Vector_3>>(),
trimesh, spacing);
// How do I set the vertices and indices values?
}
Please help me on iterating trough the triangles in trimesh and setting the vertices and indices in the code above.
The class Polyhedron_3 is not indexed based so you need to provide a item class with ids like Polyhedron_items_with_id_3. You will then need to call CGAL::set_halfedgeds_items_id(trimesh) to init the ids. If you can't modify the Polyhedron type, then you can use dynamic properties and will need to init the ids.
Note that Surface_mesh is indexed based and no particular handling is needed to get indices.
Based on sloriots code from his answer:
void mesh2GLM(CGAL::Surface_mesh<Kernel::Point_3>& trimesh, std::vector<glm::vec3>& vertices, std::vector<int>& indices) {
std::map<size_t, size_t> meshIndex2Index;
// Loop over all vertices in mesh:
size_t index = 0;
for (Mesh::Vertex_index v : CGAL::vertices(trimesh)) {
CGAL::Epick::Point_3 point = trimesh.point(v);
std::size_t vi = v;
vertices.push_back(glm::vec3(point.x(), point.y(), point.z()));
meshIndex2Index[vi] = index;
index++;
}
// Loop over all triangles (faces):
for (Mesh::Face_index f : faces(trimesh)) {
for (Mesh::Vertex_index v : CGAL::vertices_around_face(CGAL::halfedge(f, trimesh), trimesh)) {
trimesh.point(v);
std::size_t vi = v;
size_t index = meshIndex2Index[vi];
indices.push_back(index);
}
}
}
Seems to work fine.

converting meter into pixel unit

i am trying to convert convert distance from meter to pixel in ros node, with pcl library and kinect xbox. I was using below code to access euclidean coordinates of every point from kinect inside ros node, which is in meter. But i wanted to get this measurments in pixel unit. What should i do?
void
cloud_cb (const sensor_msgs::PointCloud2ConstPtr& input)
{
pcl::PointCloud<pcl::PointXYZRGB> output;
pcl::fromROSMsg(*input,output );
for(int i=0;i<=400;i++)
{
for(int j=0;j<=400;j++)
{
p[i][j] = output.at(i,j);
ROS_INFO("\n p.z = %f \t p.x = %f \t p.y = %f",p[i][j].z,p[i][j].x,p[i][j].y);
}
}
sensor_msgs::PointCloud2 cloud;
pcl::toROSMsg(output,cloud);
pub.publish (cloud);
}
Here P[raw][col] is a Point structure which contains the x,y,z coordinates value in meter, which i want to convert in pixel unit. As i see the value of pixel unit is not constant, so cant use any value found in google.
I got similar question here: Kinect depth conversion from mm to pixels, but it has no solution.
There's a problem with trying to convert meters to pixels. Pixels aren't a standard unit. The physical size of 1 pixel varies on different devices depending on screen resolution and size of a screen.
If you know the resolution of the screen the conversion is still non-trivial.
const int L = 1920; //screen width
const int H = 1280; //screen height
for(int i=0;i<=L;i++){
for(int j=0;j<=H;j++){
p[i][j] = output.at(i*400/L,j*400/H);
}
}
Thus for every pixel you'll have a depth value corresponding to the depth value in the map. This will need some int conversion and improvement.

How-to convert an iOS camera image to greyscale using the Accelerate Framework?

It seems like this should be simpler than I'm finding it to be.
I have an AVFoundation frame coming back in the standard delegate method:
- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection
where I would like to convert the frame to greyscale using the Accelerate.Framework.
There is a family of conversion methods in the framework, including vImageConvert_RGBA8888toPlanar8(), which looks like it might be what I would like to see, however, I can't find any examples of how to use them!
So far, I have the code:
- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection
{
#autoreleasepool {
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
/*Lock the image buffer*/
CVPixelBufferLockBaseAddress(imageBuffer,0);
/*Get information about the image*/
uint8_t *baseAddress = (uint8_t *)CVPixelBufferGetBaseAddress(imageBuffer);
size_t width = CVPixelBufferGetWidth(imageBuffer);
size_t height = CVPixelBufferGetHeight(imageBuffer);
size_t stride = CVPixelBufferGetBytesPerRow(imageBuffer);
// vImage In
Pixel_8 *bitmap = (Pixel_8 *)malloc(width * height * sizeof(Pixel_8));
const vImage_Buffer inImage = { bitmap, height, width, stride };
//How can I take this inImage and convert it to greyscale?????
//vImageConvert_RGBA8888toPlanar8()??? Is the correct starting format here??
}
}
So I have two questions:
(1) In the code above, is RBGA8888 the correct starting format?
(2) How can I actually make the Accelerate.Framework call to convert to greyscale?
There is an easier option here. If you change the camera acquire format to YUV, then you already have a greyscale frame that you can use as you like. When setting up your data output, use something like:
dataOutput.videoSettings = #{ (id)kCVPixelBufferPixelFormatTypeKey : #(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange) };
You can then access the Y plane in your capture callback using:
CVPixelBufferRef pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
CVPixelBufferLockBaseAddress(pixelBuffer, kCVPixelBufferLock_ReadOnly);
uint8_t *yPlane = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);
... do stuff with your greyscale camera image ...
CVPixelBufferUnlockBaseAddress(pixelBuffer);
The vImage method is to use vImageMatrixMultiply_Planar8 and a 1x3 matrix.
vImageConvert_RGBA8888toPlanar8 is the function you use to convert a RGBA8888 buffer into 4 planar buffers. These are used by vImageMatrixMultiply_Planar8. vImageMatrixMultiply_ARGB8888 will do it too in one pass, but your gray channel will be interleaved with three other channels in the result. vImageConvert_RGBA8888toPlanar8 itself doesn't do any math. All it does is separate your interleaved image into separate image planes.
If you need to adjust the gamma as well, then probably vImageConvert_AnyToAny() is the easy choice. It will do the fully color managed conversion from your RGB format to a grayscale colorspace. See vImage_Utilities.h.
I like Tarks answer better though. It just leaves you in a position of having to color manage the Luminance manually (if you care).
Convert BGRA Image to Grayscale with Accelerate vImage
This method is meant to illustrate getting Accelerate's vImage use in converting BGR images to grayscale. Your image may very well be in RGBA format and you'll need to adjust the matrix accordingly, but the camera outputs BGRA so I'm using it here. The values in the matrix are the same values used in OpenCV for cvtColor, there are other values you might play with like luminosity. I assume you malloc the appropriate amount of memory for the result. In the case of grayscale it is only 1-channel or 1/4 the memory used for BGRA. If anyone finds issues with this code please leave a comment.
Performance note
Converting to grayscale in this way may NOT be the fastest. You should check the performance of any method in your environment. Brad Larson's GPUImage might be faster, or even OpenCV's cvtColor. In any case you will want to remove the calls to malloc and free for the intermediate buffers and manage them for the app lifecycle. Otherwise, the function call will be dominated by the malloc and free. Apple's docs recommend reusing the whole vImage_Buffer when possible.
You can also read about solving the same problem with NEON intrinsics.
Finally, the fastest method is not converting at all. If you're getting image data from the device camera the device camera is natively in the kCVPixelFormatType_420YpCbCr8BiPlanarFullRange format. Meaning, grabbing the first plane's data (Y-Channel, luma) is the fastest way to get grayscale.
BGRA to Grayscale
- (void)convertBGRAFrame:(const CLPBasicVideoFrame &)bgraFrame toGrayscale:(CLPBasicVideoFrame &)grayscaleFrame
{
vImage_Buffer bgraImageBuffer = {
.width = bgraFrame.width,
.height = bgraFrame.height,
.rowBytes = bgraFrame.bytesPerRow,
.data = bgraFrame.rawPixelData
};
void *intermediateBuffer = malloc(bgraFrame.totalBytes);
vImage_Buffer intermediateImageBuffer = {
.width = bgraFrame.width,
.height = bgraFrame.height,
.rowBytes = bgraFrame.bytesPerRow,
.data = intermediateBuffer
};
int32_t divisor = 256;
// int16_t a = (int16_t)roundf(1.0f * divisor);
int16_t r = (int16_t)roundf(0.299f * divisor);
int16_t g = (int16_t)roundf(0.587f * divisor);
int16_t b = (int16_t)roundf(0.114f * divisor);
const int16_t bgrToGray[4 * 4] = { b, 0, 0, 0,
g, 0, 0, 0,
r, 0, 0, 0,
0, 0, 0, 0 };
vImage_Error error;
error = vImageMatrixMultiply_ARGB8888(&bgraImageBuffer, &intermediateImageBuffer, bgrToGray, divisor, NULL, NULL, kvImageNoFlags);
if (error != kvImageNoError) {
NSLog(#"%s, vImage error %zd", __PRETTY_FUNCTION__, error);
}
vImage_Buffer grayscaleImageBuffer = {
.width = grayscaleFrame.width,
.height = grayscaleFrame.height,
.rowBytes = grayscaleFrame.bytesPerRow,
.data = grayscaleFrame.rawPixelData
};
void *scratchBuffer = malloc(grayscaleFrame.totalBytes);
vImage_Buffer scratchImageBuffer = {
.width = grayscaleFrame.width,
.height = grayscaleFrame.height,
.rowBytes = grayscaleFrame.bytesPerRow,
.data = scratchBuffer
};
error = vImageConvert_ARGB8888toPlanar8(&intermediateImageBuffer, &grayscaleImageBuffer, &scratchImageBuffer, &scratchImageBuffer, &scratchImageBuffer, kvImageNoFlags);
if (error != kvImageNoError) {
NSLog(#"%s, vImage error %zd", __PRETTY_FUNCTION__, error);
}
free(intermediateBuffer);
free(scratchBuffer);
}
CLPBasicVideoFrame.h - For reference
typedef struct
{
size_t width;
size_t height;
size_t bytesPerRow;
size_t totalBytes;
unsigned long pixelFormat;
void *rawPixelData;
} CLPBasicVideoFrame;
I got through the grayscale conversion, but was having trouble with the quality when I found this book on the web called Instant OpenCV for iOS. I personally picked up a copy and it has a number of gems, although the code is bit of a mess. On the bright-side it is a very reasonably priced eBook.
I'm very curious about that matrix. I toyed around with it for hours trying to figure out what the arrangement should be. I would have thought the values should be on the diagonal, but the Instant OpenCV guys put it as above.
if you need to use BGRA vide streams - you can use this excellent conversion
here
This is the function you'll need to take:
void neon_convert (uint8_t * __restrict dest, uint8_t * __restrict src, int numPixels)
{
int i;
uint8x8_t rfac = vdup_n_u8 (77);
uint8x8_t gfac = vdup_n_u8 (151);
uint8x8_t bfac = vdup_n_u8 (28);
int n = numPixels / 8;
// Convert per eight pixels
for (i=0; i < n; ++i)
{
uint16x8_t temp;
uint8x8x4_t rgb = vld4_u8 (src);
uint8x8_t result;
temp = vmull_u8 (rgb.val[0], bfac);
temp = vmlal_u8 (temp,rgb.val[1], gfac);
temp = vmlal_u8 (temp,rgb.val[2], rfac);
result = vshrn_n_u16 (temp, 8);
vst1_u8 (dest, result);
src += 8*4;
dest += 8;
}
}
more optimisations (using assembly) are in the link
(1) My experience with the iOS camera framework has been with images in the kCMPixelFormat_32BGRA format, which is compatible with the ARGB8888 family of functions. (It may be possible to use other formats as well.)
(2) The simplest way to convert from BGR to grayscale on iOS is to use vImageMatrixMultiply_ARGB8888ToPlanar8():
https://developer.apple.com/documentation/accelerate/1546979-vimagematrixmultiply_argb8888top
Here is a fairly complete example written in Swift. I'm assuming the Objective-C code would be similar.
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
// TODO: report error
return
}
// Lock the image buffer
if (kCVReturnSuccess != CVPixelBufferLockBaseAddress(imageBuffer, CVPixelBufferLockFlags.readOnly)) {
// TODO: report error
return
}
defer {
CVPixelBufferUnlockBaseAddress(imageBuffer, CVPixelBufferLockFlags.readOnly)
}
// Create input vImage_Buffer
let baseAddress = CVPixelBufferGetBaseAddress(imageBuffer)
let width = CVPixelBufferGetWidth(imageBuffer)
let height = CVPixelBufferGetHeight(imageBuffer)
let stride = CVPixelBufferGetBytesPerRow(imageBuffer)
var inImage = vImage_Buffer(data: baseAddress, height: UInt(height), width: UInt(width), rowBytes: stride)
// Create output vImage_Buffer
let bitmap = malloc(width * height)
var outImage = vImage_Buffer(data: bitmap, height: UInt(height), width: UInt(width), rowBytes: width)
defer {
// Make sure to free unless the caller is responsible for this
free(bitmap)
}
// Arbitrary divisor to scale coefficients to integer values
let divisor: Int32 = 0x1000
let fDivisor = Float(divisor)
// Rec.709 coefficients
var coefficientsMatrix = [
Int16(0.0722 * fDivisor), // blue
Int16(0.7152 * fDivisor), // green
Int16(0.2126 * fDivisor), // red
0 // alpha
]
// Convert to greyscale
if (kvImageNoError != vImageMatrixMultiply_ARGB8888ToPlanar8(
&inImage, &outImage, &coefficientsMatrix, divisor, nil, 0, vImage_Flags(kvImageNoFlags))) {
// TODO: report error
return
}
The code above was inspired by a tutorial from Apple on grayscale conversion, which can be found at the following link. It also includes conversion to a CGImage if that is needed. Note that they assume RGB order instead of BGR, and they only provide a 3 coefficients instead of 4 (mistake?)
https://developer.apple.com/documentation/accelerate/vimage/converting_color_images_to_grayscale

Actionscript to Obj-C / Cocoa: How to represent BitmapData?

I'm trying to port the as3delaunay library to Obj-C (for fun and to learn some more Obj-C). It's going pretty well, but I don't really understand how to convert uses of BitmapData to Cocoa. Here are several of the relevant parts of the aforementioned library:
In Edge.as:
internal function makeDelaunayLineBmp():BitmapData
{
var p0:Point = leftSite.coord;
var p1:Point = rightSite.coord;
GRAPHICS.clear();
// clear() resets line style back to undefined!
GRAPHICS.lineStyle(0, 0, 1.0, false, LineScaleMode.NONE, CapsStyle.NONE);
GRAPHICS.moveTo(p0.x, p0.y);
GRAPHICS.lineTo(p1.x, p1.y);
var w:int = int(Math.ceil(Math.max(p0.x, p1.x)));
if (w < 1)
{
w = 1;
}
var h:int = int(Math.ceil(Math.max(p0.y, p1.y)));
if (h < 1)
{
h = 1;
}
var bmp:BitmapData = new BitmapData(w, h, true, 0);
bmp.draw(LINESPRITE);
return bmp;
}
That is called in the following function, in selectNonIntersectingEdges.as:
internal function selectNonIntersectingEdges(keepOutMask:BitmapData, edgesToTest:Vector.<Edge>):Vector.<Edge>
{
if (keepOutMask == null)
{
return edgesToTest;
}
var zeroPoint:Point = new Point();
return edgesToTest.filter(myTest);
function myTest(edge:Edge, index:int, vector:Vector.<Edge>):Boolean
{
var delaunayLineBmp:BitmapData = edge.makeDelaunayLineBmp();
var notIntersecting:Boolean = !(keepOutMask.hitTest(zeroPoint, 1, delaunayLineBmp, zeroPoint, 1));
delaunayLineBmp.dispose();
return notIntersecting;
}
}
It also appears as a parameter in SiteList.as:
public function nearestSitePoint(proximityMap:BitmapData, x:Number, y:Number):Point
{
var index:uint = proximityMap.getPixel(x, y);
if (index > _sites.length - 1)
{
return null;
}
return _sites[index].coord;
}
What would be a good way to represent this behavior and/or the use of BitmapData in Obj-C / Cocoa?
The Core Graphics library has very powerful graphics capabilities -- even though it's all C based. You can create an 8-bit RGBA (red, green, blue, alpha) bitmap context like this:
size_t const bytesPerRow = 4 * width;
CGColorSpaceRef colorspace = CGColorSpaceCreateDeviceRGB();
CGBitmapInfo bitmapInfo = kCGImageAlphaPremultipliedLast;
CGContextRef ctx = CGBitmapContextCreate(NULL, width, height, 8, bytesPerRow, colorspace, bitmapInfo);
CFRelease(colorspace);
This corresponds to what the AS3 BitmapData has.
ctx now has a reference to the bitmap context which you eventually have to release with CGContextRelease(ctx).
You can manipulate the context, i.e. draw into it with the various CGContext* functions. If you eventually need to save it as an image (e.g. JPEG data), use CGBitmapContextCreateImage().
It looks like you are needing a CGBitmapContext to use in place of the BitmapData type.
Things to look into:
Quartz 2D Programming Guide read the "Graphics Context" section, especially the "Creating a Bitmap Graphics Context" part.
CGBitmapContext reference

GLSL shader generation of normals

Hi I am writing 3D modeling app and I want to speed up rendering in OpenGL. Currently I use glBegin/glEnd which is really slow and deprecated way. I need to draw very fast flat shaded models. I generate normals on CPU every single frame. This is very slow. I tried to use glDrawElements with indexed geometry, but there is problem in normal generation, because normals are specified at vertex not at triangle level.
Another idea was to use GLSL to generate normals on GPU in geometry shader. I written this code for normal generation:
#version 120
#extension GL_EXT_geometry_shader4 : enable
vec3 NormalFromTriangleVertices(vec3 triangleVertices[3])
{
// now is same as RedBook (OpenGL Programming Guide)
vec3 u = triangleVertices[0] - triangleVertices[1];
vec3 v = triangleVertices[1] - triangleVertices[2];
return cross(v, u);
}
void main()
{
// no change of position
// computes normal from input triangle and front color for that triangle
vec3 triangleVertices[3];
vec3 computedNormal;
vec3 normal, lightDir;
vec4 diffuse;
float NdotL;
vec4 finalColor;
for(int i = 0; i < gl_VerticesIn; i += 3)
{
for (int j = 0; j < 3; j++)
{
triangleVertices[j] = gl_PositionIn[i + j].xyz;
}
computedNormal = NormalFromTriangleVertices(triangleVertices);
normal = normalize(gl_NormalMatrix * computedNormal);
// hardcoded light direction
vec4 light = gl_ModelViewMatrix * vec4(0.0, 0.0, 1.0, 0.0);
lightDir = normalize(light.xyz);
NdotL = max(dot(normal, lightDir), 0.0);
// hardcoded
diffuse = vec4(0.5, 0.5, 0.9, 1.0);
finalColor = NdotL * diffuse;
finalColor.a = 1.0; // final color ignores everything, except lighting
for (int j = 0; j < 3; j++)
{
gl_FrontColor = finalColor;
gl_Position = gl_PositionIn[i + j];
EmitVertex();
}
}
EndPrimitive();
}
When I integrated shaders to my application, no speed improvement occurred. It was worse than before. I am newbie in GLSL and shaders overall so I don't know what I done wrong.
I tried this code on MacBook with Geforce 9400M.
To be more clear, this is code I want to replace:
- (void)drawAsCommandsWithScale:(Vector3D)scale
{
float frontDiffuse[4] = { 0.4, 0.4, 0.4, 1 };
CGFloat components[4];
[color getComponents:components];
float backDiffuse[4];
float selectedDiffuse[4] = { 1.0f, 0.0f, 0.0f, 1 };
for (uint i = 0; i < 4; i++)
backDiffuse[i] = components[i];
glMaterialfv(GL_BACK, GL_DIFFUSE, backDiffuse);
glMaterialfv(GL_FRONT, GL_DIFFUSE, frontDiffuse);
Vector3D triangleVertices[3];
float *lastDiffuse = frontDiffuse;
BOOL flip = scale.x < 0.0f || scale.y < 0.0f || scale.z < 0.0f;
glBegin(GL_TRIANGLES);
for (uint i = 0; i < triangles->size(); i++)
{
if (selectionMode == MeshSelectionModeTriangles)
{
if (selected->at(i))
{
if (lastDiffuse == frontDiffuse)
{
glMaterialfv(GL_FRONT_AND_BACK, GL_DIFFUSE, selectedDiffuse);
lastDiffuse = selectedDiffuse;
}
}
else if (lastDiffuse == selectedDiffuse)
{
glMaterialfv(GL_BACK, GL_DIFFUSE, backDiffuse);
glMaterialfv(GL_FRONT, GL_DIFFUSE, frontDiffuse);
lastDiffuse = frontDiffuse;
}
}
Triangle currentTriangle = [self triangleAtIndex:i];
if (flip)
currentTriangle = FlipTriangle(currentTriangle);
[self getTriangleVertices:triangleVertices fromTriangle:currentTriangle];
for (uint j = 0; j < 3; j++)
{
for (uint k = 0; k < 3; k++)
{
triangleVertices[j][k] *= scale[k];
}
}
Vector3D n = NormalFromTriangleVertices(triangleVertices);
n.Normalize();
for (uint j = 0; j < 3; j++)
{
glNormal3f(n.x, n.y, n.z);
glVertex3f(triangleVertices[j].x, triangleVertices[j].y, triangleVertices[j].z);
}
}
glEnd();
}
As you can see it is very inefficient, but working.triangles is array of indexes into vertices array.
I tried to use this code for drawing, but I can't have only one index array not two (one for vertices and second for normals).
glEnableClientState(GL_VERTEX_ARRAY);
uint *trianglePtr = (uint *)(&(*triangles)[0]);
float *vertexPtr = (float *)(&(*vertices)[0]);
glVertexPointer(3, GL_FLOAT, 0, vertexPtr);
glDrawElements(GL_TRIANGLES, triangles->size() * 3, GL_UNSIGNED_INT, trianglePtr);
glDisableClientState(GL_VERTEX_ARRAY);
Now, how can I specify pointer to normals, when some vertices are shared by different triangles, so different normals for them?
So I finally managed to increase rendering speed. I recalculate normals on CPU, only when vertices or triangles changes, which occurs only when working in one mesh not in whole scene.
It is not solution that I wanted but in real world it is better than previous approaches.
I cache whole geometry into separate normal and vertex array, indexed drawing cannot be used because I want flat shading (similar problem to smoothing groups in 3ds max).
I use simple glDrawArrays and for lighting vertex shader, that is because I want in triangle mode different color for selected triangle and another one for unselected and there is no array of materials (I didn't found any one).
You wouldn't usually calculate the normals every frame, only when the geometry changes.
And to have one normal per triangle just set the same normal for each vertex in the triangle. That does mean you can't share vertices between adjacent triangles in your mesh but that's not unusual at all in this kind of thing.
Your question makes me remember this Normals without Normals blog post.