Simple algorithm for tracking a rectangular blob

Simple algorithm for tracking a rectangular blob - tracking

I have created an experimental fast rectangular object tracking system; it will be used for headtracking and controllling objects in 3D engine (Ogre3D).
For now I am able to show to the webcam any kind of bright colored rectangle (text markers are good objects) and system registers basic properties of this object (hue/value/lightness and initial width and height in 0 degrees rotation).
After I have registered the trackable object, I do some simple frame processing to create grayscale probabilty map.
So now I have 2 known things:
1) 4 corners for the last object position (it's always a rectangle but it may be rotated)
2) a pretty rectangular (but still far from perfect) blob which is the brightest in the frame. I can get coordinates of any point of the blob without problems, point detection is stable enough.
I can find a bounding rectangle of the object without problems, but I have a problem with detecting the object corners themselves.
I need the simplest possible (quick&dirty would be great) algorithm to scan the image starting with some known coordinates (a point inside the blob) and detect new 4 x,y coordinates of a "blobish" rectangle corners (not corners of a bounding box but corners of the rectangular blob itself).
Ready-to-use C++ function would be awesome, but somehow google doesn't like me today :(
I think that it would be overkill to use some complicated function form OpenCV library just to extract 4 points of a single rectanglular blob. But if you know a quick and efficient way how to do it using OpenCV (it must be real-time and light on CPU because I'll run the 3D engine at the same time) then I would be really grateful.

You can apply Hough transform on segmented image to detect lines. Using detected lines you can calculate their intersection to find the corner coordinates of the blob.

Related

GODOT: What is an efficient calculation for the AABB of a simple 3D model from a camera's view

I am attempting to come up with a quick and efficient means of translating a 3d mesh into a projected AABB. In the end, I would like to accomplish something similar to figure 1 wherein only the area of the screen covered by the cube is located inside the bounding box highlighted in red. ((if it is at all possible, getting the area as small as possible, highlighted in blue, would increase efficiency down the road.))
Figure 1. https://i.imgur.com/pd0E20C.png
Currently, I have tried:
Calculating the point position on the screen using camera.unproject_position(). this failed largely due to my inability to wrap my head around the pixel positions trending towards infinity. I understand it has something to do with Tan, but frankly, it is too late for my brain to function anymore.
Getting the area of collision between the view frustum and the AABB of the mesh instance. This method seems convoluted, and to get it in a usable format I would need to project the result into 2d coordinates again.
Using the MeshInstance VisualInstance to create a texture wherein a pixel is white if it contains the mesh instance, and black otherwise. Visual instances in general just baffle me, and I did not think it would be efficient to have another viewport just to output this texture.
What I am looking for:
An output that can be passed to a shader informing where to complete certain calculations. Right now this is set up to use a bounding box, but it could easily be rewritten to also use a texture. It also could be rewritten to use polygons, but I am trying to keep calculations to a minimum in the shader.
Certain solutions I have tried before have worked, slightly, but this must be robust. The camera interfacing with the 3d object will be able to move completely around and through it, meaning at times the view will be completely surrounded by the 3d model with points both in front, and behind.
Thank you for any help you can provide.
I will try my best to update this post with information if needed.

Rendering line art with constant screen width

I have a line art texture applied to an object in 3D space. The default behavior is for the object and the texture to receive perspective scaling based on the perspective model view projection matrix. Is there any established technique to keep the positioning and scaling of the 3D object, while keeping the line width constant relative to the screen? The desired effect is as though a pen (fixed screen width) were used to trace a path on the 3D object.
Would something like SDF-based font rendering help?
Or maybe some kind of projective texture mapping?
Or render the object and texture to a buffer and expand the lines using edge detection?
Unfortunately, I'm using OGL ES 2, so I can't use a geom shader or anything like that.

The solution I came up with is inspired by procedural SDF generation, like #Felipe suggested, combined with Chris Green's Improved Alpha-Tested Magnification for Vector Textures and Special Effects.
Basically I hand draw shapes into textures using pure red, green, and blue. Then I render the scene using those textures, and generate an SDF on the fly in a second render pass. The SDF generation uses Green's algorithm with a small spread to improve performance. The SDF is then passed to a final render pass that thresholds and antialiases the SDF per Green's approach, using fwidth to maintain a constant line weight regardless of the distance of the object to the camera.
Since the original question was just for the approach/concept, I'm not posting an example at the moment. But I'll see if I can put together a shadertoy sometime soon.

You could create the texture procedurally in a fragment shader and use the size of a pixel for interpolations.
See:
FabriceNeyret's blog

How to detect an image between shapes from camera

I've been searching around the web about how to do this and I know that it needs to be done with OpenCV. The problem is that all the tutorials and examples that I find are for separated shapes detection or template matching.
What I need is a way to detect the contents between 3 circles (which can be a photo or something else). From what I searched, its not to difficult to find the circles with the camera using contours but, how do I extract what is between them? The circles work like a pattern on the image to grab what is "inside the pattern".
Do I need to use the contours of each circle and measure the distance between them to grab my contents? If so, what if the image is a bit rotated/distorted on the camera?
I'm using Xamarin.iOS for this but from what I already saw, I believe I need to go native for this and any Objective C example is welcome too.
EDIT
Imagining that the image captured by the camera is this:
What I want is to match the 3 circles and get the following part of the image as result:
Since the images come from the camera, they can be rotated or scaled up/down.

The warpAffine function will let you map the desired area of the source image to a destination image, performing cropping, rotation and scaling in a single go.
Talking about rotation and scaling seem to indicate that you want to extract a rectangle of a given aspect ratio, hence perform a similarity transform. To define such a transform, three points are too much, two suffice. The construction of the affine matrix is a little tricky.

How do I rotate an OpenGL view relative to the center of the view as opposed to the center of the object being displayed?

I'm working on a fork of Pleasant3D.
When rotating an object being displayed the object always rotates around the same point relative to to itself even if that point is not at the center of the view (e.g. because the user has panned to move the object in the view).
I would like to change this so that the view always rotates the object around the point at the center of the view as it appears to the user instead of the center of the object.
Here is the core of the current code that rotates the object around its center (slightly simplified) (from here):
glLoadIdentity();
// midPlatform is the offset to reach the "middle" of the object (or more specifically the platform on which the object sits) in the x/y dimension.
// This the point around which the view is currently rotated.
Vector3 *midPlatform = [self.currentMachine calcMidBuildPlatform];
glTranslatef((GLfloat)cameraTranslateX - midPlatform.x,
(GLfloat)cameraTranslateY - midPlatform.y,
(GLfloat)cameraOffset);
// trackBallRotation and worldRotation come from trackball.h/c which appears to be
// from an Apple OpenGL sample.
if (trackBallRotation[0] != 0.0f) {
glRotatef (trackBallRotation[0], trackBallRotation[1], trackBallRotation[2], trackBallRotation[3]);
}
// accumlated world rotation via trackball
glRotatef (worldRotation[0], worldRotation[1], worldRotation[2], worldRotation[3]);
glTranslatef(midPlatform.x, midPlatform.y, 0.);
// Now draw object...
What transformations do I need to apply in what order to get the effect I desire?
Some of what I've tried so far
As I understand it this is what the current code does:
"OpenGL performs matrices multiplications in reverse order if multiple transforms are applied to a vertex" (from here). This means that the first transformation to be applied is actually the last one in the code above. It moves the center of the view (0,0) to the center of the object.
This point is then used as the center of rotation for the next two transformations (the rotations).
Finally the midPlatform translation is done in reverse to move the center back to the original location and the XY translations (panning) done by the user is applied. Here also the "camera" is moved away from the object to the proper location (indicated by cameraOffset).
This seems straightforward enough. So what I need to change is instead of translating the center of the view to the center of the object (midPlatform) I need to translate it to the current center of the view as seen by the user, right?
Unfortunately this is where the transformations start affecting each other in interesting ways and I am running into trouble.
I tried changing the code to this:
glLoadIdentity();
glTranslatef(0,
0,
(GLfloat)cameraOffset);
if (trackBallRotation[0] != 0.0f) {
glRotatef (trackBallRotation[0], trackBallRotation[1], trackBallRotation[2], trackBallRotation[3]);
}
// accumlated world rotation via trackball
glRotatef (worldRotation[0], worldRotation[1], worldRotation[2], worldRotation[3]);
glTranslatef(cameraTranslateX, cameraTranslateY, 0.);
In other words, I translate the center of the view to the previous center, rotate around that, and then apply the camera offset to move the camera away to the proper position. This makes the rotation behave exactly the way I want it to, but it introduces a new issue. Now any panning done by the user is relative to the object. For example if the object is rotated so that the camera is looking along the X axis end-on, if the user pans left to right the object appears to be moving closer/further from the user instead of left or right.
I think I can understand why the is (XY camera translations being applied before rotation), and I think what I need to do is figure out a way to cancel out the translation from before the rotation after the rotation (to avoid the weird panning effect) and then to do another translation which translates relative to the viewer (eye coordinate space) instead of the object (object coordinate space) but I'm not sure exactly how to do this.
I found what I think are some clues in the OpenGL FAQ(http://www.opengl.org/resources/faq/technical/transformations.htm), for example:
9.070 How do I transform my objects around a fixed coordinate system rather than the object's local coordinate system?
If you rotate an object around its Y-axis, you'll find that the X- and Z-axes rotate with the object. A subsequent rotation around one of these axes rotates around the newly transformed axis and not the original axis. It's often desirable to perform transformations in a fixed coordinate system rather than the object’s local coordinate system.
The root cause of the problem is that OpenGL matrix operations postmultiply onto the matrix stack, thus causing transformations to occur in object space. To affect screen space transformations, you need to premultiply. OpenGL doesn't provide a mode switch for the order of matrix multiplication, so you need to premultiply by hand. An application might implement this by retrieving the current matrix after each frame. The application multiplies new transformations for the next frame on top of an identity matrix and multiplies the accumulated current transformations (from the last frame) onto those transformations using glMultMatrix().
You need to be aware that retrieving the ModelView matrix once per frame might have a detrimental impact on your application’s performance. However, you need to benchmark this operation, because the performance will vary from one implementation to the next.
And
9.120 How do I find the coordinates of a vertex transformed only by the ModelView matrix?
It's often useful to obtain the eye coordinate space value of a vertex (i.e., the object space vertex transformed by the ModelView matrix). You can obtain this by retrieving the current ModelView matrix and performing simple vector / matrix multiplication.
But I'm not sure how to apply these in my situation.

You need to transform/translate "center of view" point into origin, rotate, then invert that translation, back to the object's transform. This is known as a basis change in linear algebra.
This is way easier to work with if you have a proper 3d-math library (I'm assuming you do have one), and that also helps to to stay far from the deprecated fixed-pipeline APIs. (more on that later).
Here's how I'd do it:
Find the transform for the center of view point in world coordinates (figure it out, then draw it to make sure it's correct, with x,y,z axis too, since the axii are supposed to be correct w.r.t. the view). If you use the center-of-view point and the rotation (usually the inverse of the camera's rotation), this will be a transform from world origin to the view center. Store this in a 4x4 matrix transform.
Apply the inverse of the above transform, so that it becomes the origin. glMultMatrixfv(center_of_view_tf.inverse());
Rotate about this point however you want (glRotate())
Transform everything back to world space (glMultMatrixfv(center_of_view_tf);)
Apply object's own world transform (glTranslate/glRotate or glMultMatrix) and draw it.
About the fixed function pipeline
Back in the old days, there were separate transistors for transforming a vertex (or it's texture coordinates), computing where light was in relation to it applying lights (up to 8) and texturing fragments in many different ways. Simply, glEnable(), enabled fixed blocks of silicon to do some computation in the hardware graphics pipeline. As performance grew, die sized shrunk and people demanded more features, the amount of dedicated silicon grew too, and much of it wasn't used.
Eventually, it got so advanced that you could program it in rather obscene ways (register combiners anyone). And then, it became feasible to actually upload a small assembler program for all vertex-level transforms. Then, it made to sense to keep a lot of silicon there that just did one thing (especially as you could've used those transistors to make the programmable stuff faster), so everything became programmable. If "fixed function" rendering was called for, the driver just converted the state (X lights, texture projections, etc) to shader code and uploaded that as a vertex shader.
So, currently, where even the fragment processing is programmable, there is just a lot of fixed-function options that is used by tons and tons of OpenGL applications, but the silicon on the GPU just runs shaders (and lots of it, in parallell).
...
To make OpenGL more efficient, and the drivers less bulky, and the hardware simpler and useable on mobile/console devices and to take full advantage of the programmable hardware that OpenGL runs on these days, many functions in the API are now marked deprecated. They are not available on OpenGL ES 2.0 and beyond (mobile) and you won't be getting the best performance out of them even on desktop systems (where they will still be in the driver for ages to come, serving equally ancient code bases originating back to the dawn of accelerated 3d graphics)
The fixed-functionness mostly concerns how transforms/lighting/texturing etc. are done by "default" in OpenGL (i.e. glEnable(GL_LIGHTING)), instead of you specifying these ops in your custom shaders.
In the new, programmable, OpenGL, transform matrices are just uniforms in the shader. Any rotate/translate/mult/inverse (like the above) should be done by client code (your code) before being uploaded to OpenGL. (Using only glLoadMatrix is one way to start thinking about it, but instead of using gl_ModelViewProjectionMatrix and the ilk in your shader, use your own uniforms.)
It's a bit of a bother, since you have to implement quite a bit of what was done by the GL driver before, but if you have your own object list/graph with transforms and a transform somewhere etc, it's not that much work. (OTOH, if you have a lot of glTranslate/glRotate in your code, it might be...). As I said, a good 3d-math library is indispensable here.
-..
So, to change the above code to "programmable pipeline" style, you'd just do all these matrix multiplications in your own code (instead of the GL driver doing it, still on the CPU) and then send the resulting matrix to opengl as a uniform before you activate the shaders and draw your object from VBOs.
(Note that modern cards do not have fixed-function code, just a lot of code in the driver to compile fixed-function rendering state to a shader that does the job. No wonder "classic" GL drivers are huge...)
...
Some info about this process is available at Tom's Hardware Guide and probably Google too.

World space to screen space (perspective projection)

I'm using a 3d engine and need to translate between 3d world space and 2d screen space using perspective projection, so I can place 2d text labels on items in 3d space.
I've seen a few posts of various answers to this problem but they seem to use components I don't have.
I have a Camera object, and can only set it's current position and lookat position, it cannot roll. The camera is moving along a path and certain target object may appear in it's view then disappear.
I have only the following values
lookat position
position
vertical FOV
Z far
Z near
and obviously the position of the target object.
Can anyone please give me an algorithm that will do this using just these components?
Many thanks.

all graphics engines use matrices to transform between different coordinats systems. Indeed OpenGL and DirectX uses them, because they are the standard way.
Cameras usually construct the matrices using the parameters you have:
view matrix (transform the world to position in a way you look at it from the camera position), it uses lookat position and camera position (also the up vector which usually is 0,1,0)
projection matrix (transforms from 3D coordinates to 2D Coordinates), it uses the fov, near, far and aspect.
You could find information of how to construct the matrices in internet searching for the opengl functions that create them:
gluLookat creates a viewmatrix
gluPerspective: creates the projection matrix
But I cant imagine an engine that doesnt allow you to get these matrices, because I can ensure you they are somewhere, the engine is using it.
Once you have those matrices, you multiply them, to get the viewprojeciton matrix. This matrix transform from World coordinates to Screen Coordinates. So just multiply the matrix with the position you want to know (in vector 4 format, being the 4º component 1.0).
But wait, the result will be in homogeneous coordinates, you need to divide X,Y,Z of the resulting vector by W, and then you have the position in Normalized screen coordinates (0 means the center, 1 means right, -1 means left, etc).
From here it is easy to transform multiplying by width and height.
I have some slides explaining all this here: https://docs.google.com/presentation/d/13crrSCPonJcxAjGaS5HJOat3MpE0lmEtqxeVr4tVLDs/present?slide=id.i0
Good luck :)
P.S: when you work with 3D it is really important to understand the three matrices (model, view and projection), otherwise you will stumble every time.

so I can place 2d text labels on items
in 3d space
Have you looked up "billboard" techniques? Sometimes just knowing the right term to search under is all you need. This refers to polygons (typically rectangles) that always face the camera, regardless of camera position or orientation.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas