Perspective distortions correction - camera

I'm searching for a methods of text recognition based on document borders.
Or the methods that can solve the problem of finding new viewpoint.
For exmp. the camera is in point (x1,y1,z1) and the result picture with perspective distortions, but we can find (x2,y2,z2) for camera to correct picture.
Thanks.

The usual approach, which assumes that the document's page is approximately flat in 3D space, is to warp the quadrangle encompassing the page into a rectangle. To do so you must estimate a homography, i.e. a (linear) projective transformation between the original image and its warped counterpart.
The estimation requires matching points (or lines) between the two images, and a common choice for documents is to map the page corners in the original images to the image corners of the warped image. This will in general produce a rectangle with an incorrect aspect ratio (i.e. the warped page will look "wider" or "taller" than the real one), but this can be easily corrected if you happen to know in advance what the real aspect ratio is (for example, because you know the type of paper used, whether letter, A4, etc.).
A simple algorithm to perform the estimation is the so-called Direct Linear Transformation.
The OpenCV library contains routines to help accomplishing all these tasks, look into it.

Related

GODOT: What is an efficient calculation for the AABB of a simple 3D model from a camera's view

I am attempting to come up with a quick and efficient means of translating a 3d mesh into a projected AABB. In the end, I would like to accomplish something similar to figure 1 wherein only the area of the screen covered by the cube is located inside the bounding box highlighted in red. ((if it is at all possible, getting the area as small as possible, highlighted in blue, would increase efficiency down the road.))
Figure 1. https://i.imgur.com/pd0E20C.png
Currently, I have tried:
Calculating the point position on the screen using camera.unproject_position(). this failed largely due to my inability to wrap my head around the pixel positions trending towards infinity. I understand it has something to do with Tan, but frankly, it is too late for my brain to function anymore.
Getting the area of collision between the view frustum and the AABB of the mesh instance. This method seems convoluted, and to get it in a usable format I would need to project the result into 2d coordinates again.
Using the MeshInstance VisualInstance to create a texture wherein a pixel is white if it contains the mesh instance, and black otherwise. Visual instances in general just baffle me, and I did not think it would be efficient to have another viewport just to output this texture.
What I am looking for:
An output that can be passed to a shader informing where to complete certain calculations. Right now this is set up to use a bounding box, but it could easily be rewritten to also use a texture. It also could be rewritten to use polygons, but I am trying to keep calculations to a minimum in the shader.
Certain solutions I have tried before have worked, slightly, but this must be robust. The camera interfacing with the 3d object will be able to move completely around and through it, meaning at times the view will be completely surrounded by the 3d model with points both in front, and behind.
Thank you for any help you can provide.
I will try my best to update this post with information if needed.

Vulkan: Framebuffer larger than Image dimensions

This question primarily relates to the dimension parameters (width, height, and layers) in the structure VkFramebufferCreateInfo.
The actual question:
In the case that one or more of the VkImageViews, used in creating a VkFrameBuffer, has dimensions that are larger than those specified in the VkFramebufferCreateInfo used to create the VkFrameBuffer, how does one control which part of that VkImageView is used during a render pass instance?
Alternatively worded question:
I am basically asking in the case that the image is larger (not the same dimensions) than the framebuffer, what defines which part of the image is used (read/write)?
Some Details:
The specification states this is a valid situation (I have seen many people state the attachments used by a framebuffer must match the dimensions of the framebuffer itself, but I can't find support for this in the specification):
Each element of pAttachments must have dimensions at least as large as the corresponding framebuffer dimension.
I want to be clear, that I understand that if I just wanted to draw to part of an image I can use a framebuffer that has the same dimensions as the image, and use viewports and scissors. But scissors and viewports are defined relative to the framebuffer's (0,0) as far as I can tell from the spec, although it is not clear to me.
I'm asking this question to help my understand of the framebuffer as I am certain I have misunderstood something. I feel it may well be the case that (x,y) in framebuffer space, is always (x,y) in image space (As in there is no way of controlling which part of the VkImageView is used).
I have been stuck on this for quite sometime (~4 days), and have tried both the Vulkan: Cookbook and the Vulkan Programming Guide, and read most of the specification, and searched online.
If the question needs clarification, please ask. I just didn't want to make it overly long.
Thank you for reading.
There isn't a way to control which part of the image is used by the framebuffer when the framebuffer is smaller than the image. The framebuffer origin always maps to the image origin.
Allowing attachments to be larger than the framebuffer is only meant to allow reusing memory/images/views for several purposes in a frame even when they don't all need the same dimensions. The typical example is reusing a depth buffer (but not it's contents) for several different render passes. You could accomplish the same thing with memory aliasing, but engines that have to support multiple APIs might find it easier to do it this way.
The way to control where you render to is by controlling the viewport. That is, you specify a framebuffer size that's actually big enough to cover the total area of the target images that you may want to render to, and use the viewport transform/scissoring to render to a specific area of those images.
There is no post-viewport transformation that goes from framebuffer space to image space. That would be decidedly redundant, since we already have a post-NDC transform. There's no point in having two of them.
Sure, VkRenderPassBeginInfo has the renderArea object, but that is more of a promise from the user rather than a guarantee for the system:
The application must ensure (using scissor if necessary) that all rendering is contained within the render area, otherwise the pixels outside of the render area become undefined and shader side effects may occur for fragments outside the render area.
So basically, the implementation doesn't do anything with renderArea. It doesn't set up a transformation or anything; you're just promising that no framebuffer pixels outside of that area will be impacted.
In any case, there's really little point to providing a framebuffer size that's smaller than the images sizes. That sort of thing is more the perview of the renderArea than the framebuffer specification.

How to detect an image between shapes from camera

I've been searching around the web about how to do this and I know that it needs to be done with OpenCV. The problem is that all the tutorials and examples that I find are for separated shapes detection or template matching.
What I need is a way to detect the contents between 3 circles (which can be a photo or something else). From what I searched, its not to difficult to find the circles with the camera using contours but, how do I extract what is between them? The circles work like a pattern on the image to grab what is "inside the pattern".
Do I need to use the contours of each circle and measure the distance between them to grab my contents? If so, what if the image is a bit rotated/distorted on the camera?
I'm using Xamarin.iOS for this but from what I already saw, I believe I need to go native for this and any Objective C example is welcome too.
EDIT
Imagining that the image captured by the camera is this:
What I want is to match the 3 circles and get the following part of the image as result:
Since the images come from the camera, they can be rotated or scaled up/down.
The warpAffine function will let you map the desired area of the source image to a destination image, performing cropping, rotation and scaling in a single go.
Talking about rotation and scaling seem to indicate that you want to extract a rectangle of a given aspect ratio, hence perform a similarity transform. To define such a transform, three points are too much, two suffice. The construction of the affine matrix is a little tricky.

What's the name of Polygon Masking?

When masking a bitmap by reducing the polygons it's displayed on, what is the name of doing this?
I often see this done for physics, sort of, to have an edge around an object for detection etc, but a long while ago remember seeing a similar approach being used with polygons for visual masking, but have completely forgotten what it was called or how to find it.
ADDITIONAL INFO:
In this example, the polygons are used to "mask" some of the image:
http://fancyratstudios.com/2010/02/programming/progresstimer-for-cocos2d/
What's the name of using polygons to do masking in this manner?
According to the definition you gave, the technique you're referring to could be Gouraud shading, whose definition is also reported in the Video-Based Rendering book on page 50:
Hole-filling: for polygons that are not depicted in any image,
determine appropriate vertex colors; during rendering, Gouraud shading
is used to mask the missing texture information.
Not sure what your talking about, but it's either rasterization (projecting polygons to a plane), or "apply a texture" (projecting an image to a set of polygons), or decimation (reducing the amount of polygons)

On-the-fly Terrain Generation Based on An Existing Terrain

This question is very similar to that posed here.
My problem is that I have a map, something like this:
This map is made using 2D Perlin noise, and then running through the created heightmap assigning types and color values to each element in the terrain based on the height or the slope of the corresponding element, so pretty standard. The map array is two dimensional and the exact dimensions of the screen size (pixel-per-pixel), so at 1200 by 800 generation takes about 2 seconds on my rig.
Now zooming in on the highlighted rectangle:
Obviously with increased size comes lost detail. And herein lies the problem. I want to create additional detail on the fly, and then write it to disk as the player moves around (the player would simply be a dot restricted to movement along the grid). I see two approaches for doing this, and the first one that came to mind I quickly implemented:
This is a zoomed-in view of a new biased local terrain created from a sampled element of the old terrain, which is highlighted by the yellow grid space (to the left of center) in the previous image. However this system would require a great deal of modification, as, for example, if you move one unit left and up of the yellow grid space, onto the beach tile, the terrain changes completely:
So for that to work properly you'd need to do an excessive amount of, I guess the word would be interpolation, to create a smooth transition as the player moved the 40 or so grid-spaces in the local world required to reach the next tile over in the over world. That seems complicated and very inelegant.
The second approach would be to break up the grid of the original map into smaller bits, maybe dividing each square by 4? I haven't implemented this and I'm not sure how I would in a way that would actually increase detail, but I think that would probably end up being the best solution.
Any ideas on how I could approach this? Keep in mind it has to be local and on-the-fly. Just increasing the resolution of the map is something I want to avoid at all costs.
Rewrite your Perlin noise to be a function of position. Then you can increase the octaves (and thus the detail level) and resample the area at a higher resolution.