ML Kit Text Recognition/Automatic Rotation - google-mlkit

The text recognition in ML Kit works well IF the orientation of the image is correct(not rotated 90 degrees or upside down). All common OCR engines have an auto orientation function to automatically determine the orientation before performing the text recognition. I do not see anything in the documentation that states there's a flag that can be set to perform auto orientation. Does it exist in MK Kit?
I also see a TextRecognizerOptions class but no documentation on how to set the options. I assume there are options to be set here(like look for "english", etc.). Where is the detailed documentation on this class?

ML Kit OCR supports latin languages. We do not provide options to specify interested languages, and in the return type, it contains the detected languages.
For image rotation, the current OCR can only handle the up-right image, but ML Kit can handle the image rotation for you with the given image orientation information. For example, with the bitmap input, you need to provide the image orientation, so that ML Kit could rotate the image for you and run the detection.

Related

Control Depth of field focal distance on the default camera postprocessing?

I'm trying to implement a depth camera on unreal engine 5.1's default camera using a post processing volume (not cinematic camera). The official tutorial describes a focal region, shown in the colour black in the image below. They describe that it's possible to increase the size of this black focal region but from what I see in terms of public variables it's only availale for mobile depth of field or cinematic cameras. Does anybody know how if it's possible?
https://docs.unrealengine.com/4.27/Images/RenderingAndGraphics/PostProcessEffects/DepthOfField/DOF_LayerImplementation1.webp
These are only controls that I see that are available in the default camera https://docs.unrealengine.com/5.0/Images/designing-visuals-rendering-and-graphics/post-process-effects/depth-of-field/DoFProperties.webp

Capture a live video of handwriting using pen and paper and replace the hand in video with some object or cursor

I want to process the captured video. I will try to capture the video of handwriting on paper / drawing on paper. But I do not want to show the hand or pen on the paper while live streaming via p5.js.
Can this be done using machine learning?
Any idea how to implement this?
If I understand you right you want to detect where in the image the hand is a draw an overlay on this position right?
If so You can use YOLO more information to detect where the hand is.
There are some trained networks that you can download maybe they are good enough, maybe you have to train your own just for handy.
There are also some libery for yolo and JS https://github.com/ModelDepot/tfjs-yolo-tiny
You may not need to go the full ML object segmentation route.
If the paper's position and illumination are constant (or at least knowable) you could try some simple heuristic comparing the pixels in the current frame with a short history and using the most constant pixel values. There might be some lag as new parts of your drawing 'become constant' so maybe you could try some modification to the accumulation, such as if the pixel was white and is going black.

Train Model with same image in diferents orientation

It is a good a idea to train the model with the same images , but with diferents orientations? I a have a small set of images for the training thats the reason why Im trying to cover all the mobile camera-gallery user scenarios.
For example, the image: example.png with 3 copies; example90.png, example180.png and example.270.png with their diferents rotations. And also with diferents background colors, shadows, etc.
By the way, my test is to identify the type of animal.
Is that a good idea??
If you use Core ML with the Vision framework (and you probably should), Vision will automatically rotate the image so that "up" is really up. In that case it doesn't matter how the user held their camera when they took the picture (assuming the picture still has the EXIF data that describes its orientation).

Cropping Using OpenGL ES 2.0 iOS (vs. using Core Image)

I'm having difficulties finding any documentation about cropping images using OpenGL ES on the iPhone or iPad.
Specifically, I am capturing video frames at a mildly rapid pace (20 FPS), and need something quick that will crop an image. Is it feasible to use OpenGL here? If so, will it perform faster than cropping using Core Image and its associated methods?
It seems that using Core Image methods, I can't achieve faster than about 10-12 FPS output, and I'm looking for a way to hit 20. Any suggestions or pointers to usage of OpenGL for this?
Obviously, using OpenGl ES will be faster than Core Image Framework. Cropping image will be done by set Texture Coordinate, in generally, Texture Coordinate always like this,
{
0.0f,1.0f,
1.0f,1.0f,
0.0f,0.0f,
1.0f.0.0f
}
The whole image will be drawed with Texture Coordinate above. If you just want upper right part of a image, you can set Texture Coordinate like this,
{
0.5f,1.0f,
1.0f,1.0f,
0.5f,0.5f,
1.0f.0.5f
}
This will get a quater of the whole image at upper right. You never forget that the Coordinate origin of OpenGl ES is at the lower left corner

How To detect Image Orientation & skew In ObjectiveC

I want to detect Image Orientation & skewing and rotate the image to be ready For scanning using OCR. how to do in ObjectiveC
You could use another library than openCV that supports the required operations.
First use erosion followed by a hugh transformation to find the angle to the x-axis as demonstrated here OpenCV - Detect skew angle in order to rotate the image.
The orientation could be estimated by checking width / height (rotate 90°) and test run of the OCR library. In case of a low detection rate you could rotate (180°) and run OCR again.
you can use any OCR which can detect the Text orientation then you can find the correct image orientation...