Cloud Vision API poorly recognizes 7-segment numbers - google-vision

The simplest example of what I'm trying to recognize:
I use DOCUMENT_TEXT_DETECTION, but in the answer I get the hieroglyphics.
If I use Eng in the ImageContext parameter for the addAllLanguageHints method, then I have 111 in result. Better, but still bad.
Is there any way to indicate that the numbers are recognised or somehow improve the results?
Also, how is the setRepeatedField option in ImageContext is used? I could not find any examples of its use.
Thanks in advance.

Even if it doesn't work out of the box ... you'd need is to classify images using custom labels, when the default labels won't suffice. Cloud Auto ML Vision (select Vision from that blue drop-down menu) let's you train custom models, which can be used to recognize that font. And since the possible amount of shapes is quite limited with that 7-segment display, it shouldn't be too difficult to train it. If you'd get a calculator with a better display, it might also work better. The LCD above looks a little cheap, with those huge spaces and cut-off endings - but nevertheless, one can train it to read that.

Related

allocating category to a comment pandas

My task is to allocate broad and fine category to the text I have in a pandas dataframe.
My df is something like this:
Text
I like this pen
this is the worst light bulb ever
these pants fit me just fine
Desired output:
Text Broad_cat Fine_cat
I like this pen Stationary Pen
this is the worst light bulb ever Electrical light Bulb
these pants fit me just fine Clothing Pants
The text could be from any category, so I cant use a prepared dictionary. These are reviews that I can get from any source. I was hoping that there is an open source python package that can help me with the specific task of categorization of a comment. I already tried YAKE, RAKE, Summa and KeyBERT methods and while each of them are giving me key words, they dont always turn out to be the category. Is this even possible? Any help in this regard is much appreciated.
I presume you have a list of allowed categories?
This a multiclass classification problem.
A fiddly approach is you embed the sentences into some sort of vector space then use a somethign like the softmax function to select the class and then train your model based on training data. This post discusses this.
I think you might be more interested in zero-shot text classification. Hugging face has a pipeline (what of using models for certain tasks) for this with the property candidate_labels. So you should be able to use this with an appropriate model and specify candidate labels... though the underlying model would have support this in some way, but presumably some do. cross-encoder/nli-distilroberta-base appears to support this.
`

Methods of labeling human muscles on tensorflow

I want to be able to label all of the muscles on an athletes body. I got a lot of the images that the athletes are almost in the same body pose but the issue that I am running into is that drawing a box around them makes them inaccurate as it ends up overlapping other muscles. Drawing exact lines around them is a bit difficult as they are a lot of smaller muscles and creates inconsistently over 20-30images. I was wondering if there is a way to feed in a human anatomy and then have tensorflow go in and label all of the muscles in given pictures.
Or I was wondering if you all had a different idea on approach this problem that I'm running into.
I don't have anybody else to ask and I've been researching this for awhile so if I missed or overlooked something please forgive me
The way i see is you need to combine with some prepossessing steps to normalize your target object in the image such as:
identify the human,
identify the pose or skeleton (which nowadays many open-source such as openpose-plus),
the pose estimation results can label the limbs, or part of the body from which you can do something either by hand-crafted image processing or other segmentation model.

training images? Considerations for selection

I'm relatively new and am still learning the basics. I've used NVIDIA DIGITS in the past, and am now looking at Tensorflow. While I've been able to fumble my way around creating some models for a few projects I'm working on, I really want to start diving deeper into what I'm doing, how I'm doing it, and ultimately a better understanding of why.
One area that I would like to start with is the Images that I'm using for training and testing. Can anyone point me to a blog, an article, a paper, or give me some insight in what I need to consider when selecting images to train a new model on. Up until recently, I've been using datasets that have already been selected and that are available for download. Lets say I'm going to start working on a project that involves object detection of ships from a variety of distances and angles.
So my thoughts would be
1) I need a large quantity of images.
2) The images need to contain ships of the different types I would like to detect. (lets just say one class, ships, don't care what type of ships)
3) I also need to have images that have a great variety of distance perspective for the different types of ships.
Ultimately, my thoughts are that the images need to reflect the distance, perspective, and types of ships I would ideally want to identify from the video. Seems simple enough.
However, there are a number of questions
Does the images need to be the same/similar resolution as the camera I'll be using, for best results?
Does the images all need to be the same resolution?
Can I use a single image and just digitally zoom out on the image to give the illusion of different distances?
I'm sure there are a number of other questions that I'm not asking, or should be asking. Are there any guide lines available for creating a solid collection of images to use when creating the collection of images for training and validation?
I recommend thinking through end to end, like would you need to classify ship models as a next step? I recommend going through well known public datasets and actually work with the structure, how to store data, labels, how to handle preprocessing etc.
More importantly, what are you trying to achieve? Talking to experts in the topic does help greatly while preparing your own dataset.
Use open source images if you can, e.g. flickr, google, imagenet.
No, you don't need them to be the same resolution.
It is not ideal to zoom in/out images to use in different categories. Preprocessing images and data augmentation already does this to create more distant representations of the same class. This is why I would recommend hands on approach with an existing dataset first.
Yes, what you need is many, different representations of classes, and a roughly balanced dataset of classes. If you define your data structure well in the beginning, it will save you a ton of time as you won't have to make changes often.

What methods to recognize sentence handwriting?

I mean posts per sentence, not per letter. Such a doctor's prescription handwriting which hard to read. Not just a normal handwriting.
In example :
I use a data mining or machine learning for doing a training from
paper handwrited.
User scanning a paper with hard to read writing.
The application doing an image processing.
And the output is some sentence from paper.
And what device to use? (Scanner or webcam)
I am newbie. If could i need some example in vb.net with emguCV/openCV and researches journals.
Any help would be appreciated.
Welcome to stack overflow! The answer to your question is twofold:
a. If you want to recognize handwriting that has already happened i.e. it is presented to you as an image you are in trouble. Computer Vision is still not good enough to provide you with reasonable accuracy.
b. If you have a chance to recognize handwriting “as it's happening” - you are in luck. Download, for example, a Gesture Search app from Android play store and you are in business.
The difference between the two scenarios is subtle but significant. In the second case you have an extra piece of information that makes handwriting recognition possible. This piece is timing of each stroke. In other words, instead of an image with handwriting you have a bunch of strokes that are all labeled with their time stamps. You can think about it as a sequence of lines and curves or as image segmentation - in any way this provides a big hint for the system. Additional help comes from the dictionary on your phone but this is typically used by any handwriting system.
Android of course has an open source library for stroke recognition (find more on your own). If you still want to go for recognizing images though, you have to first detect text (e.g. as a bounding box) and second use any of the existing engines to process detected regions. For text detection I can recommend MSER. But be careful trying to implement even text detection on your own - you are entering a world of pain here ;). Here is an article that can help.
As for learning how to recognize text from images on the Internet - this can be your plan B or C or Z when you master above mentioned stages. Don’t try to abuse learning methods and make them do hard work for you - you will hit a wall if you don’t understand what’s going on under the hood.

How to give best chance of success to an OCR software?

I am using Tesseract OCR (via pytesser) and PIL (Python Image Library) for automated test of an application.
I am checking that the displayed text is ok by making a screenshot and getting the text thanks to tesseract.
I had some issues in the beginning and it seems to work better since I have increased the size of the screenshot thanks to the bicubic interpolation of PIL.
Unfortunatelly, I still have some mistakes like confusion between '0' and 'O'. I can imagine that I will have other similar issues in the future.
I would like to know if there are some techniques to prepare an image in order to help the OCR. Any idea is welcomed.
Thanks in advance
Shameless plug and disclaimer: my company packages Tesseract for use in .NET
Tesseract is an OK OCR engine. It can miss a lot and gets readily confused by non-text. The best thing you can do for it is to make sure it gets text only. The next best thing is to give it something sanely binarized (adaptive or dynamic threshold to get there) or grayscale and let it try to do binarization.
Train tesseract to recognize your font
Make image extra clean and with enough free space around characters
Profit :)
Here are few real world examples.
First image is original image (croped power meter numbers)
Second image is slightly cleaned up image in GIMP, around 50% OCR accuracy in tesseract
Third image is completely cleaned image - 100% OCR recognized without any training!
Even under the best conditions OCR variants will sneak up on you. Your best option will be to design your tests to be aware of them.
For distinguishing between 0 and O, one simple solution is to choose a font that distinguishes between both (eg: 0 has a dash or dot in its middle). Would that be acceptable in your application?
Another solution is to apply a dictionary-based step after the character-by-character analysis of the text - feeding the recognized text into some form of spell-checker or validator to differentiate between difficult characters.
For instance, a round symbol followed by other numbers is most likely to be a zero, while the same symbol followed by letters is most likely to be a capital o. It's a trivial example, but it shows how context is necessary to make a more reliable OCR system.