I am trying to build a ocr model using tensorflow 2.0. So I want to start with CNN. I know that I can give image in array format to the model. But I am not able to understand how can I give input format for my labels for training the model. Label length is 10 and it can contain any 4 alphabets and 6 numbers. I was looking for the solution on google and found out https://github.com/JackonYang/captcha-tensorflow/blob/master/captcha-solver-tf2-4digits-AlexNet-98.8.ipynb. But here they only have digits in label but my label has characters also. Can anybody please throw some light on this?
Related
I'm new to tensorflow and played around with the hand written numbers MNIST set.
I'd like to do my own project that recognises text instead of numbers but can't find a good tutorial.
Is it the same principle as numbers but instead of 10 layers at the end I have to use 26? Or include upper and lowercase and special characters?
If so I'd have to first crop the words into each character, right? Or is there a way to recognise entire sentences?
I'd like to train three different fonts, so no handwriting, and don't care about upper or lower case.
Later I'd like to use the trained model on photographs. A printed article for example. Does the model work if I align the image, do I have to retrain for a little bit or train it from the start with the new data?
Where do I start? The Keras example is overwhelming.
You're looking for an OCR model, a simple CNN can't detect text from scanned images, you need to segment them first which can be completed based on the language script.
You can start with tesseract. There is a python wrapper named pytesseract.
import pytesseract
from PIL import Image
text = pytesseract.image_to_string(Image.open("temp.jpg"), lang='eng',
config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
print(text)
For your own model, try CRNN models. https://github.com/qjadud1994/CRNN-Keras
I am trying to use DeepLabV3 for image segmentation and object detection/classification on Coral.
I was able to sucessfully run the semantic_segmentation.py example using DeepLabV3 on the coral, but that only shows an image with an object segmented.
I see that it assigns labels to colors - how do i associate the labels.txt file that I made based off of the label info of the model to these colors? (how do i know which color corresponds to which label).
When I try to run the
engine = DetectionEngine(args.model)
using the deeplab model, I get the error
ValueError: Dectection model should have 4 output tensors!This model
has 1.
I guess this way is the wrong approach?
Thanks!
I believe you have reached out to us regarding the same query. I just wanted to paste the answer here for others to reference:
"The detection model usually have 4 output tensors to specifies the locations, classes, scores, and number and detections. You can read more about it here. In contrary, the segmentation model only have a single output tensor, so if you treat it the same way, you'll most likely segfault trying to access the wrong memory region. If you want to do all three tasks on the same image, my suggestion is to create 3 different engines and feed the image into each. The only problem with this is that each time you switch the model, there will likely be data transfer bottleneck for the model to get loaded onto the TPU. We have here an example on how you can run 2 models on a single TPU, you should be able to modify it to take 3 models."
On the last note, I just saw that you added:
how do i associate the labels.txt file that I made based off of the label info of the model to these colors
I just don't think this is something you can do for segmentation model but maybe I'm just confused on your query?
Take object detection model for example, there are 4 output tensors, the second tensor gives you an array of id associates with a certain class that you can map to a a label file. Segmentaion models only give the pixel surrounding an objects.
[EDIT]
Apology, looks like I'm the one confused on segmentation models.
Quote form my college :)
"You are interested to know the name of the label, you can find the corresponding integer to that label from result array in Semantic_segmentation.py. Where result is classification data of each pixel.
For example;
if you print result array in the with bird.jpg as input you would find few pixel's value as 3 which is corresponding 4th label in pascal_voc_segmentation_labels.txt (as indexing starts at 0 )."
I have DNA methylation data of Lower grade glioma samples obtained from GDC data portal. the values in my data are ranging from 0 to 1. I want to do classification of these samples into two classes as IDH WT and IDH mutant.
I want to use CNN for classification purpose. Thus i am doing image embedding here to give image as input. I am new to deep learning methods. I needed help with data preparation for CNN method. I am referring to this paper-(reference paper: doi: http://dx.doi.org/10.1101/364323.) My question is,
I have data frame of 9203*513 (row*column), i have saved each column into a separate file (excel file).
then i have reshaped each single file into 767*12 (row*column) by inserting one more row and zero is added there.
then i have done image embedding of each file. and i have submitted these all files to CNN as input. (training set: 80% images and test set: 20% images)
So is this approach correct to give input to CNN for classification problem?
I'm open to all suggestions,
Thank you for your time and consideration,
I want to apply attention-ocr to detect all digits on number board of cars.
I've read your README.md of attention_ocr on github(https://github.com/tensorflow/models/tree/master/research/attention_ocr), and also the way I should do to use my own image data to train model with the StackOverFlow page.(https://stackoverflow.com/a/44461910/743658)
However, I didn't get any information of how to store annotation or label of the picture, or the format of this problem.
For object detection model, I was able to make my dataset with LabelImg and converting this into csv file, and finally make .tfrecord file.
I want to make .tfrecord file on FSNS dataset format.
Can you give me your advice to go on this training steps?
Please reread the mentioned answer it has a section explaining how to store the annotation. It is stored in the three features image/text, image/class and image/unpadded_class. The image/text field is used for visualization, some models support unpadded sequences and use image/unpadded_class, while the default version relies on the text padded with null characters to have the same length stored in the feature image/class. Here is the excerpt to store the text annotation:
char_ids_padded, char_ids_unpadded = encode_utf8_string(
text, charset, length, null_char_id)
example = tf.train.Example(features=tf.train.Features(
feature={
'image/class': _int64_feature(char_ids_padded),
'image/unpadded_class': _int64_feature(char_ids_unpadded),
'image/text': _bytes_feature(text)
...
}
))
If you have worked with tensorflow object detection, then the apporach should be much easier for you.
You can create the annotation file (in .csv format) using labelImg or any other annotation tool.
However, before converting it into tensorflow format (.tfrecord), you should keep in mind the annotation format. (FSNS format in this case)
The format is : files text xmin ymin xmax ymax
So while annotating dont bother much about the class (as you would have done in object detection !! Some random name should suffice.)
Convert it into .tfrecords.
And finally labelMap is a list of characters which you have annotated.
Hope it helps !
I'm trying to convert nasnet tensorflow model to caffemodel. The cell_stem_0 block's output is right, but when it came to cell_stem_1/1x1, the output feature map of my caffemodel is a little different from the tensorflow model:
All the 22 feature maps's bounding pixels are different, but the others are right.
Will a 1x1conv cause this difference? Is there any difference between the 1x1conv in cell_stem_0 and cell_stem_1 (in my caffemodel, the output of cell_stem_0/1x1 is right) ?