Apart from Image Classification and other cool application is there any way we can extract text from images using Tensorflow, Image can be any format or pdf?
With Tensorflow you would have to train a model to detect digital or handwritten characters. The better way would be to use Opencv and pytesseract
Related
Can we decode wav file into a tensor in react native ?
I have tried to use '#tensorflow/tfjs-react-native' which does provide a method to decode Jpeg file to tensor but I could not find any method to decode wav file.
I tried to used decodeJpeg but it did not work.
Any help would be appreciated.
I am writing a simple audio recognition app on react native I have a pre-trained model which was trained using tensorflow.
I am looking for tf.audio.decode_wav equivalent in react-native
Every tutorial I find involves using a pre-made, but the project I'm trying to do is image segmentation on pictures if playing cards. The dataset will be one I create but I'm finding little to no resources about creating the dataset and needed image masks. Any help would be great!
I use Gimp (https://www.gimp.org/) with layers. You can use several useful tools, such as the "BucketFill", to quickly color a region. Then you just have to export the layers to a new file to obtain the mask. VGG image annotator is also useful (https://www.robots.ox.ac.uk/~vgg/software/via/via-1.0.6.html)
For 3D images you can use VTK and ITKsnap (http://www.itksnap.org/pmwiki/pmwiki.php) for volume identification, visualization and exporting. MIPAV (https://mipav.cit.nih.gov/) is also useful.
VGG Image Annotator (VIA), here is a quick demo. There is also labelme
I am programming in Python, but if some tool/library exists in another language that would help me considerably, I am open to suggestions.
I have a large collection of pdf pages that live in a database, and I am trying to automate the collection of those pages to build some image recognition models with them.
These "pdfs" are actually just PNG images encased with a PDF wrapper (presumably so they can be read by PDF readers like Adobe Acrobat). I need the pdfs in image format to feed into the image recognition model pipeline. I am assuming they are PNG images, because when I save the images from the browser (i.e., right click and save image as), the resulting file is a PNG file.
After reading this question from 2010, and checking out this blog post from 2007, I've concluded that there must be a way to just extract the PNG byte array from the PDF instead of re-converting the PDF into a new image. Oddly though, I couldn't find the PNG file header with
#Python 3.6
header = bytes([137, 80, 78, 71, 13, 10, 26, 10])
#the resulting header looks like this: b'\x89PNG\r\n\x1a\n'
file.find(header)
Does that mean that the embedded image is not in fact a PNG image?
If there is no easy way to extract the embedded image byte array, what tool might I use to automate the conversion of each PDF file to some image format (preferably JPEG, PNG, or TIFF)?
Edit: I know tools like ImageMagick exist for format conversions, but I'd really rather do the extraction method for the sake of learning more about these file formats.
pip install pdf2image
pip install pillow
pip install numpy
pip install opencv-python
Then,
import numpy as np
from pdf2image import convert_from_path as read
import PIL
import cv2
#pdf in the form of numpy array to play around with in OpenCV or PIL
img = np.asarray(read('path to the pdf file')[0])#first page of pdf
cv2.imwrite('path to save the image with the file extension',img)
I have a simple MNIST model from the tensorflow tutorial. I want to see how the first convolutional layer's filters changes with time. When I use tf.summary.image, only one of the steps is displayed, and the rest is ignored. Is there any way to work this around?
TF does not have videos, but you can generate image at each step, save them in some directory and then create a video from them.
I am developing an Android app now, it needs to recognize captcha from website.
I utilize the tess-two to recognize captcha and follow TrainingTesseract3 instructions to train my own traineddata (using jTessBoxEditor to correct characters), but it cannot recognize correctly and even cannot recognize it.
The below TIFF image is that I use to train my Tesseract, I collect many captchas and merge them into a image.
TIFF image
The image that I want to recognize
For example, the expected result of the above image should be k8666, but the actual result is only 66.
Does anyone give me a help? Thanks.
I tried your images using a .NET wrapper for tesseract-ocr Tesseract-ocr .Net Wrapper by Charliesw.
I got some better results like (K8EEE, K8656), i think you have to increase the text font and make it bold and i saved the image in tiff format with 96DPI resolution to get a better results than mine.