How to makeup FSNS dataset with my own image for attention OCR tensorflow model - tensorflow

I want to apply attention-ocr to detect all digits on number board of cars.
I've read your README.md of attention_ocr on github(https://github.com/tensorflow/models/tree/master/research/attention_ocr), and also the way I should do to use my own image data to train model with the StackOverFlow page.(https://stackoverflow.com/a/44461910/743658)
However, I didn't get any information of how to store annotation or label of the picture, or the format of this problem.
For object detection model, I was able to make my dataset with LabelImg and converting this into csv file, and finally make .tfrecord file.
I want to make .tfrecord file on FSNS dataset format.
Can you give me your advice to go on this training steps?

Please reread the mentioned answer it has a section explaining how to store the annotation. It is stored in the three features image/text, image/class and image/unpadded_class. The image/text field is used for visualization, some models support unpadded sequences and use image/unpadded_class, while the default version relies on the text padded with null characters to have the same length stored in the feature image/class. Here is the excerpt to store the text annotation:
char_ids_padded, char_ids_unpadded = encode_utf8_string(
text, charset, length, null_char_id)
example = tf.train.Example(features=tf.train.Features(
feature={
'image/class': _int64_feature(char_ids_padded),
'image/unpadded_class': _int64_feature(char_ids_unpadded),
'image/text': _bytes_feature(text)
...
}
))

If you have worked with tensorflow object detection, then the apporach should be much easier for you.
You can create the annotation file (in .csv format) using labelImg or any other annotation tool.
However, before converting it into tensorflow format (.tfrecord), you should keep in mind the annotation format. (FSNS format in this case)
The format is : files text xmin ymin xmax ymax
So while annotating dont bother much about the class (as you would have done in object detection !! Some random name should suffice.)
Convert it into .tfrecords.
And finally labelMap is a list of characters which you have annotated.
Hope it helps !

Related

How to load in a downloaded tfrecord dataset into TensorFlow?

I am quite new to TensorFlow, and have never worked with TFRecords before.
I have downloaded a dataset of images from online and the download format was TFRecord.
This is the file structure in the downloaded dataset:
1.
2.
E.g. inside "test"
What I want to do is load in the training, validation and testing data into TensorFlow in a similar way to what happens when you load a built-in dataset, e.g. you might load in the MNIST dataset like this, and get arrays containing pixel data and arrays containing the corresponding image labels.
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
However, I have no idea how to do so.
I know that I can use dataset = tf.data.TFRecordDataset(filename) somehow to open the dataset, but would this act on the entire dataset folder, one of the subfolders, or the actual files? If it is the actual files, would it be on the .TFRecord file? How do I use/what do I do with the .PBTXT file which contains a label map?
And even after opening the dataset, how can I extract the data and create the necessary arrays which I can then feed into a TensorFlow model?
It's mostly archaeology, and plus a few tricks.
First, I'd read the README.dataset and README.roboflow files. Can you show us what's in them?
Second, pbtxt are text formatted so we may be able to understand what that file is if you just open it with a text editor. Can you show us what's in that.
The think to remember about a TFRecord file is that it's nothing but a sequence of binary records. tf.data.TFRecordDataset('balls.tfrecord') will give you a dataset that yields those records in order.
Number 3. is the hard part, because here you'll have binary blobs of data, but we don't have any clues yet about how they're encoded.
It's common for TFRecord filed to contian serialized tf.train.Example.
So it would be worth a shot to try and decode it as a tf.train.Example to see if that tells us what's inside.
ref
for record in tf.data.TFRecordDataset('balls.tfrecord'):
break
example = tf.train.Example()
example.ParseFromString(record.numpy())
print(example)
The Example object is just a representation of a dict. If you get something other than en error there look for the dict keys and see if you can make sense out of them.
Then to make a dataset that decodes them you'll want something like:
def decode(record):
return tf.train.parse_example(record, {key:tf.io.RaggedFeature(dtype) for key, dtype in key_dtypes.items()})
ds = ds.map(decode)

Convert a .npy file to wav following tacotron2 training

I am training the Tacotron2 model using TensorflowTTS for a new language.
I managed to train the model (performed pre-processing, normalization, and decoded the few generated output files)
The files in the output directory are .npy files. Which makes sense as they are mel-spectograms.
I am trying to find a way to convert said files to a .wav file in order to check if my work has been fruitfull.
I used this :
melspectrogram = librosa.feature.melspectrogram(
"/content/prediction/tacotron2-0/paol_wavpaol_8-norm-feats.npy", sr=22050,
window=scipy.signal.hanning, n_fft=1024, hop_length=256)
print('melspectrogram.shape', melspectrogram.shape)
print(melspectrogram)
audio_signal = librosa.feature.inverse.mel_to_audio(
melspectrogram, sr22050, n_fft=1024, hop_length=256, window=scipy.signal.hanning)
print(audio_signal, audio_signal.shape)
sf.write('test.wav', audio_signal, sample_rate)
But it is given me this error : Audio data must be of type numpy.ndarray.
Although I am already giving it a numpy.ndarray file.
Does anyone know where the issue might be, and if anyone knows a better way to do it?
I'm not sure what your error is, but the output of a Tacotron 2 system are log Mel spectral features and you can't just apply the inverse Fourier transform to get a waveform because you are missing the phase information and because the features are not invertible. You can learn about why this is at places like Speech.Zone (https://speech.zone/courses/)
Instead of using librosa like you are doing, you need to use a vocoder like HiFiGan (https://github.com/jik876/hifi-gan) that is trained to reconstruct a waveform from log Mel spectral features. You can use a pre-trained model, and most off-the-shelf vocoders, but make sure that the sample rate, Mel range, FFT, hop size and window size are all the same between your Tacotron2 feature prediction network and whatever vocoder you choose otherwise you'll just get noise!

Conversion of image annotation formats into tfrecords for tensorflow object detection api

Seeking help regarding image annotation formats for object detection API.
Foreknow:
As, we know there are two annotation formats for images, Pascal VOC and COCO formats. Both have their own specification here's the main difference between both:
Pascal VOC:
Stores annotation in .xml file format.
Bounding box format [x-top-left, y-top-left, x-bottom-right, y-bottom-right]
Create separate xml annotation file for each image in the dataset.
COCO:
Stores annotation in .json file format.
Bounding box format [x-top-left, y-top-left, width, height].
Create one annotation file for each training, testing and validation.
Current-issue:
I have two dataset to deal and this is how they are annotated.
Dataset-1:
File format: Pascal VOC(.xml)
Bounding box format: COCO.
File creation: As in Pascal VOC(separate xml annotation file for each image in the dataset).
Dataset-2:
File format: Pascal VOC(.xml)
Bounding box format: COCO.
File creation: As in COCO(Create one annotation file for each training, testing and validation)
The thing that I am not able to get pass through is which format(Pascal VOC or COCO) should I follow to convert my annotations into Tfrecords(.xml to .records) as use can see the annotations of dataset aren't purely belong to any of one format.
For instance, in this link the author wrote a script to convert .xml into .records but here it is dealing with pure pascal VOC format.
And in this link they are dealing with pure COCO annotation formats.
Which path should I follow as I am standing in the middle of both formats?
Which path should I follow as I am standing in the middle of both formats?
Use Pascal VOC format for conversion of .xml into .records.
Make the following changes in a create_tf_example function of this link
for index, row in group.TextLine.iterrows():
xmin.append(row['X']/imgwidth)
xmax.append((row['X']+row['Width'])/imgwidth)
ymin.append(row['Y']/imgheight)
ymax.append((row['Y']+row['Height'])/imgheight)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))'
In case where you have X, Y, Width, Height in your .xml annotations instead of xmin, ymin, xmax, ymax.

Text recognition with tensorfow

I'm new to tensorflow and played around with the hand written numbers MNIST set.
I'd like to do my own project that recognises text instead of numbers but can't find a good tutorial.
Is it the same principle as numbers but instead of 10 layers at the end I have to use 26? Or include upper and lowercase and special characters?
If so I'd have to first crop the words into each character, right? Or is there a way to recognise entire sentences?
I'd like to train three different fonts, so no handwriting, and don't care about upper or lower case.
Later I'd like to use the trained model on photographs. A printed article for example. Does the model work if I align the image, do I have to retrain for a little bit or train it from the start with the new data?
Where do I start? The Keras example is overwhelming.
You're looking for an OCR model, a simple CNN can't detect text from scanned images, you need to segment them first which can be completed based on the language script.
You can start with tesseract. There is a python wrapper named pytesseract.
import pytesseract
from PIL import Image
text = pytesseract.image_to_string(Image.open("temp.jpg"), lang='eng',
config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
print(text)
For your own model, try CRNN models. https://github.com/qjadud1994/CRNN-Keras

What is the best way to feed the image+vector dataset to Tensorflow

I am trying to do a Deep Learning project by using Tensorflow.
Each of my data sets contains 2 files( PNGimage file + TXTvectors file ), where are put in different folders as follow:
./data/image/ #Folders contains different size of images
./data/vector/ #Folders contains vectors of corresponding image
#For example: apple.png + apple.txt
The example content of vector shows as follow:
10.0,2.5,5,13
And since image size are different, the resize and some transformation apply on vectors are required. It is important to make sure that I can do these processing during Tensorflow is running. Is there any good way to manage this kind of datasets?
I referred to a lot of basic tutorial however most of them are not so many details about arrange customized data input and output. Please give me some advice!
I recommend you to take a look at TFRecords and queues. Basically the idea is the following: you resize all your images to the same format and store them together with your txt vectors in one TFRecord file. This is done separately before you run your model.
When you create your model you create a queue which reads data from the TFRecord file and feeds it to your model.