What is the best way to feed the image+vector dataset to Tensorflow - tensorflow

I am trying to do a Deep Learning project by using Tensorflow.
Each of my data sets contains 2 files( PNGimage file + TXTvectors file ), where are put in different folders as follow:
./data/image/ #Folders contains different size of images
./data/vector/ #Folders contains vectors of corresponding image
#For example: apple.png + apple.txt
The example content of vector shows as follow:
10.0,2.5,5,13
And since image size are different, the resize and some transformation apply on vectors are required. It is important to make sure that I can do these processing during Tensorflow is running. Is there any good way to manage this kind of datasets?
I referred to a lot of basic tutorial however most of them are not so many details about arrange customized data input and output. Please give me some advice!

I recommend you to take a look at TFRecords and queues. Basically the idea is the following: you resize all your images to the same format and store them together with your txt vectors in one TFRecord file. This is done separately before you run your model.
When you create your model you create a queue which reads data from the TFRecord file and feeds it to your model.

Related

How to load in a downloaded tfrecord dataset into TensorFlow?

I am quite new to TensorFlow, and have never worked with TFRecords before.
I have downloaded a dataset of images from online and the download format was TFRecord.
This is the file structure in the downloaded dataset:
1.
2.
E.g. inside "test"
What I want to do is load in the training, validation and testing data into TensorFlow in a similar way to what happens when you load a built-in dataset, e.g. you might load in the MNIST dataset like this, and get arrays containing pixel data and arrays containing the corresponding image labels.
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
However, I have no idea how to do so.
I know that I can use dataset = tf.data.TFRecordDataset(filename) somehow to open the dataset, but would this act on the entire dataset folder, one of the subfolders, or the actual files? If it is the actual files, would it be on the .TFRecord file? How do I use/what do I do with the .PBTXT file which contains a label map?
And even after opening the dataset, how can I extract the data and create the necessary arrays which I can then feed into a TensorFlow model?
It's mostly archaeology, and plus a few tricks.
First, I'd read the README.dataset and README.roboflow files. Can you show us what's in them?
Second, pbtxt are text formatted so we may be able to understand what that file is if you just open it with a text editor. Can you show us what's in that.
The think to remember about a TFRecord file is that it's nothing but a sequence of binary records. tf.data.TFRecordDataset('balls.tfrecord') will give you a dataset that yields those records in order.
Number 3. is the hard part, because here you'll have binary blobs of data, but we don't have any clues yet about how they're encoded.
It's common for TFRecord filed to contian serialized tf.train.Example.
So it would be worth a shot to try and decode it as a tf.train.Example to see if that tells us what's inside.
ref
for record in tf.data.TFRecordDataset('balls.tfrecord'):
break
example = tf.train.Example()
example.ParseFromString(record.numpy())
print(example)
The Example object is just a representation of a dict. If you get something other than en error there look for the dict keys and see if you can make sense out of them.
Then to make a dataset that decodes them you'll want something like:
def decode(record):
return tf.train.parse_example(record, {key:tf.io.RaggedFeature(dtype) for key, dtype in key_dtypes.items()})
ds = ds.map(decode)

Convert a .npy file to wav following tacotron2 training

I am training the Tacotron2 model using TensorflowTTS for a new language.
I managed to train the model (performed pre-processing, normalization, and decoded the few generated output files)
The files in the output directory are .npy files. Which makes sense as they are mel-spectograms.
I am trying to find a way to convert said files to a .wav file in order to check if my work has been fruitfull.
I used this :
melspectrogram = librosa.feature.melspectrogram(
"/content/prediction/tacotron2-0/paol_wavpaol_8-norm-feats.npy", sr=22050,
window=scipy.signal.hanning, n_fft=1024, hop_length=256)
print('melspectrogram.shape', melspectrogram.shape)
print(melspectrogram)
audio_signal = librosa.feature.inverse.mel_to_audio(
melspectrogram, sr22050, n_fft=1024, hop_length=256, window=scipy.signal.hanning)
print(audio_signal, audio_signal.shape)
sf.write('test.wav', audio_signal, sample_rate)
But it is given me this error : Audio data must be of type numpy.ndarray.
Although I am already giving it a numpy.ndarray file.
Does anyone know where the issue might be, and if anyone knows a better way to do it?
I'm not sure what your error is, but the output of a Tacotron 2 system are log Mel spectral features and you can't just apply the inverse Fourier transform to get a waveform because you are missing the phase information and because the features are not invertible. You can learn about why this is at places like Speech.Zone (https://speech.zone/courses/)
Instead of using librosa like you are doing, you need to use a vocoder like HiFiGan (https://github.com/jik876/hifi-gan) that is trained to reconstruct a waveform from log Mel spectral features. You can use a pre-trained model, and most off-the-shelf vocoders, but make sure that the sample rate, Mel range, FFT, hop size and window size are all the same between your Tacotron2 feature prediction network and whatever vocoder you choose otherwise you'll just get noise!

What is the difference between TFRecordDataset and FixedLengthRecordDataset?

It will be great to get a use case possibly from a project and explain the use of each. Thanks in advance.
TFRecordDataset, FixedLengthRecordDataset as well as TextLineDataset are classes of Dataset.
Dataset is a base class containing methods to create and transform datasets. Also allows you initialize a dataset from data in memory, or from a Python generator.
Since release 1.4, Datasets is a new way to create input pipelines to TensorFlow models. This API is much more performant than using feed_dict or the queue-based pipelines, and it's cleaner and easier to use.
As a use case, you can think of the pre-processing of data to feed it into a model for training (Examples in the links below are pretty self-explanatory).
TFRecordDataset: Reads records from TFRecord files (Example 1, Example 2).
#Python
dataset = tf.data.TFRecordDataset("/path/to/file.tfrecord")
FixedLengthRecordDataset: Reads fixed size records from binary files (Example).
#Python
images = tf.data.FixedLengthRecordDataset(
images_file, 28 * 28, header_bytes=16).map(decode_image)
TextLineDataset: Reads lines from text files.
See this documentation (TextLineDataset example included)

Using a subset of tfrecord

Is it possible to use an existing tfrecord for one or a subset of the labels which was used to generate it
I'm training several models with the same data each would require only a one or a subset of labels used to originally create the tfrecord. The tfrecord is quite large so I want to about create one for each models subset of labels.
tf.data.Datasets have filter, skip and take methods which you may find useful. Alternatively you could split your original dataset across multiple tfrecord files and create a Dataset based on a subset of those files.
If you are happy to recreate the data using tensorflow_datasets, splits may also give you what you want.

How to load images for classification problem using Keras

I am working on image classification problem using Keras framework. This is binary classification problem and I have 2 folders training set and test set which contains images of both the classes. I don't have separate folder for each class (say cat vs. dog). Keras ImageDataGenerator works when we have separate folders for each class (cat folder & dog folder). But I have all the images in single folder training set and I am not understanding how to proceed further. Kindly suggest how to load the images.
I also have 2 CSV files - train.csv and test.csv. train.csv contain 2 columns namely image_id and class_name. test.csv contains image_id. Note that image_id is matching with the name of files in the images folders.
The latest versions of ImageDataGenerator have a method called flow_from_dataframe, which does exactly what you want.
Basically it is used by first loading your CSV file in a pandas DataFrame, instantiate a ImageDataGenerator and then call flow_from_dataframe with three important parameters:
directory: Folder where your data lives.
x_col: Column in the DataFrame that contains the filenames inside the directory that correspond to your training/testing data.
y_col: Column in the DataFrame corresponding to the labels that will be output by the generator.
Then you use this generator as any other, by calling fit_generator. More information and examples are available here.