how to get a single weights file in tensorflow js converter - tensorflow

I'm using this command:
tfjs.converters.save_keras_model(model,'jsmodels')
but I get a model.json and 3 weights file
group1-shard1of3.bin
group1-shard2of3.bin
group1-shard3of3.bin
and I want to get only one .bin file, how can I do that?

I am not too sure if this is possible using save_keras_model but from the command line with tensorflowjs_converter I would do the following. Where you specify the --weigth_shard_size_bytes to be the size of the model you have. If your model is <= 30Mb then setting it to 30000000 bytes will result in a single file group1-shard1of1.bin.
tensorflowjs_converter --input_format keras --weight_shard_size_bytes 30000000 'model.h5' 'output_dir'

The reason is because your model is over the the default size of 4mb. Hence, it is broken into multiple chunks of at most 4mb so set weight_shard_size_bytes argument of the tfjs.converters.save_keras_model(model,name_of_folder) to a size larger than the size of your model, so you get just one group1-shard1of1.bin file.
tfjs.converters.save_keras_model(model,name_of_folder,weigth_shard_size_bytes=1024*1024*size > than model_size)

Related

Convert a .npy file to wav following tacotron2 training

I am training the Tacotron2 model using TensorflowTTS for a new language.
I managed to train the model (performed pre-processing, normalization, and decoded the few generated output files)
The files in the output directory are .npy files. Which makes sense as they are mel-spectograms.
I am trying to find a way to convert said files to a .wav file in order to check if my work has been fruitfull.
I used this :
melspectrogram = librosa.feature.melspectrogram(
"/content/prediction/tacotron2-0/paol_wavpaol_8-norm-feats.npy", sr=22050,
window=scipy.signal.hanning, n_fft=1024, hop_length=256)
print('melspectrogram.shape', melspectrogram.shape)
print(melspectrogram)
audio_signal = librosa.feature.inverse.mel_to_audio(
melspectrogram, sr22050, n_fft=1024, hop_length=256, window=scipy.signal.hanning)
print(audio_signal, audio_signal.shape)
sf.write('test.wav', audio_signal, sample_rate)
But it is given me this error : Audio data must be of type numpy.ndarray.
Although I am already giving it a numpy.ndarray file.
Does anyone know where the issue might be, and if anyone knows a better way to do it?
I'm not sure what your error is, but the output of a Tacotron 2 system are log Mel spectral features and you can't just apply the inverse Fourier transform to get a waveform because you are missing the phase information and because the features are not invertible. You can learn about why this is at places like Speech.Zone (https://speech.zone/courses/)
Instead of using librosa like you are doing, you need to use a vocoder like HiFiGan (https://github.com/jik876/hifi-gan) that is trained to reconstruct a waveform from log Mel spectral features. You can use a pre-trained model, and most off-the-shelf vocoders, but make sure that the sample rate, Mel range, FFT, hop size and window size are all the same between your Tacotron2 feature prediction network and whatever vocoder you choose otherwise you'll just get noise!

Error: in the file data/coco.names number of names 80 that isn't equal to classes=13

I was using Google Colab to train Yolo-v3 to detect custom objects. I'm new to Colab, and darknet.
I used the following command for training:
!./darknet detector train "/content/gdrive/My Drive/darknet/obj.data" "/content/gdrive/My Drive/darknet/cfg/yolov3-PID.cfg" "/content/gdrive/My Drive/darknet/backup/yolov3-PID_final.weights" -dont_show
The training finished as follows, and it didn't display any details of the epochs (I don't know how many epochs actually run). Actually, it took very short time until it displayed Done!, and saved the weights as shown in the above image
Then, I tried to detect a test image with the following command:
!./darknet detect "/content/gdrive/My Drive/darknet/cfg/yolov3-PID.cfg" "/content/gdrive/My Drive/darknet/backup/yolov3-PID_final.weights" "/content/gdrive/My Drive/darknet/img/MN 111-0-515 (45).jpg" -dont-show
However, I got the following error:
Error: in the file data/coco.names number of names 80 that isn't equal to classes=13 in the file /content/gdrive/My Drive/darknet/cfg/yolov3-PID.cfg
Even, the resulting image didn't contain any bounding boxes, so I don't know if the training worked or not.
Could you pls advise what might be wrong with the training, and why the error is referring to coco.names, while I'm using other files for names, and configuration?
You did not share the yolov3-PID.cfg, obj.data and coco.names. I am assuming coco.names contain 80 classes as in the repo.
The error likely is in obj.data, where it seems your goal here is to detect 13 custom objects. If this is the case, then set classes=13, also replace names=data/coco.names with names=data/obj.names. Here, obj.names file should contain 13 lines for the custom class names. Also modify yolov3-PID.cfg to contain same amount of classes.
I suggest using this repo below if you are not already using this. It contains google colab training and inference script for yolov3, yolov4.
Here are the instructions for custom object detection training.
Nice work!!! coming this far. Well, everything is fine, you just need to edit the data folder of the darknet. By default it's using coco label, go to darknet folder --> find data folder --> coco.names file --> edit the file by removing 80 classes(in colab just double click to edit and ctrl+s to save) --> Put down your desired class and it's done!!!
i was having the same problem when i was training custom model in colab.
i just cloned darknet again in another folder and edited coco.name and moved it to my training folder. and it worked!!

Multiple ckpt, meta into single .pb file?

In tensorflow, the training produced the following files:
checkpoint
model.ckpt-10000.meta
model.ckpt-10000.data-00000-of-00001
model.ckpt-10000.index
model.ckpt-11000.meta
model.ckpt-11000.data-00000-of-00001
model.ckpt-11000.index
model.ckpt-12000.meta
model.ckpt-12000.data-00000-of-00001
model.ckpt-12000.index
model.ckpt-8000.meta
model.ckpt-8000.data-00000-of-00001
model.ckpt-8000.index
model.ckpt-9000.meta
model.ckpt-9000.data-00000-of-00001
model.ckpt-9000.index
I am interested in creating a .pb file from the output generated training; however, from the examples I have seen, it requires one set of intermediate output files. How do I merge all the output set files into a single .pb?
What you are trying to do does not make sense (at least to me). I recommend you to read about these checkpoint files here and here at first.
In short, checkpoint file just tells you what is the latest model. The .meta file stores info about your graph structure, .data stores values for variables and .index stores key/value pairs which have info where the values for each parameter can be found in .data files.
All your files look like model.ckpt-xxxx. This xxxx is the step number. So you have snapshots of training at different steps. And this is why it does not make sense to combine the value of the variable at step 9000 with the value at step 11000. Also .meta files are probably all the same.

how to add speech training data to tensorflow

I have labelled .wav files to train a Convolutional Neural Network. These are for Bengali phones, for which no standard Dataset is available. I want to input these .wav files to Tensorflow for training my CNN model. I want to create Grayscale Spectrograms from these .wav files, which will be input for my model. I need help in how to do so. If there is more than one alternative, what are their strength and weakness?
Also, they are of variable time lengths, like some are 70ms, some are 160ms. Is there a way to divide them in 20ms segments?
I have done something similar in my research. I used the Linux utility SOX to do the audio wave file manipulation and creating spectrograms.
On the audio file length, you can use the "trim" option within SOX to split the file into 20ms segments. Something along the lines of the following:
sox myaudio.wav trim 0 0.02 : newfile : restart
Using the "spectrogram" option of SOX, you can then create the spectrogram.
sox myaudio.wav -n spectrogram -m -x 256 -y 256 -o myspectrogram.png
The command will create a monochrome spectrogram of size 256x256 and store it in the file "myspectrogram.png".
In my research, I did not split the file into smaller chunks. I found that using the whole wave file of the word was sufficient to get good recognition. But, it depends on what your long term goal is.
You can also look at the ffmpeg ops in TensorFlow for loading audio files, though we don't yet have a built-in spectrogram:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ffmpeg

Training custom dataset with translate model

Running the model out of the box generates these files in the data dir :
ls
dev-v2.tgz newstest2013.en
giga-fren.release2.fixed.en newstest2013.en.ids40000
giga-fren.release2.fixed.en.gz newstest2013.fr
giga-fren.release2.fixed.en.ids40000 newstest2013.fr.ids40000
giga-fren.release2.fixed.fr training-giga-fren.tar
giga-fren.release2.fixed.fr.gz vocab40000.from
giga-fren.release2.fixed.fr.ids40000 vocab40000.to
Reading the src of translate.py :
https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/translate.py
tf.app.flags.DEFINE_string("from_train_data", None, "Training data.")
tf.app.flags.DEFINE_string("to_train_data", None, "Training data.")
To utilize my own training data I created dirs my-from-train-data & to-from-train-data and add my own training data to each of these dirs, training data is contained in the files mydata.from & mydata.to
my-to-train-data contains mydata.from
my-from-train-data contains mydata.to
I could not find documentation as to using own training data or what format it should take so I inferred this from the translate.py src and contents of data dir created when executing translate model out of the box.
Contents of mydata.from :
Is this a question
Contents of mydata.to :
Yes!
I then attempt to train the model using :
python translate.py --from_train_data my-from-train-data --to_train_data my-to-train-data
This returns with an error :
tensorflow.python.framework.errors_impl.NotFoundError: my-from-train-data.ids40000
Appears I need to create file my-from-train-data.ids40000 , what should it's contents be ? Is there an example of how to train this model using custom data ?
blue-sky
Great question, training a model on your own data is way more fun than using the standard data. An example of what you could put in the terminal is:
python translate.py --from_train_data mydatadir/to_translate.in --to_train_data mydatadir/to_translate.out --from_dev_data mydatadir/test_to_translate.in --to_dev_data mydatadir/test_to_translate.out --train_dir train_dir_model --data_dir mydatadir
What goes wrong in your example is that you are not pointing to a file, but to a folder. from_train_data should always point to a plaintext file, whose rows should be aligned with those in the to_train_data file.
Also: as soon as you run this script with sensible data (more than one line ;) ), translate.py will generate your ids (40.000 if from_vocab_size and to_vocab_size are not set). Important to know is that this file is created in the folder specified by data_dir... if you do not specify one this means they are generated in /tmp (I prefer them at the same place as my data).
Hope this helps!
Quick answer to :
Appears I need to create file my-from-train-data.ids40000 , what should it's contents be ? Is there an example of how to train this model using custom data ?
Yes, that's the vocab/ word-id file missing, which is generated when preparing to create the data.
Here is a tutorial from the Tesnorflow documentation.
quick over-view of the files and why you might be confused by the files outputted vs what to use:
python/ops/seq2seq.py: >> Library for building sequence-to-sequence models.
models/rnn/translate/seq2seq_model.py: >> Neural translation sequence-to-sequence model.
models/rnn/translate/data_utils.py: >> Helper functions for preparing translation data.
models/rnn/translate/translate.py: >> Binary that trains and runs the translation model.
The Tensorflow translate.py file requires several files to be generated when using your own corpus to translate.
It needs to be aligned, meaning: language line 1 in file 1. <> language line 1 file 2. This
allows the model to do encoding and decoding.
You want to make sure the Vocabulary have been generated from the dataset using this file:
Check these steps:
python translate.py
--data_dir [your_data_directory] --train_dir [checkpoints_directory]
--en_vocab_size=40000 --fr_vocab_size=40000
Note! If the Vocab-size is lower, then change that value.
There is a longer discussion here tensorflow/issues/600
If all else fails, check out this ByteNet implementation in Tensorflow which does translation task as well.