How to use tensorflow-wavenet - tensorflow

I am trying to use the tensorflow-wavenet program for text to speech.
These are the steps:
Download Tensorflow
Download librosa
Install requirements pip install -r requirements.txt
Download corpus and put into directory named "corpus"
Train the machine python train.py --data_dir=corpus
Generate audio python generate.py --wav_out_path=generated.wav --samples 16000 model.ckpt-1000
After doing this, how can I generate a voice read-out of a text file?

According to the tensorflow-wavenet page:
Currently there is no local conditioning on extra information which would allow context stacks or controlling what speech is generated.
You can find more information about current development of the project by reading the issues on the repository (local conditioning is a desired feature!)
The Wavenet paper compares Wavenet to two TTS baselines, one of which appears to have code for training available online: http://hts.sp.nitech.ac.jp

A recent paper by DeepMind describes one approach to going from text to speech using WaveNet, which I have not tried to implement but which at least states the method they use: they first train one network to predict a spectrogram from text, then train WaveNet to use the same sort of spectrogram as an additional conditional input to produce speech. It's a neat idea, especially since you can train the WaveNet part on some huge database of voice-only data, for which you can extract the spectrogram, and then train the text-to-spectrogram part using a different dataset where you have text.
https://google.github.io/tacotron/publications/tacotron2/index.html has the paper and some example outputs.
There seems to be a bunch of unintuitive engineering around the spectrogram prediction part (no doubt because of the nature of text-to-time learning), but there's some detail in the paper at least. The dataset is proprietary so I've no idea how hard it would be to get any results using other datasets.

For those who may come across this question, there is a new python implementation ForwardTacotron that enables text-to-speech readily.

Related

I want to use hidden markov model for data prediction

I am new to machine learning models and data science libraries. I wanted to use the Hidden Markov model for statistical data prediction on the fly which read the data from kafka and builds the model which is used to predict the data during the run-time and do the same for continous stream always.
Currently i can see only Tensorflow hidden markov model implementation in tensorflow python (tensorflow_probability distribution). Is their any other library available which can help me acheive the above scenario
Suggestions can involve the libraries of JAVA and python
Please feel free to add any resource links that can help me to understand the usage of tensorflow for hidden markov model
this might be a nice place to start: https://hmmlearn.readthedocs.io/en/latest/tutorial.html
Other alternatives, I found, are
Java:
Mallet library and it's extention GRMM in particular.
Python:
Pommegranate with it's HMM support.
Having said that, TensorFlow is much better known active and supported library, in my impression. I'd try that first.
I'm searching a library that would support Hierarchical HMMs (HHMM). That would probably require some tweaking into one of the listed ones.

Which model (GPT2, BERT, XLNet and etc) would you use for a text classification task? Why?

I'm trying to train a model for a sentence classification task. The input is a sentence (a vector of integers) and the output is a label (0 or 1). I've seen some articles here and there about using Bert and GPT2 for text classification tasks. However, I'm not sure which one should I pick to start with. Which of these recent models in NLP such as original Transformer model, Bert, GPT2, XLNet would you use to start with? And why? I'd rather to implement in Tensorflow, but I'm flexible to go for PyTorch too.
Thanks!
It highly depends on your dataset and is part of the data scientist's job to find which model is more suitable for a particular task in terms of selected performance metric, training cost, model complexity etc.
When you work on the problem you will probably test all of the above models and compare them. Which one of them to choose first? Andrew Ng in "Machine Learning Yearning" suggest starting with simple model so you can quickly iterate and test your idea, data preprocessing pipeline etc.
Don’t start off trying to design and build the perfect system.
Instead, build and train a basic system quickly—perhaps in just a few
days
According to this suggestion, you can start with a simpler model such as ULMFiT as a baseline, verify your ideas and then move on to more complex models and see how they can improve your results.
Note that modern NLP models contain a large number of parameters and it is difficult to train them from scratch without a large dataset. That's why you may want to use transfer learning: you can download pre-trained model and use it as a basis and fine-tune it to your task-specific dataset to achieve better performance and reduce training time.
I agree with Max's answer, but if the constraint is to use a state of the art large pretrained model, there is a really easy way to do this. The library by HuggingFace called pytorch-transformers. Whether you chose BERT, XLNet, or whatever, they're easy to swap out. Here is a detailed tutorial on using that library for text classification.
EDIT: I just came across this repo, pytorch-transformers-classification (Apache 2.0 license), which is a tool for doing exactly what you want.
Well like others mentioned, it depends on the dataset and multiple models should be tried and best one must be chosen.
However, sharing my experience, XLNet beats all other models so far by a good margin. Hence if learning is not the objective, i would simple start with XLNET and then try a few more down the line and conclude. It just saves time in exploring.
Below repo is excellent to do all this quickly. Kudos to them.
https://github.com/microsoft/nlp-recipes
It uses hugging face transformers and makes them dead simple. 😃
I have used XLNet, BERT, and GPT2 for summarization tasks (English only). Based on my experience, GPT2 works the best among all 3 on short paragraph-size notes, while BERT performs better for longer texts (up to 2-3 pages). You can use XLNet as a benchmark.

Tensorflow Stored Learning

I haven't tried Tensorflow yet but still curious, how does it store, and in what form, data type, file type, the acquired learning of a machine learning code for later use?
For example, Tensorflow was used to sort cucumbers in Japan. The computer used took a long time to learn from the example images given about what good cucumbers look like. In what form the learning was saved for future use?
Because I think it would be inefficient if the program should have to re-learn the images again everytime it needs to sort cucumbers.
Ultimately, a high level way to think about a machine learning model is three components - the code for the model, the data for that model, and metadata needed to make this model run.
In Tensorflow, the code for this model is written in Python, and is saved in what is known as a GraphDef. This uses a serialization format created at Google called Protobuf. Common serialization formats include Python's native Pickle for other libraries.
The main reason you write this code is to "learn" from some training data - which is ultimately a large set of matrices, full of numbers. These are the "weights" of the model - and this too is stored using ProtoBuf, although other formats like HDF5 exist.
Tensorflow also stores Metadata associated with this model - for instance, what should the input look like (eg: an image? some text?), and the output (eg: a class of image aka - cucumber1, or 2? with scores, or without?). This too is stored in Protobuf.
During prediction time, your code loads up the graph, the weights and the meta - and takes some input data to give out an output. More information here.
Are you talking about the symbolic math library, or the idea of tensor flow in general? Please be more specific here.
Here are some resources that discuss the library and tensor flow
These are some tutorials
And here is some background on the field
And this is the github page
If you want a more specific answer, please give more details as to what sort of work you are interested in.
Edit: So I'm presuming your question is more related to the general field of tensor flow than any particular application. Your question still is too vague for this website, but I'll try to point you toward a few resources you might find interesting.
The tensorflow used in image recognition often uses an ANN (Artificial Neural Network) as the object on which to act. What this means is that the tensorflow library helps in the number crunching for the neural network, which I'm sure you can read all about with a quick google search.
The point is that tensorflow isn't a form of machine learning itself, it more serves as a useful number crunching library, similar to something like numpy in python, in large scale deep learning simulations. You should read more here.

Visualizing the detection process in Mask-RCNN

I am working on a project that aims to detect objects in certain difficult circumstances. I ran a test with Mask_RCNN on a dataset that contains that specific type of difficult examples and it did a pretty good job in some of them.
But some other examples didn't get detected surprisingly, when there is no obvious reason. To understand the reason behind this performance difference, I've been adviced to use Tensorboard. But then I realized that its mostly used for training phase, as I understood from this video.
At the end of the video, however, they mention about an integration project of Tensorboard, namely the Tensorflow Debugger Integration. But unfortunately I could not find further information regarding the continuation about that feature.
Is there any way to visualize weights and activation maps inside a CNN during inference/evaluation phase?
The main difference between training and inference time for tensorboard will be the global_step value. Most graphs display global step as the x-axis. You can supply your own global step counter if you like, but you'll have to decide what the x-axis should represent to you in this case since "time" isn't really a logical construct during inference. Other tabs such as the images tab don't have a time component, so using them should be the same as during training.
The tensorflow debugger is a nice terminal debugger, but wouldn't really be related to what you're trying to do here. It's certainly not a visualization tool.
Another approach might be to simply generate your own plots and output a set of PDFs with the various visualizations you need using standard tools like matplotlib for each test image. I've found tools like XnView make it really easy to look through a lot of PDF visualizations to understand what's going on. I've used this approach quite effectively. If you want to view many hundreds or thousands of results quickly you might have an easier time if all the visuals are just dumped out to a directory.

Deep Learning with TensorFlow on Compute Engine VM

I'm actualy new in Machine Learning, but this theme is vary interesting for me, so Im using TensorFlow to classify some images from MNIST datasets...I run this code on Compute Engine(VM) at Google Cloud, because my computer is to weak for this. And the code actualy run well, but the problam is that when I each time enter to my VM and run the same code I need to wait while my model is training on CNN, and after I can make some tests or experiment with my data to plot or import some external images to impruve my accuracy etc.
Is There is some way to save my result of trainin model just once, some where, that when I will decide for example to enter to the same VM tomorrow...and dont wait anymore while my model is training. Is that possible to do this ?
Or there is maybe some another way to do something similar ?
You can save a trained model in TensorFlow and then use it later by loading it; that way you only have to train your model once, and use it as many times as you want. To do that, you can follow the TensorFlow documentation regarding that topic, where you can find information on how to save and load the model. In short, you will have to use the SavedModelBuilder class to define the type and location of your saved model, and then add the MetaGraphs and variables you want to save. Loading the saved model for posterior usage is even easier, as you will only have to run a command pointing to the location of the file in which the model was exported.
On the other hand, I would strongly recommend you to change your working environment in such a way that it can be more profitable for you. In Google Cloud you have the Cloud ML Engine service, which might be good for the type of work you are developing. It allows you to train your models and perform predictions without the need of an instance running all the required software. I happen to have worked a little bit with TensorFlow recently, and at first I was also working with a virtualized instance, but after following some tutorials I was able to save some money by migrating my work to ML Engine, as you are only charged for the usage. If you are using your VM only with that purpose, take a look at it.
You can of course consult all the available documentation, but as a first quickstart, if you are interested in ML Engine, I recommend you to have a look at how to train your models and how to get your predictions.