I would like to create custom hierarchical, many-to-one RNN architecture in Tensorflow. The idea is to process posts of each user on a discussion forum by inner RNN and hand over the last hidden state for each post into outer RNN that encodes order of posts. The idea is to end up with some latent representation that encodes both content and order that is used for prediction of a specified binary target. I would like to implement this architecture, but I cannot wrap my head around how to express the nesting in Tensorflow. Can you please give me some advice? Many thanks. Please see the attached diagram. (https://i.stack.imgur.com/lUqqX.png)
I tried reading Tensorflow documentation on nested inputs to RNNs and on composing layers from layers, but I think it does not describe what I want do... or it simply does not click for me.
Related
I'm quite new to Tensorflow and machine learning. Sorry if I haven't asked the question accurately or not making sense somewhere. I have recently got to read about and try to understand the transformer model, after its reputation in NLP and thankfully TensorFlow website has in details code and explanation.
https://www.tensorflow.org/text/tutorials/transformer#training_and_checkpointing
I have no problem understanding the code: the attention layer, positional encoding, encoder, decoder, masking etc.
When training the model, the input is the sentence to be translated and the one in the target language. where the target language is shifted and masked.
My problem is when the trainned model is used for evaluation, the mission is to translate an unseen sentence to the target language, and so the input for target would be an empty token, how would this empty tensor react with the trained model within the attention layer. Its empty? and in the first place what would be the effect of neglecting it.
To be more precise, please look at the screenshot below:
tar_inp in inputted in transformer, and loss is computed between prediction and tar_real but when evaluating the model, what is the function of an empty tar_inp do in the layer. Thank you very much sorry if it's a dumb question and may you please provide some intuition for understanding.
I want to ask you how we can effectively re-train a trained seq2seq model to remove/mitigate a specific observed error output. I'm going to give an example about Speech Synthesis, but any idea from different domains, such as Machine Translation and Speech Recognition, using seq2seq model will be appreciated.
I learned the basics of seq2seq with attention model, especially for Speech Synthesis such as Tacotron-2.
Using a distributed well-trained model showed me how naturally our computer could speak with the seq2seq (end-to-end) model (you can listen to some audio samples here). But still, the model fails to read some words properly, e.g., it fails to read "obey [əˈbā]" in multiple ways like [əˈbī] and [əˈbē].
The reason is obvious because the word "obey" appears too little, only three times out of 225,715 words, in our dataset (LJ Speech), and the model had no luck.
So, how can we re-train the model to overcome the error? Adding extra audio clips containing the "obey" pronunciation sounds impractical, but reusing the three audio clips has the danger of overfitting. And also, I suppose we use a well-trained model and "simply training more" is not an effective solution.
Now, this is one of the drawbacks of seq2seq model, which is not talked much. The model successfully simplified the pipelines of the traditional models, e.g., for Speech Synthesis, it replaced an acoustic model and a text analysis frontend etc by a single neural network. But we lost the controllability of our model at all. It's impossible to make the system read in a specific way.
Again, if you use a seq2seq model in any field and get an undesirable output, how do you fix that? Is there a data-scientific workaround to this problem, or maybe a cutting-edge Neural Network mechanism to gain more controllability in seq2seq model?
Thanks.
I found an answer to my own question in Section 3.2 of the paper (Deep Voice 3).
So, they trained both of phoneme-based model and character-based model, using phoneme inputs mainly except that character-based model is used if words cannot be converted to their phoneme representations.
I'm new to tensorflow and would like to know if there is any tutorial or example of a multi-label classification with multiple network outputs.
I'm asking this because I have a collection of articles, in which, each article can have several tags.
Out of the box, tensorflow supports binary multi-label classification via tf.nn.sigmoid_cross_entropy_with_logits loss function or the like (see the complete list in this question). If your tags are binary, in other words there's a predefined set of possible tags and each one can either be present or not, you can safely go with that. A single model to classify all labels at once. There are a lot of examples of such networks, e.g. one from this question.
Unfortunately, multi-nomial multi-label classification is not supported in tensorflow. If this is your case, you'd have to build a separate classifier for each label, each using tf.nn.softmax_cross_entropy_with_logits or a similar one.
I would very much like to understand how I can enrich a CNN with provided meta information. As I understand, a CNN 'just' looks at the images and classifies it into objects without looking at possibly existing meta-parameters such as time, weather conditions, etc etc.
To be more precise, I am using a keras CNN with tensorflow in the backend. I have the typical Conv2D and MaxPooling Layers and a fully connected model at the end of the pipeline. It works nicely and gives me a good accuracy. However, I do have additional meta information for each image (the manufacturer of the camera with which the image was taken) that is unused so far.
What is a recommended way to incorporate this meta information into the model? I could not yet come out with a good solution by myself.
Thanks for any help!
Usually it is done by adding this information in one of the fully connected layer before the prediction. The fully connected layer gives you K features representing your image, you just concatenate them with the additional information you have.
I'm trying to work out what's the best model to adapt for an open named entity recognition problem (biology/chemistry, so no dictionary of entities exists but they have to be identified by context).
Currently my best guess is to adapt Syntaxnet so that instead of tagging words as N, V, ADJ etc, it learns to tag as BEGINNING, INSIDE, OUT (IOB notation).
However I am not sure which of these approaches is the best?
Syntaxnet
word2vec
seq2seq (I think this is not the right one as I need it to learn on two aligned sequences, whereas seq2seq is designed for sequences of differing lengths as in translation)
Would be grateful for a pointer to the right method! thanks!
Syntaxnet can be used to for named entity recognition, e.g. see: Named Entity Recognition with Syntaxnet
word2vec alone isn't very effective for named entity recognition. I don't think seq2seq is commonly used either for that task.
As drpng mentions, you may want to look at tensorflow/tree/master/tensorflow/contrib/crf. Adding an LSTM before the CRF layer would help a bit, which gives something like:
LSTM+CRF code in TensorFlow: https://github.com/Franck-Dernoncourt/NeuroNER