Tensorflow: attention decoder - tensorflow

In Tensorflow 1.0, the seq2seq API was largely changed, and is no longer compatible with previous seq2seq examples. In particular, I find attention decoders quite a bit more challenging to build: the old attention_decoder function has been removed, instead the new API expects the user to provide the dynamic_rnn_decoder a couple of different attention functions during training and predicting, which in turn rely on a prepare_attention function.
Has anybody got an example of how to build an attention decoder, providing only the inputs and the final encoder state?

This seq2seq with attention model can translate number pronunciations into digits, e.g. "one hundred and twenty seven" -> "127". See if it helps.
https://acv.aixon.co/attention_word_to_number.html

Related

Tensorflow's quantization for RNN and LSTM

In the guide for Quantization Aware Training, I noticed that RNN and LSTM were listed in the roadmap for "future support". Does anyone know if it is supported now?
Is using Post-Training Quantization also possible for quantizing RNN and LSTM? I don't see much information or discussion about it so I wonder if it is possible now or if it is still in development.
Thank you.
I am currently trying to implement a speech enhancement model in 8-bit integer based on DTLN (https://github.com/breizhn/DTLN). However, when I tried to infer the quantized model without any audio/ empty array, it adds a weird waveform on top of the result: A constant signal every 125 Hz. I have checked other places in the code and there is no problem, just boils down to the quantization process with RNN/LSTM.

Tensorflow : Transformer Model Decoder Target Input

I'm quite new to Tensorflow and machine learning. Sorry if I haven't asked the question accurately or not making sense somewhere. I have recently got to read about and try to understand the transformer model, after its reputation in NLP and thankfully TensorFlow website has in details code and explanation.
https://www.tensorflow.org/text/tutorials/transformer#training_and_checkpointing
I have no problem understanding the code: the attention layer, positional encoding, encoder, decoder, masking etc.
When training the model, the input is the sentence to be translated and the one in the target language. where the target language is shifted and masked.
My problem is when the trainned model is used for evaluation, the mission is to translate an unseen sentence to the target language, and so the input for target would be an empty token, how would this empty tensor react with the trained model within the attention layer. Its empty? and in the first place what would be the effect of neglecting it.
To be more precise, please look at the screenshot below:
tar_inp in inputted in transformer, and loss is computed between prediction and tar_real but when evaluating the model, what is the function of an empty tar_inp do in the layer. Thank you very much sorry if it's a dumb question and may you please provide some intuition for understanding.

Why is it discouraged to use softmax as activation function in last layer according to Tensorflow documenation?

I was following Tensorlflow's Quickstart guide and noticed they discouraged using the softmax function as the activation function in the last layer. The explanation follows:
While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to provide an exact and numerically stable loss calculation for all models when using a softmax output.
Can anyone expand upon this explanation? Everything I have been able to find on the topic recommends using the softmax function in the last layer, counter to Tensorflow's documentation. Has something happened recently that would render this guidance outdated and incorrect now?
Thanks for any insight.

Differences between different attention layers for Keras

I am trying to add an attention layer for my text classification model. The inputs are texts (e.g. movie review), the output is a binary outcome (e.g. positive vs negative).
model = Sequential()
model.add(Embedding(max_features, 32, input_length=maxlen))
model.add(Bidirectional(CuDNNGRU(16,return_sequences=True)))
##### add attention layer here #####
model.add(Dense(1, activation='sigmoid'))
After some searching, I found a couple of read-to-use attention layers for keras. There is the keras.layers.Attention layer that is built-in in Keras. There is also the SeqWeightedAttention and SeqSelfAttention layer in the keras-self-attention package. As a person who is relatively new to the deep learning field, I have a hard time understanding the mechanism behind these layers.
What does each of these lays do? Which one will be the best for my model?
Thank you very much!
If you are using RNN, I would not recommend using the keras.layers.Attention class.
While analysing tf.keras.layers.Attention Github code to better understand how to use the same, the first line I could come across was - "This class is suitable for Dense or CNN networks, and not for RNN networks"
There is another open source version maintained by CyberZHG called
keras-self-attention. To the best of my knowledge this is NOT a part of the Keras or TensorFlow library and seems to be an independent piece of code. This contains the two classes you mentioned - SeqWeightedAttention & SeqSelfAttention layer classes. former returns a 2D value and latter a 3D value. So the SeqWeightedAttention should work for your situation. The former seems to be loosely based on Raffel et al and can be used for Seq classification, The latter seems to be a variation of Bahdanau.
In general, I would suggest you to write your own seq to classification model. The attention piece can be added in less than half a dozen lines of code (bare-bones essence)...much less than the time you would spend in integrating or debugging or understanding the code in these external libraries.
Please refer: Create an LSTM layer with Attention in Keras for multi-label text classification neural network

API Reference for RNN and Seq2Seq models in tensorflow

Where can I find the API references that specifies the available functions in the RNN and Seq2Seq models.
In the github page it was mentioned that rnn and seq2seq were moved to tf.nn
[NOTE: this answer is updated for r1.0 ... but explains legacy_seq2seq instead of tensorflow/tensorflow/contrib/seq2seq/]
The good news is that the seq2seq models provided in tensorflow are pretty sophisticated including embeddings, buckets, attention mechanism, one-to-many multi-task models, etc.
The bad news is that there is much complexity and layers of abstraction in the Python code, and that the code itself is the best available "documentation" of the higher-level RNN and seq2seq "API" as far as I can tell...thankfully the code is well docstring'd.
Practically speaking I think the examples and helper functions pointed to below are mainly useful for reference to understand coding patterns...and that in most cases you'll need to re-implement what you need using the basic functions in the lower-level Python API
Here is a breakdown of the RNN seq2seq code from the top down as of version r1.0:
models/tutorials/rnn/translate/translate.py
...provides main(), train(), decode() that works out-of-the-box to translate english to french...but you can adapt this code to other data sets
models/tutorials/rnn/translate/seq2seq_model.py
...class Seq2SeqModel() sets up a sophisticated RNN encoder-decoder with embeddings, buckets, attention mechanism...if you don't need embeddings, buckets, or attention you'll need to implement a similar class.
tensorflow/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py
...main entry point for seq2seq models via helper functions. See model_with_buckets(), embedding_attention_seq2seq(), embedding_attention_decoder(), attention_decoder(), sequence_loss(), etc.
Examples include one2many_rnn_seq2seq and models without embeddings/attention also provided like basic_rnn_seq2seq. If you can jam your data into the tensors that these functions will accept this could be your best entry point to building your own model.
tensorflow/tensorflow/contrib/rnn/python/ops/core_rnn.py
...provides a wrappers for RNN networks like static_rnn() with some bell and whistles I usually don't need so I just use code like this instead:
def simple_rnn(cell, inputs, dtype, score):
with variable_scope.variable_scope(scope or "simple_RNN") as varscope1:
if varscope1.caching_device is None:
varscope1.set_caching_device(lambda op: op.device)
batch_size = array_ops.shape(inputs[0])[0]
outputs = []
state = cell.zero_state(batch_size, dtype)
for time, input_t in enumerate(inputs):
if time > 0:
variable_scope.get_variable_scope().reuse_variables()
(output, state) = cell(input_t, state)
outputs.append(output)
return outputs, state
So far I also can't find API references about rnn functions on their site.
However, I believe you can see the comments for each functions on github as a function reference.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn.py
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn_cell.py
RNN docs for the current/master version of TensorFlow:
https://www.tensorflow.org/versions/master/api_docs/python/nn.html#recurrent-neural-networks
RNN docs for a specific version of TensorFlow:
https://www.tensorflow.org/versions/r0.10/api_docs/python/nn.html#recurrent-neural-networks
For the curious, here are some notes about why the RNN docs weren't available initially:
API docs does not list RNNs