Is it possible to create a trainable variable in keras like in tensorflow? - variables

Good morning everyone;
I'm trying to implement this model where the neural network's inputs are based on a trainable vocabulary matrix (each row in the matrix represents a word entry in the vocabulary). I'm using keras (tensorflow backend), I was wondering if it's possible to define a trainable variable (without adding a custom layer), such that this variable will be trained as well as the neural network? like a tensorflow variable.
Could you please give a short example of how I can do it?
Thanks in advance.

The neural network's inputs are based on a trainable vocabulary matrix (each row in the matrix represents a word entry in the vocabulary)
This is the definition of a Word Embedding
There is already an embedding layer in Keras, you don't have to reimplement it.
You can find an easy example of how to use it here.

Related

Deploying tensorflow RNN models (other than keras LSTM) to a microcontroller without unrolling the network?

Goal
I want to compare different types of RNN tflite-micro models, built using tensorflow, on a microcontroller based on their accuracy, model size and inference time. I have also created my own custom RNN cell that I want to compare with the LSTM cell, GRU cell, and SimpleRNN cell. I create the tensorflow model using tf.keras.layers.RNN(Cell(...)).
Problem
I have successfully deployed a keras LSTM-RNN using tf.keras.layers.LSTM(...) but when I create the same model using tf.keras.layers.RNN(tf.keras.layers.LSTMCell(...)) and deploy it to the microcontroller, then it does not work. I trained both networks on a batch size of 64, but then I copy the weights and biases to a model where the batch_size is fixed to 1 as tflite-micro does not support dynamic batch sizes.
When the keras LSTM layer is converted to a tflite model it creates a fused operator called UnidirectionalSequenceLSTM but the network created with an RNN layer using the LSTMCell does not have that UnidirectionalSequenceLSTM operator, instead it has a reshape and while operator. The first network has only 1 subgraph but the second has 3 subgraphs.
When I run that second model on the microcontroller, two things go wrong:
the interpreter returns the same result for different inputs
the interpreter fails on some inputs reporting an error with the while loop saying that int32 is not supported (which is in the while operator, and can't be quantized to int8)
LSTM tflite-model vizualized with Netron
RNN(LSTMCell) tflite-model vizualized with Netron
Bad solution (10x model size)
I figured out that by unrolling the second network I can successfully deploy it and get correct results on the microcontroller. However, that increases the model size 10x which is really bad as we are trying to deploy the model on a resource constrained device.
Better solution?
I have explained the problem using the example of the LSTM layer (works) and LSTM cell in an RNN layer (does not work), but I want to be able to deploy a model using the GRU cell, SimpleRNN cell, and of course the custom cell that I have created. And all those have the same problem as the network created with the LSTM cell.
What can I do?
Do I have to create a special fused operator? Maybe even one for each cell I want to compare? How would I do that?
Can I use the interface into the conversion infrastructure for user-defined RNN implementations mentioned here: https://www.tensorflow.org/lite/models/convert/rnn. How I understand the documentation, is that this would only work for user-defined LSTM implementations, not user-defined RNN implemenations like the title suggests.

Trainable USE-lite-based classifier with SentencePiece input

I have heard that it is possible to use the pretrained Universal Sentence Encoder (USE) (neural language model) from TF-hub as part of a trainable model, e.g. a sentence classifier. Some versions of USE rely on SentencePiece sub-word tokenizer, which I also need. There are minimal instructions online for how to do this.
Here is how to use USE-lite with SentencePiece:
- https://tfhub.dev/google/universal-sentence-encoder-lite/2
Here is how to train a classifier based on a pretrained USE model:
- http://hunterheidenreich.com/blog/google-universal-sentence-encoder-in-keras/
- https://www.youtube.com/watch?v=gnz1CUzb5qo
And here is how to measure sentence similarity using both USE-lite and SentencePiece:
- https://github.com/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder_lite.ipynb
I have successfully reproduced the above pieces separately. I have then tried to combine the above ideas into a single POC that will build a classifier model based on USE-lite and SentencePiece, but I cannot see how to do it. I am currently stuck on the part where I modify the trainable classifier's first layer(s). I have tried to make it accept either (1) SentencePiece token IDs (in which I tokenize the text outide of the Tensorflow graph) or (2) raw text (using SentencePiece as an Op inside the Tensorflow graph). After that point, it should feed tokenized text forward into the USE-lite model, either in a lambda or in some other way. Finally, the output of USE-lite should be fed into a dense layer (or two?) ending in softmax for computing class probabilities.
I am relatively new to Tensorflow. I imagine that the above sources would be sufficient for a more experienced Tensorflow developer to merge and make work for my use-case. Let me know if you can provide any pointers. Thanks.

Extract the output of the embedding layer

I am trying to build a regression model, for which I have a nominal variable with very high cardinality. I am trying to get the categorical embedding of the column.
Input:
df["nominal_column"]
Output:
the embeddings of the column.
I want to use the op of the embedding column alone since I would require that as a input to my traditional regression model. Is there a way to extract that output alone.
P.S I am not asking for code, any suggestion on the approach would be great.
If the embedding is part of the model and you train it, then you can use functional API of keras to get output of any intermediate operation in your graph:
x=Input((number_of_categories,))
y=Embedding(parameters_of_your_embeddings)(x)
output=Rest_of_your_model()(y)
model=Model(inputs=[x],outputs=[output,y])
if you do it before you train the model, you'll have to define custom loss function, that deals only with part of the output. The other way is to train the model with just one output, then create identical model with two outputs and set the weights of the second model from the trained one.
If you want to get the embedding matrix from your model, you can just use method get_weights of the embedding layer which returns the weights in numpy array.

What's the relationship between Tensorflow's dataflow graph and DNN?

As we know, a DNN is comprised of many layers which consist of many neurons applying the same function to different parts of the input. Meanwhile, if we use Tensorflow to execute a DNN task, we will get a dataflow graph generated by Tensorflow automatically and we can use Tensorboard to visualize the dataflow graph as blow. But there is no neuron in the layer. So I wonder what is the relationship between Tensorflow dataflow graph and a DNN? When a neuron of DNN's layer map into dataflow graph, how is it represented?What is the relationship of neuron in DNN and node in tensorflow(representing an operation)? I just started to learn DNN and Tensorflow, please help me arrange thoughts in order. Thanks:) enter image description here
You have to differentiate between the metaphoric representation of a DNN and it's mathematic description. The math behind a classic neuron is the sum of the weighted inputs + a bias (usually calling a activation function on this result)
So in this case you have an input vector mutplied by a weight vector (containing trainable variables) and then summed up with a bias scalar (also trainable)
If you now consider a layer of neurons instead of one, the weights will become a matrix and the bias a vector. So calculating a feed forward layer is nothing more then a matrix multiplication follow by a sum of vectors.
This is the operation you can see in your tensorflow graph.
You can actually build your Neural Network this way without any use of the so called High Level API which use the Layer abstraction. (Many have done this in the early days of tensorflow)
The actual "magic", which tensorflow does for you is calculating and executing the derivatives of this foreword pass in order to calculate the updates for the weights.

Tensorflow embeddings

I know what embeddings are and how they are trained. Precisely, while referring to the tensorflow's documentation, I came across two different articles. I wish to know what exactly is the difference between them.
link 1: Tensorflow | Vector Representations of words
In the first tutorial, they have explicitly trained embeddings on a specific dataset. There is a distinct session run to train those embeddings. I can then later on save the learnt embeddings as a numpy object and use the
tf.nn.embedding_lookup() function while training an LSTM network.
link 2: Tensorflow | Embeddings
In this second article however, I couldn't understand what is happening.
word_embeddings = tf.get_variable(“word_embeddings”,
[vocabulary_size, embedding_size])
embedded_word_ids = tf.gather(word_embeddings, word_ids)
This is given under the training embeddings sections. My doubt is: does the gather function train the embeddings automatically? I am not sure since this op ran very fast on my pc.
Generally: What is the right way to convert words into vectors (link1 or link2) in tensorflow for training a seq2seq model? Also, how to train the embeddings for a seq2seq dataset, since the data is in the form of separate sequences for my task unlike (a continuous sequence of words refer: link 1 dataset)
Alright! anyway, I have found the answer to this question and I am posting it so that others might benefit from it.
The first link is more of a tutorial that steps you through the process of exactly how the embeddings are learnt.
In practical cases, such as training seq2seq models or Any other encoder-decoder models, we use the second approach where the embedding matrix gets tuned appropriately while the model gets trained.