Tensorflow Embedding for training and inference - tensorflow

I am trying to code a simple Neural machine translation using tensorflow. But I am a little stuck regarding the understanding of the embedding on tensorflow :
I do not understand the difference between tf.contrib.layers.embed_sequence(inputs, vocab_size=target_vocab_size,embed_dim=decoding_embedding_size)
and
dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)
In which case should I use one to another ?
The second thing I do not understand is about tf.contrib.seq2seq.TrainingHelper and tf.contrib.seq2seq.GreedyEmbeddingHelper. I know that in the case of translation, we use mainly TrainingHelper for the training step (use the previous target to predict the next target) and GreedyEmbeddingHelper for the inference step (use the previous timestep to predict the next target).
But I do not understand how does it work. In particular the different parameters used. For example why do we need a sequence length in the case of TrainingHelper (why do we not used an EOS)? Why both of them do not use the embedding_lookup or embedding_sequence as input ?

I suppose that you're coming from this seq2seq tutorial. Even though this question is starting to get old, I'll try to answer for the people passing by like me:
For the first question, I looked at the source code behind tf.contrib.layers.embed_sequence, and it is actually using tf.nn.embedding_lookup. So it just wraps it, and creates the embedding matrix (tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))) for you. Although this is convenient and less verbose, by using embed_sequence there doesn't seem to a direct way to access the embeddings. So if you want to, you have to query for the internal variable used as the embedding matrix by using the same name space. I have to admit that the code in the tutorial above is confusing. I even suspect he's using different embeddings in the encoder and the decoder.
For the second question:
I guess it is equivalent to use a sequence length or an embedding.
The TrainingHelper doesn't need the embedding_lookup as it only forwards the inputs to the decoder, GreedyEmbeddingHelper does take as a first input the embedding_lookup as mentioned in the documentation.

If I understand you correctly, the first question is about the differences between tf.contrib.layers.embed_sequence and tf.nn.embedding_lookup.
According to the official docs (https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence),
Typical use case would be reusing embeddings between an encoder and decoder.
I think tf.contrib.layers.embed_sequence is designed for seq2seq models.
I found the following post:
https://github.com/tensorflow/tensorflow/issues/17417
where #ispirmustafa mentioned:
embedding_lookup doesn't support invalid ids.
Also, in another post: tf.contrib.layers.embed_sequence() is for what?
#user1930402 said:
When building a neural network model that has multiple gates that take features as input, by using tensorflow.contrib.layers.embed_sequence, you can reduce the number of parameters in your network while preserving depth. For example, it eliminates the need for each gates of the LSTM to perform its own linear projection of features.
It allows for arbitrary input shapes, which helps the implementation be simple and flexible.
For the second question, sorry that I didn't use TrainingHelper and can't answer your question.

Related

How to make a keras model take a (None,) tensor as Input

I am using the tf.keras API and I want my Model to take input with shape (None,), None is batch_size.
The shape of keras.layers.Input() doesn't include batch_size, so I think it can't be used.
Is there a way to achieve my goal? I prefer a solution without tf.placeholder since it is deprecated
By the way, my model is a sentence embedding model, so I want the input is something like ['How are you.','Good morning.']
======================
Update:
Currently, I can create an input layer with layers.Input(dtype=tf.string,shape=1), but this need my input to be something like [['How are you.'],['Good morning.']]. I want my input to have only one dimension.
Have you tried tf.keras.layers.Input(dtype=tf.string, shape=())?
If you wanted to set a specific batch size, tf.keras.Input() does actually include a batch_size parameter. But the batch size is presumed to be None by default, so you shouldn't even need to change anything.
Now, it seems like what you actually want is to be able to provide samples (sentences) of variable length. Good news! The tf.keras.layers.Embedding layer allows you to do this, although you'll have to generate an encoding for your sentences first. The Tensorflow website has a good tutorial on the process.

Loss function variational Autoencoder in Tensorflow example

I have a question regarding the loss function in variational autoencoder. I followed the tensorflow example https://www.tensorflow.org/tutorials/generative/cvae to create a LSTM-VAE, for sampling a sinus function.
My encoder-input is a set of points (x_i,sin(x_i)) for a specific range (randomly sampled), and as output of the decoder I expect similar values.
In the tensorflow guide, there is cross-entropy used to compare the encoder input with the decoder output.
cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)
This makes sense, because the input and output are treated as probabilities. But in reality these probabily functions represent the sets of my sinus function.
Can't I simply use a mean-squared-error instead of the cross-entropy (I tried it and it works well) or causes this a wrong behaviour of the architecture at some point?
Best regards and thanks for your help!
Well, such questions happen when you work too much and stop thinking properly. For the sake of solving this, it makes sense to think about what I'm trying to do.
p(x|z) is the decoder reconstruction, what means, that by sampling from z the value x is generated with the probability of p. In the tensorflow-example image-classification/generation is used, in that case crossentropy makes sense. I simply want to minimize the distance between my input and output. The use of mse is kind of logical.
Hope that helps someone at some point.
Regards.

How to input scalar (non-image) values to CNNs?

The classification task is based on a image and a scalar value.
If I encoded the scalar value as image pixels with that value (or a normalized version of the same) and append it as another layer in the input image, I would be wasting convolutional computation cycles over the encoding to get this information into the network.
On the other hand, I can send this as another neuron to the layer where flattening of conved feature maps occurs. Another option would be adding just before the output layer. (But how do I implement such a network in Keras or tensorflow?)
Which is the best method to send in scalar values?
PS: Although this question is not specific to any framework, Keras examples would be great in a way that they are simple enough for most people to understand... Links to blogs addressing the same are welcome too.
See this question and answer on the Cross Validated site: Combining image and scalar inputs into a neural network
In addition to the "bias" method suggested in the paper mentioned there(when the scalar is being used as a bias to some convolution layer), and the other option suggested in the answer to append the scalar to some flattened layer, you can also use an inner product (fully connected, "Dense" in Keras) layer to find the connectivity pattern between the ND input to the scalar.

Is there any way to get variable importance with Keras?

I am looking for a proper or best way to get variable importance in a Neural Network created with Keras. The way I currently do it is I just take the weights (not the biases) of the variables in the first layer with the assumption that more important variables will have higher weights in the first layer. Is there another/better way of doing it?
Since everything will be mixed up along the network, the first layer alone can't tell you about the importance of each variable. The following layers can also increase or decrease their importance, and even make one variable affect the importance of another variable. Every single neuron in the first layer itself will give each variable a different importance too, so it's not something that straightforward.
I suggest you do model.predict(inputs) using inputs containing arrays of zeros, making only the variable you want to study be 1 in the input.
That way, you see the result for each variable alone. Even though, this will still not help you with the cases where one variable increases the importance of another variable.
*Edited to include relevant code to implement permutation importance.
I answered a similar question at Feature Importance Chart in neural network using Keras in Python. It does implement what Teque5 mentioned above, namely shuffling the variable among your sample or permutation importance using the ELI5 package.
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
import eli5
from eli5.sklearn import PermutationImportance
def base_model():
model = Sequential()
...
return model
X = ...
y = ...
my_model = KerasRegressor(build_fn=basemodel, **sk_params)
my_model.fit(X,y)
perm = PermutationImportance(my_model, random_state=1).fit(X,y)
eli5.show_weights(perm, feature_names = X.columns.tolist())
It is not that simple. For example, in later stages the variable could be reduced to 0.
I'd have a look at LIME (Local Interpretable Model-Agnostic Explanations). The basic idea is to set some inputs to zero, pass it through the model and see if the result is similar. If yes, then that variable might not be that important. But there is more about it and if you want to know it, then you should read the paper.
See marcotcr/lime on GitHub.
This is a relatively old post with relatively old answers, so I would like to offer another suggestion of using SHAP to determine feature importance for your Keras models. SHAP also allows you to process Keras models using layers requiring 3d input like LSTM and GRU while eli5 cannot.
To avoid double-posting, I would like to offer my answer to a similar question on Stackoverflow on using SHAP.

How should I structure my labels for TensorFlow?

I'm trying to use TensorFlow to train output servo commands given an input image.
I plan on using a file as #mrry suggested in this question, with the images like so:
../some/path/some_img.JPG *some_label*
My question is, what are the label formats I can provide to TensorFlow and what structures are suggested?
My data is basically n servo commands from 0-10 seconds. A vector would work great:
[0,2,4,3]
or similarly:
[0,.25,.4,.3]
I couldn't find much about labels in the docs. Can anyone shed any light on TensorFlow labels?
And a very related question is what is the best way to structure these for TensorFlow to properly learn from them?
In Tensorflow Labels are just generic tensor. You can use any kind of tensor to store your labels. In your case a 1-D tensor with shape (4,) seems to be desired.
Labels do only differ from the rest of the data by its use in the computational graph. (Usually) labels should only be used inside the loss function while you propagate the other data through the whole network. For your problem a 4-d regression function should work.
Also, look at my newest comment to the (old) question. Using the slice_input_producer seems to be preferable in your case.