How to feed parameter ''dy" of ‘cudnnSoftmaxBack()' in cuDNN API? - layer

I want to implement to LeNet-5 with cuDNN, and try to train the net on MNIST data set.
The last layer of the net is 'Softmax', and I use the function 'cudnnSoftmaxForward()' in the forward process. And then, I want to use the function 'cudnnSoftmaxBackward()' in the backward process, but I do not sure about one of the parameter in the function 'cudnnSoftmaxBackward()'--'dy'.
The function 'cudnnSoftmaxtBackward' provided by NVIDIA is :
cuDNNSoftmaxBackward parameter
In the API ,I know that 'dy' means the 'input_diff', but the softmax layer is the last layer, how can I feed the 'input_diff' for the function 'cudnnSoftmaxBackward()'? Can I just feed it with the diff between 'network target output' and 'network actual output' ?


How does TensorFlow calculate the gradients of an FFT layer?

If I insert the function, e.g., tf.fft(input, name=None), into a neural network, how does TensorFlow calculate the gradients in backpropagation?
I didn't find any documentation about this.
I am using TensorFlow 1.0.
If you're just inserting the tf.fft(...) function in the middle of a model I'm not certain tensorflow will even be able to handle a forward pass. If you read the docs on tf.signal.fft ( or even just read the tf.fft function header, they both require inputs with dtype=tf.complex64 or dtype=tf.complex128. Perhaps tensorflow will cast float32 inputs to complex and then back again, allowing you to complete a forward pass, I'm not sure, but from what I can gather from reading tensorflow gradient documents casting values causes a disconnect between error gradient and Model parameters, meaning a backward pass won't work. You could try implementing a custom fft function which doesn't cast values and see if that works? It's not so easy though.

Tensorflow Estimator API: How to pass parameter from input function

I'm trying to add class weights as a hyperparameter for my model, but to calculate weight I need to read input data, this happens inside input_fn which then passed to An output of input_fn are only features, labels which should have same shape num_examples * num_features. My questions - is there any way to propagate data from input_fn to model_fn's hyperparameter map? Or as alternative - maybe there is a wrapper for input_fn dataset which allows to oversample minority/undersample majority along with batching - in this case I would not need any parameter to propagate.
Both features and labels can be dictionary of tensors (not just one tensor). The tensors can be any shape you want though it's common to be num_examples * ...
If you don't use any of the predefined estimators, the easiest way would be to add another feature with what you need to compute the weights, compute the weights in the model then use them (multiply the loss or pass it as a parameter).
You also have access to hyper parameters inside the input_fn so you can compute the weight there and add it as a separate column.
If you use a canned estimator check the documentation. I see most of them support a weight_column_name. In this case just give it the name you used in the features dictionary for the weight values.
Alternatively, if all else fails you can sample the data the way you want before you feed it to tensorflow.

Seq2Seq Models for Chatbots

I am building a chat-bot with a sequence to sequence encoder decoder model as in NMT. From the data given I can understand that when training they feed the decoder outputs into the decoder inputs along with the encoder cell states. I cannot figure out that when i am actually deploying a chatbot in real time, how what should I input into the decoder since that time is the output that i have to predict. Can someone help me out with this please?
The exact answer depends on which building blocks you take from Neural Machine Translation model (NMT) and which ones you would replace with your own. I assume the graph structure exactly as in NMT.
If so, at inference time, you can feed just a vector of zeros to the decoder.
Internal details: NMT uses the entity called Helper to determine the next input in the decoder (see tf.contrib.seq2seq.Helper documentation).
In particular, tf.contrib.seq2seq.BasicDecoder relies solely on helper when it performs a step: the next_inputs that the are fed in to the subsequent cell is exactly the return value of Helper.next_inputs().
There are different implementations of Helper interface, e.g.,
tf.contrib.seq2seq.TrainingHelper is returning the next decoder input (which is usually ground truth). This helper is used in training as indicated in the tutorial.
tf.contrib.seq2seq.GreedyEmbeddingHelper discards the inputs, and returns the argmax sampled token from the previous output. NMT uses this helper in inference when sampling_temperature hyper-parameter is 0.
tf.contrib.seq2seq.SampleEmbeddingHelper does the same, but samples the token according to categorical (a.k.a. generalized Bernoulli) distribution. NMT uses this helper in inference when sampling_temperature > 0.
The code is in BaseModel._build_decoder method.
Note that both GreedyEmbeddingHelper and SampleEmbeddingHelper don't care what the decoder input is. So in fact you can feed anything, but the zero tensor is the standard choice.

How to get value of a tensor from a Tensorflow Mode

I am using the following implementation of the Seq2Seq model. Now, if I want to pass some inputs and get the corresponding values of encoder's hidden state (self.encoder_last_state), how can I do it?
You need to first assemble input_feed, similar to the predict routine. Once you have that, just execute over the required hidden layer.
To assmeble the input_feed:
input_feed = self.check_feeds(encoder_inputs, encoder_inputs_length, decoder_inputs=None, decoder_inputs_length=None, decode=True)
input_feed[] = 1.0 over self.encoder_last_state:
encoder_last_state_activations =, input_feed)

How to use pre-trained model as non trainable sub network in tensorflow?

I'd like to train a network that contains a sub network that I need to stay fix during the training. The basic idea is to prepend and append some layers the the pre-trained network (inceptionV3)
new_layers -> pre-trained and fixed sub-net (inceptionv3) -> new_layers
and run the training process for the task I have without changing the pre-trained one.
I also need to branch directly on some layer of the pre-trained network. For example, with the inceptionV3 I like to uses it from the conv 299x299 to the last pool layer or from the conv 79x79 to the last pool layer.
Whether or not a "layer" is trained is determined by whether the variables used in that layer get updated with gradients. If you are using the Optimizer interface to optimize your network, then you can simply not pass the variables used in the layers that you want to keep fixed to the minimize function, i.e.,
opt.minimize(loss, <subset of variables you want to train>)
If you are using tf.gradients function directly, then remove the variables that you want to keep fixed from the second argument to tf.gradients.
Now, how you "branch directly" to a layer of a pre-trained network depends on how that network is implemented. I would simply locate the tf.Conv2D call to the 299x299 layer you are talking about, and pass as its input, the output of your new layer, and on the output side, locate the 79x79 layer, use its output as the input to your new layer.