Feed input to intermediate layer and then do back propagation in keras - tensorflow

I have looked around everywhere but could not find the way to do this.
Basically I want to feed input to some intermediate layer in a keras model and want to the backpropagation for the full graph (i.e. including layer before the intermediate layer). To understand this I refer you to the figure as mentioned in the paper "Multi-view Convolutional Neural Networks for 3D Shape Recognition".
From the figure you can see that the feature are maxpooled in view pooling layer and then the resultant vector is passed to the rest of the network.
From the paper they further did he back propagation using the view pooling features.
To achieve this I am trying a simple approach. There will not be any viewpooling layer in my model. This pooling I will do offline by taking the features for multiple views and then taking the max of it. Finally the aggregated feature will be passed to rest of the network. However I am not able to figure out how to do the back propagation to the full network by passing input to intermediate layer directly.
Any help would be appreciated. Thanks

If you have the code of the tensorflow model, then this will be quite simple. The model would probably look like
def model( cnns ):
viewpool_output = f(cnns)
cnn2_output = cnn2( viewpool_output )
You would just need to change the model to
def model( viewpool_output ):
cnn2_output = cnn2( viewpool_output )
and instead of passing a "real" view pool output, you just pass whatever image you want. But you haven't given any code, so we can only guess at what it looks like.


Training Tensorflow only one object

Corresponding Tensorflow documentation I trained 3 objects and get result (It can recognize these objects). When I show other objects (not the 3 ones) it doesn't work correctly.
I want to train only one object (example: a cup) and recognize only this object. Is it possible to do via Tensorflow ?
Your question doesn't provide enough details, but as I can guess your trained the network with softmax activation and Categorical or SparseCategorical cross entropy loss. If my guess is right, such network always generates prediction to one of three classess, regardless to actual data, i.e. there is no option of "no-one".
In order to train network to recognize only one class of objects, make the only one output with only one channel and sigmoid activation. Use BinaryCrossEntropy loss to train your model for the specific object. Provide dataset that includes examples with this object and without it.

How can I use dropout in Conv Layer to drop activation maps in tensorflow?

I am trying to add dropout in convolutional layers(although it seems people dont do this a lot).
According to cs231n, they recommended to drop the activation maps instead of units in all activation maps(I consider this somehow make sense, because each activation maps are extracting the same feature in different positions).
In tensorflow, I can't find any API can directly do this, so how can I do this?
This is my first time asking a question in StackOverflow, and I will appreciate for advices and answers.
You can actually do this with the available dropout functions via the noise_shape argument. E.g. using the layers API:
x = tf.layers.dropout(x, noise_shape=[batch_size, 1, 1, features])
This would be for 2D convolution and channels_last format. We only generate a single noise value for image width/height which will be broadcast over the image dimensions. However, we still generate a different noise value for each feature/activation map.

How to input scalar (non-image) values to CNNs?

The classification task is based on a image and a scalar value.
If I encoded the scalar value as image pixels with that value (or a normalized version of the same) and append it as another layer in the input image, I would be wasting convolutional computation cycles over the encoding to get this information into the network.
On the other hand, I can send this as another neuron to the layer where flattening of conved feature maps occurs. Another option would be adding just before the output layer. (But how do I implement such a network in Keras or tensorflow?)
Which is the best method to send in scalar values?
PS: Although this question is not specific to any framework, Keras examples would be great in a way that they are simple enough for most people to understand... Links to blogs addressing the same are welcome too.
See this question and answer on the Cross Validated site: Combining image and scalar inputs into a neural network
In addition to the "bias" method suggested in the paper mentioned there(when the scalar is being used as a bias to some convolution layer), and the other option suggested in the answer to append the scalar to some flattened layer, you can also use an inner product (fully connected, "Dense" in Keras) layer to find the connectivity pattern between the ND input to the scalar.

Tensorflow: jointly training CNN + LSTM

There are quite a few examples on how to use LSTMs alone in TF, but I couldn't find any good examples on how to train CNN + LSTM jointly.
From what I see, it is not quite straightforward how to do such training, and I can think of a few options here:
First, I believe the simplest solution (or the most primitive one) would be to train CNN independently to learn features and then to train LSTM on CNN features without updating the CNN part, since one would probably have to extract and save these features in numpy and then feed them to LSTM in TF. But in that scenario, one would probably have to use a differently labeled dataset for pretraining of CNN, which eliminates the advantage of end to end training, i.e. learning of features for final objective targeted by LSTM (besides the fact that one has to have these additional labels in the first place).
Second option would be to concatenate all time slices in the batch
dimension (4-d Tensor), feed it to CNN then somehow repack those
features to 5-d Tensor again needed for training LSTM and then apply a cost function. My main concern, is if it is possible to do such thing. Also, handling variable length sequences becomes a little bit tricky. For example, in prediction scenario you would only feed single frame at the time. Thus, I would be really happy to see some examples if that is the right way of doing joint training. Besides that, this solution looks more like a hack, thus, if there is a better way to do so, it would be great if someone could share it.
Thank you in advance !
For joint training, you can consider using tf.map_fn as described in the documentation https://www.tensorflow.org/api_docs/python/tf/map_fn.
Lets assume that the CNN is built along similar lines as described here https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10.py.
def joint_inference(sequence):
inference_fn = lambda image: inference(image)
logit_sequence = tf.map_fn(inference_fn, sequence, dtype=tf.float32, swap_memory=True)
lstm_cell = tf.contrib.rnn.LSTMCell(128)
output_state, intermediate_state = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=logit_sequence)
projection_function = lambda state: tf.contrib.layers.linear(state, num_outputs=num_classes, activation_fn=tf.nn.sigmoid)
projection_logits = tf.map_fn(projection_function, output_state)
return projection_logits
Warning: You might have to look into device placement as described here https://www.tensorflow.org/tutorials/using_gpu if your model is larger than the memory gpu can allocate.
An Alternative would be to flatten the video batch to create an image batch, do a forward pass from CNN and reshape the features for LSTM.

How to use pre-trained model as non trainable sub network in tensorflow?

I'd like to train a network that contains a sub network that I need to stay fix during the training. The basic idea is to prepend and append some layers the the pre-trained network (inceptionV3)
new_layers -> pre-trained and fixed sub-net (inceptionv3) -> new_layers
and run the training process for the task I have without changing the pre-trained one.
I also need to branch directly on some layer of the pre-trained network. For example, with the inceptionV3 I like to uses it from the conv 299x299 to the last pool layer or from the conv 79x79 to the last pool layer.
Whether or not a "layer" is trained is determined by whether the variables used in that layer get updated with gradients. If you are using the Optimizer interface to optimize your network, then you can simply not pass the variables used in the layers that you want to keep fixed to the minimize function, i.e.,
opt.minimize(loss, <subset of variables you want to train>)
If you are using tf.gradients function directly, then remove the variables that you want to keep fixed from the second argument to tf.gradients.
Now, how you "branch directly" to a layer of a pre-trained network depends on how that network is implemented. I would simply locate the tf.Conv2D call to the 299x299 layer you are talking about, and pass as its input, the output of your new layer, and on the output side, locate the 79x79 layer, use its output as the input to your new layer.