When we add a bidirectional RNN layer I can understand that we have to concatenate hidden states. If we use bidirectional RNN layer in encoder decoder model do we have to train the bidirectional RNN layer separately ?
No. To quote from the abstract of Bidirectional Recurrent Neural Networks by Schuster and Paliwal:
The BRNN can be trained without the limitation of using input information just
up to a preset future frame. This is accomplished by training it
simultaneously in positive and negative time direction.
I guess you are talking about tf.nn.static_bidirectional_rnn.
Related
Now I have multi-source remote sensing data, like using them as my input layers, and only one output layer, the model is CNN or DBN, can this be achieved?
Please help me with an idea
I have a question about the architecture of the Keras layers. About kind of unfolding the architecture of the layer. For example, I would like to see the main equation used for the Dense layer in Keras, how the n. of neurons, activation function, and bias are tied to each other. This seems pretty simple in logical understating for the Dense layer. However, finding out the tfa.ESN layer multiplication inside the corresponding layer, for instance, would be very helpful for me.
Thanks to everyone,
J
I am trying to apply GradCAM to my pre-trained CNN model to generate heat maps of layers. My custom CNN design is shown as follows:
- It adopted all the convolution layers and the pre-trained weights from the VGG16 model.
- Extract lower level features (early convolution layers) from VGG16.
- Train the fully connected layers of both normal/high and lower level features from VGG16.
- Concatenate outputs of both normal/high- and lower-level f.c. layers and then train more f.c. layers before the final prediction.
model design
I want to use GradCAM to visualize the feature maps of the low-level route and the normal/high-level route and I have done such heatmaps on non-concatenate fine-tuned VGG using the last convolutional layers. My question is, on a concatenated CNN model, can the Grad-CAM method still work using the gradient of the prediction with respect to the low- and high-level feature map feature maps respectfully? If not, are there other methods that can do the heatmaps visualization for such a model? Is using the shared fully connected layer an option?
Any idea and suggestions are much appreciated!
I am trying to use Tensorflow for transfer learning using a pre-trained VGG16 model.
However, the input to the model in my problem is an RGB image with an extra channel functioning as a binary mask. This is different than the original input on which the model was trained (224x224 RGB images).
I think that using the pretrained model is still possible in this case. How do I assign weights for connections between the first convolutional layer and the extra channel? Is transfer learning still applicable in such a scenario?
Thanks!
I am working on a project which is to localize object in a image. The method I am going to adopt is based on the localization algorithm in CS231n-8.
The network structure has two optimization heads, classification head and regression head. How can I minimize both of them when training the network?
I have one idea that summarizing both of them into one loss. But the problem is classification loss is softmax loss and regression loss is l2 loss, which means they have different range. I don't think this is the best way.
It depends on your network status.
If your network is just able to extract features [you're using weights kept from some other net], you can set this weights to be constants and then train separately the two classification heads, since the gradient will not flow trough the constants.
If you're not using weights from a pre-trained model, you
Have to train the network to extract features: thus train the network using the classification head and let the gradient flow from the classification head to the first convolutional filter. In this way your network now can classify objects combining the extracted features.
Convert to constant tensors the learned weights of the convolutional filters and the classification head and train the regression head.
The regression head will learn to combine the features extracted from the convolutional layer adapting its parameters in order to minimize the L2 loss.
Tl;dr:
Train the network for classification first.
Convert every learned parameter to a constant tensor, using graph_util.convert_variables_to_constants as showed in the 'freeze_graph` script.
Train the regression head.