Is it possible to freeze one LSTM layer and build another LSTM layer on the top of it?
The idea is quite simple but mechanism in Tensorflow makes it harder. All I need to do is to build one LSTM layer and save the model. Then, I restore this model to get kernel (weight) matrix and biases using:
tf.get_default_graph().get_tensor_by_name("rnn/multi_rnn_cell/cell_0/LSTM_cell/kernel:0") tf.get_default_graph().get_tensor_by_name("rnn/multi_rnn_cell/cell_0/LSTM_cell/bias:0")
Then, I want to get these two tensors and keep them untrained and build another LSTM layer on the top of this then fetch the variables of the second layer.
However, the only topic I came up related to what I want to do is here and it says that it is not possible, because I cannot set weights manually.
Doing this is super simple in feed forward neural network.
Does anyone have any idea?
Related
I have been playing around with neural networks for quite a while now, and recently came across the terms "freezing" & "unfreezing" the layers before training a neural network while reading about transfer learning & am struggling with understanding their usage.
When is one supposed to use freezing/unfreezing?
Which layers are to freezed/unfreezed? For instance, when I import a pre-trained model & train it on my data, is my entire neural-net except the output layer freezed?
How do I determine if I need to unfreeze?
If so how do I determine which layers to unfreeze & train to improve model performance?
I would just add to the other answer that this is most commonly used with CNNs and the amount of layers that you want to freeze (not train) is "given" by the amount of similarity between the task that you are solving and the original one (the one that the original network is solving).
If the tasks are very similar, let's say that you are using CNN pretrained on imagenet and you just want to add some other "general" objects that the network should recognize then you might get away with training just the dense top of the network.
The more dissimilar the tasks are, the more layers of the original network you will need to unfreeze during the training.
By freezing it means that the layer will not be trained. So, its weights will not be changed.
Why do we need to freeze such layers?
Sometimes we want to have deep enough NN, but we don't have enough time to train it. That's why use pretrained models that already have usefull weights. The good practice is to freeze layers from top to bottom. For examle, you can freeze 10 first layers or etc.
For instance, when I import a pre-trained model & train it on my data, is my entire neural-net except the output layer freezed?
- Yes, that's may be a case. But you can also don't freeze a few layers above the last one.
How do I freeze and unfreeze layers?
- In keras if you want to freeze layers use: layer.trainable = False
And to unfreeze: layer.trainable = True
If so how do I determine which layers to unfreeze & train to improve model performance?
- As I said, the good practice is from top to bottom. You should tune the number of frozen layers by yourself. But take into account that the more unfrozen layers you have, the slower is training.
When training a model while transfer layer, we freeze training of certain layers due to multiple reasons, such as they might have already converged or we want to train the newly added layers to an already pre-trained models. This is a really basic concept of Transfer learning and I suggest you go through this article if you have no idea about transfer learning .
I am using Google's Dopamine framework to train a specific reinforcement learning use-case. I am using an auto encoder to pre-train the convolutional layers of the Deep Q Network and then transfer those pre-trained weights in the final network.
To that end, I have created a separate model (in this case an auto-encoder) which I train and save the resulting model and weights.
The DQN model is created using Keras's model sub-classing method and the model used to save the trained convolutional layers weights was build using the Sequential API. My issue is with when trying to load the pre-trained weights to my final DQN model. Based on whether I use the load_model() or load_weights() functionality from Tensorflow's API I get two different overall behaviors of my network and I would like to understand why. Specifically I have the two following scenarios:
Loading the weights with theload_weights() method to the final model. The weights are the weights of the encoder plus one additional layer(added just before saving the weights) to fit the architecture of the final network implemented in dopamine where they are loaded.
First load the saved model with load_model() and then when defining the new model in the __init__() method, extract the relevant layers from the loaded model and then use them for the final model.
Overall, I would expect the two approaches to yield similar results with regards to the average reward achieved per episode , when I use the same pre-trained weights. However the two approaches differ ( 1. yield higher average reward than 2. although using the same pre-trained weights) and I don't understand why.
Furthermore, in order to validate this behavior I have tried loading random weights with the two aforementioned approaches in order to see a change in behavior. In both cases, based on which of the two aforementioned loading methods I am using, I end up with very similar resulting behavior with the respected case when loading the trained weights. It's seems like the pre-trained weights in each respected case have no effect on the overall resulting training behavior. Although, this might be irrelevant to the issue I am trying to investigate here as it might be the case that the pre-trained weights don't offer any benefit overall which is also possible.
Any thoughts and ideas on this would be much appreciated.
I am looking to train a large face identification network. Resnet or VGG-16/19. TensorFlow 1.14
My question is - if I run out of GPU memory - is it valid strategy to train sets of layers one by one?
For example train 2 cnn and maxpooling layer as one set, then "freeze the weights" somehow and train next set etc..
I know I can train on multi-gpu in tensorflow but what if I want to stick to just one GPU..
The usual approach is to use transfer learning: use a pretrained model and fine-tune it for the task.
For fine-tuning in computer vision, a known approach is re-training only the last couple of layers. See for example:
https://www.learnopencv.com/keras-tutorial-fine-tuning-using-pre-trained-models/
I may be wrong but, even if you freeze your weights, they still need to be loaded into the memory (you need to do whole forward pass in order to compute the loss).
Comments on this are appreciated.
I am aiming to build a RNN in Keras/TensorFlow that consists of layers of recurrent units (GRU, LSTM, etc.) as well as a loop from the bottom of the network to the top, to add an attention mechanism or special memory types. I am not familiar with symbolic loops, so first I tried to build an unrolled model along these lines:
As far as I see, what I would need to do this is an RNN layer with two input tensors and two output tensors as I would need to "route" the internal input/output of the RNN layers (green) myself to unroll these connections at the same time as the big loop (blue).
I can handle implementing the unrolled big loop with the concat layer and a custom splitting layer ( https://github.com/keras-team/keras/issues/890 ), but with the RNN layers I ran into a problem as I don't seem to be able to simulate them using more primitive layers (Dense, Activation, etc.). Before reimplementing them including the backprop step in a way that I can specify separate tensors as their external input and internal input, is there a better way to do this, possibly by somehow reusing existing code?
The project at https://github.com/csirmaz/superloop allows implementing RNNs with such a big loop. It seems to use its own implementation of RNN layers to get two inputs and two outputs.
I want to be able to make inferential analysis on my neural network by accessing the weights and making decision based on what I've found. In other words, If I've got my list of weights of a particular neuron in a hidden layer, then I want to be able to manipulate that neuron in any way I want. I want to mess with the Neuron's output. I'm using tensorflow for my neural network.