combine layers from different neural networks - tensorflow

I am using tensorflow to train two instances of the same neural network with two different datasets. the network itself is quite simple with an input and output layer and 6 hidden layers (each layer is a 20 meurons followed by a non-linear activation function).
I can train the network with two different datasets and that is fine. Now, what i want to do is basically create a new network which is a combination of these two trained networks. In particular, I want the input and the first 3 layers to be from one of the trained network and the last 3 layers and the output layer to be from the other network. I am very new to tensorflow and have not found a way to do this. Can someone point me to the API or some way to do this sort of hybrid networks?

Constructing your network with Keras will make this easy; see the keras documentation for how to reuse layers across networks.

You might be asking about multitask learning aspect,well it can be simplified by seperating the weight matrix of each individual variables trained with different datasets and sum there weight layers individually to a sharable_weight_layer variable after a, b trained networks and finally evaluate your model as summed network in multitasking method.

Related

Create neural network in Keras given a certain architecture

I want to create a neural network in Keras based on a given architecture, for example:
As you can see in the image I have a table with all the neurons, the connections between them and the weights of each connection, this means that I don't need to train the neural network, I just want to build it with the values of the table and test it.
Is there a way of doing this in Keras?
I'm new using Keras and Tensorflow, so I'm not sure this is possible or not.
Thank you.

What is freezing/unfreezing a layer in neural networks?

I have been playing around with neural networks for quite a while now, and recently came across the terms "freezing" & "unfreezing" the layers before training a neural network while reading about transfer learning & am struggling with understanding their usage.
When is one supposed to use freezing/unfreezing?
Which layers are to freezed/unfreezed? For instance, when I import a pre-trained model & train it on my data, is my entire neural-net except the output layer freezed?
How do I determine if I need to unfreeze?
If so how do I determine which layers to unfreeze & train to improve model performance?
I would just add to the other answer that this is most commonly used with CNNs and the amount of layers that you want to freeze (not train) is "given" by the amount of similarity between the task that you are solving and the original one (the one that the original network is solving).
If the tasks are very similar, let's say that you are using CNN pretrained on imagenet and you just want to add some other "general" objects that the network should recognize then you might get away with training just the dense top of the network.
The more dissimilar the tasks are, the more layers of the original network you will need to unfreeze during the training.
By freezing it means that the layer will not be trained. So, its weights will not be changed.
Why do we need to freeze such layers?
Sometimes we want to have deep enough NN, but we don't have enough time to train it. That's why use pretrained models that already have usefull weights. The good practice is to freeze layers from top to bottom. For examle, you can freeze 10 first layers or etc.
For instance, when I import a pre-trained model & train it on my data, is my entire neural-net except the output layer freezed?
- Yes, that's may be a case. But you can also don't freeze a few layers above the last one.
How do I freeze and unfreeze layers?
- In keras if you want to freeze layers use: layer.trainable = False
And to unfreeze: layer.trainable = True
If so how do I determine which layers to unfreeze & train to improve model performance?
- As I said, the good practice is from top to bottom. You should tune the number of frozen layers by yourself. But take into account that the more unfrozen layers you have, the slower is training.
When training a model while transfer layer, we freeze training of certain layers due to multiple reasons, such as they might have already converged or we want to train the newly added layers to an already pre-trained models. This is a really basic concept of Transfer learning and I suggest you go through this article if you have no idea about transfer learning .

How to best transfer learning using Dopamine for Reinforcement Learning?

I am using Google's Dopamine framework to train a specific reinforcement learning use-case. I am using an auto encoder to pre-train the convolutional layers of the Deep Q Network and then transfer those pre-trained weights in the final network.
To that end, I have created a separate model (in this case an auto-encoder) which I train and save the resulting model and weights.
The DQN model is created using Keras's model sub-classing method and the model used to save the trained convolutional layers weights was build using the Sequential API. My issue is with when trying to load the pre-trained weights to my final DQN model. Based on whether I use the load_model() or load_weights() functionality from Tensorflow's API I get two different overall behaviors of my network and I would like to understand why. Specifically I have the two following scenarios:
Loading the weights with theload_weights() method to the final model. The weights are the weights of the encoder plus one additional layer(added just before saving the weights) to fit the architecture of the final network implemented in dopamine where they are loaded.
First load the saved model with load_model() and then when defining the new model in the __init__() method, extract the relevant layers from the loaded model and then use them for the final model.
Overall, I would expect the two approaches to yield similar results with regards to the average reward achieved per episode , when I use the same pre-trained weights. However the two approaches differ ( 1. yield higher average reward than 2. although using the same pre-trained weights) and I don't understand why.
Furthermore, in order to validate this behavior I have tried loading random weights with the two aforementioned approaches in order to see a change in behavior. In both cases, based on which of the two aforementioned loading methods I am using, I end up with very similar resulting behavior with the respected case when loading the trained weights. It's seems like the pre-trained weights in each respected case have no effect on the overall resulting training behavior. Although, this might be irrelevant to the issue I am trying to investigate here as it might be the case that the pre-trained weights don't offer any benefit overall which is also possible.
Any thoughts and ideas on this would be much appreciated.

Is it possible to train a single Keras Model on Two different GPU on two different systems and combine the training?

I'm using Google Colab for training my models.
But speed is still low.
So is there a way I can train from two different accounts and combine the training later?
No, you cannot train using 2 accounts the same model on colab. Google colab is for research purposes only. Not to train large scale production models. Colab also disconnects the kernel every 12 hour.
You can instead train the model using multiple GPU's on a single computer. Keras supports multi GPU training when using tensorflow as backend. But training on two different computers/VM is not possible. How will gradients flow during back propagation?
There is a solution though, but not an end-to-end approach. You can split your model into two different models, where the output of first model will become the input for second and second will produce the final output. For this you need a different training set for each model.
Take this example.
Suppose you are building a face recogniser where the model takes in a raw camera picture and recognises the face as yes/no.
Instead of training this big Networks you could split it into two different nets, where task for first net will be to crop the face and remove other useless things from image and second to recognise from cropped image.
This is non end-to-end model, and you can train the two models diffently on different machines with different dataset and then eventually merge it together. This is usually more powerful and easy to train.
Look up this question Tensorflow Combining Two Models End to End
Another possibility is to ensemble the two trained models. You'd have to make sure however that the data for both of the models are coming from the same distribution.

Tensorflow's Neurons

I want to be able to make inferential analysis on my neural network by accessing the weights and making decision based on what I've found. In other words, If I've got my list of weights of a particular neuron in a hidden layer, then I want to be able to manipulate that neuron in any way I want. I want to mess with the Neuron's output. I'm using tensorflow for my neural network.