Can we create a recursive model in keras? - tensorflow

I have models to make in keras. The output of one model has to be fed as input to other models.
Input -> say a batch of 64 X 64 images
First model outputs -> three outputs , splits some of the input images of the batch to 32 X 32, 64 X 32 and 64 X 16.
Each of these images of different sizes will be input to three different models which will further split them. This will continue six times in a recursive fashion.
See the pic for better understanding:Click to see image
There are 6 stages in each stage there are three choices from the parent model.
In this way a ternary tree structure of models is formed.
Each model has its own loss and optimizers.
How to implement such a model during training ? Should we use recursion ? Is recursion allowed in model training in such a manner in keras?

Will the sizes/number change during training? Or will you define the setup and keep it like that? If you are keeping it the same throughout, but just changing it to test different model setups, you can easily create a function that generates the model tree. For example
def create_model(tree_depth):
models = []
for i in range(tree_depth):
model = ... # might be nice to have a function for defining a single model
models.append(model)
top_level_inputs = tf.keras.layers.Input((64, 64))
x = model[0](top_level_inputs) # using functional model format here
# if you want different parts of the input to go to different models, you may struggle.
# Look into strided_slice if necessary
for mod in models:
x = mod(x) # you will need to code the true tree structure here, rather than this one-level for loop
total_model = tf.keras.models.Model(top_level_inputs, x)
return total_model
my_model = create_model(my_depth)
The biggest challenge will be automating the shapes if you don't have each layer get the same sized inputs, and making some sort of nested for-loop to handle the recursions/splitting.

Related

Switch between the heads of a model during inference

I have 200 neural networks which I trained using transfer learning on text. They all share the same weights except for their heads which are trained on different tasks. Is it possible to merge those networks into a single model to use with Tensorflow such that when I call it with input (text, i) it returns me the prediction task i. The idea here is to only store the shared weights once to save on model size and also to only evaluate the head of the task we want to predict in order to save on computations. The important bit is to wrap all of that into a Tensorflow model as I want to make it easier to serve it on google-ai-platform .
Note: It is fine to train all the heads independently, I just want to put all of them together into a single model for the inference part
You probably have a model like the following:
# Create the model
inputs = Input(shape=(height, width, channels), name='data')
x = layers.Conv2D(...)(inputs)
# ...
x = layers.GlobalAveragePooling2D(name='penultimate_layer')(x)
x = layers.Dense(num_class, name='task0', ...)(x)
model = models.Model(inputs=inputs, outputs=[x])
Until now the model only has one output. You can add multiple outputs at model creation, or later on. You can add a new head like this:
last_layer = model.get_layer('penultimate_layer').output
output_heads = []
taskID = 0
while True:
try:
head = model.get_layer("task"+str(taskID))
output_heads.append(head.output)
taskID += 1
except:
break
# add new head
new_head = layers.Dense(num_class, name='task'+str(taskID), ...)(last_layer)
output_heads.append(new_head)
model = models.Model(inputs=model.input, outputs=output_heads)
Now since every head has a name you can load your specific weights, calling the head by name. The weights to load are the weights of the last layer of (an)other_model. You should have something like this:
model.get_layer("task0").set_weights(other_model.layers[-1].get_weights())
When you want to obtain predictions, all you need to know is the task ID of the head you want to look at:
taskID=0 # obtain predictions from head 0
outputs = model(test_data, training=False)
predictions = outputs[taskID]
If you want to train new heads later on, while still sharing the same backbone, you just have to freeze the other heads, otherwise even those will be trained, and you don't want that:
for layer in model.layers:
if "task" in layer.name:
layer.trainable = False
# code to add the new head ...
Training new tasks, so a new set of classes, in a later moment is called task-incremental learning. The major issue with this is catastrophic forgetting: it is pretty easy to still forget prior knowledge while training new tasks. Even if the heads are frozen, the backbone obviously isn't. If you do this you'll have to apply some technique to avoid this.

Siamese Twin Network: Merging of data streams with a custom function

since I am not very experienced I am struggling with a siamese twin network.
I have 2 images which run trough the same CNN and generate each a distinct feature vector. I would like to train a further network interpreting these two image vectors (each with 32 elements). In an intermediate step I would like to use these vectors as input for a function NCC which is located as a Layer between the CNN and the NN and defined in the following snippet ( i.e. the output should be used for the next NN):
def NCC(a, b):
l=a.shape[1]
av_a=tf.math.reduce_mean(a)
av_b=tf.math.reduce_mean(b)
a=a-av_a
b=b-av_b
norm_a=tf.math.sqrt(tf.math.reduce_sum(a*a))
norm_b=tf.math.sqrt(tf.math.reduce_sum(b*b))
a=a/norm_a
b=b/norm_b
A=tf.reshape(tf.repeat(a, axis=0, repeats=l),(l,l))
B=tf.reshape(tf.repeat(b, axis=0, repeats=l),(l,l))
ncc=Flatten()(A*tf.transpose(B))
return ncc
The output vector (for batchsize=1) should have a 32x32=1024 elements. It seems to work for a batchsize of 1. If I increase the batch size I run into trouble because the input vectors are now tensors with shape=(batch_size,32). I think this is a very stupid question- But how can I circumvent this issue? (It should be noted I wish also to have an output tensor with shape=(batch_size,1024))
Thanks in advance
Mike

Keras Looping LSTM layers

I am trying to build a model which is basically sequence to sequence model but i have a special encoder namely "Secondary Encoder".
Timesteps in Secondary Encoder = 300
this encoder has a special property, in essence it is a GRU, but at each timestep the hidden state produced by the GRUCell is needed to be altered, it is needed to be Added with another variable and then this combination(new hidden state) is passed on to the next GRUCell which uses this as initial_state........this thing repeated 300 times.
As 300 GRUCells are required (one for each time step) it is not feasible to hard code each of the 300 layers and create the model.
So, I need help to figure out how to write a loop to implement this thing in keras or maybe how to create a custom Layer (if this is a better choice).
what I thought (pseudocode) :-
here alpha is the variable that I was talking that i want to add
x = Input(shape=...)
encoder_cell = GRU(10,return_state=True)
init_state = xxxx //some value to give as initialiser to first GRU cell
for t in range(300):
_,hstate = encoder_cell(x[t],initial_state = init_state)
init_state = hstate + alpha
model = Model(inputs = x, outputs = init_state)
will this work ? will the model be able to interpret that it needs to loop 300 times at each training example?
The model is quite big it has skip connections and lots of other things that's why i need your help to figure out this subset of my problem before i implement the rest, and please ignore the syntax, this is just pseudocode.
Also, I need to call this model again n again, so i think the iterative way will slow down the process by quite a lot right?

Subsection of grid as input to cnn

I have two huge grids (input and output) representing some spatial data of the same area. I want to be able to generate the output pixel-by-pixel by feeding a neural network a small part of the input grid, around the pixel of interest.
The naive way of training and evaluating on the CNN would be to extract sections separately, and giving those to the fit() function. But if the sub-grid the CNN operates on is e.g. a 256×256 area of the input, then I would copy each data point 65536 (!!!) times per epoch.
So is there any way to have karas just use subsections of a bigger data structure as training?
To me, this sounds a bit like training RNN's on sequencial sections of a data series, instead of copying each section separately.
The performance consideration is mainly in the case of evaluating the model. I want to use this model to generate output grid of a huge geographical area (denmark) with a resolution of 12,5 cm
It seems to me that you are looking for a fully convolutional network (FCN).
By using only layers that scale in size with their inputs (banishing the use of dense layers specifically), an FCN is able to produce an output with a spatial range that grows proportionally with that of the input — typically, the ouput has the same resolution as the input, as in your case.
If your inputs are very large, you can still train an FCN on subimages. Then for inference, you can
run the network on your entire image: indeed, sometimes the inputs are too big to be batched together during training, but can be feed alone for inference.
or split your input into subimages and tile the results back. In that case, I would probably use overlapping tiles to avoid potential border effects.
You can probably go well with a Sequence generator.
You will still have to create slices for each batch, but taking slices isn't slow at all compared with the CNN operations.
And by using a keras.utils.Sequence, the generation of the batches is parallel with the model's execution, so no penalty:
class GridGenerator(keras.utils.Sequence):
def __init__(self, originalGrid_maybeFileName, outputGrid, subGridSize):
self.originalGrid = originalGrid_maybeFileName
self.outputGrid = outputGrid
self.subgridSize = subgridSize
def __len__(self):
#naive implementation, if grids are squares and the sizes are multiples of each other
self.divs = self.originalGrid.shape[:,:,1] // self.subgridSize
return self.divs * self.divs
def __getitem__(self,i):
row, column = divmod(i, self.divs)
#using channels_last
x= self.originalGrid[:,row:row+self.subgridSize, column:column+self.subgridSize]
y= self.outputGrid[:,row:row+self.subgridSize, column:column+self.subgridSize]
return x,y
If the full grid doesn't fit your PC's memory, then you should find ways of loading parts of the grid at a time. (Use the generator to load these parts)
Create the generator and train with fit_generator:
generator = GridGenerator(xGrid, yGrid, subSize)
#you can create additional generators to take a part of that as training and another part as validation
model.fit_generator(generator, len(generator), ...., workers = 4)
The workers argument determines how many batches will be loaded in parallel before sent to the model.

CNTK Transfer Learning with LSTM: appending pretrained network to another network

I have a pretrained Seq-to-Seq slot tagger network as which in its simplest form is follows:
Network_1 = Sequential ([
Embedding(emb_dim)
Recurrence(LSTM(LSTM_dim))
Dense(num_labels)
])
I would like to use the output of this as initial layers in another network. Basically I would like to concatenate the embeddings from the network_1 (pretrained) to an embedding layer in the network_2 as follows:
Network_2 = Sequential ([
Concat_embeddings ( Embedding(emb_dim), Network_1_embed() )
Recurrence(LSTM(LSTM_dim))
(Label('encoded_h'), Label('encoded_c'))
])
def Network_1_embed():
loaded_model = load_model(path_to_network_1_saved_model);
cloned_model = loaded_model.clone(CloneMethod.freeze);
return cloned_model
def Concat_embeddings(emb1, emb2):
X=Placeholder();
return splice(emb1(X), emb2(X))
This is giving me the following error
ValueError: Times: The 1 leading dimensions of the right operand with shape '[50360]' do not match the left operand's trailing dimensions with shape '[293]'
For reference, we get [293] since emb_dim=256, and num_network_1_labels=37, while [50360] is the vocabulary size of the network_2 input. The Network_1 also had the same vocabulary mapping when being trained, so it can take the same input, and output a 37 dimensional vector for each token.
How do I make this work?
Thanks
I think your problem is that you are using the entire Network_1 as the embedding, instead of just its embedding layer.
One way would be to define embed separately and train it through Network_1:
embed = Embedding(emb_dim)
Network_1 = Sequential ([
embed,
Recurrence(LSTM(LSTM_dim)),
Dense(num_labels)
])
Then train Network_1, but save embed:
embed.save(EMBED_PATH)
Explanation: Since Network_1 just invokes embed, they share parameters, so that training Network_1 will train embed's parameters. Saving embed then gives you the embedding layer trained by Network_1. Quite straight-forward, actually.
Then, to train your second model (in a second script), load embed from disk and just use it:
Network_1_embed = load_model(EMBED_PATH)
Network_2 = Sequential ([
( Embedding(emb_dim), Network_1_embed() ),
splice,
Recurrence(LSTM(LSTM_dim)),
(Label('encoded_h'), Label('encoded_c'))
])
Note the use of a function tuple as the first item passed to Sequential(). The tuple means to apply both functions to the same input, and generates two outputs, which are then the input to the subsequent function, splice.
To keep embed constant, clone it with Freeze option as you already did in your example.
(I am not in front of a computer with the latest CNTK and cannot test this, so it is possible that I made a mistake.)