I am trying to train a neural net with some parameters fixed in a layer.
I have a layer with 30 input and 30 output then it will have 930 parameters/weights if it is a normal dense layer. But I want to fix around 800 of these parameters to zero and don't want them to train. While rest 130 are supposed to be trained.
Is there a way to freeze some parameters in tensorflow or pytorch?
P.S. - Not freezing the whole layer, only partially.
In PyTorch, what you can do is to copy the parameters of a layer, and then after 'optimizer.step()' just copy the parameters back into the layer, effectively resetting the weights.
You could do something like:
layer_to_freeze = 0 # change as you need
for i, param in enumerate(model.parameters()): # iteration needed since model.parameters() returns a generator.
if i == layer_to_freeze: # First layer
frozen_params = param
optimizer.step() # Backpropagate the gradients.
for i, param in enumerate(model.parameters()):
if i == layer_to_freeze: # First layer
with torch.no_grad(): # required for the copying operation
param[0:30,0:27] = frozen_params[0:30,0:27] # copies 810 parameters, modify according to your desires.
Related
hello everyone i am trying to train a model using cnn and keras but the training don't finish and i got this warning and it stops training , i don't know why and i didn't understand where the problem is can anyone gives me a advice or what i should change in the code
def myModel():
no_Of_Filters=60
size_of_Filter=(5,5) # THIS IS THE KERNEL THAT MOVE AROUND THE IMAGE TO GET THE FEATURES.
# THIS WOULD REMOVE 2 PIXELS FROM EACH BORDER WHEN USING 32 32 IMAGE
size_of_Filter2=(3,3)
size_of_pool=(2,2) # SCALE DOWN ALL FEATURE MAP TO GERNALIZE MORE, TO REDUCE OVERFITTING
no_Of_Nodes = 500 # NO. OF NODES IN HIDDEN LAYERS
model= Sequential()
model.add((Conv2D(no_Of_Filters,size_of_Filter,input_shape=(imageDimesions[0],imageDimesions[1],1),activation='relu'))) # ADDING MORE CONVOLUTION LAYERS = LESS FEATURES BUT CAN CAUSE ACCURACY TO INCREASE
model.add((Conv2D(no_Of_Filters, size_of_Filter, activation='relu')))
model.add(MaxPooling2D(pool_size=size_of_pool)) # DOES NOT EFFECT THE DEPTH/NO OF FILTERS
model.add((Conv2D(no_Of_Filters//2, size_of_Filter2,activation='relu')))
model.add((Conv2D(no_Of_Filters // 2, size_of_Filter2, activation='relu')))
model.add(MaxPooling2D(pool_size=size_of_pool))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(no_Of_Nodes,activation='relu'))
model.add(Dropout(0.5)) # INPUTS NODES TO DROP WITH EACH UPDATE 1 ALL 0 NONE
model.add(Dense(noOfClasses,activation='softmax')) # OUTPUT LAYER
# COMPILE MODEL
model.compile(Adam(lr=0.001),loss='categorical_crossentropy',metrics=['accuracy'])
return model
############################### TRAIN
model = myModel()
print(model.summary())
history=model.fit_generator(dataGen.flow(X_train,y_train,batch_size=batch_size_val),steps_per_epoch=steps_per_epoch_val,epochs=epochs_val,validation_data=(X_validation,y_validation),shuffle=1)
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 20000 batches). You may need to use the repeat() function when building your dataset.
While using generators, you can either run the model without the step_per_epoch parameter and let the model figure out how many steps are there to cover an epoch.
history=model.fit_generator(dataGen.flow(X_train,y_train,batch_size=batch_size_val),
epochs=epochs_val,
validation_data=(X_validation,y_validation),
shuffle=1)
OR
you'll have to calculate steps_per_epoch and use it while training as follows;
history=model.fit_generator(dataGen.flow(X_train,y_train,batch_size=batch_size_val),
steps_per_epoch=(data_samples/batch_size)
epochs=epochs_val,
validation_data=(X_validation,y_validation),
shuffle=1)
Let us know if the issue still persists. Thanks!
I'm working on the "trigger word detection" model, and I decided to deploy the model to my phone.
The input shape of the model is (None, 5511, 101).
The output shape is (None, 1375, 1).
But in a real deployed App, the model can't get the 5511 timesteps all at once, instead the audio frame produced by the sensor of the phone is one by one.
How can I feed this pieces of data to the model one by one and get the output at each timestep?
The model is a recurrent one. But the "model.predict()" takes a first parameter of (None,5511,101), and what I intend to do is
output = []
for i in range(5511):
a = model.func(i, (None,1,101))
output.append(a)
structure of the model:
This problem can be solved by making the timesteps axis dynamic. In other words, when you define the model, the number of timesteps should be set to None. Here is an example illustrating how it would work for a simplified version of your model:
from keras.layers import GRU, Input, Conv1D
from keras.models import Model
import numpy as np
x = Input(shape=(None, 101))
h = Conv1D(196, 15, strides=4)(x)
h = GRU(1, return_sequences=True)(h)
model = Model(x, h)
# The model works for the original number of timesteps (5511)
batch_size = 2
out = model.predict(np.random.rand(batch_size, 5511, 101))
print(out.shape)
# ... but also for fewer timesteps (say 32)
out = model.predict(np.random.rand(batch_size, 32, 101))
print(out.shape)
# However, it will not work if timesteps < Conv1D filter_size (15)!
out = model.predict(np.random.rand(batch_size, 14, 101))
print(out.shape)
Note, however, that you will not be able to feed less than 15 timesteps (dimension of the Conv1D filters) unless you pad the input sequences to 15.
You should either change your model in a recurrent one where you can feed pieces of data one at a time or you should think about changing the model and using something that works on (overlapping) windows in time, where you apply the model every few pieces of data and get a partial output.
Still depending on the model you might get the output you want only at the end. You should design it accordingly.
Here is an example: https://hacks.mozilla.org/2018/09/speech-recognition-deepspeech/
For passing inputs step by step, you need recurrent layers with stateful=True.
The convolutional layer will certainly prevent you from achieving what you want. Either you remove it or you pass inputs in groups of 15 steps (where 15 is your kernel size for the convolution).
You would need to coordinate these 15 steps with stride 4, and might need a padding. If I may suggest, to avoid mathematical difficulties, you could use kernel_size=16, stride=4 and input_steps = 5512, this is a multiple of 4 which is your stride value. (This will avoid padding and allow easier calculations), and your output steps will be 1375 perfectly round.
Then your model would be like:
inputs = Input(batch_shape=(batch_size,None, 101)) #where you will always use input shapes of (batch_size, 16, 101)
out = Conv1D(196, 16, strides=4)(inputs)
...
...
out = GRU(..., stateful=True)(out)
...
out = GRU(..., stateful=True)(out)
...
...
model = Model(inputs, out)
It's necessary to have a fixed batch size with a stateful=True model. It can be 1, but for optimizing your processing speed, if you have more than one sequence to process in parallel (and independently from each other), use a bigger batch size.
For working it step by step, you need, first of all, to reset states (whenever you use a stateful=True model, you need to keep resetting states every time you are going to feed a new sequence or a new batch of parallel sequences).
So:
#will start a new batch containing a number of sequences equal to batch_size:
model.reset_states()
#received 16 steps from batch_size sequences:
steps = an_array_shaped((batch_size, 16, 101))
#for training
model.train_on_batch(steps, something_for_y_shaped((batch_size, 1, 1)), ...)
#I don't recommend to train like this because of the batch normalizations
#If you can train the entire length at once, do it.
#never forget: for full length training, you would need model.reset_states() every batch.
#for predicting:
predictions = model.predict_on_batch(steps, ...)
#received 4 new steps from X sequences:
steps = np.concatenate([steps[:,4:], new_steps], axis=1)
#these new steps belong to the "same" batch_size sequences! Don't call reset states!
#repeat one of the above for training or predicting
new_predictions = model.predict_on_batch(steps, ...)
predictions = np.concatenate([predictions, new_predictions], axis=1)
#keep repeating this loop until you reach the last step
Finally, when you reached the last step, for safety, call `model.reset_states()` again, everything that you input will be "new" sequences, not new "steps" or the previous sequences.
------------
# Training hint
If you are able to train with the full sequences (not step by step), use a `stateful=False` model, train normally with `model.fit(...)`, later you recreate the model exactly, but using `stateful=True`, copy the weights with `new_model.set_weights(old_model.get_weights())`, and use the new model for predicting like above.
I'm trying to compute the gradient of the output layer with respect to the input layer. My neural network is relatively small (input layer composed of 9 activation units and the output layer of 1) and the training went fine as the test provided a very good accuracy. I made the NN model using Keras.
In order to solve my problem, I need to compute the gradient of the output with respect to the input. This is, I need to obtain the Jacobian which as dimension [1x9]. The gradients function in tensorflow should provide me with everything I need, but when I run the code below I obtain a different solution every time.
output_v = model.output
input_v = model.input
gradients = tf.gradients(output_v, input_v)
sess = tf.Session()
sess.run(tf.initialize_all_variables())
print(sess.run(model.input,feed_dict={model.input:x_test_N[0:1,:]}))
evaluated_gradients = sess.run(gradients,feed_dict{model.input:x_test_N[0:1,:]})
print(evaluated_gradients)
sess.close()
The first print command shows this value every time I run it (just to make sure that the input values are not modified):
[[-1.4306372 -0.1272892 0.7145787 1.338818 -1.2957293 -0.5402862-0.7771702 -0.5787912 -0.9157122]]
But the second print shows different ones:
[[ 0.00175761, -0.0490326 , -0.05413761, 0.09952173, 0.06112418, -0.04772799, 0.06557006, -0.02473242, 0.05542536]]
[[-0.00416433, 0.08235116, -0.00930298, 0.04440641, 0.03752216, 0.06378302, 0.03508484, -0.01903783, -0.0538374 ]]
Using finite differences, evaluated_gradients[0,0] = 0.03565103, which isn't close to any of the first values previously printed.
Thanks for your time!
Alberto
Solved by creating a specific session just before training my model:
sess = tf.Session()
sess.run(tf.global_variables_initializer())
K.set_session(sess)
history = model.fit(x_train_N, y_train_N, epochs=n_epochs,
validation_split=split, verbose=1, batch_size=n_batch_size,
shuffle='true', callbacks=[early_stop, tensorboard])
And evaluating the gradient after training, while tf.session is still open:
evaluated_gradients = sess.run(K.gradients(model.output, model.input), feed_dict={model.input: x_test_N})
Presumably your network is set up to initialize weights to random values. When you run sess.run(tf.initialize_all_variables()), you are initializing your variables to new random values. Therefore you get different values for output_v in every run, and hence different gradients. If you want to use a model you trained before, you should replace the initialization with initialize_all_variables() with a restore command. I am not familiar with how this is done in Keras since I usually work directly with tensorflow, but I would try this.
Also note that initialize_all_variables is deprecated and you should use global_variables_initializer instead.
my problem is the following:
I am working on an object detection problem and would like to use dropout during test time to obtain a distribution of outputs. The object detection network consists of a training model and a prediction model, which wraps around the training model. I would like to perform several stochastic forward passes using the training model and combine these e.g. by averaging the predictions in the prediction wrapper. Is there a way of doing this in a keras model instead of requiring an intermediate processing step using numpy?
Note that this question is not about how to enable dropout during test time
def prediction_wrapper(model):
# Example code.
# Arguments
# model: the training model
regression = model.outputs[0]
classification = model.outputs[1]
predictions = # TODO: perform several stochastic forward passes (dropout during train and test time) here
avg_predictions = # TODO: combine predictions here, e.g. by computing the mean
outputs = # TODO: do some processing on avg_predictions
return keras.models.Model(inputs=model.inputs, outputs=outputs, name=name)
I use keras with a tensorflow backend.
I appreciate any help!
The way I understand, you're trying to average the weight updates for a single sample while Dropout is enabled. Since dropout is random, you would get different weight updates for the same sample.
If this understanding is correct, then you could create a batch by duplicating the same sample. Here I am assuming that the Dropout is different for each sample in a batch. Since, backpropagation averages the weight updates anyway, you would get your desired behavior.
If that does not work, then you could write a custom loss function and train with a batch-size of one. You could update a global counter inside your custom loss function and return non-zero loss only when you've averaged them the way you want it. I don't know if this would work, it's just an idea.
I am trying to use train an LSTM to behave like a controller. Essential this is a many to many problem. I have 7 input features and with each feature being a sequence of 40 values. My output has two features, also being a sequence of 40 values.
I have 2 layers. First layer has four LSTM cells, and second has two LSTM cells. The code is given below.
The code runs and produces output as expected but I am unable to reduced the training error (Mean square error). The error just stops improving after the first 1000 epochs.
I tried using different batch sizes. But I am getting high error even if it the batch size is one. I tried the same network with a simple sine function, and it is working properly i.e. the error is decreasing. Is this because my sequence length is too large, due to which the vanishing gradient problem is occurring. What can I do to improve training error?
#Specify input and ouput features
Xfeatures = 7 #Number of input features
Yfeatures = 2 #Number of input features
num_steps = 40
# reset everything to rerun in jupyter
tf.reset_default_graph()
# Placeholder for the inputs in a given iteration.
u = tf.placeholder(tf.float32, [train_batch_size,num_steps,Xfeatures])
u_NN = tf.placeholder(tf.float32, [train_batch_size,num_steps,Yfeatures])
with tf.name_scope('Normalization'):
#L2 normalization for input data
Xnorm = tf.nn.l2_normalize(u_opt, 0, epsilon=1e-12, name='Normalize')
lstm1= tf.contrib.rnn.BasicLSTMCell(lstm1_size)
lstm2 = tf.contrib.rnn.BasicLSTMCell(lstm2_size)
stacked_lstm = tf.contrib.rnn.MultiRNNCell([lstm1, lstm2])
print(lstm1.output_size)
print(stacked_lstm.output_size)
LSTM_outputs, states = tf.nn.dynamic_rnn(stacked_lstm, Xnorm, dtype=tf.float32)
#Loss
mean_square_error = tf.losses.mean_squared_error(u_NN,LSTM_outputs)
train_step = tf.train.AdamOptimizer(learning_rate).minimize(mean_square_error)
#Initialization and training session
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
#print(sess.run([LSTM_outputs],feed_dict={u_opt:InputX1}))
print(sess.run([mean_square_error],feed_dict={u_opt:InputX1,u_NN:InputY1}))
for i in range(training_epochs):
sess.run([train_step],feed_dict={u_opt:InputX1,u_NN:InputY1})
if i%display_epoch ==0:
print("Training loss is:",sess.run([mean_square_error],feed_dict={u_opt:InputX1,u_NN:InputY1}),"at itertion:",i)
print(sess.run([mean_square_error],feed_dict={u_opt:InputX1,u_NN:InputY1}))
print(sess.run([LSTM_outputs],feed_dict={u_opt:InputX1}))
What do you mean with: "First layer has four LSTM cells, and second has two LSTM cells. The code is given below"? Probably you intend the states of the cells.
Your code is not complete but I can try give you some advices.
If your training error is not going down, a possibility is that your net is not well dimensioned. Probably your lstm1_size and lstm2_size are not enough large to capture the characteristics of your data.
LSTMs help you in accumulating the past of a given sequences in a state vector. Usually, the state vector is not used itself as the predictor but it is projected to the output space using a standard feedforward layer. Probably you can just keep a single layer of recursion (a single LSTM layer) and than project the outputs of the layer using a feedforward layer (i.e. g(W*LSTM_outputs+b), where g is a non-linear activation).