How can I join two CNN models M1 and M2 at the training phase? the input of M2 is the feature map generated from M1 and a parameter k (also generated from M1)
output = M2(M1, K) and K depends on the feature map generated from M1;
K = f(M1)
; in a normal circumstancies, we don't build two models, but only one model and train once, but the problem here is that at a certain level we need to generate k, which is not a trainable parameter.
Related
I am referring to this study https://proceedings.neurips.cc/paper/2020/file/288cd2567953f06e460a33951f55daaf-Paper.pdf "On Warm-Starting Neural Network Training". Here, the authors propose a shrink and perturb technique to retrain the models on new-arriving data. In warm restart, the models are initialized with their previously trained weights on old data and are retrained on the new data. In the proposed technique, the weights and biases of the existing model are shrunk towards zero and then added with random noise. To shrink a weight, it is multiplied by a value that's between 0 and 1, typically about 0.5. Their official pytorch code is available at https://github.com/JordanAsh/warm_start/blob/main/run.py. A simple explanation of this study is given at https://pureai.com/articles/2021/02/01/warm-start-ml.aspx where the writer gives a simple pytorch function to perform shrink and perturbation of the existing model as shown below:
def shrink_perturb(model, lamda=0.5, sigma=0.01):
for (name, param) in model.named_parameters():
if 'weight' in name: # just weights
nc = param.shape[0] # cols
nr = param.shape[1] # rows
for i in range(nr):
for j in range(nc):
param.data[j][i] = \
(lamda * param.data[j][i]) + \
T.normal(0.0, sigma, size=(1,1))
return
With the defined function, a prediction model can be
initialized with the shrink-perturb technique using code like this:
net = Net().to(device)
fn = ".\\Models\\employee_model_first_100.pth"
net.load_state_dict(T.load(fn))
shrink_perturb(net, lamda=0.5, sigma=0.01)
# now train net as usual
Is there a Keras compatible version of this function definition where we can shrink the weights and add random gaussian noise to an existing model like this?
model = load_model('weights/model.h5')
model.summary()
shrunk_model = shrink_perturn(model,lamda=0.5,sigma=0.01)
shrunk_model.summary()
maybe something like this:
ws = [w * 0.5 + tf.random.normal(w.shape) for w in model.get_weights()]
model.set_weights(ws)
Problem: I have two pretrained models with variables W1,b1 and W2,b2 saved as numpy arrays.
I want to set a mixture of these two pretrained models as the variables of my model, and only update the mixture weights alpha1 and alpha2 during training.
In order to do that I create two variables alpha1 and alpha2 and load the numpy arrays and create the mixture nodes: W_new, b_new.
I want to replace W and b in the computation graph with W_new and b_new and then only train alpha1 and alpha2 parameter by opt.minimize(loss, var_list= [alpha1, alpha2]).
I don't know how to replace W_new and b_new in the computation graph. I tried assigning tf.trainable_variables()[0] = W_new but this doesn't work.
I'd appreciate if anyone could give me some clues.
Note 1: I don't want to assign values to W and b (this will disconnect the graph from alpha1 and alpha2), I want the mixture of parameters to be a part of the graph.
Note 2: You might say that you could compute y using the new variables, but problem is, the code here is just a toy sample to simplify things. In reality instead of linear regression I have several bilstms with crf. So I can't manually compute the formula. I'll have to replace these variables in the graph.
import tensorflow as tf
import numpy as np
np.random.seed(7)
tf.set_random_seed(7)
#define a linear regression model with 10 params and 1 bias
with tf.variable_scope('main'):
X = tf.placeholder(name='input', dtype=float)
y_gold = tf.placeholder(name='output', dtype=float)
W = tf.get_variable('W', shape=(10, 1))
b = tf.get_variable('b', shape=(1,))
y = tf.matmul(X, W) + b
#loss = tf.losses.mean_squared_error(y_gold, y)
#numpy matrices saved from two different trained models with the exact same architecture
W1 = np.random.rand(10, 1)
W2 = np.random.rand(10, 1)
b1 = np.random.rand(1)
b2 = np.random.rand(1)
with tf.variable_scope('mixture'):
alpha1 = tf.get_variable('alpha1', shape=(1,))
alpha2 = tf.get_variable('alpha2', shape=(1,))
W_new = alpha1 * W1 + alpha2 * W2
b_new = alpha1 * b1 + alpha2 * b2
all_trainable_vars = tf.trainable_variables()
print(all_trainable_vars)
#replace the original W and b with the new mixture variables in the computation graph (**doesn't do what I want**)
all_trainable_vars[0] = W_new
all_trainable_vars[1] = b_new
#this doesn't work
#note that I could just do the computation for y using the new variables as y = tf.matmul(X, W_new) + b_new
#but the problem is this is just a toy example. In real world, my model has a big architecture with several
#bilstms whose variables I want to replace with these new ones.
#Now what I need is to replace W and b trainable parameters (items 0 and 1 in all_trainable vars)
#with W_new and b_new in the computation graph.
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_writer = tf.summary.FileWriter('./' + 'graph',
sess.graph)
#print(sess.run([W, b]))
#give the model 3 samples and predict on them
print(sess.run(y, feed_dict={X:np.random.rand(3, 10)}))
Why do I want to do this?
Assume you have several pretrained models (in different domains) but you don't have access to any of their data.
Then you have a little bit of training data from another domain that doesn't give you that much performance, but if you could train the model jointly with the data that you don't have you could get a good performance.
Assuming the data is somehow represented in the trained models,
we want to learn a mixture of the pretrained models, by learning the mixing coefficients, using little labelled data that we have as supervision.
We don't want to pretrain any parameters, we only want to learn a mix of pretrained models. What are the mixture weights? we need to learn that from the little supervision that we have.
Update 1:
I realised I could set the parameters of the model before I create it as:
model = Model(W_new, b_new)
But as I said my real model uses several tf.contrib.rnn.LSTMCell objects.
So I'll need to give the LSTMCell class and the new variables instead of letting it to create its own new variables. So now the problem is how to set the variables of LSTMCell instead of letting it create them. I guess I'll need to subclass the LSTMCell class and make the changes. Is there any easy way to do this, is my question now. Maybe I should ask this as a new question.
What I want to do:
W = tf.get_variable(...)
b = tf.get_variable(...)
cell_fw = tf.contrib.rnn.LSTMCell(W, b,
state_is_tuple=True)
created a separate question for this here because it might be useful for others for different reasons.
I have a model M1 whose data input is a placeholder M1.input and whose weights are trained.
My goal is to build a new model M2 which computes the output o of M1 (with its trained weights) from an input w in a form of tf.Variable (instead of feeding actual values to M1.input). In other words, I use the trained model M1 as a black-box function to build a new model o = M1(w) (in my new model, w is to be learned and the weights of M1 are fixed as constants). The problem is that M1 only accepts as its input M1.input through which we need to feed actual values, not a tf.Variable like w.
As a naive solution to build M2, I can just manually build M1 within M2 and then initialize M1's weights with the pre-trained values and keep them not trainable within M2. However, in practice, M1 is complicated and I don't want to manually build M1 again within M2. I am looking for a more elegant solution, something like a workaround or a direct solution to replace the input placeholder M1.input of M1 with tf.Variable w.
Thank you for your time.
This is possible. What about:
import tensorflow as tf
def M1(input, reuse=False):
with tf.variable_scope('model_1', reuse=reuse):
param = tf.get_variable('param', [1])
o = input + param
return o
w = tf.get_variable('some_w', [1])
plhdr = tf.placeholder_with_default(w, [1])
output_m1 = M1(plhdr)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(w.assign([42]))
print(sess.run(output_m1, {plhdr: [0]})) # direct from placeholder
print(sess.run(output_m1)) # direct from variable
So when feed_dict has a value for the placeholder, this value is used. Otherwise, the fallback option using the variable "w" is active.
I'm trying to train an svm classifier to do prediction. When I try to use the trained model, I get this error: test data does not match model. I am not why this is happening. This is my code
# to prepare the training and testing data
dat = data.frame(x = rbind(tmp1, tmp2), y = as.factor(c(rep(1, 300), rep(-1, 300))))
set.seed(1)
train_ind = sample(seq_len(nrow(dat)), size = 500)
train = dat[train_ind, ]
test = dat[-train_ind, ]
# training and prediction
library('e1071')
svmfit = svm(y ~ ., data = train, kernel ='linear', cost = 10, scale = FALSE)
ypred = predict(svmfit, test)
table(predict=ypred, truth = test$y)
The reason behind this error is that I included the ids of the observations in the training and testing data which has confused the svm classifier. The ids of the observations are in the first column. So when I removed the first column from the training and testing, it worked.
If you have a categorical predictor variable (independent variable) in your training dataset, only the categories present in training dataset can be present in test dataset. If it is the case, check if all categories in your test dataset are present into training dataset. Some times SVM assumes as categorical one a integer variable with a short range like the months represented as numbers [1:12]
If I have two neural networks A and B, and I use the output of network A to feed the input(placeholder) of network B. And I use the optimizer to minimize the loss of network B, can the network A's parameters be updated by backpropagation?
Yes, if "feed" is done in TensorFlow; no, if you do it manually.
Specifically, if you evaluate A, then train B with those outputs manually fed in (say, as a feed dict), A will not change, because it is not involved in the training stage.
If you set the input of the B network to be the output of an op in A (instead of a tf.Placeholder, for instance), then you can train the combined network, which will update A's parameters. In this case, though, you're really just training a combined network "AB", not two separate networks.
A concrete example:
import numpy as np
import tensorflow as tf
# A network
A_input = tf.placeholder(tf.float32, [None,100])
A_weights = tf.Variable(tf.random_normal([100,10]))
A_output = tf.matmul(A_input,A_weights)
# B network
B_input = tf.placeholder(tf.float32, [None,10])
B_weights = tf.Variable(tf.random_normal([10,5]))
B_output = tf.matmul(B_input,B_weights)
# AB network
AB_input = A_output
AB_weights = tf.Variable(tf.random_normal([10,5]))
AB_output = tf.matmul(AB_input,AB_weights)
test_inputs = np.random.rand(17,100)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
A_out = sess.run(A_output, feed_dict={A_input: test_inputs})
print 'A output shape:',A_out.shape
B_out = sess.run(B_output, feed_dict={B_input: A_out})
print 'B output shape:',B_out.shape
AB_out = sess.run(AB_output, feed_dict={A_input: test_inputs})
print 'AB output shape:',AB_out.shape
In the first case, we've fed network B with the outputs from network A by using a feed_dict. This is evaluating network A in tensorflow, pulling the results back into python, then evaluating network B in tensorflow. If you try to train network B in this fashion, you'll only update parameters in network B.
In the second case, we've fed the "B" part of network AB by directly connecting the outputs of network A to the input of network AB. Evaluating network AB never pulls the intermediate results of network A back into python, so if you train network AB in this fashion, you can update the parameters for the combined network. (note: your training inputs are feed to the A_input of network AB, not the intermediate tensor AB_input)