tensorflow - initialize duplicate nets with the same weights - tensorflow

I want to create two models with the same architecture and exactly the same weights using the tf.layers.dense
m1 = tf.layers.dense(input, units=10, activation='relu', name='m1')
m2 = tf.layers.dense(input, units=10, activation='relu', name='m2')
How can I set m2 weights and biases to be the same as m1? (same values, not shared weights, so when training m1, m2 weights will not change and vice versa)

You should firstly create the weights and bias initializers:
import tensorflow as tf
tf.enable_eager_execution()
init_kernel = tf.constant_initializer([[1,2,3],[4,5,6]])
init_bias = tf.constant_initializer([7,8,9])
Then you can create the two dense layers and pass the same initializers to kernel_initializer and bias_initializer.
layer1 = tf.layers.dense(inputs=inputs, units=3, kernel_initializer=init_kernel, bias_initializer=init_bias)
layer2 = tf.layers.dense(inputs=inputs, units=3, kernel_initializer=init_kernel, bias_initializer=init_bias)
If I understood your question correctly this is what you need.

gorjan answer is correct, but I also found another solution that can be easier to use in more complex scenarios:
with tf.Session() as sess:
with tf.variable_scope("m1", reuse=True):
weights_m1 = sess.run(tf.get_variable("kernel"))
with tf.variable_scope("m2", reuse=True):
sess.run(tf.get_variable("kernel").assign(weights_m1))
weights_m2 = sess.run(tf.get_variable("kernel"))
print(np.array_equal(weights_m1, weights_m2) # True

Related

Multiple loss functions on (somewhat) overlapping sub-models in Keras

I have a model in Keras where I would like to use two loss functions. The model consists of an autoencoder and a classifier on top of it. I would like to have one loss function that makes sure the autoencoder is fitted reasonably well (for example, it can be mse) and another loss function that evaluates the classifier (for example, categorical_crossentropy). I would like to fit my model and use a loss function that would be a linear combination of the two loss functions.
# loss functions
def ae_mse_loss(x_true, x_pred):
ae_loss = K.mean(K.square(x_true - x_pred), axis=1)
return ae_loss
def clf_loss(y_true, y_pred):
return K.sum(K.categorical_crossentropy(y_true, y_pred), axis=-1)
def combined_loss(y_true, y_pred):
???
return ae_loss + w1*clf_loss
where w1 is some weight that defines "importance of clf_loss" in the final combined loss.
# autoencoder
ae_in_layer = Input(shape=in_dim, name='ae_in_layer')
ae_interm_layer1 = Dense(interm_dim, activation='relu', name='ae_interm_layer1')(ae_in_layer)
ae_mid_layer = Dense(latent_dim, activation='relu', name='ae_mid_layer')(ae_interm_layer1)
ae_interm_layer2 = Dense(interm_dim, activation='relu', name='ae_interm_layer2')(ae_mid_layer)
ae_out_layer = Dense(in_dim, activation='linear', name='ae_out_layer')(ae_interm_layer2)
ae_model=Model(ae_input_layer, ae_out_layer)
ae_model.compile(optimizer='adam', loss = ae_mse_loss)
# classifier
clf_in_layer = Dense(interm_dim, activation='sigmoid', name='clf_in_layer')(ae_out_layer)
clf_out_layer = Dense(3, activation='softmax', name='clf_out_layer')(clf_in_layer)
clf_model = Model(clf_in_layer, clf_out_layer)
clf_model.compile(optimizer='adam', loss = combined_loss, metrics = [ae_mse_loss, clf_loss])
What I'm not sure about is how to distinguish y_true and y_pred in the two loss functions (since they refer to true and predicted data at different stages in the model). What I had in mind is something like this (I'm not sure how to implement it since obviously I need to pass only one set of arguments y_true & y_pred):
def combined_loss(y_true, y_pred):
ae_loss = ae_mse_loss(x_true_ae, x_pred_ae)
clf_loss = clf_loss(y_true_clf, y_pred_clf)
return ae_loss + w1*clf_loss
I could define this problem as two separate models and train each model separately but I would really prefer if I could do this all at once if possible (since it would optimize both problems simultaneously). I realize, this model doesn't make much sense but it demonstrates the (much more complicated) problem I'm trying to solve in a simple way.
Any suggestions would be appreciated.
All you need is simply available in native keras
you can automatically combine multiple losses using loss_weights parameter
In the example below I tried to reproduce your example where I combined an mse loss for the regression task and a categorical_crossentropy for the classification task
in_dim = 10
interm_dim = 64
latent_dim = 32
n_class = 3
n_sample = 100
X = np.random.uniform(0,1, (n_sample,in_dim))
y = tf.keras.utils.to_categorical(np.random.randint(0,n_class, n_sample))
# autoencoder
ae_in_layer = Input(shape=in_dim, name='ae_in_layer')
ae_interm_layer1 = Dense(interm_dim, activation='relu', name='ae_interm_layer1')(ae_in_layer)
ae_mid_layer = Dense(latent_dim, activation='relu', name='ae_mid_layer')(ae_interm_layer1)
ae_interm_layer2 = Dense(interm_dim, activation='relu', name='ae_interm_layer2')(ae_mid_layer)
ae_out_layer = Dense(in_dim, activation='linear', name='ae_out_layer')(ae_interm_layer2)
# classifier
clf_in_layer = Dense(interm_dim, activation='sigmoid', name='clf_in_layer')(ae_out_layer)
clf_out_layer = Dense(n_class, activation='softmax', name='clf_out_layer')(clf_in_layer)
model = Model(ae_in_layer, [ae_out_layer,clf_out_layer])
model.compile(optimizer='adam',
loss = {'ae_out_layer':'mse', 'clf_out_layer':'categorical_crossentropy'},
loss_weights = {'ae_out_layer':1., 'clf_out_layer':0.5})
model.fit(X, [X,y], epochs=10)
In this specific case, the loss is the result of 1*ae_out_layer_loss + 0.5*clf_out_layer_loss

Shared weight matrices between dense layers in keras

I am trying to reimplement this tensorflow code into keras, I have noted other tickets submitted here that do not share the sentiment I am trying to recreate. The goal is to share a weight matrix across multiple dense layers.
import tensorflow as tf
# define input and weight matrices
x = tf.placeholder(shape=[None, 4], dtype=tf.float32)
w1 = tf.Variable(tf.truncated_normal(stddev=.1, shape=[4, 12]),
dtype=tf.float32)
w2 = tf.Variable(tf.truncated_normal(stddev=.1, shape=[12, 2]),
dtype=tf.float32)
# neural network
hidden_1 = tf.nn.tanh(tf.matmul(x, w1))
projection = tf.matmul(hidden_1, w2)
hidden_2 = tf.nn.tanh(projection)
hidden_3 = tf.nn.tanh(tf.matmul(hidden_2, tf.transpose(w2)))
y = tf.matmul(hidden_3, tf.transpose(w1))
# loss function and optimizer
loss = tf.reduce_mean(tf.reduce_sum((x - y) * (x - y), 1))
optimize = tf.train.AdamOptimizer().minimize(loss)
init = tf.initialize_all_variables()
The issue is reimplementing these weight layers in keras as the transpose of original layers. I am currently implementing my own network using keras functional API
Start by defining your two dense layers:
from keras.layers import Dense, Lambda
import keras.backend as K
dense1 = Dense(12, use_bias=False, activation='tanh')
dense2 = Dense(2, use_bias=False, activation='tanh')
You can then access the weights from your layers with for example dense1.weights[0]. You can wrap this in a lambda layer that also transposes your weights:
h3 = Lambda(lambda x: K.dot(x, K.transpose(dense2.weights[0])))(h2)

Tensorflow dense layers worse than keras sequential

I try to train an agent on the inverse-pendulum (similar to cart-pole) problem, which is a benchmark of reinforcement learning. I use neural-fitted-Q-iteration algorithm which uses a multi-layer neural network to evaluate the Q function.
I use Keras.Sequential and tf.layers.dense to build the neural network repectively, and leave all other things to be the same. However, Keras gives me a good results and tensorflow does not. In fact, tensorflow doesn't work at all with its loss being increasing and the agent learns nothing from the training.
Here I present the code for Keras as follows
def build_model():
model = Sequential()
model.add(Dense(5, input_dim=3))
model.add(Activation('sigmoid'))
model.add(Dense(5))
model.add(Activation('sigmoid'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
adam = Adam(lr=1E-3)
model.compile(loss='mean_squared_error', optimizer=adam)
return model
and the tensorflow version is
class NFQ_fit(object):
"""
neural network approximator for NFQ iteration
"""
def __init__(self, sess, N_feature, learning_rate=1E-3, batch_size=100):
self.sess = sess
self.N_feature = N_feature
self.learning_rate = learning_rate
self.batch_size = batch_size
# DNN structure
self.inputs = tf.placeholder(tf.float32, [None, N_feature], 'inputs')
self.labels = tf.placeholder(tf.float32, [None, 1], 'labels')
self.l1 = tf.layers.dense(inputs=self.inputs,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-1')
self.l2 = tf.layers.dense(inputs=self.l1,
units=5,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='hidden-layer-2')
self.outputs = tf.layers.dense(inputs=self.l2,
units=1,
activation=tf.sigmoid,
use_bias=True,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
bias_initializer=tf.constant_initializer(0.0),
kernel_regularizer=tf.contrib.layers.l2_regularizer(1E-4),
name='outputs')
# optimization
# self.mean_loss = tf.losses.mean_squared_error(self.labels, self.outputs)
self.mean_loss = tf.reduce_mean(tf.square(self.labels-self.outputs))
self.regularization_loss = tf.losses.get_regularization_loss()
self.loss = self.mean_loss # + self.regularization_loss
self.train_op = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss)
The two models are the same. Both of them has two hidden layers with the same dimension. I expect that the problems may come from the kernel initialization but I don't know how to fix it.
Using Keras is great. If you want better TensorFlow integration check out tf.keras. There's no particular reason to use tf.layers if the Keras (or tf.keras) defaults work better.
In this case glorot_uniform looks like the default initializer. This is also the global TensorFlow default, so consider removing the kernel_initializer argument instead of the explicit truncated normal initialization in your question (or passing Glorot explicitly).

How can i create a model in Keras and train it using Tensorflow?

Is it possible to create a model with Keras and without using compile and fit functions in Keras, use Tensorflow to train the model?
Sure. From Keras documentation:
Useful attributes of Model
model.layers is a flattened list of the layers comprising the model graph.
model.inputs is the list of input tensors.
model.outputs is the list of output tensors.
If you use Tensorflow backend, inputs and outputs are Tensorflow tensors, so you can use them without using Keras.
You can use keras to define a complicated graph:
import tensorflow as tf
sess = tf.Session()
from keras import backend as K
K.set_session(sess)
from keras.layers import Dense
from keras.objectives import categorical_crossentropy
img = Input(shape=(784,))
labels = Input(shape=(10,)) #one-hot vector
x = Dense(128, activation='relu')(img)
x = Dense(128, activation='relu')(x)
preds = Dense(10, activation='softmax')(x)
Then use tensorflow to config complicated optimization and training procedure:
loss = tf.reduce_mean(categorical_crossentropy(labels, preds))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
init_op = tf.global_variables_initializer()
sess.run(init_op)
# Run training loop
with sess.as_default():
for i in range(100):
batch = mnist_data.train.next_batch(50)
train_step.run(feed_dict={img: batch[0],
labels: batch[1]})
Ref: https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html

tensorflow RNN implementation

I'm building a RNN model to do the image classification. I used a pipeline to feed in the data. However it returns
ValueError: Variable rnn/rnn/basic_rnn_cell/weights already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
I wonder what can I do to fix this since there are not many examples of implementing RNN with an input pipeline. I know it would work if I use the placeholder, but my data is already in the form of tensors. Unless I can feed the placeholder with tensors, I prefer just to use the pipeline.
def RNN(inputs):
with tf.variable_scope('cells', reuse=True):
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=batch_size)
with tf.variable_scope('rnn'):
outputs, states = tf.nn.dynamic_rnn(basic_cell, inputs, dtype=tf.float32)
fc_drop = tf.nn.dropout(states, keep_prob)
logits = tf.contrib.layers.fully_connected(fc_drop, batch_size, activation_fn=None)
return logits
#Training
with tf.name_scope("cost_function") as scope:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_label_batch, logits=RNN(train_batch)))
train_step = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(cost)
#Accuracy
with tf.name_scope("accuracy") as scope:
correct_prediction = tf.equal(tf.argmax(RNN(test_image), 1), tf.argmax(test_image_label, 0))
accuracy = tf.cast(correct_prediction, tf.float32)
You need to use the reuse option correctly. following changes would solve it. For prediction you need to use the already existed variables in the graph.
def RNN(inputs, reuse):
with tf.variable_scope('cells', reuse=reuse):
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=batch_size, reuse=reuse)
...
...
#Training
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=train_label_batch, logits=RNN(train_batch, reuse=None)))
#Accuracy
...
correct_prediction = tf.equal(tf.argmax(RNN(test_image, reuse=True), 1), tf.argmax(test_image_label, 0))