Greedy Initialization in Keras - tensorflow

I'm currently coding up a feed-forward network in Tensorflow and I want to make a custom initializer which initializes each layer using a (externally defined) function of the point with the highest MSE.
Pseudo-code:
Given any input data being passed to my current layer:
- Identify datapoint *X* with highest MSE from target
- Initialize at *f(X)*
Sorry if I posted no pseudocode but I have absolutely no idea how to go about it in Keras (Python not R).

Since you want your Weights to be updated with respect to Maximized Loss rather than Minimized Loss (from the comment), it can be achieved by passing -loss, as shown in the code below:
Tensorflow Version 2.x:
loss = tf.keras.losses.MSE()
opt = tf.keras.optimizers.Adam(learning_rate=0.01)
model.compile(optimizer=opt, loss = -loss)
Tensorflow Version 1.x:
loss = tf.reduce_mean(tf.keras.losses.MSE(y_true, y_pred))
trainm = tf.train.GradientDescentOptimizer(0.01).minimize(-loss)
Hope this helps.

Related

Wrong gradient from tf custom gradient - even though gradient is implemented using the inbuilt Jacobian

I'm trying to write a wrapper around a model, such that the tf model can be called as a function of its weights (and input). However this wrapper returns different gradients than the gradients fromt the original model. Details in the code below (including a colab notebook to reproduce directly), but at the core I'm using the custom gradient decorator - the respective gradient is computed directly as the upstream 'gradient' matmul (via tensordot) the respective jacobian.
To make this clear: I'm computing the gradient for a model, once directly, once by using my custom wrapper. In both cases the parameters in the model are the same. The Jacobian is implemented by TF, so nothing should be wrong there. Still the resulting gradient seems to be wrong.
I'm not sure, whether this is a coding mistake I made somewhere, or possibly just a numeric problem stemming from the Jacobian matmul - however my tests regarding correlation of the gradients suggest this is more than a numeric issue for now. Code of the function is provided below, a link to colab notebook reproducing the problem can be found here: Colab Notebook reproducing the problem
Why: This is important for a bunch of metalearning, which I'm trying to build a small library for currently.
My current 'wrapper' looks something like this:
#calls model on input x but replaces internal weights with the weights argument
#critically supposed to compute the respective gradient for the weights tensor argument!
def call_model_with_weights(model, x, weights, dim_output=2):
#tf.custom_gradient
def _call_with_weights(x_and_w):
x, weights = x_and_w
#be careful; this assigns weights to the model as a side effect, can ignore for dummy version
ctrls = [var.assign(val) for var, val in zip(model.trainable_weights, weights)]
with tf.control_dependencies(ctrls):
with tf.GradientTape() as tape:
y = model(x)
jacobians = tape.jacobian(y, model.trainable_weights)
def grad(upstream, variables):
assert len(variables)==len(weights)
#gradient for each weight should be upstream dotproduct respective jacobian
dy_dw = [tf.tensordot(upstream, j, axes=[list(range(dim_output)), list(range(dim_output))]) for j in jacobians]
dy_dw_weights = dy_dw
return (None, dy_dw_weights), [None for _ in dy_dw] # returning x as derivative of x is wrong, but not important here rn
return y, grad
y = _call_with_weights((x, weights))
return y
Thanks a lot for any help (including how this could be done in a more elegant way), helping out means you are contributing to package that plans to mimic PyTorch 'higher' for TF which I hope helps some more people <3

Keras: Custom loss function with training data not directly related to model

I am trying to convert my CNN written with tensorflow layers to use the keras api in tensorflow (I am using the keras api provided by TF 1.x), and am having issue writing a custom loss function, to train the model.
According to this guide, when defining a loss function it expects the arguments (y_true, y_pred)
https://www.tensorflow.org/guide/keras/train_and_evaluate#custom_losses
def basic_loss_function(y_true, y_pred):
return ...
However, in every example I have seen, y_true is somehow directly related to the model (in the simple case it is the output of the network). In my problem, this is not the case. How do implement this if my loss function depends on some training data that is unrelated to the tensors of the model?
To be concrete, here is my problem:
I am trying to learn an image embedding trained on pairs of images. My training data includes image pairs and annotations of matching points between the image pairs (image coordinates). The input feature is only the image pairs, and the network is trained in a siamese configuration.
I am able to implement this successfully with tensorflow layers and train it sucesfully with tensorflow estimators.
My current implementations builds a tf Dataset from a large database of tf Records, where the features is a dictionary containing the images and arrays of matching points. Before I could easily feed these arrays of image coordinates to the loss function, but here it is unclear how to do so.
There is a hack I often use that is to calculate the loss within the model, by means of Lambda layers. (When the loss is independent from the true data, for instance, and the model doesn't really have an output to be compared)
In a functional API model:
def loss_calc(x):
loss_input_1, loss_input_2 = x #arbirtray inputs, you choose
#according to what you gave to the Lambda layer
#here you use some external data that doesn't relate to the samples
externalData = K.constant(external_numpy_data)
#calculate the loss
return the loss
Using the outputs of the model itself (the tensor(s) that are used in your loss)
loss = Lambda(loss_calc)([model_output_1, model_output_2])
Create the model outputting the loss instead of the outputs:
model = Model(inputs, loss)
Create a dummy keras loss function for compilation:
def dummy_loss(y_true, y_pred):
return y_pred #where y_pred is the loss itself, the output of the model above
model.compile(loss = dummy_loss, ....)
Use any dummy array correctly sized regarding number of samples for training, it will be ignored:
model.fit(your_inputs, np.zeros((number_of_samples,)), ...)
Another way of doing it, is using a custom training loop.
This is much more work, though.
Although you're using TF1, you can still turn eager execution on at the very beginning of your code and do stuff like it's done in TF2. (tf.enable_eager_execution())
Follow the tutorial for custom training loops: https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough
Here, you calculate the gradients yourself, of any result regarding whatever you want. This means you don't need to follow Keras standards of training.
Finally, you can use the approach you suggested of model.add_loss.
In this case, you calculate the loss exaclty the same way I did in the first answer. And pass this loss tensor to add_loss.
You can probably compile a model with loss=None then (not sure), because you're going to use other losses, not the standard one.
In this case, your model's output will probably be None too, and you should fit with y=None.

tf.Estimator.predict() issue when using a Tensorflow Hub module as the basis of a custom tf.Estimator

I am trying to create a custom tensorflow tf.Estimator. In the model_fn passed to the tf.Estimator, I am importing the Inception_V3 module from Tensorflow Hub.
Problem: After fine-tuning the model (using tf.Estimator.train), the results obtained using tf.Estimator.predict are not as good as expected based on tf.Estimator.evaluate (This is for a regression problem.)
I am new to Tensorflow and Tensorflow Hub, so I could be making lots of rookie mistakes.
When I run tf.Estimator.evaluate() on my validation data, the reported loss is in the same ball park as the loss after tf.Estimator.train() was used to train the model. The problem comes in when I try to use tf.Estimator.predict() on the same validation data.
tf.Estimator.predict() returns predictions which I then use to calculate the same loss metric (mean_squared_error) which is computed by tf.Estimator.evaluate(). I am using the same set of data to feed to the predict function as the evaluate function. But I do not get the same result for the mean_squared_error -- not remotely close! (The mse I calculate from predict is much worse.)
Here is what I have done (edited out some details)...
Define a model_fn with Tensorflow Hub module. Then call the tf.Estimator functions to train, evaluate and predict.
def my_model_fun(features, labels, mode, params):
# Load InceptionV3 Module from Tensorflow Hub
iv3_module =hub.Module("https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1",trainable=True, tags={'train'})
# Gather the variables for fine-tuning
var_list = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,scope='CustomeLayer')
var_list.extend(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,scope='module/InceptionV3/Mixed_5b'))
predictions = {"the_prediction" : final_output}
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
# Define loss, optimizer, and evaluation metrics
loss = tf.losses.mean_squared_error(labels=labels, predictions=final_output)
optimizer =tf.train.AdadeltaOptimizer(learning_rate=learn_rate).minimize(loss,
var_list=var_list, global_step=tf.train.get_global_step())
rms_error = tf.metrics.root_mean_squared_error(labels=labels,predictions=predictions["the_prediction"])
eval_metric_ops = {"rms_error": rms_error}
if mode == tf.estimator.ModeKeys.TRAIN:
return tf.estimator.EstimatorSpec(mode=mode, loss=loss,train_op=optimizer)
if mode == tf.estimator.ModeKeys.EVAL:
tf.summary.scalar('rms_error', rms_error)
return tf.estimator.EstimatorSpec(mode=mode, loss=loss,eval_metric_ops=eval_metric_ops)
iv3_estimator = tf.estimator.Estimator(model_fn=iv3_model_fn)
iv3_estimator.train(input_fn=train_input_fn, steps=TRAIN_STEPS)
iv3_estimator.evaluate(input_fn=val_input_fn)
ii =0
for ans in iv3_estimator.predict(input_fn=test_input_fn):
sqErr = np.square(label[ii] - ans['the_prediction'][0])
totalSqErr += sqErr
ii += 1
mse = totalSqErr/ii
I expect that the mse loss reported by tf.Estimator.evaluate() should be the same as the when I calculate mse from the known labels and the output of tf.Estimator.predict()
Do I need to import the Tensorflow Hub model differently when I use predict? (use trainable=False in the call to hub.Module()?
Are the weights obtained from training being used when tf.Estimator.evaluate() runs, but not when tf.Estimator.predict()- runs?
other?
There's a few things that seem to be missing from the code snippet. How is final_output computed from iv3_module? Also, mean squared error is an unusual choice of loss function for a classification problem; the common approach is to pass image features from the module into a a linear output layer with scores for each class ("logits") and a "softmax cross-entropy loss". For an explanation of these terms, you can review online tutorials like https://developers.google.com/machine-learning/crash-course/ (all the way to multi-class neural nets).
Regarding TF-Hub technicalities:
The variables of a Hub module are automatically added to the GLOBAL_VARIABLES and TRAINABLE_VARIABLES collections (if trainable=True, as you already do). No manual extension of those collections should be needed.
hub.Module(..., tags=...) should be set to {"train"} for mode==TRAIN and set to None or the empty set otherwise.
In general, it's useful to get a solution working end-to-end for your problem without fine-tuning as a baseline, and then add fine-tuning.

BatchNormalization in Keras

How do I update moving mean and moving variance in keras BatchNormalization?
I found this in tensorflow documentation, but I don't know where to put train_op or how to work it with keras models:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize( loss )
No posts I found say what to do with train_op and whether you can use it in model.compile.
You do not need to manually update the moving mean and variances if you are using the BatchNormalization layer. Keras takes care of updating these parameters during training, and to keep them fixed during testing (by using the model.predict and model.evaluate functions, same as with model.fit_generator and friends).
Keras also keeps track of the learning phase so different codepaths run during training and validation/testing.
If you need just update the weights for existing model with some new values then you can do the following:
w = model.get_layer('batchnorm_layer_name').get_weights()
# Order: [gamma, beta, mean, std]
for j in range(len(w[0])):
gamma = w[0][j]
beta = w[1][j]
run_mean = w[2][j]
run_std = w[3][j]
w[2][j] = new_run_mean_value1
w[3][j] = new_run_std_value2
model.get_layer('batchnorm_layer_name').set_weights(w)
There are two interpretations of the question: the first is assuming that the goal is to use high level training api and this question was answered by Matias Valdenegro.
The second - as discussed in the comments - is whether it is possible to use batch normalization with the standard tensorflow optimizer as discussed here keras a simplified tensorflow interface and the section "Collecting trainable weights and state updates". As mentioned there the update ops are accessible in layer.updates and not in tf.GraphKeys.UPDATE_OPS, in fact if you have a keras model in tensorflow you can optimize with a standard tensorflow optimizer and batch normalization like this
update_ops = model.updates
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize( loss )
and then use a tensorflow session to fetch the train_op. To distinguish training and evaluation modes of the batch normalization layer you need to feed the
learning phase state of the keras engine (see "Different behaviors during training and testing" on the same tutorial page as given above). This would work for example like this
...
# train
lo, _ = tf_sess.run(fetches=[loss, train_step],
feed_dict={tf_batch_data: bd,
tf_batch_labels: bl,
tensorflow.keras.backend.learning_phase(): 1})
...
# eval
lo = tf_sess.run(fetches=[loss],
feed_dict={tf_batch_data: bd,
tf_batch_labels: bl,
tensorflow.keras.backend.learning_phase(): 0})
I tried this in tensorflow 1.12 and it works with models containing batch normalization. Given my existing tensorflow code and in the light of approaching tensorflow version 2.0 I was tempted to use this approach myself, but given that this approach is not being mentioned in the tensorflow documentation I am not sure this will be supported in the long term and I finally have decided to not use it and to invest a little bit more to change the code to use the high level api.

How to get weights in tf.layers.dense?

I wanna draw the weights of tf.layers.dense in tensorboard histogram, but it not show in the parameter, how could I do that?
The weights are added as a variable named kernel, so you could use
x = tf.dense(...)
weights = tf.get_default_graph().get_tensor_by_name(
os.path.split(x.name)[0] + '/kernel:0')
You can obviously replace tf.get_default_graph() by any other graph you are working in.
I came across this problem and just solved it. tf.layers.dense 's name is not necessary to be the same with the kernel's name's prefix. My tensor is "dense_2/xxx" but it's kernel is "dense_1/kernel:0". To ensure that tf.get_variable works, you'd better set the name=xxx in the tf.layers.dense function to make two names owning same prefix. It works as the demo below:
l=tf.layers.dense(input_tf_xxx,300,name='ip1')
with tf.variable_scope('ip1', reuse=True):
w = tf.get_variable('kernel')
By the way, my tf version is 1.3.
The latest tensorflow layers api creates all the variables using the tf.get_variable call. This ensures that if you wish to use the variable again, you can just use the tf.get_variable function and provide the name of the variable that you wish to obtain.
In the case of a tf.layers.dense, the variable is created as: layer_name/kernel. So, you can obtain the variable by saying:
with tf.variable_scope("layer_name", reuse=True):
weights = tf.get_variable("kernel") # do not specify
# the shape here or it will confuse tensorflow into creating a new one.
[Edit]: The new version of Tensorflow now has both Functional and Object-Oriented interfaces to the layers api. If you need the layers only for computational purposes, then using the functional api is a good choice. The function names start with small letters for instance -> tf.layers.dense(...). The Layer Objects can be created using capital first letters e.g. -> tf.layers.Dense(...). Once you have a handle to this layer object, you can use all of its functionality. For obtaining the weights, just use obj.trainable_weights this returns a list of all the trainable variables found in that layer's scope.
I am going crazy with tensorflow.
I run this:
sess.run(x.kernel)
after training, and I get the weights.
Comes from the properties described here.
I am saying that I am going crazy because it seems that there are a million slightly different ways to do something in tf, and that fragments the tutorials around.
Is there anything wrong with
model.get_weights()
After I create a model, compile it and run fit, this function returns a numpy array of the weights for me.
In TF 2 if you're inside a #tf.function (graph mode):
weights = optimizer.weights
If you're in eager mode (default in TF2 except in #tf.function decorated functions):
weights = optimizer.get_weights()
in TF2 weights will output a list in length 2
weights_out[0] = kernel weight
weights_out[1] = bias weight
the second layer weight (layer[0] is the input layer with no weights) in a model in size: 50 with input size: 784
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(50, activation="relu", name="dense_1")(inputs)
x = layers.Dense(50, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(...)
model.fit(...)
kernel_weight = model.layers[1].weights[0]
bias_weight = model.layers[1].weights[1]
all_weight = model.layers[1].weights
print(len(all_weight)) # 2
print(kernel_weight.shape) # (784,50)
print(bias_weight.shape) # (50,)
Try to make a loop for getting the weight of each layer in your sequential network by printing the name of the layer first which you can get from:
model.summary()
Then u can get the weight of each layer running this code:
for layer in model.layers:
print(layer.name)
print(layer.get_weights())