I'm writing a custom Tensorflow loss function for Keras, and I tried debugging it by using Tensorflow assertions, but these don't seem to raise errors anywhere even when I'm sure they ought to. I can boil it down to the following example:
from keras.models import Sequential
from keras.layers import Dense
import tensorflow as tf
import numpy as np
def demo_loss(y_true, y_pred):
tf.assert_negative(tf.ones([1,1]))
return tf.square(y_true - y_pred)
model = Sequential()
model.add(Dense(1, input_dim=1, activation='linear'))
model.compile(optimizer='rmsprop', loss=demo_loss)
model.fit(np.ones((1000,1)), np.ones((1000,1)), epochs=10, batch_size=100)
This really seems to me like it should emit an InvalidArgumentError. Why doesn't it?
(Alternately, what's the more sensible way to debug my custom loss functions?)
Your TensorFlow code is not working because there is nothing which forces the assertion to be executed. To make it work you need to add a control dependency to it, something like:
def demo_loss(y_true, y_pred):
with tf.control_dependencies([tf.assert_negative(tf.ones([1,1]))]):
return tf.square(y_true - y_pred)
I'm not sure whether the code should stop... your loss function will be compiled together with your model in a single graph, and that tf.assert command is totally disconnected from everything.
These functions are not meant to be debugged. They're created to achieve the highest performance possible, that's why it's made first as a graph, and only later you feed the data.
When I want to debug, I go for a little model and predict:
trueInput = Input(outputShape)
predInput = Input(outputShape)
output = Lambda(lambda x: demo_loss(x[0],x[1]))([trueInput,predInput])
debugModel = Model([trueInput,predInput], output)
Now use this model to predict:
retults = degugModel.predict([someNumpyTrue, someNumpyPred])
You can divide the function in smaller functions, each one in a different Lambda layer, and see each output separately.
Related
I have a very basic code that tries to create a single-layered Dense neural net and predicts the output for a deterministic input. The code is as follows:
import tensorflow as tf
from tensorflow.keras import layers
model = tf.keras.models.Sequential()
model.add(layers.Dense(units = 10))
import numpy as np
inp = np.ones((1,10))
model.predict(inp)
But the output that I am getting isn't being deterministic. I think it is related to initializing the weights and biases. So, how do I fix this without writing the initializing function from scratch?
Set global seed before initializing model tf.random.set_seed(42)
You can also set seed for specific parts of model, e.g. kernel_initializer in Dense layer, but with this approach, you may miss initializers that will still be nondeterministic. In your case setting it globally will be the best solution.
I want to implement a custom optimization algorithm for TF models.
I have read the following sources
tf documentation on custom optimizers
tf SGD implementation
keras documentation on custom models
towardsdatascience guide on custom optimizers
However lot of questions remain.
It seems like it is not possible to evaluate the loss function multiple times (for different weight settings) before applying a gradient step, when using the custom optimizer API. For example in a line-search type of algorithm this is necessary.
I tried to do all steps manually.
Assume I have setup my model and my optimization problem like this
from tensorflow.keras import layers
from tensorflow.keras import losses
from tensorflow.keras import models
model = models.Sequential()
model.add(layers.Dense(15, input_dim=10))
model.add(layers.Dense(20))
model.add(layers.Dense(1))
x_train, y_train = get_train_data()
loss = losses.MeanSquaredError()
def val_and_grads(weights):
model.set_weights(weights)
with tf.GradientTape() as tape:
val = loss(y_train, model(x_train))
grads = tape.gradient(val, model.trainable_variables)
return val, grads
initial_weights = model.get_weights()
optimal_weigths = my_fancy_optimization_algorithm(val_and_grads, initial_weights)
However my function val_and_grads needs a list of weights and returns a list of gradients from my_fancy_optimization_algorithms point of view that seems unnatural.
I could warp val_and_grads to "stack" the returned gradients and "split" the passed weights like this
def wrapped_val_and_grad(weights):
grads = val_and_grads(split_weights(weights))
return stack_grads(grads)
however that seems very inefficient.
Anyway, I do not like this approach since it seems that I would loose out out on a lot of the surrounding tensorflow infrastructure (printing of current loss function values and metrics during learning, tensorboard stuff, ...).
I could also pack the above in a custom model with a tailored train_step like this
def CustomModel(keras.Model):
def train_step(self, data):
x_train, y_train = data
def val_and_grads(weights):
self.set_weights(weights)
with tf.GradientTape() as tape:
val = loss(y_train, self(x_train))
grads = tape.gradient(val, self.trainable_variables)
return val, grads
trainable_vars = self.trainable_variables
old_weights = self.get_weights()
update = my_fancy_update_finding_algorithm(val_and_grads, self.get_weights()) # this can do multiple evaluations of the model
self.set_weights(old_weights) # restore the weights
self.optimizer.apply_gradients(zip(update, trainable_vars))
Here I would need a accompanying custom optimizer that does nothing else than updating the current weights by adding the update (new_weigths = current_weights + update).
I am still unsure if this is the best way to go.
If someone can comment on the snippets and ideas above, guide me to any other resource that I should consider or provide new approaches and other feedback I would be very glad.
Thanks all.
Franz
EDIT:
Sadly I did not get any response here so far. Maybe my question is not concrete enough. As a first smaller question:
Given the model and val_and_grads in the first listing. How would I efficiently calculate the norm of the WHOLE gradient? What I do so far is
import numpy as np
_, grads = val_and_grad(model.get_weights())
norm_grads = np.linalg.norm(np.concatenate([grad.numpy().flatten() for grad in grad]))
This surely cannot be the "right" way.
I would like to use TFP to write a neural network where the output are the probabilities of a categorical variable with 3 classes, and train it using the negative log-likelihood.
As I'm moving my first steps with TF and TFP, I started with a toy model where the input layer has only 1 unit receiving a null input, and the output layer has 3 units with softmax activation function. The idea is that the biases should learn (up to an additive constant) the log of the probabilities.
Here below is my code, true_p are the true parameters I use to generate the data and I would like to learn, while learned_p is what I get from the NN.
import numpy as np
import tensorflow as tf
from tensorflow import keras
from functions import nll
from tensorflow.keras.optimizers import SGD
import tensorflow.keras.layers as layers
import tensorflow_probability as tfp
tfd = tfp.distributions
# params
true_p = np.array([0.1, 0.7, 0.2])
n_train = 1000
# training data
x_train = np.array(np.zeros(n_train)).reshape((n_train,))
y_train = np.array(np.random.choice(len(true_p), size=n_train, p=true_p)).reshape((n_train,))
# model
input_layer = layers.Input(shape=(1,))
p_layer = layers.Dense(len(true_p), activation=tf.nn.softmax)(input_layer)
p_y = tfp.layers.DistributionLambda(tfd.Categorical)(p_layer)
model_p = keras.models.Model(inputs=input_layer, outputs=p_y)
model_p.compile(SGD(), loss=nll)
# training
hist_p = model_p.fit(x=x_train, y=y_train, batch_size=100, epochs=3000, verbose=0)
# check result
learned_p = np.round(model_p.layers[1].call(tf.constant([0], shape=(1, 1))).numpy(), 3)
learned_p
With this setup, I get the result:
>>> learned_p
array([[0.005, 0.989, 0.006]], dtype=float32)
I over-estimate the second category, and can't really distinguish between the first and the third one. What's worst, if I plot the probabilities at the end of each epoch, it looks like they are converging monotonically to the vector [0,1,0], which doesn't make sense (it seems to me the gradient should push in the opposite direction once I start to over-estimate).
I really can't figure out what's going on here, but have the feeling I'm doing something plain wrong. Any idea? Thank you for your help!
For the record, I also tried using other optimizers like Adam or Adagrad playing with the hyper-params, but with no luck.
I'm using Python 3.7.9, TensorFlow 2.3.1 and TensorFlow probability 0.11.1
I believe the default argument to Categorical is not the vector of probabilities, but the vector of logits (values you'd take softmax of to get probabilities). This is to help maintain precision in internal Categorical computations like log_prob. I think you can simply eliminate the softmax activation function and it should work. Please update if it doesn't!
EDIT: alternatively you can replace the tfd.Categorical with
lambda p: tfd.Categorical(probs=p)
but you'll lose the aforementioned precision gains. Just wanted to clarify that passing probs is an option, just not the default.
I can't find a simple way to convert a tensor to a NumPy array without enabling eager mode, which gives a nice .numpy() method, but also slows down my model training.
I'd be super grateful for your suggestions. For context, I'm writing a custom metric for my TensorFlow model that relies on a scikit learn function, which only takes numpy arrays.
I've tried wrapping the tensors with np.array(), which throws a not implemented error. Also gave sessions and .eval() a go, but didn't get it to work either and seemed like too much for this simple job.
My specific error:
NotImplementedError: Cannot convert a symbolic Tensor (model_17/dense_17/Sigmoid:0) to a numpy array.
# Custom metric
def accuracy_ml(y_true, y_pred):
return accuracy_score(y_true, np.round(y_pred)) # ERROR here feeding tensor to sklearn function
# Model
cnn = simple_model(input_shape=(224, 224, 3),
num_classes=10,
base_model = base_ResNet101)
lr = 1e-2
loss_fn = tf.keras.losses.BinaryCrossentropy()
metrics = [accuracy_ml]
cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
loss=loss_fn,
metrics=metrics)
# Simple baseline eval that fails
validation_steps=17
loss0, accuracy0 = cnn.evaluate(validation_batches, steps = validation_steps)
Wrapping my NumPy metric with tf.numpy_function() solved it. https://www.tensorflow.org/api_docs/python/tf/numpy_function
Trying to build a Wavelet Neural Network using Keras/Tensorflow. For this Neural Network I am supposed to use a Wavelet function as my activation function.
I have tried doing this by simply calling creating a custom activation function. However there seems to be an issue in regards to the backpropagation
import numpy as np
import pandas as pd
import pywt
import matplotlib.pyplot as plt
import tensorflow as tf
from keras.models import Model
import keras.layers as kl
from keras.layers import Input, Dense
import keras as kr
from keras.layers import Activation
from keras import backend as K
from keras.utils.generic_utils import get_custom_objects
def custom_activation(x):
return pywt.dwt(x, 'db1') -1
get_custom_objects().update({'custom_activation':Activation(custom_activation)})
model = Sequential()
model.add(Dense(12, input_dim=8, activation=custom_activation))
model.add(Dense(8, activation=custom_activation)
model.add(Dense(1, activation=custom_activation)
i get the following error for running the code in its entirety
SyntaxError: invalid syntax
if i run
model = Sequential()
model.add(Dense(12, input_dim=8, activation=custom_activation))
model.add(Dense(8, activation=custom_activation)
i get the following error
SyntaxError: unexpected EOF while parsing
and if i run
model = Sequential()
model.add(Dense(12, input_dim=8, activation=custom_activation))
I get the following error
TypeError: Cannot convert DType to numpy.dtype
model.add() is a function call. You must close parenthesis, otherwise it is a syntax error.
These two lines in your code example will cause a syntax error.
model.add(Dense(8, activation=custom_activation)
model.add(Dense(1, activation=custom_activation)
Regarding the 2nd question:
I get the following error
TypeError: Cannot convert DType to numpy.dtype
This seems like a numpy function was invoked with the incorrect arguments. Perhaps you can try to figure out which line in the script caused the error.
Also, an activation function must be written in keras backend operations. Or you need to manually compute the gradients for it. Neural network training requires being able to compute the gradients of a function on the reverse pass in order to adjust the weights. As far as I understand it you can't just call an arbitrary python library as an activation function; you have to either re-implement its operations using tensor operations or you have the option of using python operations on eager tensors if you know how to compute the gradients manually.