Exploding LOSS in TensorFlow 2.0 Linear Regression Example using GradientTape - tensorflow

I'm trying to construct a little educational example for multivariate linear regresssion, but the LOSS is increasing until it explodes rather than getting smaller, any idea?
import tensorflow as tf
tf.__version__
import numpy as np
data = np.array(
[
[100,35,35,12,0.32],
[101,46,35,21,0.34],
[130,56,46,3412,12.42],
[131,58,48,3542,13.43]
]
)
x = data[:,1:-1]
y_target = data[:,-1]
def loss_function(y, pred):
return tf.reduce_mean(tf.square(y - pred))
def train(b, w, x, y, lr=0.012):
with tf.GradientTape() as t:
current_loss = loss_function(y, linear_model(x))
lr_weight, lr_bias = t.gradient(current_loss, [w, b])
w.assign_sub(lr * lr_weight)
b.assign_sub(lr * lr_bias)
epochs = 80
for epoch_count in range(epochs):
real_loss = loss_function(y_target, linear_model(x))
train(b, w, x, y_target, lr=0.12)
print(f"Epoch count {epoch_count}: Loss value: {real_loss.numpy()}")
This even happens if I initialize the weights with the "correct" values (found out via a scikit-learn regressor)
w = tf.Variable([-1.76770250e-04,3.46688912e-01,2.43827475e-03],dtype=tf.float64)
b = tf.Variable(-11.837184241807234,dtype=tf.float64)

Here's how you might use a TF2 optimizer for a toy example (as per the comment). I know this is not the answer but I didn't want to post this in the comments section, as it will mess up the indentation and all that.
tf_x = tf.Variable(tf.constant(2.0,dtype=tf.float32),name='x')
optimizer = tf.optimizers.SGD(learning_rate=0.1)
# Optimizing tf_x using gradient tape
x_series, y_series = [],[]
for step in range(5):
x_series.append(tf_x.numpy().item())
with tf.GradientTape() as tape:
tf_y = tf_x**2
gradients = tape.gradient(tf_y, tf_x)
optimizer.apply_gradients(zip([gradients], [tf_x]))

Based on #thushv89's input, I'm providing here an intermediate solution using a TF2 Optimizer which is working, although this is not 100% answering my question
import tensorflow as tf
tf.__version__
import numpy as np
data = np.array(
[
[100,35,35,12,0.32],
[101,46,35,21,0.34],
[130,56,46,3412,12.42],
[131,58,48,3542,13.43]
]
)
x = data[:,1:-1]
y_target = data[:,-1]
w = tf.Variable([1,1,1],dtype=tf.float64)
b = tf.Variable(1,dtype=tf.float64)
def linear_model(x):
return b + tf.tensordot(x,w,axes=1)
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.MeanSquaredLogarithmicError()
def train_step(x, y):
with tf.GradientTape() as tape:
predicted = linear_model(x)
loss_value = loss_object(y, predicted)
print(f"Loss Value:{loss_value}")
grads = tape.gradient(loss_value, [b,w])
optimizer.apply_gradients(zip(grads, [b,w]))
def train(epochs):
for epoch in range(epochs):
train_step(x, y_target)
print ('Epoch {} finished'.format(epoch))
train(epochs = 1000)

Related

How can I print the loss value obtained from model.add_loss() function in Tensorflow 2 (with Keras API)?

I have a classification model in Keras API of Tensorflow 2. The model has two losses in the example below, categorical cross entropy and KL-divergence. Categorical cross-entropy is being added while initializing the model and hence it can be printed while training. KL-divergence is being added separately using model.add_loss(). However, it does not display during training. Is there a way to print it while training? The sample code is shown below. Since it is on simulated data, loss value might come to be nan.
import tensorflow as tf
keras = tf.keras
import keras.backend as K
import numpy as np
def ohc(y):
y = np.random.randint(4, size=100)
b = np.zeros((y.size, y.max() + 1))
b[np.arange(y.size), y] = 1
return b
xtrain = np.random.rand(100,50)
y = np.random.randint(4, size=100)
y_one_hot = ohc(y)
def my_kld(y_true, y_pred):
y_pred2 = tf.keras.layers.Lambda(lambda x: x + 0.00001)(y_pred)
LR = y_true/y_pred2
logLR = K.log(LR)
kld = y_true*logLR
loss = K.mean(kld)
loss = tf.keras.layers.Lambda(lambda x: x * 0.2)(loss)
return loss
x_t = keras.layers.Input((50,))
y_t = keras.layers.Input((4,))
x1 = keras.layers.Dense(100, activation = 'relu')(x_t)
x2 = keras.layers.Dense(4, activation = 'softmax')(x1)
model = keras.models.Model([x_t,y_t], x2)
model.add_loss(my_kld(y_t, x2))
optim = keras.optimizers.Nadam(0.00006)
model.compile(loss=['categorical_crossentropy'], optimizer=optim, metrics=['accuracy'])
model.fit(x=[xtrain,y_one_hot], y = y_one_hot, epochs = 100)
The code that I have tried has been included in the question in an implementable way. Hwever, there does not seem to be any way to print the loss from model.add_loss().

How can I compile batched training of a gpflow GPR into a tf.function?

I need to train a GPR model in multiple batches per epoch using a custom loss function. I would like to do this using GPflow and I would like to compile my training using tf.function to increase the efficiency. However, gpflow.GPR must be re-instantiated each time you supply new data, so tf.function will have to re-trace each time. This makes the code slower rather than faster.
This is the initial setup:
import numpy as np
from itertools import islice
import tensorflow as tf
import tensorflow_probability as tfp
tfb = tfp.bijectors
from sklearn.model_selection import train_test_split
import gpflow
from gpflow.kernels import SquaredExponential
import time
data_size = 1000
train_fract = 0.8
batch_size = 250
n_epochs = 3
iterations_per_epoch = int(train_fract * data_size/batch_size)
tf.random.set_seed(3)
# Generate dummy data
x = np.arange(data_size)
y = np.arange(data_size) + np.random.rand(data_size)
# Slice into train and validate sets
x_train, x_validate, y_train, y_validate = train_test_split(x, y, random_state = 1, test_size = 1-train_fract )
# Convert data into tensorflow constants
x_train = tf.constant(x_train[:, np.newaxis], dtype=np.float64)
x_validate = tf.constant(x_validate[:, np.newaxis], dtype=np.float64)
y_train = tf.constant(y_train[:, np.newaxis], dtype=np.float64)
y_validate = tf.constant(y_validate[:, np.newaxis], dtype=np.float64)
# Batch data
batched_dataset = (
tf.data.Dataset.from_tensor_slices((x_train, y_train))
.shuffle(buffer_size=len(x_train), seed=1)
.repeat(count=None)
.batch(batch_size)
)
# Create kernel
constrain_positive = tfb.Shift(np.finfo(np.float64).tiny)(tfb.Exp())
amplitude = tfp.util.TransformedVariable(initial_value=1, bijector=constrain_positive, dtype=np.float64, name="amplitude")
len_scale = tfp.util.TransformedVariable(initial_value=10, bijector=constrain_positive, dtype=np.float64, name="len_scale")
kernel = SquaredExponential(variance=amplitude, lengthscales=len_scale, name="squared_exponential_kernel")
obs_noise = tfp.util.TransformedVariable(initial_value=1e-3, bijector=constrain_positive, dtype=np.float64, name="observation_noise")
# Define custom loss function
#tf.function(autograph=False, experimental_compile=False)
def my_custom_loss(y_predict, y_true):
return tf.math.reduce_mean(tf.math.squared_difference(y_predict, y_true))
#optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
This is how I train without a tf.function:
gpr_model_j_i = gpflow.models.GPR(data=(x_train, y_train), kernel=kernel, noise_variance=obs_noise)
# Start training loop
for j in range(n_epochs):
for i, (x_train_j_i, y_train_j_i) in enumerate(islice(batched_dataset, iterations_per_epoch)):
with tf.GradientTape() as tape:
gpr_model_j_i = gpflow.models.GPR(data=(x_train_j_i, y_train_j_i), kernel=kernel, noise_variance=gpr_model_j_i.likelihood.variance)
y_predict_j_i = gpr_model_j_i.predict_f(x_validate)[0]
loss_j_i = my_custom_loss(y_predict_j_i, y_validate)
grads_j_i = tape.gradient(loss_j_i, gpr_model_j_i.trainable_variables)
optimizer.apply_gradients(zip(grads_j_i, gpr_model_j_i.trainable_variables))
This is how I train with a tf.function:
#tf.function(autograph=False, experimental_compile=False)
def tf_function_attempt_3(model): #, optimizer):
with tf.GradientTape() as tape:
y_predict_j_i = model.predict_f(x_validate)[0]
loss_j_i = my_custom_loss(y_predict_j_i, y_validate)
grads_j_i = tape.gradient(loss_j_i, model.trainable_variables)
optimizer.apply_gradients(zip(grads_j_i, model.trainable_variables))
print("TRACING...", end="")
for j in range(n_epochs):
for i, (x_train_j_i, y_train_j_i) in enumerate(islice(batched_dataset, iterations_per_epoch)):
gpr_model_j_i = gpflow.models.GPR(data=(x_train_j_i, y_train_j_i), kernel=kernel, noise_variance=gpr_model_j_i.likelihood.variance)
tf_function_attempt_3(gpr_model_j_i)#, optimizer)
The tf.function retraces for each batch and is significantly slower than the normal training.
Is there a way to speed up the batched training of my GPR model with tf.function while using a custom loss function and GPflow? If not, I am open to suggestions for an alternative approach.
You don't have to re-instantiate GPR each time. You can construct tf.Variable holders with unconstrained shape and then .assign to them:
import gpflow
import numpy as np
import tensorflow as tf
input_dim = 1
initial_x, initial_y = np.zeros((0, input_dim)), np.zeros((0, 1)) # or your first batch
x_var = tf.Variable(initial_x, shape=(None, input_dim), dtype=tf.float64)
y_var = tf.Variable(initial_y, shape=(None,1), dtype=tf.float64)
# in principle you could also set shape=(None, None)...
m = gpflow.models.GPR((x_var, y_var), gpflow.kernels.SquaredExponential())
loss = m.training_loss_closure() # compile=True default wraps in tf.function()
N1 = 3
x1, y1 = np.random.randn(N1, input_dim), np.random.randn(N1, 1)
m.data[0].assign(x1)
m.data[1].assign(y1)
loss() # traces the first time
N2 = 7
x2, y2 = np.random.randn(N2, input_dim), np.random.randn(N2, 1)
m.data[0].assign(x2)
m.data[1].assign(y2)
loss() # does not trace again

Why pure tensorflow autoencoder cannot converge, when Keras one fits good

I'm newbie at machine learning(and at stackoverflow too). I want to ask for help.
I have two implementations of same two-layer autoencoder for mnist.
First one fits good:
import tensorflow as tf, numpy as np
def in_pics(pics):
return np.array(pics, np.float32)/255.
def out_pics(pics):
npformed = np.array(pics, np.float32)
samples = np.shape(npformed)[0]
return np.reshape(npformed, (samples, 784))/255.
(train_data, test_data) = tf.keras.datasets.mnist.load_data()
train_dataset = tf.data.Dataset.from_tensor_slices((in_pics(train_data[0]), out_pics(train_data[0]))).batch(100)
test_dataset = tf.data.Dataset.from_tensor_slices((in_pics(test_data[0]), out_pics(test_data[0]))).batch(100)
model = tf.keras.Sequential(
[
tf.keras.layers.InputLayer(input_shape=(28,28)),
tf.keras.layers.Reshape(target_shape=(784,)),
tf.keras.layers.Dense(128, activation="sigmoid", kernel_initializer=tf.keras.initializers.truncated_normal(stddev=0.1),
bias_initializer =tf.keras.initializers.truncated_normal(stddev=0.1)),
tf.keras.layers.Dense(784, activation="sigmoid", kernel_initializer=tf.keras.initializers.truncated_normal(stddev=0.1),
bias_initializer =tf.keras.initializers.truncated_normal(stddev=0.1)),
]
)
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer="adam")
model.fit(x = train_dataset, epochs=20, verbose=1, validation_data=test_dataset)
Keras result
And I have the same model, same learning procedure, but written in terms of pure tensorflow(almost):
import tensorflow as tf, numpy as np
def prep_pics(pics):
npformed = np.array(pics, np.float32)
samples = np.shape(npformed)[0]
return np.reshape(npformed, (samples, 784))/255.
batch_size, num_batches = 100, 12000
(train_data, test_data) = tf.keras.datasets.mnist.load_data()
(train_pics_prepared, test_pics_prepared) = (prep_pics(train_data[0]), prep_pics(test_data[0]))
train_dataset = tf.data.Dataset.from_tensor_slices((train_pics_prepared, train_pics_prepared))
test_dataset = tf.data.Dataset.from_tensor_slices((test_pics_prepared, test_pics_prepared)).batch(batch_size)
encoder_W = tf.Variable(tf.keras.initializers.truncated_normal(stddev=0.1)(shape = [784,128], dtype = tf.float32))
encoder_b = tf.Variable(tf.keras.initializers.truncated_normal(stddev=0.1)(shape = [128],dtype = tf.float32))
decoder_W = tf.Variable(tf.keras.initializers.truncated_normal(stddev=0.1)(shape = [128,784], dtype = tf.float32))
decoder_b = tf.Variable(tf.keras.initializers.truncated_normal(stddev=0.1)(shape = [784],dtype = tf.float32))
optimizer = tf.keras.optimizers.Adam()
bin_cross = tf.keras.losses.BinaryCrossentropy(from_logits=True)
#tf.function
def trainstep(X_pack, Y_pack):
with tf.GradientTape() as tape:
tape.watch([encoder_W, encoder_b, decoder_W, decoder_b])
encoded = tf.nn.sigmoid(tf.matmul(X_pack, encoder_W) + encoder_b)
decoded = tf.nn.sigmoid(tf.matmul(encoded, decoder_W) + decoder_b)
loss = bin_cross(decoded, Y_pack)
gradients = tape.gradient(loss, [encoder_W, encoder_b, decoder_W, decoder_b])
optimizer.apply_gradients(zip(gradients, [encoder_W, encoder_b, decoder_W, decoder_b]))
num_samples = len(train_data[0])
epochs = num_batches//(num_samples//batch_size)
for i in range(epochs):
print(f"Epoch num {i+1}:")
dataset = train_dataset.shuffle(num_samples).batch(batch_size)
for x,y in dataset:
trainstep(x,y)
Loss is never becomes lower than 0.64(First one had 0.07 in the end of education).
Tensorflow result.
What I tried:
1.) Change optimizer to Adagrad and even to SGD.
2.) Use other loss - sigmoid_cross_entropy_with_logits.
3.) Learn more. Especially for SGD.
Cannot find mistake or typo almost for week. Please, help!

Loss function with derivative in TensorFlow 2

I am using TF2 (2.3.0) NN to approximate the function y which solves the ODE: y'+3y=0
I have defined cutsom loss class and function in which I am trying to differentiate the single output with respect to the single input so the equation holds, provided that y_true is zero:
from tensorflow.keras.losses import Loss
import tensorflow as tf
class CustomLossOde(Loss):
def __init__(self, x, model, name='ode_loss'):
super().__init__(name=name)
self.x = x
self.model = model
def call(self, y_true, y_pred):
with tf.GradientTape() as tape:
tape.watch(self.x)
y_p = self.model(self.x)
dy_dx = tape.gradient(y_p, self.x)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
return loss
but running the following NN:
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
from custom_loss_ode import CustomLossOde
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
loss = CustomLossOde(model.input, model)
model.compile(optimizer=Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99),loss=loss)
model.run_eagerly = True
model.fit(x_train, y_train, batch_size=16, epochs=30)
for now I am getting 0 loss from the fisrt epoch, which doesn't make any sense.
I have printed both y_true and y_test from within the function and they seem OK so I suspect that the problem is in the gradien which I didn't succeed to print.
Apprecitate any help
Defining a custom loss with the high level Keras API is a bit difficult in that case. I would instead write the training loop from scracth, as it allows a finer grained control over what you can do.
I took inspiration from those two guides :
Advanced Automatic Differentiation
Writing a training loop from scratch
Basically, I used the fact that multiple tape can interact seamlessly. I use one to compute the loss function, the other to calculate the gradients to be propagated by the optimizer.
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
# using the high level tf.data API for data handling
x_train = tf.reshape(x_train,(-1,1))
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(1)
opt = Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99)
for step, (x,y_true) in enumerate(dataset):
# we need to convert x to a variable if we want the tape to be
# able to compute the gradient according to x
x_variable = tf.Variable(x)
with tf.GradientTape() as model_tape:
with tf.GradientTape() as loss_tape:
loss_tape.watch(x_variable)
y_pred = model(x_variable)
dy_dx = loss_tape.gradient(y_pred, x_variable)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
grad = model_tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(grad, model.trainable_variables))
if step%20==0:
print(f"Step {step}: loss={loss.numpy()}")

Keras Loss Function with Additional Dynamic Parameter

I'm working on implementing prioritized experience replay for a deep-q network, and part of the specification is to multiply gradients by what's know as importance sampling (IS) weights. The gradient modification is discussed in section 3.4 of the following paper: https://arxiv.org/pdf/1511.05952.pdf I'm struggling with creating a custom loss function that takes in an array of IS weights in addition to y_true and y_pred.
Here's a simplified version of my model:
import numpy as np
import tensorflow as tf
# Input is RAM, each byte in the range of [0, 255].
in_obs = tf.keras.layers.Input(shape=(4,))
# Normalize the observation to the range of [0, 1].
norm = tf.keras.layers.Lambda(lambda x: x / 255.0)(in_obs)
# Hidden layers.
dense1 = tf.keras.layers.Dense(128, activation="relu")(norm)
dense2 = tf.keras.layers.Dense(128, activation="relu")(dense1)
dense3 = tf.keras.layers.Dense(128, activation="relu")(dense2)
dense4 = tf.keras.layers.Dense(128, activation="relu")(dense3)
# Output prediction, which is an action to take.
out_pred = tf.keras.layers.Dense(2, activation="linear")(dense4)
opt = tf.keras.optimizers.Adam(lr=5e-5)
network = tf.keras.models.Model(inputs=in_obs, outputs=out_pred)
network.compile(optimizer=opt, loss=huber_loss_mean_weighted)
Here's my custom loss function, which is just an implementation of Huber Loss multiplied by the IS weights:
'''
' Huber loss: https://en.wikipedia.org/wiki/Huber_loss
'''
def huber_loss(y_true, y_pred):
error = y_true - y_pred
cond = tf.keras.backend.abs(error) < 1.0
squared_loss = 0.5 * tf.keras.backend.square(error)
linear_loss = tf.keras.backend.abs(error) - 0.5
return tf.where(cond, squared_loss, linear_loss)
'''
' Importance Sampling weighted huber loss.
'''
def huber_loss_mean_weighted(y_true, y_pred, is_weights):
error = huber_loss(y_true, y_pred)
return tf.keras.backend.mean(error * is_weights)
The important bit is that is_weights is dynamic, i.e. it's different each time fit() is called. As such, I cannot simply close over is_weights as described here: Make a custom loss function in keras
I found this code online, which appears to use a Lambda layer to compute the loss: https://github.com/keras-team/keras/blob/master/examples/image_ocr.py#L475 It looks promising, but I'm struggling to understand it/adapt it to my particular problem. Any help is appreciated.
OK. Here is an example.
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Flatten
from keras.models import Model
from keras.losses import categorical_crossentropy
def sample_loss( y_true, y_pred, is_weight ) :
return is_weight * categorical_crossentropy( y_true, y_pred )
x = Input(shape=(32,32,3), name='image_in')
y_true = Input( shape=(10,), name='y_true' )
is_weight = Input(shape=(1,), name='is_weight')
f = Conv2D(16,(3,3),padding='same')(x)
f = MaxPool2D((2,2),padding='same')(f)
f = Conv2D(32,(3,3),padding='same')(f)
f = MaxPool2D((2,2),padding='same')(f)
f = Conv2D(64,(3,3),padding='same')(f)
f = MaxPool2D((2,2),padding='same')(f)
f = Flatten()(f)
y_pred = Dense(10, activation='softmax', name='y_pred' )(f)
model = Model( inputs=[x, y_true, is_weight], outputs=y_pred, name='train_only' )
model.add_loss( sample_loss( y_true, y_pred, is_weight ) )
model.compile( loss=None, optimizer='sgd' )
print model.summary()
Note, since you've add loss through add_loss(), you don't have to do it through compile( loss=xxx ).
With regards to train a model, nothing is special except you move y_true to your input end. See below
import numpy as np
a = np.random.randn(8,32,32,3)
a_true = np.random.randn(8,10)
a_is_weight = np.random.randint(0,2,size=(8,1))
model.fit( [a, a_true, a_is_weight] )
Finally, you can make a testing model (which share all weights in model) for easier use, i.e.
test_model = Model( inputs=x, outputs=y_pred, name='test_only' )
a_pred = test_model.predict( a )