How to use a custom loss function in GPfLOW 2? - tensorflow

I am new to GPflow and I am trying to figure out how to write a custom loss function to optimize the model. For my purpose, I need to manipulate the predicted output of the GP through different data treatments, and thus, it is the output I get after these treatments, that I would like the optimise the GP model according to. For that purpose I would like to use the root mean square error as loss function.
Workflow:
Input -> GP model -> GP_output -> Data treatment -> Predicted_output -> RMSE(Predicted_output, Observations)
I hope this makes sense.
Normally models are optimised doing something like this:
import gpflow as gf
import numpy as np
X = np.linspace(0, 100, num=100)
n = np.random.normal(scale=8, size=X.size)
y_obs = 10 * np.sin(X) + n
model = gf.models.GPR(
data=(X, y_obs),
kernel=gf.kernels.SquaredExponential(),
)
gf.optimizers.Scipy().minimize(
model.training_loss, model.trainable_variables, options=optimizer_config
)
I have figured out how to do a workaround using the scipy minimize function to optimise using RMSE, but I would like to stay within the GPflow framework, where I can just input model.trainable_variables as argument, and have a general function that also works if I have multiple input/output dimensions.
def objective_func(params):
model.kernel.lengthscales.assign(params[0])
model.kernel.variance.assign(params[1])
model.likelihood.variance.assign(params[2])
GP_output = model.predict_y(X)[0]
GP_output = GP_output.numpy()
Predicted_output = data_treatment_func(GP_output)
return np.sqrt(np.square(np.subtract(Predicted_output, y_obs)).mean())
from scipy.optimize import minimize
res = minimize(objective_func,
x0=(1.0, 1.0, 1.0),)

I found the answer myself.
If you write your objective_func using TensorFlow instead of NumPy (e.g. tf.math.sqrt, tf.reduce_mean) you can simply pass that to gf.optimizers.Scipy().minimize(...) instead of model.training_loss:
import tensorflow as tf
def objective_func():
GP_output = model.predict_y(X)[0]
Predicted_output = data_treatment_func(GP_output)
return tf.sqrt(tf.reduce_mean(tf.square(Predicted_output - y_obs)))
gf.optimizers.Scipy().minimize(
objective_func, model.trainable_variables, options=optimizer_config
)

Related

Convert a tensorflow script to pytorch (TransformedDistribution)

I am trying to rewrite a tensorflow script in pytorch. I have a problem finding the equivalent part in torch for the following line from this script:
import tensorflow_probability as tfp
tfd = tfp.distributions
a_distribution = tfd.TransformedDistribution(
distribution=tfd.Normal(loc=0.0, scale=1.0),
bijector=tfp.bijectors.Chain([
tfp.bijectors.AffineScalar(shift=self._means,
scale=self._mags),
tfp.bijectors.Tanh(),
tfp.bijectors.AffineScalar(shift=mean, scale=std),
]),
event_shape=[mean.shape[-1]],
batch_shape=[mean.shape[0]])
In particular, I have a huge problem for replacing the tfp.bijectors.Chain component.
I wrote the following lines in torch, but I am wondering whether these lines in pytorch compatible with the above tensorflow code and whether I can specify the batch_shape somewhere?
base_distribution = torch.normal(0.0, 1.0)
transforms = torch.distributions.transforms.ComposeTransform([torch.distributions.transforms.AffineTransform(loc=self._action_means, scale=self._action_mag, event_dim=mean.shape[-1]), torch.nn.Tanh(),torch.distributions.transforms.AffineTransform(loc=mean, scale=std, event_dim=mean.shape[-1])])
a_distribution = torch.distributions.transformed_distribution.TransformedDistribution(base_distribution, transforms)
Any solution?
In Pytorch, the base distribution class Distribution expects both a batch_shape and a event_shape parameter. Now notice that the subclass TransformedDistribution does not take such parameters (src code). That's because they are inferred from the base distribution class provided on initialization: see here and here.
You already found out about AffineTransform and ComposeTransform. Keep in mind you must stick with classes from the torch.distributions.
This holds for torch.normal which should be replaced with Normal. With this class, the shape is inferred from the provided loc and scale tensors.
And nn.Tanh which should be replaced with TanhTransform.
Here is a minimal example using your transformation pipeline:
Imports:
from torch.distributions.normal import Normal
from torch.distributions import transforms as tT
from torch.distributions.transformed_distribution import TransformedDistribution
Parameters:
mean = torch.rand(2,2)
std = 1
_action_means, _action_mag = 0, 1
event_dim=mean.shape[-1]
Distribution definition:
a_distribution = TransformedDistribution(
base_distribution=Normal(loc=torch.full_like(mean, 0),
scale=torch.full_like(mean, 1)),
transforms=tT.ComposeTransform([
tT.AffineTransform(loc=_action_means, scale=_action_mag, event_dim=event_dim),
tT.TanhTransform(),
tT.AffineTransform(loc=mean, scale=std, event_dim=event_dim)]))

Streamlit with Tensorflow to analyse image and return the probability if is positive or negative

I'm trying to use Tensorflow to Machine Learning to analyze an image and return the probability if is positive or negative based on a model created (extension .h5). I couldn't found a documentation exactly for that, or repository, so even a link to read will be awesome.
Link for the application: https://share.streamlit.io/felipelx/hackathon/IDC_Detector.py
Libraries that I'm trying to use.
import numpy as np
import streamlit as st
import tensorflow as tf
from keras.models import load_model
The function to load the model.
#st.cache(allow_output_mutation=True)
def loadIDCModel():
model_idc = load_model('models/IDC_model.h5', compile=False)
model_idc.summary()
return model_idc
The function to work the image, and what I'm trying to see: model.predict - I can see but is not updating the %, independent of the image the value is always the same.
if uploaded_file is not None:
# transform image to numpy array
file_bytes = tf.keras.preprocessing.image.load_img(uploaded_file, target_size=(96,96), grayscale = False, interpolation = 'nearest', color_mode = 'rgb', keep_aspect_ratio = False)
c.image(file_bytes, channels="RGB")
Genrate_pred = st.button("Generate Prediction")
if Genrate_pred:
model = loadMetModel()
input_arr = tf.keras.preprocessing.image.img_to_array(file_bytes)
input_arr = np.array([input_arr])
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])
prediction = probability_model.predict(input_arr)
dict_pred = {0: 'Benigno/Normal', 1: 'Maligno'}
result = dict_pred[np.argmax(prediction)]
value = 0
if result == 'Benigno/Normal':
value = str(((prediction[0][0])*100).round(2)) + '%'
else:
value = str(((prediction[0][1])*100).round(2)) + '%'
c.metric('Predição', result, delta=value, delta_color='normal')
Thank you in advance to any help.
The first thing I'm noticing is that your function for loading the model is named loadIDCModel, but then the function you call for loading the model is loadMetModel. When I check your source code, though, it looks like you've already addressed this issue. I'd recommend updating your question to reflect this.
Playing around with your application, I think the issue is your model itself. I tried various images — images containing carcinomas, and even a picture of a cat — and each gave me a probability around 73%. The lowest score I got was 72.74%, and the highest was 73.11% (this one was the cat). It seems that the output percentage is varying slightly, hinting that rather than something being wrong in the code, your model itself is likely at fault. You might need to retrain your model, as it seems to have learned to always return a value of approximately 0.73.

Why does keras (SGD) optimizer.minimize() not reach global minimum in this example?

I'm in the process of completing a TensorFlow tutorial via DataCamp and am transcribing/replicating the code examples I am working through in my own Jupyter notebook.
Here are the original instructions from the coding problem :
I'm running the following snippet of code and am not able to arrive at the same result that I am generating within the tutorial, which I have confirmed are the correct values via a connected scatterplot of x vs. loss_function(x) as seen a bit further below.
# imports
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import Variable, keras
def loss_function(x):
import math
return 4.0*math.cos(x-1)+np.divide(math.cos(2.0*math.pi*x),x)
# Initialize x_1 and x_2
x_1 = Variable(6.0, np.float32)
x_2 = Variable(0.3, np.float32)
# Define the optimization operation
opt = keras.optimizers.SGD(learning_rate=0.01)
for j in range(100):
# Perform minimization using the loss function and x_1
opt.minimize(lambda: loss_function(x_1), var_list=[x_1])
# Perform minimization using the loss function and x_2
opt.minimize(lambda: loss_function(x_2), var_list=[x_2])
# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())
I draw a quick connected scatterplot to confirm (successfully) that the loss function that I using gets me back to the same graph provided by the example (seen in screenshot above)
# Generate loss_function(x) values for given range of x-values
losses = []
for p in np.linspace(0.1, 6.0, 60):
losses.append(loss_function(p))
# Define x,y coordinates
x_coordinates = list(np.linspace(0.1, 6.0, 60))
y_coordinates = losses
# Plot
plt.scatter(x_coordinates, y_coordinates)
plt.plot(x_coordinates, y_coordinates)
plt.title('Plot of Input values (x) vs. Losses')
plt.xlabel('x')
plt.ylabel('loss_function(x)')
plt.show()
Here are the resulting global and local minima, respectively, as per the DataCamp environment :
4.38 is the correct global minimum, and 0.42 indeed corresponds to the first local minima on the graphs RHS (when starting from x_2 = 0.3)
And here are the results from my environment, both of which move opposite the direction that they should be moving towards when seeking to minimize the loss value:
I've spent the better part of the last 90 minutes trying to sort out why my results disagree with those of the DataCamp console / why the optimizer fails to minimize this loss for this simple toy example...?
I appreciate any suggestions that you might have after you've run the provided code in your own environments, many thanks in advance!!!
As it turned out, the difference in outputs arose from the default precision of tf.division() (vs np.division()) and tf.cos() (vs math.cos()) -- operations which were specified in (my transcribed, "custom") definition of the loss_function().
The loss_function() had been predefined in the body of the tutorial and when I "inspected" it using the inspect package ( using inspect.getsourcelines(loss_function) ) in order to redefine it in my own environment, the output of said inspection didn't clearly indicate that tf.division & tf.cos had been used instead of their NumPy counterparts (which my version of the code had used).
The actual difference is quite small, but is apparently sufficient to push the optimizer in the opposite direction (away from the two respective minima).
After swapping in tf.division() and tf.cos (as seen below) I was able to arrive at the same results as seen in the DC console.
Here is the code for the loss_function that will back in to the same results as seen in the console (screenshot) :
def loss_function(x):
import math
return 4.0*tf.cos(x-1)+tf.divide(tf.cos(2.0*math.pi*x),x)

How to use torch to speed up some common computations?

I am trying make some common computations, like matrix multiplication, but without gradient computation. An example of my computation is like
import numpy as np
from scipy.special import logsumexp
var = 1e-8
a = np.random.randint(0,10,(128,20))
result = np.logsumexp(a, axis=1) / 2. + np.log(np.pi * var)
I want to use torch (gpu) to speed up the computation. Here is the code
import numpy as np
import torch
var = 1e-8
a = np.random.randint(0,10,(128,20))
a = torch.numpy_from(a).cuda()
result = torch.logsumexp(a, dim=1)/ 2. + np.log(np.pi*var)
but i have some questions:
Could the above code speed up the computation? I don't know if it works.
Do I need to convert all values into torch.tensor, like from var to torch.tensor(var).cuda() and from np.log(np.pi*var) to a torch.tensor?
Do I need to convert all tensors into gpu by myself, especially for some intermediate variable?
If the above code doesn't work, how can I speed up the computation with gpu?
You could use torch only to do the computations.
import torch
# optimization by passing device argument, tensor is created on gpu and hence move operation is saved
# convert to float to use with logsumexp
a = torch.randint(0,10, (128,20), device="cuda").float()
result = torch.logsumexp(a, dim=1)/ 2.
Answers to your some of your questions:
Could the above code speed up the computation?
It depends. If you have too many matrix multiplication, using gpu can give speed up.
Do I need to convert all values into torch.tensor, like from var to torch.tensor(var).cuda() and from np.log(np.pi*var) to a torch.tensor?
Yes
Do I need to convert all tensors into gpu by myself, especially for some intermediate variable?
Only leaf variables need to converted, intermediate variable will be placed on device on which the operations are done. For ex: if a and b are on gpu, then as a result of operation c=a+b, c will also be on gpu.

pybrain LSTM sequence to predict sequential data

I have written a simple code using pybrain to predict a simple sequential data.
For example a sequence of 0,1,2,3,4 will supposed to get an output of 5 from the network. The dataset specifies the remaining sequence.
Below are my codes implementation
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.datasets import SequentialDataSet
from pybrain.structure import SigmoidLayer, LinearLayer
from pybrain.structure import LSTMLayer
import itertools
import numpy as np
INPUTS = 5
OUTPUTS = 1
HIDDEN = 40
net = buildNetwork(INPUTS, HIDDEN, OUTPUTS, hiddenclass=LSTMLayer, outclass=LinearLayer, recurrent=True, bias=True)
ds = SequentialDataSet(INPUTS, OUTPUTS)
ds.addSample([0,1,2,3,4],[5])
ds.addSample([5,6,7,8,9],[10])
ds.addSample([10,11,12,13,14],[15])
ds.addSample([16,17,18,19,20],[21])
net.randomize()
trainer = BackpropTrainer(net, ds)
for _ in range(1000):
print trainer.train()
x=net.activate([0,1,2,3,4])
print x
The output on my screen keeps showing [0.99999999 0.99999999 0.9999999 0.99999999] every simple time. What am I missing? Is the training not sufficient? Because trainer.train()
shows output of 86.625..
The pybrain sigmoidLayer is implementing the sigmoid squashing function, which you can see here:
sigmoid squashing function code
The relevant part is this:
def sigmoid(x):
""" Logistic sigmoid function. """
return 1. / (1. + safeExp(-x))
So, no matter what the value of x, it will only ever return values between 0 and 1. For this reason, and for others, it is a good idea to scale your input and output values to between 0 and 1. For example, divide all your inputs by the maximum value (assuming the minimum is no lower than 0), and the same for your outputs. Then do the reverse with the result (e.g. multiply by 25 if you were dividing by 25 at the beginning).
Also, I'm no expert on pybrain, but I wonder if you need OUTPUTS = 4? It looks like you have only one output in your data, so I'm wondering if you could just use OUTPUTS = 1.
You may also try scaling the inputs and outputs to a particular part of the sigmoid curve (e.g. between 0.1 and 0.9) to make the pybrain's job easier, but that makes the scaling before and after a little more complex.