Keras Loss Function with Additional Dynamic Parameter - tensorflow

I'm working on implementing prioritized experience replay for a deep-q network, and part of the specification is to multiply gradients by what's know as importance sampling (IS) weights. The gradient modification is discussed in section 3.4 of the following paper: https://arxiv.org/pdf/1511.05952.pdf I'm struggling with creating a custom loss function that takes in an array of IS weights in addition to y_true and y_pred.
Here's a simplified version of my model:
import numpy as np
import tensorflow as tf
# Input is RAM, each byte in the range of [0, 255].
in_obs = tf.keras.layers.Input(shape=(4,))
# Normalize the observation to the range of [0, 1].
norm = tf.keras.layers.Lambda(lambda x: x / 255.0)(in_obs)
# Hidden layers.
dense1 = tf.keras.layers.Dense(128, activation="relu")(norm)
dense2 = tf.keras.layers.Dense(128, activation="relu")(dense1)
dense3 = tf.keras.layers.Dense(128, activation="relu")(dense2)
dense4 = tf.keras.layers.Dense(128, activation="relu")(dense3)
# Output prediction, which is an action to take.
out_pred = tf.keras.layers.Dense(2, activation="linear")(dense4)
opt = tf.keras.optimizers.Adam(lr=5e-5)
network = tf.keras.models.Model(inputs=in_obs, outputs=out_pred)
network.compile(optimizer=opt, loss=huber_loss_mean_weighted)
Here's my custom loss function, which is just an implementation of Huber Loss multiplied by the IS weights:
'''
' Huber loss: https://en.wikipedia.org/wiki/Huber_loss
'''
def huber_loss(y_true, y_pred):
error = y_true - y_pred
cond = tf.keras.backend.abs(error) < 1.0
squared_loss = 0.5 * tf.keras.backend.square(error)
linear_loss = tf.keras.backend.abs(error) - 0.5
return tf.where(cond, squared_loss, linear_loss)
'''
' Importance Sampling weighted huber loss.
'''
def huber_loss_mean_weighted(y_true, y_pred, is_weights):
error = huber_loss(y_true, y_pred)
return tf.keras.backend.mean(error * is_weights)
The important bit is that is_weights is dynamic, i.e. it's different each time fit() is called. As such, I cannot simply close over is_weights as described here: Make a custom loss function in keras
I found this code online, which appears to use a Lambda layer to compute the loss: https://github.com/keras-team/keras/blob/master/examples/image_ocr.py#L475 It looks promising, but I'm struggling to understand it/adapt it to my particular problem. Any help is appreciated.

OK. Here is an example.
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Flatten
from keras.models import Model
from keras.losses import categorical_crossentropy
def sample_loss( y_true, y_pred, is_weight ) :
return is_weight * categorical_crossentropy( y_true, y_pred )
x = Input(shape=(32,32,3), name='image_in')
y_true = Input( shape=(10,), name='y_true' )
is_weight = Input(shape=(1,), name='is_weight')
f = Conv2D(16,(3,3),padding='same')(x)
f = MaxPool2D((2,2),padding='same')(f)
f = Conv2D(32,(3,3),padding='same')(f)
f = MaxPool2D((2,2),padding='same')(f)
f = Conv2D(64,(3,3),padding='same')(f)
f = MaxPool2D((2,2),padding='same')(f)
f = Flatten()(f)
y_pred = Dense(10, activation='softmax', name='y_pred' )(f)
model = Model( inputs=[x, y_true, is_weight], outputs=y_pred, name='train_only' )
model.add_loss( sample_loss( y_true, y_pred, is_weight ) )
model.compile( loss=None, optimizer='sgd' )
print model.summary()
Note, since you've add loss through add_loss(), you don't have to do it through compile( loss=xxx ).
With regards to train a model, nothing is special except you move y_true to your input end. See below
import numpy as np
a = np.random.randn(8,32,32,3)
a_true = np.random.randn(8,10)
a_is_weight = np.random.randint(0,2,size=(8,1))
model.fit( [a, a_true, a_is_weight] )
Finally, you can make a testing model (which share all weights in model) for easier use, i.e.
test_model = Model( inputs=x, outputs=y_pred, name='test_only' )
a_pred = test_model.predict( a )

Related

How to apply Triplet Loss for a ResNet50 based Siamese Network in Keras or Tf 2

I have a ResNet based siamese network which uses the idea that you try to minimize the l-2 distance between 2 images and then apply a sigmoid so that it gives you {0:'same',1:'different'} output and based on how far the prediction is, you just flow the gradients back to network but there is a problem that updation of gradients is too little as we're changing the distance between {0,1} so I thought of using the same architecture but based on Triplet Loss.
I1 = Input(shape=image_shape)
I2 = Input(shape=image_shape)
res_m_1 = ResNet50(include_top=False, weights='imagenet', input_tensor=I1, pooling='avg')
res_m_2 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
x1 = res_m_1.output
x2 = res_m_2.output
# x = Flatten()(x) or use this one if not using any pooling layer
distance = Lambda( lambda tensors : K.abs( tensors[0] - tensors[1] )) ([x1,x2] )
final_output = Dense(1,activation='sigmoid')(distance)
siamese_model = Model(inputs=[I1,I2], outputs=final_output)
siamese_model.compile(loss='binary_crossentropy',optimizer=Adam(),metrics['acc'])
siamese_model.fit_generator(train_gen,steps_per_epoch=1000,epochs=10,validation_data=validation_data)
So how can I change it to use the Triplet Loss function? What adjustments should be done here in order to get this done? One change will be that I'll have to calculate
res_m_3 = ResNet50(include_top=False, weights='imagenet', input_tensor=I2, pooling='avg')
x3 = res_m_3.output
One thing found in tf docs is triplet-semi-hard-loss and is given as:
tfa.losses.TripletSemiHardLoss()
As shown in the paper, the best results are from triplets known as "Semi-Hard". These are defined as triplets where the negative is farther from the anchor than the positive, but still produces a positive loss. To efficiently find these triplets we utilize online learning and only train from the Semi-Hard examples in each batch.
Another implementation of Triplet Loss which I found on Kaggle is: Triplet Loss Keras
Which one should I use and most importantly, HOW?
P.S: People also use something like: x = Lambda(lambda x: K.l2_normalize(x,axis=1))(x) after model.output. Why is that? What is this doing?
Following this answer of mine, and with role of TripletSemiHardLoss in mind, we could do following:
import tensorflow as tf
import tensorflow_addons as tfa
import tensorflow_datasets as tfds
from tensorflow.keras import models, layers
BATCH_SIZE = 32
LATENT_DEM = 128
def _normalize_img(img, label):
img = tf.cast(img, tf.float32) / 255.
return (img, label)
train_dataset, test_dataset = tfds.load(name="mnist", split=['train', 'test'], as_supervised=True)
# Build your input pipelines
train_dataset = train_dataset.shuffle(1024).batch(BATCH_SIZE)
train_dataset = train_dataset.map(_normalize_img)
test_dataset = test_dataset.batch(BATCH_SIZE)
test_dataset = test_dataset.map(_normalize_img)
inputs = layers.Input(shape=(28, 28, 1))
resNet50 = tf.keras.applications.ResNet50(include_top=False, weights=None, input_tensor=inputs, pooling='avg')
outputs = layers.Dense(LATENT_DEM, activation=None)(resNet50.output) # No activation on final dense layer
outputs = layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=1))(outputs) # L2 normalize embedding
siamese_model = models.Model(inputs=inputs, outputs=outputs)
# Compile the model
siamese_model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tfa.losses.TripletSemiHardLoss())
# Train the network
history = siamese_model.fit(
train_dataset,
epochs=3)

Loss function with derivative in TensorFlow 2

I am using TF2 (2.3.0) NN to approximate the function y which solves the ODE: y'+3y=0
I have defined cutsom loss class and function in which I am trying to differentiate the single output with respect to the single input so the equation holds, provided that y_true is zero:
from tensorflow.keras.losses import Loss
import tensorflow as tf
class CustomLossOde(Loss):
def __init__(self, x, model, name='ode_loss'):
super().__init__(name=name)
self.x = x
self.model = model
def call(self, y_true, y_pred):
with tf.GradientTape() as tape:
tape.watch(self.x)
y_p = self.model(self.x)
dy_dx = tape.gradient(y_p, self.x)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
return loss
but running the following NN:
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
from custom_loss_ode import CustomLossOde
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
loss = CustomLossOde(model.input, model)
model.compile(optimizer=Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99),loss=loss)
model.run_eagerly = True
model.fit(x_train, y_train, batch_size=16, epochs=30)
for now I am getting 0 loss from the fisrt epoch, which doesn't make any sense.
I have printed both y_true and y_test from within the function and they seem OK so I suspect that the problem is in the gradien which I didn't succeed to print.
Apprecitate any help
Defining a custom loss with the high level Keras API is a bit difficult in that case. I would instead write the training loop from scracth, as it allows a finer grained control over what you can do.
I took inspiration from those two guides :
Advanced Automatic Differentiation
Writing a training loop from scratch
Basically, I used the fact that multiple tape can interact seamlessly. I use one to compute the loss function, the other to calculate the gradients to be propagated by the optimizer.
import tensorflow as tf
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input
num_samples = 1024
x_train = 4 * (tf.random.uniform((num_samples, )) - 0.5)
y_train = tf.zeros((num_samples, ))
inputs = Input(shape=(1,))
x = Dense(16, 'tanh')(inputs)
x = Dense(8, 'tanh')(x)
x = Dense(4)(x)
y = Dense(1)(x)
model = Model(inputs=inputs, outputs=y)
# using the high level tf.data API for data handling
x_train = tf.reshape(x_train,(-1,1))
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(1)
opt = Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.99)
for step, (x,y_true) in enumerate(dataset):
# we need to convert x to a variable if we want the tape to be
# able to compute the gradient according to x
x_variable = tf.Variable(x)
with tf.GradientTape() as model_tape:
with tf.GradientTape() as loss_tape:
loss_tape.watch(x_variable)
y_pred = model(x_variable)
dy_dx = loss_tape.gradient(y_pred, x_variable)
loss = tf.math.reduce_mean(tf.square(dy_dx + 3 * y_pred - y_true))
grad = model_tape.gradient(loss, model.trainable_variables)
opt.apply_gradients(zip(grad, model.trainable_variables))
if step%20==0:
print(f"Step {step}: loss={loss.numpy()}")

How to solve "KeyError: '/conv2d_1/kernel:0'"

I am trying to use the colab to run the gym package with pacman, since the spec in colab is more powerful than my notebook. This program is successful simulate in Jupyter in my notebook, which using tensorflow 1.14. However, the errors keep appears when I put in google colab to simulate, and I also debug and change part of the code, so that the code can be used in tensor flow 2.0. Below is my code
#First we import all the necessary libraries
import numpy as np
import gym
import tensorflow as tf
from tensorflow import keras
from keras.layers import Flatten, Conv2D, Dense
#from tensorflow.contrib.layers import Flatten, conv2d, Dense
from collections import deque, Counter
import random
from datetime import datetime
#Now we define a function called preprocess_observation for preprocessing our input game screen.
#We reduce the image size and convert the image into greyscale.
color = np.array([210, 164, 74]).mean()
def preprocess_observation(obs):
# Crop and resize the image
img = obs[1:176:2, ::2]
# Convert the image to greyscale
img = img.mean(axis=2)
# Improve image contrast
img[img==color] = 0
# Next we normalize the image from -1 to +1
img = (img - 128) / 128-1
return img.reshape(88,80,1)
#Let us initialize our gym environment
env = gym.make('MsPacman-v0')
n_outputs = env.action_space.n
print(n_outputs)
print(env.env.get_action_meanings())
observation = env.reset()
import tensorflow as tf
import matplotlib.pyplot as plt
for i in range(22):
if i > 20:
plt.imshow(observation)
plt.show()
observation, _, _, _ = env.step(1)
#Okay, Now we define a function called q_network for building our Q network. We input the game state to the Q network
#and get the Q values for all the actions in that state.
#We build Q network with three convolutional layers with same padding followed by a fully connected layer.
tf.compat.v1.reset_default_graph()
def q_network(X, name_scope):
# Initialize layers
initializer = tf.compat.v1.keras.initializers.VarianceScaling(scale=2.0)
with tf.compat.v1.variable_scope(name_scope) as scope:
# initialize the convolutional layers
#layer_1 = tf.keras.layers.Conv2D(X, 32, kernel_size=(8,8), stride=4, padding='SAME', weights_initializer=initializer)
layer_1_set = Conv2D(32, (8,8), strides=4, padding="SAME", kernel_initializer=initializer)
layer_1= layer_1_set(X)
tf.compat.v1.summary.histogram('layer_1',layer_1)
#layer_2 = tf.keras.layers.Conv2D(layer_1, 64, kernel_size=(4,4), stride=2, padding='SAME', weights_initializer=initializer)
layer_2_set = Conv2D(64, (4,4), strides=2, padding="SAME", kernel_initializer=initializer)
layer_2= layer_2_set(layer_1)
tf.compat.v1.summary.histogram('layer_2',layer_2)
#layer_3 = tf.keras.layers.Conv2D(layer_2, 64, kernel_size=(3,3), stride=1, padding='SAME', weights_initializer=initializer)
layer_3_set = Conv2D(64, (3,3), strides=1, padding="SAME", kernel_initializer=initializer)
layer_3= layer_3_set(layer_2)
tf.compat.v1.summary.histogram('layer_3',layer_3)
flatten_layer = Flatten() # instantiate the layer
flat = flatten_layer(layer_3)
fc_set = Dense(128, kernel_initializer=initializer)
fc=fc_set(flat)
tf.compat.v1.summary.histogram('fc',fc)
#Add final output layer
output_set = Dense(n_outputs, activation= None, kernel_initializer=initializer)
output= output_set(fc)
tf.compat.v1.summary.histogram('output',output)
vars = {v.name[len(scope.name):]: v for v in tf.compat.v1.get_collection(key=tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope=scope.name)}
#Return both variables and outputs together
return vars, output
#Next we define a function called epsilon_greedy for performing epsilon greedy policy. In epsilon greedy policy we either select the best action
#with probability 1 - epsilon or a random action with probability epsilon.
#We use decaying epsilon greedy policy where value of epsilon will be decaying over time
#as we don't want to explore forever. So over time our policy will be exploiting only good actions.
epsilon = 0.5
eps_min = 0.05
eps_max = 1.0
eps_decay_steps = 500000
def epsilon_greedy(action, step):
p = np.random.random(1).squeeze()
epsilon = max(eps_min, eps_max - (eps_max-eps_min) * step/eps_decay_steps)
if np.random.rand() < epsilon:
return np.random.randint(n_outputs)
else:
return action
#Now, we initialize our experience replay buffer of length 20000 which holds the experience.
#We store all the agent's experience i.e (state, action, rewards) in the
#experience replay buffer and we sample from this minibatch of experience for training the network.
buffer_len = 20000
exp_buffer = deque(maxlen=buffer_len)
# Now we define our network hyperparameters,
num_episodes = 800
batch_size = 48
input_shape = (None, 88, 80, 1)
learning_rate = 0.001
X_shape = (None, 88, 80, 1)
discount_factor = 0.97
global_step = 0
copy_steps = 100
steps_train = 4
start_steps = 2000
logdir = 'logs'
tf.compat.v1.reset_default_graph()
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
# Now we define the placeholder for our input i.e game state
X = tf.placeholder(tf.float32, shape=X_shape)
#X = tf.Variable(tf.float32, tf.ones(shape=X_shape))
# we define a boolean called in_training_model to toggle the training
in_training_mode = tf.placeholder(tf.bool)
# we build our Q network, which takes the input X and generates Q values for all the actions in the state
mainQ, mainQ_outputs = q_network(X, 'mainQ')
# similarly we build our target Q network, for policy evaluation
targetQ, targetQ_outputs = q_network(X, 'targetQ')
# define the placeholder for our action values
X_action = tf.placeholder(tf.int32, shape=(None,))
Q_action = tf.reduce_sum(targetQ_outputs * tf.one_hot(X_action, n_outputs), axis=-1, keepdims=True)
#Copy the primary Q network parameters to the target Q network
copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in mainQ.items()]
copy_target_to_main = tf.group(*copy_op)
#Compute and optimize loss using gradient descent optimizer
# define a placeholder for our output i.e action
y = tf.placeholder(tf.float32, shape=(None,1))
# now we calculate the loss which is the difference between actual value and predicted value
loss = tf.reduce_mean(tf.square(y - Q_action))
# we use adam optimizer for minimizing the loss
optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
loss_summary = tf.summary.scalar('LOSS', loss)
merge_summary = tf.summary.merge_all()
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())
Ok up to here, the error come out when i run this cell in colab :
#Copy the primary Q network parameters to the target Q network
copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in mainQ.items()]
copy_target_to_main = tf.group(*copy_op)
The error gives:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-13-58715282cea8> in <module>()
----> 1 copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in mainQ.items()]
2 copy_target_to_main = tf.group(*copy_op)
<ipython-input-13-58715282cea8> in <listcomp>(.0)
----> 1 copy_op = [tf.compat.v1.assign(main_name, targetQ[var_name]) for var_name, main_name in mainQ.items()]
2 copy_target_to_main = tf.group(*copy_op)
KeyError: '/conv2d_1/kernel:0'
I have two question?
First, how to solve the question that already stated above.
Second, in tensor-flow 2.0 above,the placeholder command is replaced by tf.Variable, i rewrite the code:
X = tf.placeholder(tf.float32, shape=X_shape) to become
X = tf.Variable(tf.float32, tf.ones(shape=X_shape))
and still get error, and i have to use command below:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
X = tf.placeholder(tf.float32, shape=X_shape)
but get warning like this:
WARNING:tensorflow:From /usr/local/lib/python3.6/dist- packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating: non-resource variables are not supported in the long term
I doing intensive searching in the Stack overflow website by using keyword, yet i can't find solution. Really looking forward to any advise. Thank you very much.

Bayesian Model does not learn with tensorflow probability and keras

I want to estimate epistemic uncertainty of my model. So I converted all layers into tensorflow probability layers. The model gives no errors back, but it also not learning anything. The model has two outputs and the losses of both outputs do not change at all. On the other hand, the overall loss of the model is shrinking, but seems not be related to the other losses at all, which I cant explain.
import numpy as np
from tensorflow import keras
import tensorflow_probability as tfp
import tensorflow as tf
from plot.plot_utils import plot_model_metrics
from Custom_Keras_layers.ProbSqueezeExcite import squeeze_excite_block
inp = keras.layers.Input(shape=[self.timesteps, self.features])
# left side
# 1 Conv1D block
l = tfp.layers.Convolution1DFlipout(filters=2*self.features, kernel_size=2, padding='same', activation=tf.nn.relu)(inp)
l = keras.layers.BatchNormalization()(l)
if squeeze_excite == 1:
l = squeeze_excite_block(l)
l = keras.layers.Dropout(dropout_rate)(l, training=True)
# 1 Conv1D block
l = tfp.layers.Convolution1DFlipout(filters=4 * self.features, kernel_size=4, padding='same', activation=tf.nn.relu)(l)
l = keras.layers.BatchNormalization()(l)
if squeeze_excite == 1:
l = squeeze_excite_block(l)
l = keras.layers.Dropout(dropout_rate)(l, training=True)
# 1 lstm bock
l = keras.layers.LSTM(32, recurrent_dropout=dropout_rate, dropout=dropout_rate)(l, training=True)
# letf output layer
l = tfp.layers.DenseFlipout(self.classes, activation=tf.nn.softmax, name='left')(l)
# right side
# 1 Conv1D block
r = tfp.layers.Convolution1DFlipout(filters=2 * self.features, kernel_size=2, padding='same', activation=tf.nn.relu)(inp)
r = keras.layers.BatchNormalization()(r)
if squeeze_excite == 1:
r = squeeze_excite_block(r)
r = keras.layers.Dropout(dropout_rate)(r, training=True)
# 1 Conv1D block
r = tfp.layers.Convolution1DFlipout(filters=4 * self.features, kernel_size=4, padding='same', activation=tf.nn.relu)(r)
r = keras.layers.BatchNormalization()(r)
if squeeze_excite == 1:
r = squeeze_excite_block(r)
r = keras.layers.Dropout(dropout_rate)(r, training=True)
# 1 lstm bock
r = keras.layers.LSTM(32, recurrent_dropout=dropout_rate, dropout=dropout_rate)(r, training=True)
# letf output layer
r = tfp.layers.DenseFlipout(self.classes, activation=tf.nn.softmax, name='right')(r)
model = keras.models.Model(inputs=inp, outputs=[l, r])
optimizer = tf.train.AdamOptimizer(learning_rate=lr)
losses = {
"left": self._neg_log_likelihood_bayesian,
"right": self._neg_log_likelihood_bayesian}
model.compile(optimizer=optimizer, loss=losses, metrics=['accuracy'])
self.model = model
and the loss function is defined as follows:
def _neg_log_likelihood_bayesian(self, y_true, y_pred):
labels_distribution = tfp.distributions.Categorical(logits=y_pred)
neg_log_likelihood = -tf.reduce_mean(labels_distribution.log_prob(tf.argmax(y_true, axis=-1)))
kl = sum(self.model.losses) / self.trainNUM
loss = neg_log_likelihood + kl
return loss
Any help would be appreciated. The overall loss starts at 45000, whereas the two outputs losses are at around 1,3. It is very strange to me.
Thanks to this post on the issues forum of github tensorflow, I found out how to solve it https://github.com/tensorflow/probability/issues/282
You have to scale the KL sum within each tfp layer:
kernel_divergence_fn=lambda q, p, _: tfp.distributions.kl_divergence(q, p) / tf.to_float(train.num_examples))
Furthermore I changed the loss function to :
neg_log_likelihood =
tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_true,
logits=y_pred)
That did it for me, now my model is properly training.
The overall loss might already include the prior loss kl(sampled weights||prior), so could be doubke counted? (I'm not sure how Keras handles this.) Another thought is to try using reduce_sum instead of reduce_mean.

Translating Pytorch program into Keras: different results

I have translated a pytorch program into keras.
A working Pytorch program:
import numpy as np
import cv2
import torch
import torch.nn as nn
from skimage import segmentation
np.random.seed(1)
torch.manual_seed(1)
fi = "in.jpg"
class MyNet(nn.Module):
def __init__(self, n_inChannel, n_outChannel):
super(MyNet, self).__init__()
self.seq = nn.Sequential(
nn.Conv2d(n_inChannel, n_outChannel, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.BatchNorm2d(n_outChannel),
nn.Conv2d(n_outChannel, n_outChannel, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.BatchNorm2d(n_outChannel),
nn.Conv2d(n_outChannel, n_outChannel, kernel_size=1, stride=1, padding=0),
nn.BatchNorm2d(n_outChannel)
)
def forward(self, x):
return self.seq(x)
im = cv2.imread(fi)
data = torch.from_numpy(np.array([im.transpose((2, 0, 1)).astype('float32')/255.]))
data = data.cuda()
labels = segmentation.slic(im, compactness=100, n_segments=10000)
labels = labels.flatten()
u_labels = np.unique(labels)
label_indexes = np.array([np.where(labels == u_label)[0] for u_label in u_labels])
n_inChannel = 3
n_outChannel = 100
model = MyNet(n_inChannel, n_outChannel)
model.cuda()
model.train()
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
label_colours = np.random.randint(255,size=(100,3))
for batch_idx in range(100):
optimizer.zero_grad()
output = model( data )[ 0 ]
output = output.permute( 1, 2, 0 ).view(-1, n_outChannel)
ignore, target = torch.max( output, 1 )
im_target = target.data.cpu().numpy()
nLabels = len(np.unique(im_target))
im_target_rgb = np.array([label_colours[ c % 100 ] for c in im_target]) # correct position of "im_target"
im_target_rgb = im_target_rgb.reshape( im.shape ).astype( np.uint8 )
for inds in label_indexes:
u_labels_, hist = np.unique(im_target[inds], return_counts=True)
im_target[inds] = u_labels_[np.argmax(hist, 0)]
target = torch.from_numpy(im_target)
target = target.cuda()
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
print (batch_idx, '/', 100, ':', nLabels, loss.item())
if nLabels <= 3:
break
fo = "out.jpg"
cv2.imwrite(fo, im_target_rgb)
(source: https://github.com/kanezaki/pytorch-unsupervised-segmentation/blob/master/demo.py)
My translation into Keras:
import cv2
import numpy as np
from skimage import segmentation
from keras.layers import Conv2D, BatchNormalization, Input, Reshape
from keras.models import Model
import keras.backend as k
from keras.optimizers import SGD, Adam
from skimage.util import img_as_float
from skimage import io
from keras.models import Sequential
np.random.seed(0)
fi = "in.jpg"
im = cv2.imread(fi).astype(float)/255.
labels = segmentation.slic(im, compactness=100, n_segments=10000)
labels = labels.flatten()
print (labels.shape)
u_labels = np.unique(labels)
label_indexes = [np.where(labels == u_label)[0] for u_label in np.unique(labels)]
n_channels = 100
model = Sequential()
model.add ( Conv2D(n_channels, kernel_size=3, activation='relu', input_shape=im.shape, padding='same'))
model.add( BatchNormalization())
model.add( Conv2D(n_channels, kernel_size=3, activation='relu', padding='same'))
model.add( BatchNormalization())
model.add( Conv2D(n_channels, kernel_size=1, padding='same'))
model.add( BatchNormalization())
model.add( Reshape((im.shape[0] * im.shape[1], n_channels)))
img = np.expand_dims(im,0)
print (img.shape)
output = model.predict(img)
print (output.shape)
im_target = np.argmax(output[0], 1)
print (im_target.shape)
for inds in label_indexes:
u_labels_, hist = np.unique(im_target[inds], return_counts=True)
im_target[inds] = u_labels_[np.argmax(hist, 0)]
def custom_loss(loss_target, loss_output):
return k.categorical_crossentropy(target=k.stack(loss_target), output=k.stack(loss_output), from_logits=True)
model.compile(optimizer=SGD(lr=0.1, momentum=0.9), loss=custom_loss)
model.fit(img, output, epochs=100, batch_size=1, verbose=1)
pred_result = model.predict(x=[img])[0]
print (pred_result.shape)
target = np.argmax(pred_result, 1)
print (target.shape)
nLabels = len(np.unique(target))
label_colours = np.random.randint(255, size=(100, 3))
im_target_rgb = np.array([label_colours[c % 100] for c in im_target])
im_target_rgb = im_target_rgb.reshape(im.shape).astype(np.uint8)
cv2.imwrite("out.jpg", im_target_rgb)
However, Keras output is really different than of pytorch
Input image:
Pytorch result:
Keras result:
Could someone help me for this translation?
Edit 1:
I corrected two errors as advised by #sebrockm
1. removed `relu` from last conv layer
2. added `from_logits = True` in the loss function
Also, changed the no. of conv layers from 4 to 3 to match with the original code.
However, output image did not improve than before and the `loss` are resulted in negative:
Epoch 99/100
1/1 [==============================] - 0s 92ms/step - loss: -22.8380
Epoch 100/100
1/1 [==============================] - 0s 99ms/step - loss: -23.039
I think that the Keras code lacks connection between model and output. However, could not figure out to make this connection.
Two major mistakes that I see (likely related):
The last convolutional layer in the original model does not have an activation function, while your translation uses relu.
The original model uses CrossEntropyLoss as loss function, while your model uses categorical_crossentropy with logits=False (a default argument). Without mathematical background the difference is tricky to explain, but in short: CrossEntropyLoss has a softmax built in, that's why the model doesn't have one on the last layer. To do the same in keras, use k.categorical_crossentropy(..., logits=True). "logits" means the input values are expected not to be "softmaxed", i.e. all values can be arbitrary. Currently, your loss function expects the output values to be "softmaxed", i.e. all values must be between 0 and 1 (and sum up to 1).
Update:
One other mistake, likely a huge one: In Keras, you calculate the output once in the beginning and never change it from there on. Then you train your model to fit on this initially generated output.
In the original pytorch code, target (which is the variable being trained on) gets updated in every training loop.
So, you cannot use Keras' fit method which is designed for doing the entire training for you (given fixed training data). You will have to replicate the training loop manually, just as it is done in the pytorch code. I'm not sure if this is easily doable with the API Keras provides. train_on_batch is one method you surely will need in your manual loop. You will have to do some more work, I'm afraid...