Related
I have created custom loss (Weighted Absolute error) in keras but implementation doesn't work - I get an error ValueError: No gradients provided for any variable: ['my_model/conv2d/kernel:0', 'my_model/conv2d/bias:0'].
I want to apply different weight for each pixel.
class WeightedMeanAbsoluteError(tf.keras.metrics.Metric):
def __init__(self, name='weighted_mean_absolute_error'):
super(WeightedMeanAbsoluteError, self).__init__(name=name)
self.wmae = self.add_weight(name='wmae', initializer='zeros')
def update_state(self, y_true, y_pred, loss_weights):
values = tf.math.abs(y_true - y_pred) * loss_weights
return self.wmae.assign_add(tf.reduce_sum(values))
def result(self):
return self.wmae
def reset_states(self):
# The state of the metric will be reset at the start of each epoch.
self.wmae.assign(0.)
loss_object = WeightedMeanAbsoluteError()
train_loss = WeightedMeanAbsoluteError()
I use the following code to implement a training step:
#tf.function
def train_step(input_images, output_images):
with tf.GradientTape() as tape:
# training=True is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
result_images = model(input_images, training=True)
loss = loss_object(output_images, result_images)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
Also my code works just fine if I use
loss_object = tf.keras.losses.MeanAbsoluteError()
train_loss = tf.keras.metrics.MeanAbsoluteError()
The best and simple way to minimize a weighted standard loss (such mae) is using the sample_weights parameter in fit method where we pass an array with the desired weight of each sample
X = np.random.uniform(0,1, (1000,50))
y = np.random.uniform(0,1, 1000)
W = np.random.randint(1,10, 1000)
inp = Input((50))
x = Dense(64, activation='relu')(inp)
out = Dense(10)(x)
model = Model(inp, out)
model.compile('adam','mae')
model.fit(X,y, epochs=100, sample_weights=W)
I am using Tensorflow 2.1 to create custom models and custom training loops. My aim is to compare the accuracy of different configurations of my neural network. Specifically, in this case, I am comparing the reconstruction error of an AutoEncoder with varying latent dimension. Hence, I am training my network for one latent dimension then computing the test error and then I redo this process for another latent dimension, and so on. With this process I want to create plots like this:
Plot example:
To speed up the training I want to use the #tf.function decorator for the BackPropagation part of my training loop. However, when I try to train several different networks, looping over the latent dimension I get an error. See below:
ValueError: in converted code:
<ipython-input-19-78bafad21717>:41 grad *
loss_value = tf.losses.mean_squared_error(inputs, model(inputs))
/tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py:778 __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
<ipython-input-19-78bafad21717>:33 call *
x_enc = self.encoder(inp)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py:778 __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
<ipython-input-19-78bafad21717>:9 call *
x = self.dense1(inp)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py:748 __call__
self._maybe_build(inputs)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py:2116 _maybe_build
self.build(input_shapes)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/layers/core.py:1113 build
trainable=True)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py:446 add_weight
caching_device=caching_device)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/training/tracking/base.py:744 _add_variable_with_custom_getter
**kwargs_for_getter)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/engine/base_layer_utils.py:142 make_variable
shape=variable_shape if variable_shape else None)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/ops/variables.py:258 __call__
return cls._variable_v1_call(*args, **kwargs)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/ops/variables.py:219 _variable_v1_call
shape=shape)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/ops/variables.py:65 getter
return captured_getter(captured_previous, **kwargs)
/tensorflow-2.1.0/python3.6/tensorflow_core/python/eager/def_function.py:502 invalid_creator_scope
"tf.function-decorated function tried to create "
ValueError: tf.function-decorated function tried to create variables on non-first call.
I do not get this error when I remove #tf.function decorator. I believe if it has something to do with Tensorflow creating a computational graph when I use the decorator and this graph remains when I create another instance of my network. Thus, sparking an error since the old graph does not match the new instance of the network. But I am not sure about this at all, since I believe I am missing something fundamental about Tensorflow here!
Below is a very simply version of my code recreating the error. I have tried to remove all the unnecessary parts of the code to make it easier to read and debug. Furthermore, I am generating a very simply training and test set just for the sake of this question.
I have already tried the tf.keras.backend.clear_session() function without any luck.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Encoder
class build_encoder(tf.keras.Model):
def __init__(self,latent_dim):
super(build_encoder, self).__init__()
self.dense1 = tf.keras.layers.Dense(32, activation='relu',use_bias=True)
self.dense2 = tf.keras.layers.Dense(latent_dim, activation='relu',use_bias=True)
def call(self, inp):
x = self.dense1(inp)
x = self.dense2(x)
return x
# Decoder
class build_decoder(tf.keras.Model):
def __init__(self,):
super(build_decoder, self).__init__()
self.dense1 = tf.keras.layers.Dense(32, activation='relu',use_bias=True)
self.dense2 = tf.keras.layers.Dense(10, activation='relu',use_bias=True)
def call(self, inp):
x = self.dense1(inp)
x = self.dense2(x)
return x
# Full Autoencoder
class Autoencoder(tf.keras.Model):
def __init__(self,latent_dim=5):
super(Autoencoder, self).__init__()
self.encoder = build_encoder(latent_dim)
self.decoder = build_decoder()
def call(self, inp):
x_enc = self.encoder(inp)
x_dec = self.decoder(x_enc)
return x_dec
#### Here is the backpropagation with #tf.function decorator ####
#tf.function
def grad(model, inputs):
with tf.GradientTape() as tape:
loss_value = tf.losses.mean_squared_error(inputs, model(inputs))
return loss_value, tape.gradient(loss_value, model.trainable_variables)
# Training loop function
def train(x_train, model, num_epochs, batch_size,optimizer):
train_loss = []
for epoch in range(num_epochs):
tf.random.shuffle(x_train)
for i in range(0, len(x_train), batch_size):
x_inp = x_train[i: i + batch_size]
loss_value, grads = grad(model, x_inp)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
train_loss.append(tf.reduce_mean(tf.losses.mean_squared_error(x_train, model(x_train))).numpy())
if epoch % 100 == 0:
print("Epoch: {}, Train loss: {:.9f}".format(epoch, train_loss[epoch]))
return train_loss
#### Generating simple training and test data
num_train = 10000
num_test = 1000
x_train = s = np.random.uniform(0,1,(num_train,10)).astype(np.float32)
x_train[:,6:10] = 0
x_test = s = np.random.uniform(0,1,(num_test,10)).astype(np.float32)
x_test[:,6:10] = 0
###
batch_size = 8
num_epochs = 10000
test_loss = []
# Looping over the latent dimensions
for latent_dim in range(1,10):
model = Autoencoder(latent_dim=3) # Creating an instance of my Autoencoder
optimizer = tf.keras.optimizers.Adam(learning_rate=0.00005) # Defining an optimizer
train_loss = train(x_train, model=model, num_epochs=num_epochs, batch_size=batch_size, optimizer=optimizer) # Training the network
test_loss.append(tf.reduce_mean(tf.losses.mean_squared_error(x_test, model(x_test))).numpy())
plt.figure()
plt.plot(test_loss,linewidth=1.5)
plt.grid(True)
plt.show()
There's an error in the code snippet you provided.
I changed last Dense layer unit from 6 to 10.
# Decoder
class build_decoder(tf.keras.Model):
def __init__(self,):
super(build_decoder, self).__init__()
self.dense1 = tf.keras.layers.Dense(32, activation='relu',use_bias=True)
self.dense2 = tf.keras.layers.Dense(10, activation='relu',use_bias=True)
def call(self, inp):
x = self.dense1(inp)
x = self.dense2(x)
return x
As for your question on training multiple model.
The error message "ValueError: tf.function-decorated function tried to create variables on non-first call" means that the function decorated by #tf.function is creating a new variable on its next iteration, this is not allowed as this function is turned into a graph.
I have modified your back propagation method, I commented out your original code to observe the difference.
#### Here is the backpropagation with #tf.function decorator ####
# #tf.function
# def grad(model, inputs):
# with tf.GradientTape() as tape:
# loss_value = tf.losses.mean_squared_error(inputs, model(inputs))
# return loss_value, tape.gradient(loss_value, model.trainable_variables)
#tf.function
def MSE(y_true, y_pred):
return tf.keras.losses.MSE(y_true, y_pred)
def backprop(inputs, model):
with tf.GradientTape() as tape:
loss_value = MSE(inputs, model(inputs))
return loss_value, tape.gradient(loss_value, model.trainable_variables)
def gradient_func(model, inputs):
return backprop(inputs, model)
The main culprit of your original code was the calling of model(inputs) as an input in the Loss Function, when you decorate #tf.function in a function it is inherited on all the functions inside, this means the Loss function is optimized.
Also a way to train multiple model without rewriting single variable, is to put them into array.
model_array = [0]
# Looping over the latent dimensions
for latent_dim in range(1,10):
model_array.append(Autoencoder(latent_dim))
# Creating an instance of my Autoencoder
optimizer = tf.keras.optimizers.Adam(learning_rate=0.00005) # Defining an optimizer
train_loss = train(x_train, model=model_array[latent_dim], num_epochs=num_epochs, batch_size=batch_size, optimizer=optimizer) # Training the network
test_loss.append(tf.reduce_mean(tf.losses.mean_squared_error(x_test, model_array[latent_dim](x_test))).numpy())
This will arrange model into array, easier to be accessed and debugged.
Here is the complete modified code.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Encoder
class build_encoder(tf.keras.Model):
def __init__(self,latent_dim):
super(build_encoder, self).__init__()
self.dense1 = tf.keras.layers.Dense(32, activation='relu',use_bias=True)
self.dense2 = tf.keras.layers.Dense(latent_dim, activation='relu',use_bias=True)
def call(self, inp):
x = self.dense1(inp)
x = self.dense2(x)
return x
# Decoder
class build_decoder(tf.keras.Model):
def __init__(self,):
super(build_decoder, self).__init__()
self.dense1 = tf.keras.layers.Dense(32, activation='relu',use_bias=True)
self.dense2 = tf.keras.layers.Dense(10, activation='relu',use_bias=True)
def call(self, inp):
x = self.dense1(inp)
x = self.dense2(x)
return x
# Full Autoencoder
class Autoencoder(tf.keras.Model):
def __init__(self,latent_dim=5):
super(Autoencoder, self).__init__()
self.encoder = build_encoder(latent_dim)
self.decoder = build_decoder()
def call(self, inp):
x_enc = self.encoder(inp)
x_dec = self.decoder(x_enc)
return x_dec
#### Here is the backpropagation with #tf.function decorator ####
# #tf.function
# def grad(model, inputs):
# with tf.GradientTape() as tape:
# loss_value = tf.losses.mean_squared_error(inputs, model(inputs))
# return loss_value, tape.gradient(loss_value, model.trainable_variables)
#tf.function
def MSE(y_true, y_pred):
return tf.keras.losses.MSE(y_true, y_pred)
def backprop(inputs, model):
with tf.GradientTape() as tape:
loss_value = MSE(inputs, model(inputs))
return loss_value, tape.gradient(loss_value, model.trainable_variables)
def gradient_func(model, inputs):
return backprop(inputs, model)
# Training loop function
def train(x_train, model, num_epochs, batch_size,optimizer):
train_loss = []
for epoch in range(num_epochs):
tf.random.shuffle(x_train)
for i in range(0, len(x_train), batch_size):
x_inp = x_train[i: i + batch_size]
loss_value, grads = gradient_func(model, x_inp)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
train_loss.append(tf.reduce_mean(tf.losses.mean_squared_error(x_train, model(x_train))).numpy())
if epoch % 100 == 0:
print("Epoch: {}, Train loss: {:.9f}".format(epoch, train_loss[epoch]))
return train_loss
#### Generating simple training and test data
num_train = 10000
num_test = 1000
x_train = s = np.random.uniform(0,1,(num_train,10)).astype(np.float32)
x_train[:,6:10] = 0
x_test = s = np.random.uniform(0,1,(num_test,10)).astype(np.float32)
x_test[:,6:10] = 0
###
batch_size = 8
num_epochs = 10000
test_loss = []
model_array = [0]
# Looping over the latent dimensions
for latent_dim in range(1,10):
model_array.append(Autoencoder(latent_dim))
# Creating an instance of my Autoencoder
optimizer = tf.keras.optimizers.Adam(learning_rate=0.00005) # Defining an optimizer
train_loss = train(x_train, model=model_array[latent_dim], num_epochs=num_epochs, batch_size=batch_size, optimizer=optimizer) # Training the network
test_loss.append(tf.reduce_mean(tf.losses.mean_squared_error(x_test, model_array[latent_dim](x_test))).numpy())
plt.figure()
plt.plot(range(1,10),test_loss,linewidth=1.5)
plt.grid(True)
plt.show()
There is also a brief discussion about #tf.function and AutoGraphs in TF Documentation in this link.
Feel free to ask questions and hope this helps you.
For example, this is trivial but is there a layer for this? Is not really a convolution ... there is one "Dense layer" (weights) per data point.
In [266]: X = np.random.randn(10, 3); W = np.random.randn(10, 3, 4); (X[:, :, None] * W).sum(axis=1).shape
Out[266]: (10, 4)
Create your own layer:
Warning: works only with fixed batch size, you need to define batch_shape or batch_input_shape in your models!!!!
class SampleDense(Layer):
def __init__(self, units, **kwargs):
self.units = units
super(SampleDense, self).__init__(**kwargs)
def build(self, input_shape):
weight_shape = input_shape + (self.units,)
self.kernel = self.add_weight(name='kernel',
shape=weight_shape,
initializer='uniform',
trainable=True)
self.built = True
def call(self, inputs):
inputs = K.expand_dims(inputs, axis=-1)
outputs = inputs * self.kernel
outputs = K.sum(outputs, axis=-2)
return outputs
def compute_output_shape(self, input_shape):
return input_shape[:-1] + (self.units,)
I have a model which is composed of custom layers. Each custom layer contains many tf.keras.layers. The problem is that if I want to freeze those layers after defining my model, the loop:
for i, layer in enumerate(model.layers):
print(i, layer.name)
only prints the "outer" custom layers and not those who exist inside. Is there any way to access the inner layers so I can freeze them?
an example of a custom layer from the official tf docs:
class MLPBlock(layers.Layer):
def __init__(self):
super(MLPBlock, self).__init__()
self.linear_1 = Linear(32)
self.linear_2 = Linear(32)
self.linear_3 = Linear(1)
def call(self, inputs):
x = self.linear_1(inputs)
x = tf.nn.relu(x)
x = self.linear_2(x)
x = tf.nn.relu(x)
return self.linear_3(x)
You can use keras callbacks. If you want to freeze your first layer after some certain amount of epochs, add this callback
class FreezeCallback(tf.keras.callbacks.Callback):
def __init__(self, n_epochs=10):
super().__init__()
self.n_epochs = n_epochs
def on_epoch_end(self, epoch, logs=None):
if epoch == self.n_epochs:
l = self.model.get_layer('first')
l.trainable = False
What you are doing in your update function is to replace the first Dense() layer with another Dense() layer, this time setting trainable = false.
While this works, I would update the 'update' function as following:
def updt(self):
self.dense1.trainable = False
Ok i came up with a solution.
An "update" function must be implemented inside the custom layer, which updates the inner layers so that they become non trainable.
Here is a sample code:
import tensorflow as tf
import numpy as np
layers = tf.keras.layers
seq_model = tf.keras.models.Sequential
class MDBlock(layers.Layer):
def __init__(self):
super(MDBlock, self).__init__()
self.dense1 = layers.Dense(784, name="first")
self.dense2 = layers.Dense(32, name="second")
self.dense3 = layers.Dense(32, name="third")
self.dense4 = layers.Dense(1, activation='sigmoid', name="outp")
def call(self, inputs):
x = self.dense1(inputs)
x = tf.nn.relu(x)
x = self.dense2(x)
x = tf.nn.relu(x)
x = self.dense3(x)
x = tf.nn.relu(x)
x = self.dense4(x)
return x
def updt(self):
self.dense1.trainable = False
def __str__(self):
return "\nd1:{0}\nd2:{1}\nd3:{2}\nd4:{3}".format(self.dense1.trainable, self.dense2.trainable,
self.dense3.trainable, self.dense4.trainable)
# define layer block
layer = MDBlock()
model = seq_model()
model.add(layers.Input(shape=(784,)))
model.add(layer)
# Use updt function to make layers non-trainable
for i, layer in enumerate(model.layers):
layer.updt()
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
# Generate dummy data
data = np.random.random((1000, 784))
labels = np.random.randint(2, size=(1000, 1))
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
# print block's layers state
for i, layer in enumerate(model.layers):
print(i, layer)
Are there any code examples for using Tensorflow's sampled_softmax_loss or nce_loss functions with multi-label problems? That is, where num_true is more than one?
What follows is my attempt to create a wrapper for nce_loss() and sampled_softmax_loss() based Jeff Chao's work (https://github.com/joelthchao/keras). In the following code, if you change num_true to 1, both samplers work. But with num_true > 1, both samplers throw slightly different exceptions involving tensor shape.
The main program is a simple autoencoder that replicates the class of problem I'm trying to solve: multi-label testing with a huge number of output classes, with a Zipfian distribution. Comments and stack trace at the end.
import tensorflow as tf
import numpy as np
import keras.layers as layers
from keras.models import Model
from keras import backend as K
from keras import initializers,regularizers,constraints
from keras.models import Model
from keras.layers import Dense
from keras.engine.base_layer import InputSpec
from keras.engine.topology import Layer
from keras.engine.input_layer import Input
from tensorflow.keras.optimizers import Nadam, Adam
np.random.seed(10)
import random
def nce_loss_function(weights, biases, labels, inputs, num_sampled, num_classes, num_true):
if K.learning_phase() == 1:
loss = tf.nn.nce_loss(weights, biases, labels, inputs, num_sampled, num_classes, num_true,
partition_strategy="div")
else:
logits = tf.matmul(inputs, tf.transpose(weights))
logits = tf.nn.bias_add(logits, biases)
labels_one_hot = tf.one_hot(labels, num_classes)
loss = tf.nn.sigmoid_cross_entropy_with_logits(
labels=labels_one_hot[:][0][:],
logits=logits)
loss = tf.reduce_sum(loss, axis=1)
return loss
def sampled_softmax_loss_function(weights, biases, labels, inputs, num_sampled, num_classes, num_true):
if K.learning_phase() == 1:
return tf.nn.sampled_softmax_loss(weights, biases, labels, inputs, num_sampled, num_classes, num_true,
partition_strategy="div")
else:
logits = tf.matmul(inputs, tf.transpose(weights))
logits = tf.nn.bias_add(logits, biases)
labels_one_hot = tf.one_hot(labels, num_classes)
loss = tf.nn.softmax_cross_entropy_with_logits_v2(
labels=labels_one_hot,
logits=logits)
return loss
class Sampling(Layer):
"""Regular densely-connected NN layer with various sampling Loss.
`Sampling` implements the operation:
`output = dot(input, kernel) + bias`
`kernel` is a weights matrix created by the layer, and `bias` is a bias vector
created by the layer. Also, it adds a sampling Loss to the model.
See [reference](http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf).
# Example
```python
inputs = Input(shape=(4,))
target = Input(shape=(1,)) # sparse format, e.g. [1, 3, 2, 6, ...]
net = Dense(8)(inputs)
net = Sampling(units=128, num_sampled=32)([net, target])
model = Model(inputs=[inputs, target], outputs=net)
model.compile(optimizer='adam', loss=None)
x = np.random.rand(1000, 4)
y = np.random.randint(128, size=1000)
model.fit([x, y], None)
```
# Arguments
units: Positive integer, dimensionality of the output space (num classes).
num_sampled: Positive integer, number of classes to sample in Sampling Loss.
type: 'sampled_softmax', 'nce'
num_true: Max # of positive classes, pad to this for variable inputs
kernel_initializer: Initializer for the `kernel` weights matrix
(see [initializers](../initializers.md)).
bias_initializer: Initializer for the bias vector
(see [initializers](../initializers.md)).
kernel_regularizer: Regularizer function applied to
the `kernel` weights matrix
(see [regularizer](../regularizers.md)).
bias_regularizer: Regularizer function applied to the bias vector
(see [regularizer](../regularizers.md)).
activity_regularizer: Regularizer function applied to
the output of the layer (its "activation").
(see [regularizer](../regularizers.md)).
kernel_constraint: Constraint function applied to
the `kernel` weights matrix
(see [constraints](../constraints.md)).
bias_constraint: Constraint function applied to the bias vector
(see [constraints](../constraints.md)).
# Input shape
Two tensors. First one is 2D tensor with shape: `(batch_size, input_dim)`.
Second one is 1D tensor with length `batch_size`
# Output shape
2D tensor with shape: `(batch_size, units)`.
For instance, for a 2D input with shape `(batch_size, input_dim)`,
the output would have shape `(batch_size, units)`.
"""
def __init__(self,
units,
num_sampled,
type='sampled_softmax',
num_true=1,
kernel_initializer='glorot_uniform',
bias_initializer='zeros',
kernel_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
bias_constraint=None,
**kwargs):
if 'input_shape' not in kwargs and 'input_dim' in kwargs:
kwargs['input_shape'] = (kwargs.pop('input_dim'),)
super(Sampling, self).__init__(**kwargs)
self.units = units
self.num_sampled = num_sampled
if self.num_sampled > self.units:
raise Exception('num_sample: {} cannot be greater than units: {}'.format(
num_sampled, units))
self.type = type
if not (self.type == 'nce' or self.type == 'sampled_softmax'):
raise Exception('type {} is not a valid sampling loss type'.format(type))
self.num_true = num_true
self.kernel_initializer = initializers.get(kernel_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.activity_regularizer = regularizers.get(activity_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.bias_constraint = constraints.get(bias_constraint)
self.input_spec = [InputSpec(min_ndim=2), InputSpec(min_ndim=1)]
self.supports_masking = True
def build(self, input_shape):
assert len(input_shape) == 2
input_dim = input_shape[0][-1]
self.kernel = self.add_weight(shape=(input_dim, self.units),
initializer=self.kernel_initializer,
name='kernel',
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
self.bias = self.add_weight(shape=(self.units,),
initializer=self.bias_initializer,
name='bias',
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
self.input_spec[0] = InputSpec(min_ndim=2, axes={-1: input_dim})
self.built = True
def call(self, inputs):
pred, target = inputs
output = K.dot(pred, self.kernel)
output = K.bias_add(output, self.bias, data_format='channels_last')
# TODO : check train or test mode
if self.type == 'nce':
nce_loss = nce_loss_function(
K.transpose(self.kernel), self.bias, target, pred, self.num_sampled, self.units, self.num_true)
self.add_loss(K.mean(nce_loss))
else:
sampled_softmax_loss = sampled_softmax_loss_function(
K.transpose(self.kernel), self.bias, target, pred, self.num_sampled, self.units, self.num_true)
self.add_loss(K.mean(sampled_softmax_loss))
return output
def compute_output_shape(self, input_shape):
assert input_shape and len(input_shape) == 2
assert input_shape[0][-1]
output_shape = list(input_shape[0])
output_shape[-1] = self.units
return tuple(output_shape)
def get_config(self):
config = {
'units': self.units,
'num_sampled': self.num_sampled,
'kernel_initializer': initializers.serialize(self.kernel_initializer),
'bias_initializer': initializers.serialize(self.bias_initializer),
'kernel_regularizer': regularizers.serialize(self.kernel_regularizer),
'bias_regularizer': regularizers.serialize(self.bias_regularizer),
'activity_regularizer': regularizers.serialize(self.activity_regularizer),
'kernel_constraint': constraints.serialize(self.kernel_constraint),
'bias_constraint': constraints.serialize(self.bias_constraint)
}
base_config = super(Sampling, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def fill_zipf(length, num_classes, num_true=1):
data_onehot = np.zeros((length, num_classes), dtype='float32')
data_labels = np.zeros((length, num_true), dtype='int32')
# all indexes outside of num_classes scattered in existing space
rand = np.random.zipf(1.3, length * num_true) % num_classes
for i in range(length):
for j in range(num_true):
k = rand[i]
data_onehot[i][k] = 1.0
data_labels[i][j] = k
return data_onehot, data_labels
# number of test samples
num_train = 32*500
num_test = 32*500
num_valid = 100
num_epochs = 5
num_hidden = 10
# number of classes
num_classes = 2000
# number of samples for NCE
num_sampled = 24
# number of labels
num_true = 1
# type of negative sampler
sampler_type='sampled_softmax'
inputs = Input(shape=(num_classes,))
target = Input(shape=(num_true,), dtype=tf.int32) # sparse format, e.g. [1, 3, 2, 6, ...]
net = Dense(num_classes)(inputs)
net = Dense(num_hidden, activation='relu')(net)
net = Sampling(units=num_classes, num_sampled=num_sampled, type=sampler_type)([net, target])
model = Model(inputs=[inputs, target], outputs=net)
model.compile(optimizer='adam', loss=None, metrics=['binary_crossentropy'])
model.summary()
train_input, train_output = fill_zipf(num_train, num_classes, num_true)
valid_input, valid_output = fill_zipf(num_valid, num_classes, num_true)
history = model.fit([train_input, train_output], None,
validation_data=([valid_input, valid_output], None),
epochs=num_epochs, verbose=2)
test_input, test_output = fill_zipf(num_test, num_classes, num_true)
predicts = model.predict([test_input, test_output], batch_size=32)
count = 0
for test in range(num_test):
pred = predicts[test]
imax = np.argmax(pred)
if imax == test_output[test]:
count += 1
print("Found {0} out of {1}".format(count/num_true, num_test))
This test works for the single-label case, both 'nce' and 'sampled_softmax'. But, when I set num_true to greater than one, both NCE and Sampled Softmax throw a tensor mismatch exception.
num_true=3
width=2000
sampler_type='sampled_softmax'
With these parameters, for Sampled Softmax, the code throws this exception trace:
File "postable_sampling_tests.py", line 220, in <module>
epochs=num_epochs, verbose=2)
File "/opt/ds/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/opt/ds/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/opt/ds/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/opt/ds/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/opt/ds/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1399, in __call__
run_metadata_ptr)
File "/opt/ds/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 526, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must be broadcastable: logits_size=[32,2000] labels_size=[96,2000]
[[{{node sampling_1/softmax_cross_entropy_with_logits}} = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _class=["loc:#train...s_grad/mul"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](sampling_1/BiasAdd_1, sampling_1/softmax_cross_entropy_with_logits/Reshape_1)]]
32 is the batch_size. Clearly, something is num_true * batch_size but I don't know how to fix this.
If we change the sampler to NCE:
num_true=3
width=2000
sampler_type='nce'
The final two lines of the exception stack:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [32,2000] vs. [3,2000]
[[{{node sampling_1/logistic_loss/mul}} = Mul[T=DT_FLOAT, _class=["loc:#training/Adam/gradients/sampling_1/logistic_loss/mul_grad/Reshape"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](sampling_1/BiasAdd_1, sampling_1/strided_slice_2)]]
In this case, the labels have not been multiplied by batch_size.
What am I doing wrong? How can I get this wrapper system working for multi-label cases?
You can also use samples softmax with multiple labels, you just have to take the mean of each samples softmax
embeddings = tf.get_variable( 'embeddings',
initializer= tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
softmax_weights = tf.get_variable( 'softmax_weights',
initializer= tf.truncated_normal([vocabulary_size, embedding_size],
stddev=1.0 / math.sqrt(embedding_size)))
softmax_biases = tf.get_variable('softmax_biases',
initializer= tf.zeros([vocabulary_size]), trainable=False )
embed = tf.nn.embedding_lookup(embeddings, train_dataset) #train data set is
embed_reshaped = tf.reshape( embed, [batch_size*num_inputs, embedding_size] )
segments= np.arange(batch_size).repeat(num_inputs)
averaged_embeds = tf.segment_mean(embed_reshaped, segments, name=None)
loss = tf.reduce_mean(
tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=averaged_embeds,
labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))
optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss) #Original learning rate was 1.0
from
https://github.com/Santosh-Gupta/Research2Vec/blob/master/Research2VecTraining2.ipynb