Stateful RNN (LSTM) in keras - tensorflow

imagin the following the data:
X = [x1, x2, x3, x4, x5, x6, ...]
and
Y = [y1, y2, y3, y4, ...]
the label represent the input in the following manner:
[x1,x2] -> y1
[x2,x3] -> y2
.
.
.
I am trying to make a model in using keras, so that when the classification takes place, the model remembers what it classified the previous stage to be, and make it causal as in the next prediction is directly dependent on the previous one, somewhat similar to other methods like HMM. So something like this:
Y2 = f( [x2,x3] , y1)
I have read this page, where they divide each batch into sub-batches (if that's the correct term?) and reset state between each main batch, but what I want to do is not shuffle the batches and introduce that causality into the model.
My question is how can you do this with stateful LSTMs?

One way is to do custom layer inherits from the LSTM class
[ Sample ]:
import tensorflow as tf
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Class / Definition
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
class MyLSTMLayer( tf.keras.layers.LSTM ):
def __init__(self, units, return_sequences, return_state):
super(MyLSTMLayer, self).__init__( units, return_sequences=True, return_state=False )
self.num_units = units
def build(self, input_shape):
self.kernel = self.add_weight("kernel",
shape=[int(input_shape[-1]),
self.num_units])
def call(self, inputs):
return tf.matmul(inputs, self.kernel)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
start = 3
limit = 12
delta = 3
sample = tf.range( start, limit, delta )
sample = tf.cast( sample, dtype=tf.float32 )
sample = tf.constant( sample, shape=( sample.shape[0], 1, 1 ) )
layer = MyLSTMLayer( sample.shape[0], True, False )
print( sample )
print( layer(sample) )
[ Output ]:
tf.Tensor(
[[[3.]]
[[6.]]
[[9.]]], shape=(3, 1, 1), dtype=float32)
tf.Tensor(
[[[-1.8635211 2.6157026 -1.6650987]]
[[-3.7270422 5.2314053 -3.3301973]]
[[-5.5905633 7.8471084 -4.995296 ]]], shape=(3, 1, 3), dtype=float32)

Related

Gradient = nan when using DenseVariational layer

I work with binary data (named mimic) and i want to do a bayesian model to reproduce this data. To do so, i define this model :
def prior(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
prior_model = tf.keras.Sequential(
[
tfp.layers.DistributionLambda(
lambda t: tfp.distributions.MultivariateNormalDiag(
loc=tf.zeros(n), scale_diag=tf.ones(n)
)
)
]
)
return prior_model
def posterior(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
posterior_model = tf.keras.Sequential(
[
tfp.layers.VariableLayer(
tfp.layers.MultivariateNormalTriL.params_size(n), dtype=dtype
),
tfp.layers.MultivariateNormalTriL(n),
]
)
return posterior_model
model = tf.keras.Sequential([
tfkl.Input(shape=(), name='dummy_input'),
tfpl.DistributionLambda(lambda t:
latentNormal,
convert_to_tensor_fn=lambda x : x.sample(batchSize)
),
tfp.layers.DenseVariational(units=inputDim,make_prior_fn=prior,make_posterior_fn=posterior,activation="sigmoid",use_bias=False),
tfpl.DistributionLambda(lambda t:
tfd.Bernoulli(probs=t)
)
])
Then i train the model :
negloglik = lambda data: -model(69).log_prob(data)
optimizer = tf.keras.optimizers.Adam()
loo=[]
kls = []
for epoch in trange(100):
# print(epoch)
# model.fit(mimic[:1453*32], mimic[:1453*32], epochs=1, batch_size=batchSize, verbose=0)
idx = np.random.choice(np.arange(mimic.shape[0]), size=3*batchSize, replace=False)
shuffled_ds = mimic.numpy()[idx]
for nBatch in range(3):
# print(nBatch)
batch = shuffled_ds[nBatch*batchSize:(1+nBatch)*batchSize]
with tf.GradientTape() as tape:
tape.watch(model.trainable_variables)
loss = negloglik(batch)
loo.append(loss)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
kl = tf.reduce_mean(tfd.kl_divergence(model(0), real_dist))
kls.append(kl.numpy())
More precisely, when i run for 1 epoch and one batch, my model is full of nan. enter image description here and the gradient is also nan enter image description here
Do you have any idea how can i solve that please ?
I tried to replace the VariationalDense by a Denser layer and everything works well. I don't get why DenseVariational is a problem here.

Model with normalized binary cross entropy loss does not converge

I'm trying to implement normalized binary cross entropy for a classification task following this paper: Normalized Loss Functions for Deep Learning with Noisy Labels.
The math is as follows:
Here is my implementation:
import tensorflow as tf
from keras.utils import losses_utils
class NormalizedBinaryCrossentropy(tf.keras.losses.Loss):
def __init__(
self,
from_logits=False,
label_smoothing=0.0,
axis=-1,
reduction=tf.keras.losses.Reduction.NONE,
name="normalized_binary_crossentropy",
**kwargs
):
super().__init__(
reduction=reduction, name=name
)
self.from_logits = from_logits
self._epsilon = tf.keras.backend.epsilon()
def call(self, target, logits):
if tf.is_tensor(logits) and tf.is_tensor(target):
logits, target = losses_utils.squeeze_or_expand_dimensions(
logits, target
)
logits = tf.convert_to_tensor(logits)
target = tf.cast(target, logits.dtype)
if self.from_logits:
logits = tf.math.sigmoid(logits)
logits = tf.clip_by_value(logits, self._epsilon, 1.0 - self._epsilon)
numer = target * tf.math.log(logits) + (1 - target) * tf.math.log(1 - logits)
denom = - (tf.math.log(logits) + tf.math.log(1 - logits))
return - numer / denom
def get_config(self):
config = super().get_config()
config.update({"from_logits": self._from_logits})
return config
I'm using this loss to train a binary classifier (CTR predictor), but loss of the model does not decrease and ROC-AUC remains at ~0.49-0.5. To verify the implementation of numerator, I tried training by removing the denominator and it's working fine.
# Example Usage
labels = np.array([[0], [1], [0], [0], [0]]).astype(np.int64)
logits = np.array([[-1.024], [2.506], [1.43], [0.004], [-2.0]]).astype(np.float64)
tf_nce = NormalizedBinaryCrossentropy(
reduction=tf.keras.losses.Reduction.NONE,
from_logits=True
)
tf_nce(labels, logits)
#<tf.Tensor: shape=(5, 1), dtype=float64, numpy=
# array([[0.18737159],
# [0.02945536],
# [0.88459308],
# [0.50144269],
# [0.05631594]])>
I checked manually with some extremes and that loss doesn't hit nans or 0s.
Can anyone help me in debugging why the model is not able to converge on this loss? Is there something wrong with my understanding of the loss function or implementation?
Edit 1: Model architecture is a Multi-Gate Mixture-of-Experts with 6 tasks. All 6 tasks are binary classification and losses from all tasks are added together to get final loss.
One thing which is mentioned in the paper as described above is that the Norm of the loss should be inclusively in between [0 ~ 1] but as your loss is violating this condition of Normalized Binary Cross Entropy and the other reason is you are dividing by the wrong denominator, you have to divide it by the Cross-Entropy of your logits for this take the BinaryCrossEntropy() of your logits. so, these can be the reasons that your function is not decreasing... I have made some changes to your code that satisfy this Norm Property...
import tensorflow as tf
from keras.utils import losses_utils
class NormalizedBinaryCrossentropy(tf.keras.losses.Loss):
def __init__(
self,
from_logits=False,
label_smoothing=0.0,
axis=-1,
reduction=tf.keras.losses.Reduction.NONE,
name="normalized_binary_crossentropy",
**kwargs
):
super().__init__(
reduction=reduction, name=name
)
self.from_logits = from_logits
self._epsilon = tf.keras.backend.epsilon()
def call(self, target, logits):
if tf.is_tensor(logits) and tf.is_tensor(target):
logits, target = losses_utils.squeeze_or_expand_dimensions(
logits, target
)
logits = tf.convert_to_tensor(logits)
target = tf.cast(target, logits.dtype)
logits = tf.clip_by_value(logits, self._epsilon, 1.0 - self._epsilon)
if self.from_logits:
numer = tf.keras.losses.binary_crossentropy(target, logits,from_logits=True)[:,tf.newaxis]
denom = -( tf.math.log(logits) + tf.math.log(1 - logits))
return numer * denom / tf.reduce_sum(denom)
else:
logits = tf.nn.log_softmax(logits)
num = - tf.math.reduce_sum(tf.multiply(target, logits), axis=1)
denom = -tf.math.reduce_sum(logits, axis=1)
return num / denom
def get_config(self):
config = super().get_config()
config.update({"from_logits": self._from_logits})
return config
I have updated the solution, there are two ways for computing the BCE if your logits are one-hot then set from_logit=False else set it True.
I would try to avoid log-Sigmoid stability issues and try to implement the above model as a 2 class problem with Softmax Binary cross entropy..
The NormalizedCrossEntropy is defined as:
class NormalizedCrossEntropy(keras.layers.Layer):
def __init__(self, num_classes):
super(NormalizedCrossEntropy, self).__init__()
self.num_classes = num_classes
def call(self, pred, labels):
pred = tf.nn.log_softmax(pred, axis=1,)
label_one_hot = tf.one_hot(labels, self.num_classes)
numer = -1 * tf.reduce_sum(label_one_hot * pred, axis=1)
denom = -1* tf.reduce_sum(pred, axis=1)
nce = numer/ denom
return nce
Example usage:
NormalizedCrossEntropy(num_classes=2)(np.array([[-1.024, 0.5], [0.1, 2.506], [1, .0], [0., 1.], [-0.89, -2.0]]), np.array([0, 1, 0, 0, 0]) )
#array([0.89725673, 0.03348167, 0.19259584, 0.80740416, 0.16958274]

tf.reshape(self.normalized_price(prce), (-1, 1)), ValueError: Shape must be rank 1 but is rank 2

I am getting the following error when I am calling the subclass of the model. My guess is that I am not passing the two parameters correctly or the reshape is not outputting the correct value.
ValueError: Shape must be rank 1 but is rank 2 for '{{node base_stock_model/concat}} = ConcatV2[N=3, T=DT_FLOAT, Tidx=DT_INT32](base_stock_model/sequential_2/embedding_2/embedding_lookup/Identity_1, base_stock_model/sequential_3/embedding_3/embedding_lookup/Identity_1, base_stock_model/Reshape, base_stock_model/concat/axis)' with input shapes: [32], [32], [1,1], [].
Here is the main class model
class StockModel(tfrs.models.Model):
def __init__(self, rating_weight: float, retrieval_weight: float) -> None:
super().__init__()
embedding_dimension = 32
self.user_model= UserModel()
self.stock_model= base_stockModel()
self.rating_model = tf.keras.Sequential([
tf.keras.layers.Dense(256, activation="relu"),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(1),
])
# The tasks.
self.rating_task: tf.keras.layers.Layer = tfrs.tasks.Ranking(
loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.RootMeanSquaredError()],
)
self.retrieval_task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=stocks.batch(1).map(self.stock_model)
)
)
# The loss weights.
self.rating_weight = rating_weight
self.retrieval_weight = retrieval_weight
def call(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:
user_embeddings = self.user_model(features['username'])
# np.array([features["name"],features["price"]])
price=tf.as_string(features["price"])
stock_embeddings = self.stock_model([features["name"],price])
return (
user_embeddings,
stock_embeddings,
self.rating_model(
tf.concat([user_embeddings, stock_embeddings], axis=1)
),
)
def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
ratings = features.pop("Rating")
print("features",features)
user_embeddings, stock_embeddings, rating_predictions = self(features)
# We compute the loss for each task.
rating_loss = self.rating_task(
labels=ratings,
predictions=rating_predictions,
)
retrieval_loss = self.retrieval_task(user_embeddings, stock_embeddings)
# And combine them using the loss weights.
return (self.rating_weight * rating_loss
+ self.retrieval_weight * retrieval_loss)
The above main class model calls the base_stockModel class, which is causing errors.
class base_stockModel(tf.keras.Model):
def __init__(self):
super().__init__()
embedding_dimension=32
self.stock_embedding = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=unique_stock_titles, mask_token=None),
tf.keras.layers.Embedding(len(unique_stock_titles) + 1, embedding_dimension)
])
self.price_embedding=tf.keras.Sequential([
tf.keras.layers.Discretization(prices_bucket.tolist()),
tf.keras.layers.Embedding(len(prices_bucket)+2,32)
])
self.normalized_price = tf.keras.layers.Normalization(axis=None)
self.normalized_price.adapt(prices)
def call(self,input,*args,**kwargs):
print(input.get_shape(),kwargs)
# print(tf.rank(input),[input[:]],input.get_shape(),input.dtype)
# nme=input[3]
nme=input[0]
prce=input[1]
prce=tf.strings.to_number(input[1],out_type=tf.dtypes.float32)
#print(tf.rank(self.stock_embedding(nme)),tf.rank(self.price_embedding(prce)),tf.rank(tf.reshape(sself.normalized_price(prce), (-1, 1))))
return tf.concat([
self.stock_embedding(nme),
self.price_embedding(prce),
tf.reshape(self.normalized_price(prce), (-1, 1)),
], axis=1)
This code is a variant of tensorflow recommender official page https://www.tensorflow.org/recommenders/examples/multitask/
https://www.tensorflow.org/recommenders/examples/context_features
Any help is much appreciated.
First I analyzed all the ranks of input tensor. If they were not the same rank model wants then we have to use tf.reshape() command or adjust the input to match model's demand. Note that, tf.shape() gives you the shape while running the model.
Here is documentation on it.
https://www.tensorflow.org/api_docs/python/tf/reshape
https://www.tensorflow.org/api_docs/python/tf/shape

Tensorflow 2 custom loss return nan

I have a model, I compile it using binary_crossentropy, the training process goes well, the loss is printed.
model = MyModel()
model.compile(optimizer="adadelta", loss="binary_crossentropy")
data1, data2 = get_random_data(4, 3) # this method return data1:(1000,4),data2:(1000,3)
model.fit([data1, data2], y, batch_size=4)
Then I write a custom loss function, the loss become nan
import tensorflow.keras.backend as K
class MyModel():
...
def batch_loss(self, y_true, y_pred_batch):
bottom = K.sum(K.exp(y_pred_batch))
batch_softmax = K.exp(y_pred_batch) / bottom
batch_log_likelihood = K.log(batch_softmax)
loss = K.sum(batch_log_likelihood)
return loss
model.compile(optimizer="adadelta", loss=model.batch_loss) # change above compile code to this
I use a batch_loss(tf.ones((1,))) to test my loss function, seems it return the correct result.
But when it run together with training, it becomes nan, where should I start to debug?
model and data code (for those who need it to reproduce):
class MyModel(tf.keras.models.Model):
def __init__(self):
super().__init__()
self.t1A = tf.keras.layers.Dense(300, activation='relu', input_dim=1)
self.t1B = tf.keras.layers.Dense(300, activation='relu', input_dim=1)
self.t1v = tf.keras.layers.Dense(128, activation='relu')
self.t2A = tf.keras.layers.Dense(300, activation='relu')
self.t2B = tf.keras.layers.Dense(300, activation='relu')
self.t2v = tf.keras.layers.Dense(128, activation='relu')
self.out = tf.keras.layers.Dot(axes=1)
def call(self, inputs, training=None, mask=None):
u, i = inputs[0], inputs[1]
u = self.t1A(u)
u = self.t1B(u)
u = self.t1v(u)
i = self.t2A(i)
i = self.t2B(i)
i = self.t2v(i)
out = self.out([u, i])
return out
def get_random_data(user_feature_num, item_feature_num):
def get_random_ndarray(data_size, dis_list, feature_num):
data_list = []
for i in range(feature_num):
arr = np.random.randint(dis_list[i], size=data_size)
data_list.append(arr)
data = np.array(data_list)
return np.transpose(data, axes=(1, 0))
uf_dis, if_dis, data_size = [1000, 2, 10, 20], [10000, 50, 60], 1000
y = np.zeros(data_size)
for i in range(int(data_size/10)):
y[i] = 1
return get_random_ndarray(data_size, uf_dis, feature_num=user_feature_num), \
get_random_ndarray(data_size, if_dis, feature_num=item_feature_num), y
The values outputted by your models are quite big. Combined with a call to tf.exp in your function, values quickly grows to nan. You might consider applying an activation function like a sigmoid to keep the values between 0 and 1.
I think your error is caused by calling exp(). This function quickly growing and returns nan.

Using TF Estimator with TFRecord generator

I am trying to create a simple NN that reads in a folder of tfrecords. Each record has a 1024-value 'mean_rgb' vector, and a category label. I am trying to create a simple feed-forward NN that learns the categories based on this feature vector.
def generate(dir, shuffle, batch_size):
def parse(serialized):
features = {
'mean_rgb': tf.FixedLenFeature([1024], tf.float32),
'category': tf.FixedLenFeature([], tf.int64)
}
parsed_example = tf.parse_single_example(serialized=serialized, features=features)
vrv = parsed_example['mean_rgb']
label = parsed_example['category']
d = dict(zip(['mean_rgb'], [vrv])), label
return d
dataset = tf.data.TFRecordDataset(dir).repeat(1)
dataset = dataset.map(parse)
if shuffle:
dataset = dataset.shuffle(8000)
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()
next = iterator.get_next()
print(next)
return next
def batch_generator(dir, shuffle=False, batch_size=64):
sess = K.get_session()
while True:
yield sess.run(generate(dir, shuffle, batch_size))
num_classes = 29
batch_size = 64
yt8m_train = [os.path.join(yt8m_dir_train, x) for x in read_all_file_names(yt8m_dir_train) if '.tfrecord' in x]
yt8m_test = [os.path.join(yt8m_dir_test, x) for x in read_all_file_names(yt8m_dir_test) if '.tfrecord' in x]
feature_columns = [tf.feature_column.numeric_column(k) for k in ['mean_rgb']]
#batch_generator(yt8m_test).__next__()
classifier = tf.estimator.DNNClassifier(
feature_columns=feature_columns,
hidden_units=[1024, 1024],
n_classes=num_classes,
model_dir=model_dir)
classifier.train(
input_fn=lambda: generate(yt8m_train, True, batch_size))
However, I get the following error:
InvalidArgumentError (see above for traceback): Input to reshape is a
tensor with 65536 values, but the requested shape has 64
I am not sure why it sees the input as a 64x1024=65536 vector instead of a (64, 1024) vector. When I print the next item in the generator, I get
({'mean_rgb': <tf.Tensor: id=23, shape=(64, 1024), dtype=float32, numpy=
array([[ 0.9243997 , 0.28990048, -0.4130672 , ..., -0.096692 ,
0.27225342, 0.13346168],
[ 0.5853526 , 0.67050666, -0.24683481, ..., -0.6999033 ,
-0.4100128 , -0.00349384],
[ 0.49572858, 0.5231492 , -0.53445834, ..., 0.0449002 ,
0.10582132, -0.37333965],
...,
[ 0.5776026 , -0.07128889, -0.61762846, ..., 0.22194198,
0.61441416, -0.27355513],
[-0.01848815, 0.20132884, 1.1023484 , ..., 0.06496283,
0.29560333, 0.09157721],
[-0.25877073, -1.9552246 , 0.10309827, ..., 0.22032814,
-0.6812989 , -0.23649289]], dtype=float32)>}
which has the correct (64, 1024) shape
the problem is at how the features_columns works, for example, I had a similar problem and I solved by doing a reshape here is part of my code that will help you understand:
defining the features_column:
feature_columns = {
'images': tf.feature_column.numeric_column('images', self.shape),
}
then to create the input for the model:
with tf.name_scope('input'):
feature_columns = list(self._features_columns().values())
input_layer = tf.feature_column.input_layer(
features=features, feature_columns=feature_columns)
input_layer = tf.reshape(
input_layer,
shape=(-1, self.parameters.size, self.parameters.size,
self.parameters.channels))
if pay attention to the last part I had to reshape the tensor, the -1 is to let Tensorflow figure out the batch size
I believe the issue was that feature_columns = [tf.feature_column.numeric_column(k) for k in ['mean_rgb']] assumes that the column is a scalar - when actually it is a 1024 vector. I had to add shape=1024 to the numeric_column call. Also had to remove existing checkpoint saved model.