tensorflow tf.gather_nd do not function - tensorflow

when i use tf.gather_nd, i have some problems,my toy code is like this, when i run it ,it do not give the result
as expected. the code is like this:
x = tf.constant([[0.22, 0.3,0.1,0.11],[0.4,0.5, 0.6,0.99],[0.8, 0.9,0.43,0.21]])
indices = tf.constant([[1,2],[0,1],[2,3]])
b=tf.gather_nd(x, indices)
sess = tf.InteractiveSession()
cc=sess.run([b], feed_dict={})
print cc
when i expect the result is float result, but it got all zeros as result like
[array([ 0., 0., 0.], dtype=float32)]

Try without using tf.constant(),
x = [[0.22, 0.3,0.1,0.11],[0.4,0.5, 0.6,0.99],[0.8, 0.9,0.43,0.21]]
indices = [[1,2],[0,1],[2,3]]
b = tf.gather_nd(x, indices)
sess = tf.Session()
print (sess.run(b))

Related

tf.keras.losses.CategoricalCrossentropy gives different values than plain implementation

Any one knows why raw implementation of Categorical Crossentropy function is so different from the tf.keras's api function?
import tensorflow as tf
import math
tf.enable_eager_execution()
y_true =np.array( [[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
y_pred = np.array([[.9, .05, .05], [.5, .89, .6], [.05, .01, .94]])
ce = tf.keras.losses.CategoricalCrossentropy()
res = ce(y_true, y_pred).numpy()
print("use api:")
print(res)
print()
print("implementation:")
step1 = -y_true * np.log(y_pred )
step2 = np.sum(step1, axis=1)
print("step1.shape:", step1.shape)
print(step1)
print("sum step1:", np.sum(step1, ))
print("mean step1", np.mean(step1))
print()
print("step2.shape:", step2.shape)
print(step2)
print("sum step2:", np.sum(step2, ))
print("mean step2", np.mean(step2))
Above gives:
use api:
0.3239681124687195
implementation:
step1.shape: (3, 3)
[[0.10536052 0. 0. ]
[0. 0.11653382 0. ]
[0. 0. 0.0618754 ]]
sum step1: 0.2837697356318653
mean step1 0.031529970625762814
step2.shape: (3,)
[0.10536052 0.11653382 0.0618754 ]
sum step2: 0.2837697356318653
mean step2 0.09458991187728844
If now with another y_true and y_pred:
y_true = np.array([[0, 1]])
y_pred = np.array([[0.99999999999, 0.00000000001]])
It gives:
use api:
16.11809539794922
implementation:
step1.shape: (1, 2)
[[-0. 25.32843602]]
sum step1: 25.328436022934504
mean step1 12.664218011467252
step2.shape: (1,)
[25.32843602]
sum step2: 25.328436022934504
mean step2 25.328436022934504
The difference is because of these values: [.5, .89, .6], since it's sum is not equal to 1. I think you have made a mistake and you meant this instead: [.05, .89, .06].
If you provide the values with sum equal to 1, then both formulas results will be the same:
import tensorflow as tf
import numpy as np
y_true = np.array( [[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
y_pred = np.array([[.9, .05, .05], [.05, .89, .06], [.05, .01, .94]])
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
print(np.sum(-y_true * np.log(y_pred), axis=1))
#output
#[0.10536052 0.11653382 0.0618754 ]
#[0.10536052 0.11653382 0.0618754 ]
However, let's explore how is calculated if the y_pred tensor is not scaled (the sum of values is not equal to 1)? If you look at the source code of categorical cross entropy here, you will see that it scales y_pred so that the class probas of each sample sum to 1:
if not from_logits:
# scale preds so that the class probas of each sample sum to 1
output /= tf.reduce_sum(output,
reduction_indices=len(output.get_shape()) - 1,
keep_dims=True)
since we passed a pred which the sum of probas is not 1, let's see how this operation changes our tensor [.5, .89, .6]:
output = tf.constant([.5, .89, .6])
output /= tf.reduce_sum(output,
axis=len(output.get_shape()) - 1,
keepdims=True)
print(output.numpy())
# array([0.2512563 , 0.44723618, 0.30150756], dtype=float32)
So, it should be equal if we replace the above operation output (scaled y_pred), and pass it to your own implemented categorical cross entropy, with the unscaled y_pred passing to tensorflow implementation:
y_true =np.array( [[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
#unscaled y_pred
y_pred = np.array([[.9, .05, .05], [.5, .89, .6], [.05, .01, .94]])
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
#scaled y_pred (categorical_crossentropy scales above tensor to this internally)
y_pred = np.array([[.9, .05, .05], [0.2512563 , 0.44723618, 0.30150756], [.05, .01, .94]])
print(np.sum(-y_true * np.log(y_pred), axis=1))
Output:
[0.10536052 0.80466845 0.0618754 ]
[0.10536052 0.80466846 0.0618754 ]
Now, let's explore the results of your second example. Why your second example shows different output?
If you check the source code again, you will see this line:
output = tf.clip_by_value(output, epsilon, 1. - epsilon)
which clips values below than a threshold. Your input [0.99999999999, 0.00000000001] will be converted to [0.9999999, 0.0000001] in this line, so it gives you a different result:
y_true = np.array([[0, 1]])
y_pred = np.array([[0.99999999999, 0.00000000001]])
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
print(np.sum(-y_true * np.log(y_pred), axis=1))
#now let's first clip the values less than epsilon, then compare loss
epsilon=1e-7
y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred).numpy())
print(np.sum(-y_true * np.log(y_pred), axis=1))
Output:
#results without clipping values
[16.11809565]
[25.32843602]
#results after clipping values if there is a value less than epsilon (1e-7)
[16.11809565]
[16.11809565]

How to specify spearman rank correlation as a loss function in keras?

I wanted to write a loss function that maximizes the spearman rank correlation between two vectors in keras. Unfortunately I could not find an existing implementation, nor a good method to calculate the rank of a vector in keras, so that I could use the formula to implement it myself
def rank_correlation(y_true, y_pred):
pass
model = tensorflow.keras.Sequential()
#### More model code
model.compile(loss=rank_correlation)
Can anyone please help me implement rank_correlation ?
You can try something like as follows, referenced.
from scipy.stats import spearmanr
def compute_spearmanr(y, y_pred):
spearsum = 0
cnt = 0
for col in range(y_pred.shape[1]):
v = spearmanr(y_pred[:,col], y[:,col]).correlation
if np.isnan(v):
continue
spearsum += v
cnt += 1
res = spearsum / cnt
return res
a = np.array([[2., 1., 2., 3.],[3., 3., 4., 5.]] )
b = np.array([[1., 0., 0., 3.], [1., 0., 3., 3.]])
compute_spearmanr(a, b)
0.9999999999999999

T5 Encoder model output all zeros?

I am trying out a project where I use the T5EncoderModel from HuggingFace in order to obtain hidden representations of my input sentences. I have 100K sentences which I tokenize and pad as follows:
for sentence in dataset[original]:
sentence = tokenizer(sentence, max_length=40, padding='max_length', return_tensors='tf', truncation= True)
original_sentences.append(sentence.input_ids)
org_mask.append(sentence.attention_mask)
This gives me the right outputs and tokenizes everything decently. The problem I achieve is when I am trying to actually train the model. The setup is a bit complex and is taken from https://keras.io/examples/vision/semantic_image_clustering/ which I am trying to apply to text.
The set-up for training is as follows:
def create_encoder(rep_dim):
encoder = TFT5EncoderModel.from_pretrained('t5-small', output_hidden_states=True)
encoder.trainable = True
original_input = Input(shape=(max_length), name = 'originalIn', dtype=tf.int32)
augmented_input = Input(shape=(max_length), name = 'originalIn', dtype=tf.int32)
concat = keras.layers.Concatenate(axis=1)([original_input, augmented_input])
#Take 0-index because it returns a TFBERTmodel type, and 0 returns a tensor
encoded = encoder(input_ids=concat)[0]
#This outputs shape: [sentences, max_length, encoded_dims]
output = Dense(rep_dim, activation='relu')(encoded)
return encoder
This function is fed into the ReprensentationLearner class from the above link as such:
class RepresentationLearner(keras.Model):
def __init__(
self,
encoder,
projection_units,
temperature=0.8,
dropout_rate=0.1,
l2_normalize=False,
**kwargs
):
super(RepresentationLearner, self).__init__(**kwargs)
self.encoder = encoder
# Create projection head.
self.projector = keras.Sequential(
[
layers.Dropout(dropout_rate),
layers.Dense(units=projection_units, use_bias=False),
layers.BatchNormalization(),
layers.ReLU(),
]
)
self.temperature = temperature
self.l2_normalize = l2_normalize
self.loss_tracker = keras.metrics.Mean(name="loss")
#property
def metrics(self):
return [self.loss_tracker]
def compute_contrastive_loss(self, feature_vectors, batch_size):
num_augmentations = tf.shape(feature_vectors)[0] // batch_size
if self.l2_normalize:
feature_vectors = tf.math.l2_normalize(feature_vectors, -1)
# The logits shape is [num_augmentations * batch_size, num_augmentations * batch_size].
logits = (
tf.linalg.matmul(feature_vectors, feature_vectors, transpose_b=True)
/ self.temperature
)
# Apply log-max trick for numerical stability.
logits_max = tf.math.reduce_max(logits, axis=1)
logits = logits - logits_max
# The shape of targets is [num_augmentations * batch_size, num_augmentations * batch_size].
# targets is a matrix consits of num_augmentations submatrices of shape [batch_size * batch_size].
# Each [batch_size * batch_size] submatrix is an identity matrix (diagonal entries are ones).
targets = tf.tile(tf.eye(batch_size), [num_augmentations, num_augmentations])
# Compute cross entropy loss
return keras.losses.categorical_crossentropy(
y_true=targets, y_pred=logits, from_logits=True
)
def call(self, inputs):
features = self.encoder(inputs[0])[0]
# Apply projection head.
return self.projector(features[0])
def train_step(self, inputs):
batch_size = tf.shape(inputs)[0]
# Run the forward pass and compute the contrastive loss
with tf.GradientTape() as tape:
feature_vectors = self(inputs, training=True)
loss = self.compute_contrastive_loss(feature_vectors, batch_size)
# Compute gradients
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
# Update weights
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
# Update loss tracker metric
self.loss_tracker.update_state(loss)
# Return a dict mapping metric names to current value
return {m.name: m.result() for m in self.metrics}
def test_step(self, inputs):
batch_size = tf.shape(inputs)[0]
feature_vectors = self(inputs, training=False)
loss = self.compute_contrastive_loss(feature_vectors, batch_size)
self.loss_tracker.update_state(loss)
return {"loss": self.loss_tracker.result()}
In order to train it, I use the Colab TPU and train it as such:
with strategy.scope():
encoder = create_encoder(rep_dim)
training_model = RepresentationLearner(encoder=encoder, projection_units=128, temperature=0.1)
lr_scheduler = keras.experimental.CosineDecay(initial_learning_rate=0.001, decay_steps=500, alpha=0.1)
training_model.compile(optimizer=tfa.optimizers.AdamW(learning_rate=lr_scheduler, weight_decay=0.0001))
history = training_model.fit(x = [original_train, augmented_train], batch_size=32*8, epocs = 10)
training_model.save_weights('representation_learner.h5', overwrite=True)
Note that I am giving my model two inputs. When I predict on my input data, I get all zeros, and I can not seem to understand why. I predict as follows:
training_model.load_weights('representation_learner.h5')
feature_vectors= training_model.predict([[original_train, augmented_train]], verbose = 1)
And the output is:
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
With a way too large shape of (1000000, 128)

tf how to restore two variables from the same variable

I have saved a model and now I am trying to restore it in two branches, like this:
I wrote this code, and it raises ValueError: The same saveable will be restored with two names.
How do I restore two variables from the same variable?
restore_variables = {}
for varr in tf.global_variables()
if varr.op.name in checkpoint_variables:
restore_variables[varr.op.name.split("_red")[0]] = varr
restore_variables[varr.op.name.split("_blue")[0]] = varr
init_saver = tf.train.Saver(restore_variables, max_to_keep=0)
Tested on TF 1.15
Basically the error is saying that it's finding multiple references to the same variable in the restore_variables dict. The fix is simple. Create a copy of your variable using tf.Variable(varr) as follows for one of the references.
I think it's safe to assume that you're not looking for multiple references to the same variable here, rather two separate variables. (I'm assuming this because, if you want to use the same variable multiple times, you can just use the single variable multiple times).
with tf.Session() as sess:
saver.restore(sess, './vars/vars.ckpt-0')
restore_variables = {}
checkpoint_variables=['b']
for varr in tf.global_variables():
if varr.op.name in checkpoint_variables:
restore_variables[varr.op.name.split("_red")[0]] = varr
restore_variables[varr.op.name.split("_blue")[0]] = tf.Variable(varr)
print(restore_variables)
init_saver = tf.train.Saver(restore_variables, max_to_keep=0)
Below you can find the full code to replicate the issue using a toy example. Essentially, we have two variables a and b and out of that, we are creating b_red and b_blue variables.
# Saving the variables
import tensorflow as tf
import numpy as np
a = tf.placeholder(shape=[None, 3], dtype=tf.float64)
w1 = tf.Variable(np.random.normal(size=[3,2]), name='a')
out = tf.matmul(a, w1)
w2 = tf.Variable(np.random.normal(size=[2,3]), name='b')
out = tf.matmul(out, w2)
saver = tf.train.Saver([w1, w2])
with tf.Session() as sess:
tf.global_variables_initializer().run()
saved_path = saver.save(sess, './vars/vars.ckpt', global_step=0)
# Restoring the variables
with tf.Session() as sess:
saver.restore(sess, './vars/vars.ckpt-0')
restore_variables = {}
checkpoint_variables=['b']
for varr in tf.global_variables():
if varr.op.name in checkpoint_variables:
restore_variables[varr.op.name+"_red"] = varr
# Fixing the issue: Instead of varr, do tf.Variable(varr)
restore_variables[varr.op.name+"_blue"] = varr
print(restore_variables)
init_saver = tf.train.Saver(restore_variables, max_to_keep=0)
I may not be understanding the problem correctly, but can't you just make two saver objects? Something like this:
import tensorflow as tf
# Make checkpoint
with tf.Graph().as_default(), tf.Session() as sess:
a = tf.Variable([1., 2.], name='a')
sess.run(a.initializer)
b = tf.Variable([3., 4., 5.], name='b')
sess.run(b.initializer)
saver = tf.train.Saver([a, b])
saver.save(sess, 'tmp/vars.ckpt')
# Restore checkpoint
with tf.Graph().as_default(), tf.Session() as sess:
# Red
a_red = tf.Variable([0., 0.], name='a_red')
b_red = tf.Variable([0., 0., 0.], name='b_red')
saver_red = tf.train.Saver({'a': a_red, 'b': b_red})
saver_red.restore(sess, 'tmp1/vars.ckpt')
print(a_red.eval())
# [1. 2.]
print(b_red.eval())
# [3. 4. 5.]
# Blue
a_blue = tf.Variable([0., 0.], name='a_blue')
b_blue = tf.Variable([0., 0., 0.], name='b_blue')
saver_blue = tf.train.Saver({'a': a_blue, 'b': b_blue})
saver_blue.restore(sess, 'tmp/vars.ckpt')
print(a_blue.eval())
# [1. 2.]
print(b_blue.eval())
# [3. 4. 5.]

numpy divide along axis

Is there a numpy function to divide an array along an axis with elements from another array? For example, suppose I have an array a with shape (l,m,n) and an array b with shape (m,); I'm looking for something equivalent to:
def divide_along_axis(a,b,axis=None):
if axis is None:
return a/b
c = a.copy()
for i, x in enumerate(c.swapaxes(0,axis)):
x /= b[i]
return c
For example, this is useful when normalizing an array of vectors:
>>> a = np.random.randn(4,3)
array([[ 1.03116167, -0.60862215, -0.29191449],
[-1.27040355, 1.9943905 , 1.13515384],
[-0.47916874, 0.05495749, -0.58450632],
[ 2.08792161, -1.35591814, -0.9900364 ]])
>>> np.apply_along_axis(np.linalg.norm,1,a)
array([ 1.23244853, 2.62299312, 0.75780647, 2.67919815])
>>> c = divide_along_axis(a,np.apply_along_axis(np.linalg.norm,1,a),0)
>>> np.apply_along_axis(np.linalg.norm,1,c)
array([ 1., 1., 1., 1.])
For the specific example you've given: dividing an (l,m,n) array by (m,) you can use np.newaxis:
a = np.arange(1,61, dtype=float).reshape((3,4,5)) # Create a 3d array
a.shape # (3,4,5)
b = np.array([1.0, 2.0, 3.0, 4.0]) # Create a 1-d array
b.shape # (4,)
a / b # Gives a ValueError
a / b[:, np.newaxis] # The result you want
You can read all about the broadcasting rules here. You can also use newaxis more than once if required. (e.g. to divide a shape (3,4,5,6) array by a shape (3,5) array).
From my understanding of the docs, using newaxis + broadcasting avoids also any unecessary array copying.
Indexing, newaxis etc are described more fully here now. (Documentation reorganised since this answer first posted).
I think you can get this behavior with numpy's usual broadcasting behavior:
In [9]: a = np.array([[1., 2.], [3., 4.]])
In [10]: a / np.sum(a, axis=0)
Out[10]:
array([[ 0.25 , 0.33333333],
[ 0.75 , 0.66666667]])
If i've interpreted correctly.
If you want the other axis you could transpose everything:
> a = np.random.randn(4,3).transpose()
> norms = np.apply_along_axis(np.linalg.norm,0,a)
> c = a / norms
> np.apply_along_axis(np.linalg.norm,0,c)
array([ 1., 1., 1., 1.])