I need to write a custom loss for my keras model. As I need to write the function using Keras functions for auto-backpropagation, I am not sure how I will implement this, as this might require some looping operations -
Target[1*300] - [...0 0 0 1 0 0 0 0 0 1 0 0 0...]
Output[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]
What I need is that while calculating loss I don't need a exact match.
Even if my output has a discrepancy of +/- three places. I want it to mark it to consider this as a correct prediction.
For example, both of these should be considered as the right predictions -
Output[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]
Output[1*300] - [...0 1 0 0 0 0 0 0 0 0 0 1 0...]
The code which I have written till now -
import tensorflow as tf
tar = tf.placeholder(tf.float32, shape=(1, 10))
tar_unpacked = tf.unstack(tar)
pred = tf.placeholder(tf.float32, shape=(1, 10))
pred_unpacked = tf.unstack(pred)
for t in tar_unpacked:
result_tensor = tf.equal(t,1)
tar_ind = tf.where(result_tensor)
with tf.Session() as sess:
print(sess.run([tar_ind], feed_dict={tar:np.asarray([[0, 0,1, 0,0,0,1,0,0,0]]),pred:np.asarray([[0, 0,1, 0,0,0,1,0,0,0]])}))
Now what I want to do next is generate valid indexes by adding each from
[-3,-2,-1,0,1,2,3]
to elements in tar_ind and then compare the indexes with pred_unstacked.
My naive loss would be 1 - (NUM_MATCHED/TOTAL)
But the problem is that tar_ind is a variably sized tensor, and I cannot loop over it for the next operation.
Update-1.
As suggested by #user36624, I tried the alternate approach of having tf.py_func which gives the updated y_pred and then I used the updated ones for binary cross-entropy.
As I have implemented the function using py_func, It is giving me error as ValueError: An operation hasNonefor the gradient. Please make sure that all of your ops have a gradient defined (i.e., are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Also as he suggested that I need to manually stop gradients which I don't know how to do?
def specificity_loss_wrapper():
def specificity_loss(y_true, y_pred):
y_pred = tf.py_func(best_match,[y_true,y_pred],(tf.float32))
y_pred = tf.stop_gradient(y_pred)
y_pred.set_shape(y_true.get_shape())
return K.binary_crossentropy(y_true, y_pred)
return specificity_loss
spec_loss = specificity_loss_wrapper()
and
...
model.compile(loss=spec_loss, optimizer='adam', metrics=['accuracy'])
...
In my understanding, binary_crossentropy should be differentiable.
Thanks
What you are suggesting is to compute
1. offsets = compute_index_offsets( y_true, y_pred )
2. loss = 1 - num(offsets <= 3)/total
I suggest to solve it in an alternative way.
1. y_true_mod = DP_best_match( y_true, y_pred )
2. loss = 1 - num(y_true_mod==y_pred)/total
The advantage of modifying y_true is that it is equivalent to providing a new target value, and thus it is not a part of the model graph optimization or the loss computation.
What DP_best_match( y_true, y_pred ) should do is to modify y_true according to y_pred,
e.g. given
y_true[1*300] - [...0 0 0 1 0 0 0 0 0 1 0 0 0...]
y_pred[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]
then DP_best_match( y_true, y_pred ) should give the new target
y_true_mod[1*300] - [...0 0 1 0 0 0 0 0 0 0 1 0 0...]
Note, DP_best_match( y_true, y_pred ) is aiming to modify y_true to best match y_pred, so it is deterministic and nothing to optimize. Thus, no need to have backpropagation. This means you need to manually stop gradients if you implement DP_best_match( y_true, y_pred ) in tf. Otherwise, you can implement it in numpy and wrap the function via tf.py_func, which might be easier to implement.
Final remark, you should make sure the proposed loss function makes sense. For me, it makes more sense to use binary_crossentropy or mse after finding the best y_true_mod.
Related
I've been back and forth with this for ages, but without being able to find a solution so far anywhere. So, I have a HuggingFace model ('bert-base-cased') that I'm using with TensorFlow and a custom dataset. I've: (1) tokenized my data (2) split the data; (3) converted the data to TF dataset format; (4) instantiated, compiled and fit the model.
During training, it behaves as you'd expect: training and validation accuracy go up. But when I evaluate the model on the test dataset using TF's model.evaluate and model.predict, the results are very different. The accuracy as reported by model.evaluate is higher (and more or less in line with the validation accuracy); the accuracy as reported by model.predict is about 10% lower. (Maybe it's just a coincidence, but it's similar to the reported training accuracy after the single epoch of fine-tuning.)
Can anyone figure out what's causing this? I include snippets of my code below.
# tokenize the dataset
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="bert-base-cased",use_fast=False)
def tokenize_function(examples):
return tokenizer(examples['text'], padding = "max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
# splitting dataset
trainSize = 0.7
valTestSize = 1 - trainSize
train_testvalid = tokenized_datasets.train_test_split(test_size=valTestSize,stratify_by_column='class')
valid_test = train_testvalid['test'].train_test_split(test_size=0.5,stratify_by_column='class')
# renaming each of the datasets for convenience
train_set = train_testvalid['train']
val_set = valid_test['train']
test_set = valid_test['test']
# converting the tokenized datasets to TensorFlow datasets
data_collator = DefaultDataCollator(return_tensors="tf")
tf_train_dataset = train_set.to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
label_cols=['class'],
shuffle=True,
collate_fn=data_collator,
batch_size=8)
tf_validation_dataset = val_set.to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
label_cols=['class'],
shuffle=False,
collate_fn=data_collator,
batch_size=8)
tf_test_dataset = test_set.to_tf_dataset(
columns=["attention_mask", "input_ids", "token_type_ids"],
label_cols=['class'],
shuffle=False,
collate_fn=data_collator,
batch_size=8)
# loading tensorflow model
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=1)
# compiling the model
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=5e-6),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=tf.metrics.BinaryAccuracy())
# fitting model
history = model.fit(tf_train_dataset,
validation_data=tf_validation_dataset,
epochs=1)
# Evaluating the model on the test data using `evaluate`
results = model.evaluate(x=tf_test_dataset,verbose=2) # reports binary_accuracy: 0.9152
# first attempt at using model.predict method
hits = 0
misses = 0
for x, y in tf_test_dataset:
logits = tf.keras.backend.get_value(model(x, training=False).logits)
labels = tf.keras.backend.get_value(y)
for i in range(len(logits)):
if logits[i][0] < 0:
z = 0
else:
z = 1
if z == labels[i]:
hits += 1
else:
misses += 1
print(hits/(hits+misses)) # reports binary_accuracy: 0.8187
# second attempt at using model.predict method
modelPredictions = model.predict(tf_test_dataset).logits
testDataLabels = np.concatenate([y for x, y in tf_test_dataset], axis=0)
hits = 0
misses = 0
for i in range(len(modelPredictions)):
if modelPredictions[i][0] >= 0:
z = 1
else:
z = 0
if z == testDataLabels[i]:
hits += 1
else:
misses += 1
print(hits/(hits+misses)) # reports binary_accuracy: 0.8187
Things I've tried include:
different loss functions (it's a binary classification problem with the label column of the dataset filled with either a zero or a one for each row);
different ways of unpacking the test dataset and feeding it to model.predict;
altering the 'num_labels' parameter between 1 and 2.
I fixed the problem by changing the num_labels parameter to two and the loss function to sparse categorical cross entropy. (I then had to change my model.predict loop by taking the argmax of the two logits produced by the model.)
I have followed the tutorial available at: https://www.tensorflow.org/quantum/tutorials/mnist. I have modified this tutorial to the simplest example I could think of: an input set in which x increases linearly from 0 to 1 and y = x < 0.3. I then use a PQC with a single Rx gate with a symbol, and a readout using a Z gate.
When retrieving the optimized symbol and adjusting it manually, I can easily find a value that provides 100% accuracy, but when I let the Adam optimizer run, it converges to either always predict 1 or always predict -1. Does anybody spot what I do wrong? (and I apologize for not being able to break down the code to a smaller example)
import tensorflow as tf
import tensorflow_quantum as tfq
import cirq
import sympy
import numpy as np
# used to embed classical data in quantum circuits
def convert_to_circuit_cont(image):
"""Encode truncated classical image into quantum datapoint."""
values = np.ndarray.flatten(image)
qubits = cirq.GridQubit.rect(4, 1)
circuit = cirq.Circuit()
for i, value in enumerate(values):
if value:
circuit.append(cirq.rx(value).on(qubits[i]))
return circuit
# define classical dataset
length = 1000
np.random.seed(42)
# create a linearly increasing set for x from 0 to 1 in 1/length steps
x_train_sorted = np.asarray([[x/length] for x in range(0,length)], dtype=np.float32)
# p is used to shuffle x and y similarly
p = np.random.permutation(len(x_train_sorted))
x_train = x_train_sorted[p]
# y = x < 0.3 in {-1, 1} for Hinge loss
y_train_sorted = np.asarray([1 if (x/length)<0.30 else -1 for x in range(0,length)])
y_train = y_train_sorted[p]
# test == train for this example
x_test = x_train_sorted[:]
y_test = y_train_sorted[:]
# convert classical data into quantum circuits
x_train_circ = [convert_to_circuit_cont(x) for x in x_train]
x_test_circ = [convert_to_circuit_cont(x) for x in x_test]
x_train_tfcirc = tfq.convert_to_tensor(x_train_circ)
x_test_tfcirc = tfq.convert_to_tensor(x_test_circ)
# define the PQC circuit, consisting out of 1 qubit with 1 gate (Rx) and 1 parameter
def create_quantum_model():
data_qubits = cirq.GridQubit.rect(1, 1)
circuit = cirq.Circuit()
a = sympy.Symbol("a")
circuit.append(cirq.rx(a).on(data_qubits[0])),
return circuit, cirq.Z(data_qubits[0])
model_circuit, model_readout = create_quantum_model()
# Build the Keras model.
model = tf.keras.Sequential([
# The input is the data-circuit, encoded as a tf.string
tf.keras.layers.Input(shape=(), dtype=tf.string),
# The PQC layer returns the expected value of the readout gate, range [-1,1].
tfq.layers.PQC(model_circuit, model_readout),
])
# used for logging progress during optimization
def hinge_accuracy(y_true, y_pred):
y_true = tf.squeeze(y_true) > 0.0
y_pred = tf.squeeze(y_pred) > 0.0
result = tf.cast(y_true == y_pred, tf.float32)
return tf.reduce_mean(result)
# compile the model with Hinge loss and Adam, as done in the example. Have tried with various learning_rates
model.compile(
loss = tf.keras.losses.Hinge(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.1),
metrics=[hinge_accuracy])
EPOCHS = 20
BATCH_SIZE = 32
NUM_EXAMPLES = 1000
# fit the model
qnn_history = model.fit(
x_train_tfcirc, y_train,
batch_size=32,
epochs=EPOCHS,
verbose=1,
validation_data=(x_test_tfcirc, y_test),
use_multiprocessing=False)
results = model.predict(x_test_tfcirc)
results_mapped = [-1 if x<=0 else 1 for x in results[:,0]]
print(np.sum(np.equal(results_mapped, y_test)))
After 20 epochs of optimization, I get the following:
1000/1000 [==============================] - 0s 410us/sample - loss: 0.5589 - hinge_accuracy: 0.6982 - val_loss: 0.5530 - val_hinge_accuracy: 0.7070
This results in 700 samples out of 1000 predicted correctly. When looking at the mapped results, this is because all results are predicted as -1. When looking at the raw results, they linearly increase from -0.5484014 to -0.99996257.
When retrieving the weight with w = model.layers[0].get_weights(), subtracting 0.8, and setting it again with model.layers[0].set_weights(w), I get 920/1000 correct. Fine-tuning this process allows me to achieve 1000/1000.
Update 1:
I have also printed the update of the weight over the various epochs:
4.916246, 4.242602, 3.3765688, 2.6855211, 2.3405066, 2.206207, 2.1734586, 2.1656137, 2.1510274, 2.1634471, 2.1683235, 2.188944, 2.1510284, 2.1591303, 2.1632445, 2.1542525, 2.1677444, 2.1702878, 2.163104, 2.1635907
I set the weight to 1.36, a value which gives 908/1000 (as opposed to 700/100). The optimizer moves away from it:
1.7992111, 2.0727847, 2.1370323, 2.15711, 2.1686404, 2.1603785, 2.183334, 2.1563332, 2.156857, 2.169908, 2.1658351, 2.170673, 2.1575692, 2.1505954, 2.1561477, 2.1754034, 2.1545155, 2.1635509, 2.1464484, 2.1707492
One thing that I noticed is that the value for the hinge accuracy was 0.75 with the weight 1.36, which is higher than the 0.7 for 2.17. If this is the case, I am either in an unlucky part of the optimization landscape where the global minimum does not correspond to the minimum of the loss landscape, or the loss value is determined incorrectly. This is what I will be investigating next.
The minima of the Hinge loss function for this examples does not correspond with the maxima of number of correctly classified examples. Please see plot of these w.r.t. the value of the parameter. Given that the optimizer works towards the minima of the loss, not the maxima of the number of classified examples, the code (and framework/optimizer) do what they are supposed to do. Alternatively, one could use a different loss function to try to find a better fit. For example binarized l1 loss. This function would have the same global optimum, but would likely have a very flat landscape.
I got a problem when using tf.gradients to compute gradient.
my x is a tf.constant() of a vector v of shape (4, 1)
and my y is the sigmoid of v, also of shape (4, 1), so the gradient of y with respect to x should be a diagonal matrix of shape (4, 4).
My code:
c = tf.constant(sigmoid(x_0#w_0))
d = tf.constant(x_0#w_0)
Omega = tf.gradients(c, d)
_Omega = sess.run(Omega)
the error is
Fetch argument None has invalid type .
In addition, I think using tf.gradients might be wrong, there may be some other functions that can compute this.
My question:
point out where I am wrong and how to fix it using tf.gradients
or using another function.
Edit:
want to compute the derivative like this: see the vector_by_vector section https://en.wikipedia.org/wiki/Matrix_calculus#Vector-by-vector
and the result Omega would look like the following:
[[s1(1-s1) 0 0 0 ]
[0 s2(1-s2) 0 0 ]
[0 0 s3(1-s3) 0 ]
[0 0 0 s4(1-s4)]]
where si = sigmoid(x_0i#w_0), where x_0i is the ith row of x_0.
Generally, compute a vector over another vector, should be a matrix.
First of all, you can't calculate gradients for constants. You'll get None op for gradients. That's the reason for your error. One way to calculate gradients would be tf graph (see the code below) Or other way could be using tf.GradientTape in Eager execution mode:
import tensorflow as tf
import numpy as np
arr = np.random.rand(4, 1)
ip = tf.Variable(initial_value=arr)
sess = tf.Session()
c_var = tf.math.sigmoid(ip)
Omega = tf.gradients(c_var, ip)
sess.run(tf.global_variables_initializer())
_Omega = sess.run(Omega)
print(_Omega)
Now, you can pass any sized vector. Still, not sure how you will get (4, 4) diagonal matrix for the gradients.
I am training a neural network for classification. In the context of my research, I would like to zero out the (k) highest losses in each minibatch. I couldn't figure out a simple way to perform this procedure, without relying on numpy at some level.
I have tried the following procedure :
1. Compute the argmax indices of the losses array -- It returns a tf Tensor
2. Slice the losses tensor with the indices array
The issue is that the slicing couldn't be performed using a tf Tensor.
# losses is tf.Tensor
ind_sorted = tf.argsort(losses)
losses_sorted = losses[ind_sorted] # Error mentioned above
# The issue is that ind_1_sorted depends on the output of the neural network. I couldn't find an equivalent of the detach method in pytorch
k_smallest_losses = losses_sorted[:k] # Keeping only the k smallest losses
loss = tf.sum(k_smallest_losses) # Performing the summation of the k smallest losses
Probably you want to use tf.nn.top_k, which returns you both the values and indices of top_k items. (Note to get smallest losses, I add a negative to your loss and convert them back when done).
batch = 2
max_len = 6
losses = tf.random.uniform(shape=[batch, max_len], minval=0, maxval=2, dtype = tf.float32)
bottom_losses_values, bottom_losses_indices = tf.nn.top_k(-losses, k=3)
total = tf.reduce_sum(-bottom_losses_values, axis=-1)
with tf.Session() as sess:
losses, bottom_losses_values, bottom_losses_indices, total = sess.run([losses, bottom_losses_values, bottom_losses_indices, total])
print 'original losses\n', losses
print 'bottom 3 loss values\n', -bottom_losses_values
print 'bottom 3 loss indices\n', bottom_losses_indices
print 'total\n', total
Results:
original losses
[[ 1.45301318 1.65069246 1.31003475 1.71488905 1.71400714 0.0543921 ]
[ 0.09954047 0.12081003 0.24793792 1.51561213 1.73758292 1.43859148]]
bottom 3 loss values
[[ 0.0543921 1.31003475 1.45301318]
[ 0.09954047 0.12081003 0.24793792]]
bottom 3 loss indices
[[5 2 0]
[0 1 2]]
total
[ 2.81744003 0.46828842]
Thanks for your help tensorflow community!
I have a question regarding understanding and visualizing the output of the estimator's evaluate function.
I have a DNNClassifier and have trained it on data with 10 output ranges predictions can go into.
After training and running
accuracy = classifier.evaluate(input_fn = test_input_fn)['accuracy']
I see my accuracy as 33.8%. Which who knows how good that is. (Probably not good)
How can I see the output of each of the comparisons?
As the test_data is ran I would like to see what the estimate is, and what the actual value is. Basically a side by side of y and y'.
something like: [0 0 0 0 0 0 0 0 1] vs [0 0 0 0 0 0 0 0 1 0] 'false'
Rather than just seeing the aggregated overall accuracy.
Thanks!
So in the event that someone reads the question above, and understands what I was trying to do (view the output of predictions), I have a solution.
The solution is to utilize the .predict() method.
A good example is here:
https://www.tensorflow.org/get_started/estimator#classify_new_samples
My code ended up looking like:
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
x = {"x": np.array(predict_set.data)},
num_epochs = 1,
shuffle = False)
predictions = list(classifier.predict(input_fn=predict_input_fn))
print("\n Predictions:")
print(len(predictions))
for p in predictions:
print(int(p['classes'][0]))
which outputs the predictions in a column which I can copy / paste into some spread sheet program to examine my data.