How to train model with two kids functions for object detection? - tensorflow

I'm trying to implement the model described by Professor Andrew Ng for object detection (explanation starts at 10:00).
He describes the first element of the output vector as the probability that an object was detected, followed by the coordinates of the bounding box of the object matched (when one is matched). The last part of the output vector is a softmax of all the classes your model knows.
As he explains it, using a simple squared error for the case when there is a detection is fine, and just the squares difference of y^[0] - y[0]. I get that this is a naive approach. I'm just wanting to implement this for the learning experience.
My questions
How do I implement this conditional loss in tensorflow?
How do I handle this conditional about y^[0] when dealing with a batch.

How do I implement this conditional loss in tensorflow?
You can convert the loss function to:
Error = mask[0]*(y^[0]-y[0])**2 + mask[1]*(y^[1]-y[1])**2 ... mask[n]*(y^[n]-y[n])**2),
where mask = [1, 1,...1] for y[0] = 1 and [1, 0, ...0] for y[0] = 0
How do I handle this conditional about y^[0] when dealing with a
batch.
For a batch, you can construct the mask on the fly like:
mask = tf.concat([tf.ones((tf.shape(y)[0],1)),y[:,0][...,None]*y[:,1:]], axis=1)
Code:
y_hat_n = np.array([[3, 3, 3, 3], [3,3,3,3]])
y_1 = np.array([[1, 1, 1, 1], [1,1,1,1]])
y_0 = np.array([[0, 1, 1, 1], [0,1,1,1]])
y = tf.placeholder(tf.float32,[None, 4])
y_hat = tf.placeholder(tf.float32,[None, 4])
mask = tf.concat([tf.ones((tf.shape(y)[0],1)),y[:,0][...,None]*y[:,1:]], axis=1)
error = tf.losses.mean_squared_error(mask*y, mask*y_hat)
with tf.Session() as sess:
print(sess.run([mask,error], {y:y_0, y_hat:y_hat_n}))
print(sess.run([mask,error], {y:y_1, y_hat:y_hat_n}))
# Mask and error
#[array([[1., 0., 0., 0.],
# [1., 0., 0., 0.]], dtype=float32), 2.25]
#[array([[1., 1., 1., 1.],
# [1., 1., 1., 1.]], dtype=float32), 4.0]

Related

How does the output layer of this network which has 10 nodes correspond to an integer?

ffnn = Sequential([
Flatten(input_shape=X_train.shape[1:]),
Dense(512, activation='relu'),
Dropout(0.2),
Dense(512, activation='relu'),
Dropout(0.2),
Dense(10, activation='softmax')
])
ffnn_history = ffnn.fit(X_train,
y_train,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2,
callbacks=[checkpointer, early_stopping],
verbose=1,
shuffle=True)
ffnn_accuracy = ffnn.evaluate(X_test, y_test, verbose=0)[1]
These codes are from https://github.com/stefan-jansen/machine-learning-for-trading/blob/main/18_convolutional_neural_nets/02_digit_classification_with_lenet5.ipynb.
I understand this network and how softmax function works. My question is, the output layer has 10 nodes. The output should be a vector of length 10 (the sum of the vector is 1). How does it matches the label y where y is an integer in the training and evaluating process (shouldn't it transform the output vector to the corresponding integer first)?
Does tensorflow automatically interpret the length-10 output vector to the corresponding integer or what?
In your case the labels are one-hot encoded by the loss function sparse_categorical_crossentropy():
>>> y_true = [1, 2]
>>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
>>> tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred).numpy()
array([0.05129344, 2.3025851 ], dtype=float32)
The output softmax(x) can be interpreted as a probability distribution (Σ softmax(x) = 1.0). So e.g. argmax(softmax(x)) = id_maxprob is going to return you the index of the most probable class.
Hence, your target vector for your neural network is going to be 10-dimensional such that each integer [0, 1, .., 8, 9] corresponds to one node of the softmax-output.
With that being said, the target vector you're trying to predict is simply going to be one-hot encoded:
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0] # == 0
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0] # == 1
..
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1] # == 9
In other words: If you have a batch of images and feed it to your network, the output will be (n, num_classes) (here num_classes is 10) and it is you who is going to do the final interpretation of the output e.g. by using np.argmax in order to get your final predictions.
predictions = model(images)
predicted_ids = np.argmax(predictions, axis=1)
# Print each index == predicted integer
print(predicted_ids)
Also, note the following example:
>>> tf.one_hot([1, 2, 9], depth=10)
<tf.Tensor: shape=(3, 10), dtype=float32, numpy=
array([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]], dtype=float32)>

keras custom metrics for multi-label classification without all()

I'm using sigmoid and binary_crossentropy for multi-label classification. A very similar question asked here. And the following custom metric was suggested:
from keras import backend as K
def full_multi_label_metric(y_true, y_pred):
comp = K.equal(y_true, K.round(y_pred))
return K.cast(K.all(comp, axis=-1), K.floatx())
But I do not want to use all() because for one single sample with a true label of [1, 0, 0, 1, 1] and a predicted label of [0, 0, 0, 1, 1] I do not consider the prediction accuracy as zero (due to the the fact that the labels for the last four classes have been predicted correctly).
Here is my model:
# expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential()
model.add(Masking(mask_value=-9999, input_shape=(197, 203)))
model.add(LSTM(512, return_sequences=True))
model.add(Dense(20, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer=SGD(lr=1e-3, decay=1e-4, momentum=0.9, nesterov=True),
metrics = ['accuracy'])
print(model.summary())
Here is my y_pred for one example:
pred = model.predict(X_test)
y_pred = pred[0,196,:]
y_pred
array([2.6081860e-01, 9.9079555e-01, 1.4816311e-01, 8.6009043e-01,
2.6759505e-04, 3.0792636e-01, 2.6738405e-02, 8.5339689e-01,
5.1105350e-02, 1.5427300e-01, 6.7039116e-05, 1.7909735e-02,
6.4140558e-04, 3.5133284e-01, 5.3054303e-02, 1.2765944e-01,
2.9298663e-04, 6.3041472e-01, 5.8620870e-03, 5.9656668e-01],
dtype=float32)
Here is my y_true for one example:
y_true = Y_test[0,0,:]
y_true
array([1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0.,
0., 0., 1.])
My question is: How can I set a Keras custom metric function so that each element in y_pred should be compared to the each element in y_true, then an accuracy measure will be given during training? I want to use this metric in metrics = [X])?
Unless I'm mistaken the default binary_crossentropy metric/loss already does what you need. Taking your example
import tensorflow as tf
from tensorflow import keras
y_true = tf.constant([[1, 0, 0, 1, 1]], dtype=tf.int32)
y_pred = tf.constant([[0.6, 0, 0, 1, 1]], dtype=tf.float32)
m = keras.metrics.binary_crossentropy(y_true, y_pred)
m.numpy()
The output is [-log(0.6) / 5].
i.e. The metric/loss for the batch takes into account the losses for each of the 20 outputs of the model. Which I'm assuming represent time steps.
As a metric, it is much more common to use binary_accuracy.
Example:
y_true = tf.constant([[1, 0, 0, 1, 1]], dtype=tf.int32)
y_pred = tf.constant([[0.1, 0, 0, 1, 1]], dtype=tf.float32)
keras.metrics.binary_accuracy(tf.cast(y_true, tf.float32), y_pred)
One can get a better idea of the performance of the model via an ROC metric (https://www.tensorflow.org/api_docs/python/tf/keras/metrics/AUC) which measures the curve at various thresholds see an explanation at https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5. Personally, I tend to use an accuracy metric while training and look at the precision/recall curve after the model is trained in order to check that it behaves as expected and select the prediction threshold.

How to shift a tensor like pandas.shift in tensorflow / keras? (Without shift the last row to first row, like tf.roll)

I want to shift a tensor in a given axis. It's easy to do this in pandas or numpy. Like this:
import numpy as np
import pandas as pd
data = np.arange(0, 6).reshape(-1, 2)
pd.DataFrame(data).shift(1).fillna(0).values
Output is:
array([[0., 0.],
[0., 1.],
[2., 3.]])
But in tensorflow, the closest solution I found is tf.roll. But it shift the last row to the first row. (I don't want that). So I have to use something like
tf.roll + tf.slice(remove the last row) + tf.concat(add tf.zeros to the first row).
It's really ugly.
Is there a better way to handle shift in tensorflow or keras?
Thanks.
I think I find a better way for this problem.
We could use tf.roll, then apply tf.math.multiply to set the first row to zeros.
Sample code is as follows:
Original tensor:
A = tf.cast(tf.reshape(tf.range(27), (-1, 3, 3)), dtype=tf.float32)
A
Output:
<tf.Tensor: id=117, shape=(3, 3, 3), dtype=float32, numpy=
array([[[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]],
[[ 9., 10., 11.],
[12., 13., 14.],
[15., 16., 17.]],
[[18., 19., 20.],
[21., 22., 23.],
[24., 25., 26.]]], dtype=float32)>
Shift (like pd.shift):
B = tf.concat((tf.zeros((1, 3)), tf.ones((2, 3))), axis=0)
C = tf.expand_dims(B, axis=0)
tf.math.multiply(tf.roll(A, 1, axis=1), C)
Output:
<tf.Tensor: id=128, shape=(3, 3, 3), dtype=float32, numpy=
array([[[ 0., 0., 0.],
[ 0., 1., 2.],
[ 3., 4., 5.]],
[[ 0., 0., 0.],
[ 9., 10., 11.],
[12., 13., 14.]],
[[ 0., 0., 0.],
[18., 19., 20.],
[21., 22., 23.]]], dtype=float32)>
Try this:
import tensorflow as tf
input = tf.constant([[0, 1, 3], [4, 5, 6], [7, 8, 9]])
shifted_0dim = input[1:]
shifted_1dim = input[:, 1:]
shifted2 = input[2:]
Generalizing the accepted answer to arbitrary tensor shapes, desired shift, and axis to shift:
import tensorflow as tf
def tf_shift(tensor, shift=1, axis=0):
dim = len(tensor.shape)
if axis > dim:
raise ValueError(
f'Value of axis ({axis}) must be <= number of tensor axes ({dim})'
)
mask_dim = dim - axis
mask_shape = tensor.shape[-mask_dim:]
zero_dim = min(shift, mask_shape[0])
mask = tf.concat(
[tf.zeros(tf.TensorShape(zero_dim) + mask_shape[1:]),
tf.ones(tf.TensorShape(mask_shape[0] - zero_dim) + mask_shape[1:])],
axis=0
)
for i in range(dim - mask_dim):
mask = tf.expand_dims(mask, axis=0)
return tf.multiply(
tf.roll(tensor, shift, axis),
mask
)
EDIT:
This code above doesn't allow for negative shift values, and is pretty slow. Here is a more efficient version utilizing tf.roll and tf.concat without creating a mask and multiplying the tensor of interest by it.
import tensorflow as tf
def tf_shift(values: tf.Tensor, shift: int = 1, axis: int = 0):
pad = tf.zeros([val if i != axis else abs(shift) for i, val in enumerate(values.shape)],
dtype=values.dtype)
size = [-1 if i != axis else val - abs(shift) for i, val in enumerate(values.shape)]
if shift > 0:
shifted = tf.concat(
[pad, tf.slice(values, [0] * len(values.shape), size)],
axis=axis
)
elif shift < 0:
shifted = tf.concat(
[tf.slice(values, [0 if i != axis else abs(shift) for i, _ in enumerate(values.shape)], size), pad],
axis=axis
)
else:
shifted = values
return shifted
Assuming a 2d tensor, this function should mimic a Dataframe shift:
def shift_tensor(tensor, periods, fill_value):
num_row = len(tensor)
num_col = len(tensor[0])
pad = tf.fill([periods, num_col], fill_value)
if periods > 0:
shifted_tensor = tf.concat((pad, tensor[:(num_row - periods), :]), axis=0)
else:
shifted_tensor = tf.concat((tensor[:(num_row - periods), :], pad), axis=0)
return shifted_tensor

How do I create a binary (0 or 1 valued) tensor according to known index in tensorflow?

input: length(placeholder), index(1D tensor)
output: 0-1 1D tensor
example: length 5, index [0,1,3], output tensor should be [1,1,0,1,0]
I have tried scatter_add, which requires Variable which requires known shape, and embedding_lookup from matrix with [length, length], which is not effective when length is large.
Any ideas?
Try tf.sparse_to_dense:
output_size = tf.placeholder(tf.int32, [1])
index = tf.constant([0, 1, 3])
ones = tf.ones([tf.size(index)])
result = tf.sparse_to_dense(index, output_size, ones)
with tf.Session() as sess:
sess.run(result, feed_dict={output_size: [5]})
Outputs: array([ 1., 1., 0., 1., 0.], dtype=float32)

how does tensorflow indexing work

I'm having trouble understanding a basic concept with tensorflow. How does indexing work for tensor read/write operations? In order to make this specific, how can the following numpy examples be translated to tensorflow (using tensors for the arrays, indices and values being assigned):
x = np.zeros((3, 4))
row_indices = np.array([1, 1, 2])
col_indices = np.array([0, 2, 3])
x[row_indices, col_indices] = 2
x
with output:
array([[ 0., 0., 0., 0.],
[ 2., 0., 2., 0.],
[ 0., 0., 0., 2.]])
... and ...
x[row_indices, col_indices] = np.array([5, 4, 3])
x
with output:
array([[ 0., 0., 0., 0.],
[ 5., 0., 4., 0.],
[ 0., 0., 0., 3.]])
... and finally ...
y = x[row_indices, col_indices]
y
with output:
array([ 5., 4., 3.])
There's github issue #206 to support this nicely, meanwhile you have to resort to verbose work-arounds
The first example can be done with tf.select that combines two same-shaped tensors by selecting each element from one or the other
tf.reset_default_graph()
row_indices = tf.constant([1, 1, 2])
col_indices = tf.constant([0, 2, 3])
x = tf.zeros((3, 4))
sess = tf.InteractiveSession()
# get list of ((row1, col1), (row2, col2), ..)
coords = tf.transpose(tf.pack([row_indices, col_indices]))
# get tensor with 1's at positions (row1, col1),...
binary_mask = tf.sparse_to_dense(coords, x.get_shape(), 1)
# convert 1/0 to True/False
binary_mask = tf.cast(binary_mask, tf.bool)
twos = 2*tf.ones(x.get_shape())
# make new x out of old values or 2, depending on mask
x = tf.select(binary_mask, twos, x)
print x.eval()
gives
[[ 0. 0. 0. 0.]
[ 2. 0. 2. 0.]
[ 0. 0. 0. 2.]]
The second one could be done with scatter_update, except scatter_update only supports on linear indices and works on variables. So you could create a temporary variable and use reshaping like this. (to avoid variables you could use dynamic_stitch, see the end)
# get linear indices
linear_indices = row_indices*x.get_shape()[1]+col_indices
# turn 'x' into 1d variable since "scatter_update" supports linear indexing only
x_flat = tf.Variable(tf.reshape(x, [-1]))
# no automatic promotion, so make updates float32 to match x
updates = tf.constant([5, 4, 3], dtype=tf.float32)
sess.run(tf.initialize_all_variables())
sess.run(tf.scatter_update(x_flat, linear_indices, updates))
# convert back into original shape
x = tf.reshape(x_flat, x.get_shape())
print x.eval()
gives
[[ 0. 0. 0. 0.]
[ 5. 0. 4. 0.]
[ 0. 0. 0. 3.]]
Finally the third example is already supported with gather_nd, you write
print tf.gather_nd(x, coords).eval()
To get
[ 5. 4. 3.]
Edit, May 6
The update x[cols,rows]=newvals can be done without using Variables (which occupy memory between session run calls) by using select with sparse_to_dense that takes vector of sparse values, or relying on dynamic_stitch
sess = tf.InteractiveSession()
x = tf.zeros((3, 4))
row_indices = tf.constant([1, 1, 2])
col_indices = tf.constant([0, 2, 3])
# no automatic promotion, so specify float type
replacement_vals = tf.constant([5, 4, 3], dtype=tf.float32)
# convert to linear indexing in row-major form
linear_indices = row_indices*x.get_shape()[1]+col_indices
x_flat = tf.reshape(x, [-1])
# use dynamic stitch, it merges the array by taking value either
# from array1[index1] or array2[index2], if indices conflict,
# the later one is used
unchanged_indices = tf.range(tf.size(x_flat))
changed_indices = linear_indices
x_flat = tf.dynamic_stitch([unchanged_indices, changed_indices],
[x_flat, replacement_vals])
x = tf.reshape(x_flat, x.get_shape())
print x.eval()