Understanding keras.backend.max usage with tf.random_normal - tensorflow

import numpy as np
import tensorflow as tf
from keras import backend as K
sess = tf.InteractiveSession()
box_scores1 = tf.constant([[[ 9.188682, 11.484599 ],
[10.06533, 7.557296 ]],
[[10.099248, 10.591225 ],
[10.592823 , 7.8770704]]])
box_scores2 = tf.random_normal([2,2,2], mean=10, stddev=1, dtype=tf.float32, seed = 1)
box_class_scores1 = K.max(box_scores1, axis=-1)
box_class_scores2 = K.max(box_scores2, axis=-1)
print(box_scores1.eval())
print(box_scores2.eval())
print(box_class_scores1.eval())
print(box_class_scores2.eval())
Output:
[[[ 9.188682 11.484599 ]
[10.06533 7.557296 ]]
[[10.099248 10.591225 ]
[10.592823 7.8770704]]]
[[[ 9.188682 11.484599 ]
[10.06533 7.557296 ]]
[[10.099248 10.591225 ]
[10.592823 7.8770704]]]
[[11.484599 10.06533 ]
[10.591225 10.592823]]
[[10.242094 10.515779]
[12.083789 11.397354]]
As, we can see values in box_scores1 and box_scores2 are same but the result obtained after applying max operation differs. How can the values of box_class_scores1 and box_class_scores2 be different?

Your problem has nothing to do with the max function, but a misunderstanding with tensorflow, as most of its operations are symbolic, so when you use tf.random_mormal, this does not produce random numbers, but a symbolic normal distribution with the given mean and standard distribution.
Then, each time you evaluate this distribution, it generates different outputs, so your first eval looks ok, but the second produces a different output that is given to max, so it produces a different output than just giving a constant vector.

Related

simple element wise with Keras over TF

i am trying to implement the following in TensorFlow:
Input * const
matrix multiplication of 640x800x6
Here is the code
ssValues = np.zeros(shape=(6,640,800),dtype=np.float16)
inputPlaceHolder = tf.compat.v1.placeholder(shape=(6,640,800), name='InputTensor', dtype=tf.dtypes.float16)
inputLayer = tf.keras.Input(shape=(6,640,800,),
batch_size=1,
name='inputLayer',
dtype=tf.dtypes.float16,
tensor=inputPlaceHolder)
ssConstant = tf.constant(ssValues, dtype=tf.dtypes.float16, shape=(6,640,800), name='ss')
ssm = tf.keras.layers.Multiply()([inputPlaceHolder,inputPlaceHolder])
model = tf.keras.models.Model(inputs=inputLayer, outputs=ssm)
input = np.zeros(shape=(6,640,800),dtype=np.float16)
output = model.predict(input)
i get the following error:
ValueError: ('Error when checking model input: expected no data, but got:', array([[[1., 1., 1., ..., 1., 1., 1.],
how to overcome this error and run the predict function ?
why tf.keras.layers.multiply doesn't return a Layer object ?
Your issue comes from the fact that you declared your operation on a v1 placeholder, when it should simply use the inputLayer (which already acts as a placeholder for inputs following the provided specification).
Additionnally, you wrote a multiplication that returns $x \times x$, when I think you wanted $x \times constant$ ; so here would be the code:
inputLayer = tf.keras.Input(shape=(6,640,800,),
batch_size=1,
name='inputLayer',
dtype=tf.dtypes.float16)
ssConstant = tf.constant( # also fixed a shape issue here
ssValues, dtype=tf.dtypes.float16, shape=(1, 6,640,800), name='ss'
)
ssm = tf.keras.layers.Multiply(dtype=tf.dtypes.float16)([inputLayer, ssConstant])
model = tf.keras.models.Model(inputs=inputLayer, outputs=ssm)
inputs = np.zeros(shape=(1,6,640,800), dtype=np.float16)
output = model.predict(inputs)
Furthermore, since this is not an actual model, in the sense that it uses a constant and not learnable weights, you might want to use tf.keras.backend.function instead of tf.keras.Model (but that is really up to you).
Note that the shapes are probably not suited to what you actually want, with the batch-size of 1... Please consider using a batch-size of 6 to remove the useless dimensions.
When you use Input(shape) you have a placeholder already. It doesn't make sense to create a placeholder to the pass it to Input(tensor=placeholder) because this is not how Keras works.
You must:
inputs = Input(shape=(6,640,800))
ssm_tensor = Multiply()([inputs, inputs])
model = Model(inputs, ssm)
Since you always have a batch size with Keras:
input = np.zeros(shape=(1,6,640,800))

Is it possible to write a custom loss function that is based on the difference of sample outputs in a batch in Keras?

I am trying to implement a loss function in Keras, which can do the following:
Suppose y0, y1, ..., yn is the model batch output for a batch input x0, x1, ..., xn, here batch_size is n+1, the output yi for each xi is a scalar value, what I want the loss function to calculate the whole loss for this batch as following:
K.log(K.sigmoid(y1-y0))+K.log(K.sigmoid(y2-y1))+...+K.log(K.sigmoid(yn-yn-1))
I was thinking to use Lambda layer to first convert the batch output[y0,y1,...,yn] to [y1-y0, y2-y1, ...,yn-yn-1], then use a custom loss function on the transformed output.
However, I am not sure if Keras can understand that there is no weight to update in the Lambda layer, and I am not clear how Keras will propagate the gradient back through the Lambda layer, as Keras usually requires each layer/loss function operates on single sample input, but my layer will take the whole output of a batch of samples. Have anyone solved similar issues before? Thanks!
Does the slicing like below work for you (though I am not using keras).
batch = 4
num_classes = 6
logits = tf.random.uniform(shape=[batch, num_classes])
logits1 = tf.slice(logits, (0, 0), [batch, num_classes-1])
logits2 = tf.slice(logits, (0, 1), [batch, num_classes-1])
delta = logits2 - logits1
loss = tf.reduce_sum(tf.log(tf.nn.sigmoid(delta)), axis=-1)
with tf.Session() as sess:
logits, logits1, logits2, delta, loss = sess.run([logits, logits1, logits2,
delta, loss])
print 'logits\n', logits
print 'logits2\n', logits2
print 'logits1\n', logits1
print 'delta\n', delta
print 'loss\n', loss
The result:
logits
[[ 0.61241663 0.70075285 0.98333454 0.4117974 0.5943476 0.84245574]
[ 0.02499413 0.22279179 0.70742595 0.34853518 0.7837007 0.88074362]
[ 0.35030317 0.36670768 0.64244425 0.87957716 0.22823489 0.45076978]
[ 0.38116801 0.39040041 0.82510674 0.64789391 0.45415008 0.03520513]]
logits2
[[ 0.70075285 0.98333454 0.4117974 0.5943476 0.84245574]
[ 0.22279179 0.70742595 0.34853518 0.7837007 0.88074362]
[ 0.36670768 0.64244425 0.87957716 0.22823489 0.45076978]
[ 0.39040041 0.82510674 0.64789391 0.45415008 0.03520513]]
logits1
[[ 0.61241663 0.70075285 0.98333454 0.4117974 0.5943476 ]
[ 0.02499413 0.22279179 0.70742595 0.34853518 0.7837007 ]
[ 0.35030317 0.36670768 0.64244425 0.87957716 0.22823489]
[ 0.38116801 0.39040041 0.82510674 0.64789391 0.45415008]]
delta
[[ 0.08833623 0.28258169 -0.57153714 0.18255019 0.24810815]
[ 0.19779766 0.48463416 -0.35889077 0.43516552 0.09704292]
[ 0.01640451 0.27573657 0.23713291 -0.65134227 0.22253489]
[ 0.0092324 0.43470633 -0.17721283 -0.19374382 -0.41894495]]
loss
[-3.41376281 -3.11249781 -3.49031925 -3.69255161]

Possible tensorflow cholesky_solve inconsistency?

I am trying to solve a linear system of equations using tensorflow.cholesky_solve and I'm getting some unexpected results.
I wrote a script to compare the output of a very simple linear system with simple matrix inversion a la tensorflow.matrix_inverse, the non-cholesky based matrix equation solver tensorflow.matrix_solve, and tensorflow.cholesky_solve.
According to my understanding of the docs I've linked, these three cases should all yield a solution of the identity matrix divided by 2, but this is not the case for tensorflow.cholesky_solve. Perhaps I'm misunderstanding the docs?
import tensorflow as tf
I = tf.eye(2, dtype=tf.float32)
X = 2 * tf.eye(2, dtype=tf.float32)
X_inv = tf.matrix_inverse(X)
X_solve = tf.matrix_solve(X, I)
X_chol_solve = tf.cholesky_solve(tf.cholesky(X), I)
with tf.Session() as sess:
for x in [X_inv, X_solve, X_chol_solve]:
print('{}:\n{}'.format(x.name, sess.run(x)))
print
yielding output:
MatrixInverse:0:
[[ 0.5 0. ]
[ 0. 0.5]]
MatrixSolve:0:
[[ 0.5 0. ]
[ 0. 0.5]]
cholesky_solve/MatrixTriangularSolve_1:0:
[[ 1. 0.]
[ 0. 1.]]
Process finished with exit code 0
I think it's a bug. Notice how the result doesn't even depend on the RHS, unless RHS = 0, in which case you get nan instead of 0. Please report it on GitHub.

How to use tf.cond for batch processing

I want to use tf.cond(pred, fn1, fn2, name=None) for conditional branching. Let say I have two tensors: x, y. Each tensor is a batch of 0/1 and I want to use this tensors compression x < y as the source for
tf.cond pred argument:
pred: A scalar determining whether to return the result of fn1 or fn2.
But if I am working with batches then it looks like I need to iterate over the source tensor inside the graph and make slices for every item in batch and apply tf.cond for every item. Looks suspiciously as for me. Why tf.cond not accept batch and only scalar? Can you advise what is the right way to use it with batch?
tf.where sounds like what you want: a vectorized selection between Tensors.
tf.cond is a control flow modifier: it determines which ops are executed, and so it's difficult to think of useful batch semantics.
We can also put together a mixture of these operations: an operation which slices based on a condition and passes those slices to two branches.
import tensorflow as tf
from tensorflow.python.util import nest
def slicing_where(condition, full_input, true_branch, false_branch):
"""Split `full_input` between `true_branch` and `false_branch` on `condition`.
Args:
condition: A boolean Tensor with shape [B_1, ..., B_N].
full_input: A Tensor or nested tuple of Tensors of any dtype, each with
shape [B_1, ..., B_N, ...], to be split between `true_branch` and
`false_branch` based on `condition`.
true_branch: A function taking a single argument, that argument having the
same structure and number of batch dimensions as `full_input`. Receives
slices of `full_input` corresponding to the True entries of
`condition`. Returns a Tensor or nested tuple of Tensors, each with batch
dimensions matching its inputs.
false_branch: Like `true_branch`, but receives inputs corresponding to the
false elements of `condition`. Returns a Tensor or nested tuple of Tensors
(with the same structure as the return value of `true_branch`), but with
batch dimensions matching its inputs.
Returns:
Interleaved outputs from `true_branch` and `false_branch`, each Tensor
having shape [B_1, ..., B_N, ...].
"""
full_input_flat = nest.flatten(full_input)
true_indices = tf.where(condition)
false_indices = tf.where(tf.logical_not(condition))
true_branch_inputs = nest.pack_sequence_as(
structure=full_input,
flat_sequence=[tf.gather_nd(params=input_tensor, indices=true_indices)
for input_tensor in full_input_flat])
false_branch_inputs = nest.pack_sequence_as(
structure=full_input,
flat_sequence=[tf.gather_nd(params=input_tensor, indices=false_indices)
for input_tensor in full_input_flat])
true_outputs = true_branch(true_branch_inputs)
false_outputs = false_branch(false_branch_inputs)
nest.assert_same_structure(true_outputs, false_outputs)
def scatter_outputs(true_output, false_output):
batch_shape = tf.shape(condition)
scattered_shape = tf.concat(
[batch_shape, tf.shape(true_output)[tf.rank(batch_shape):]],
0)
true_scatter = tf.scatter_nd(
indices=tf.cast(true_indices, tf.int32),
updates=true_output,
shape=scattered_shape)
false_scatter = tf.scatter_nd(
indices=tf.cast(false_indices, tf.int32),
updates=false_output,
shape=scattered_shape)
return true_scatter + false_scatter
result = nest.pack_sequence_as(
structure=true_outputs,
flat_sequence=[
scatter_outputs(true_single_output, false_single_output)
for true_single_output, false_single_output
in zip(nest.flatten(true_outputs), nest.flatten(false_outputs))])
return result
Some examples:
vector_test = slicing_where(
condition=tf.equal(tf.range(10) % 2, 0),
full_input=tf.range(10, dtype=tf.float32),
true_branch=lambda x: 0.2 + x,
false_branch=lambda x: 0.1 + x)
cross_range = (tf.range(10, dtype=tf.float32)[:, None]
* tf.range(10, dtype=tf.float32)[None, :])
matrix_test = slicing_where(
condition=tf.equal(tf.range(10) % 3, 0),
full_input=cross_range,
true_branch=lambda x: -x,
false_branch=lambda x: x + 0.1)
with tf.Session():
print(vector_test.eval())
print(matrix_test.eval())
Prints:
[ 0.2 1.10000002 2.20000005 3.0999999 4.19999981 5.0999999
6.19999981 7.0999999 8.19999981 9.10000038]
[[ 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
[ 0.1 1.10000002 2.0999999 3.0999999 4.0999999
5.0999999 6.0999999 7.0999999 8.10000038 9.10000038]
[ 0.1 2.0999999 4.0999999 6.0999999 8.10000038
10.10000038 12.10000038 14.10000038 16.10000038 18.10000038]
[ 0. -3. -6. -9. -12. -15.
-18. -21. -24. -27. ]
[ 0.1 4.0999999 8.10000038 12.10000038 16.10000038
20.10000038 24.10000038 28.10000038 32.09999847 36.09999847]
[ 0.1 5.0999999 10.10000038 15.10000038 20.10000038
25.10000038 30.10000038 35.09999847 40.09999847 45.09999847]
[ 0. -6. -12. -18. -24. -30.
-36. -42. -48. -54. ]
[ 0.1 7.0999999 14.10000038 21.10000038 28.10000038
35.09999847 42.09999847 49.09999847 56.09999847 63.09999847]
[ 0.1 8.10000038 16.10000038 24.10000038 32.09999847
40.09999847 48.09999847 56.09999847 64.09999847 72.09999847]
[ 0. -9. -18. -27. -36. -45.
-54. -63. -72. -81. ]]

How to work with probabilistic classification with scikit learn SVC's?

Firstly my data looks like this:
label|instances(sentences)
5 |1190
4 |839
3 |239
2 |204
1 |127
Then I cross validated:
from sklearn import cross_validation
kf = cross_validation.KFold(n=len(y),n_folds=10)
for train_index, test_index in kf:
print "\nTRAIN:\n", train_index, "\n TEST:\n", test_index
X_train, X_test = X_combined_features[train_index], X_combined_features[test_index]
y_train, y_test = y[train_index], y[test_index]
From the documentation I know that probabilistic metrics can be turned on as follows:
svm = SVC(probability=True)
I would like to work with probabilistic classification and SVMs, so let's assume that I read the data, then I do the following:
from sklearn.svm import SVC
svm = SVC(kernel='linear', probability=True)
svm.fit(reduced_training_matrix, y)
output_proba = svm.predict_proba(reduced_testing_matrix)
print output_proba
Then I got this:
[[ 0.06351278 0.05312154 0.07709772 ..., 0.41958171 0.00076087
0.00076095]
[ 0.05813505 0.05373973 0.08617775 ..., 0.47467149 0.00082695
0.00082701]
[ 0.05576647 0.04756668 0.08216568 ..., 0.47984425 0.00077685
0.00077693]
...,
[ 0.05983482 0.03972051 0.07636607 ..., 0.4853006 0.00070774
0.00070783]
[ 0.05813505 0.05373973 0.08617775 ..., 0.47467149 0.00082695
0.00082701]
[ 0.05989075 0.04822012 0.07795987 ..., 0.48084117 0.00073095
0.00073101]]
Several questions arised from the above excercise: What is that array output (i.e. what does it mean?), Am I doing things in the right way?... If not, how should I need to proceed in order to use probabilistic classification with SVC?.
Update:
vector_of_probabilities_for_sample= reduced_training_matrix[j,:]
print vector_of_probabilities_for_sample.toarray()
[[ 0. 0. 0. 0. 0. 0.]]
probability_of_corresponding_class = reduced_training_matrix[j,:]
print probability_of_corresponding_class.toarray()
[[ 0. 0. 0. 0. 0. 0.]]
What is that array output
Probability of some label for each corresponding sample from reduced_training_matrix. Every i-th column here - probability of corresponding class svm.classes_[i], every j-th row - vector of probabilities for sample reduced_training_matrix[j,:]. Obviously sum of each row equals to 1.
Am I doing things in the right way?
Yes.