How to use tf.cond for batch processing - tensorflow

I want to use tf.cond(pred, fn1, fn2, name=None) for conditional branching. Let say I have two tensors: x, y. Each tensor is a batch of 0/1 and I want to use this tensors compression x < y as the source for
tf.cond pred argument:
pred: A scalar determining whether to return the result of fn1 or fn2.
But if I am working with batches then it looks like I need to iterate over the source tensor inside the graph and make slices for every item in batch and apply tf.cond for every item. Looks suspiciously as for me. Why tf.cond not accept batch and only scalar? Can you advise what is the right way to use it with batch?

tf.where sounds like what you want: a vectorized selection between Tensors.
tf.cond is a control flow modifier: it determines which ops are executed, and so it's difficult to think of useful batch semantics.
We can also put together a mixture of these operations: an operation which slices based on a condition and passes those slices to two branches.
import tensorflow as tf
from tensorflow.python.util import nest
def slicing_where(condition, full_input, true_branch, false_branch):
"""Split `full_input` between `true_branch` and `false_branch` on `condition`.
Args:
condition: A boolean Tensor with shape [B_1, ..., B_N].
full_input: A Tensor or nested tuple of Tensors of any dtype, each with
shape [B_1, ..., B_N, ...], to be split between `true_branch` and
`false_branch` based on `condition`.
true_branch: A function taking a single argument, that argument having the
same structure and number of batch dimensions as `full_input`. Receives
slices of `full_input` corresponding to the True entries of
`condition`. Returns a Tensor or nested tuple of Tensors, each with batch
dimensions matching its inputs.
false_branch: Like `true_branch`, but receives inputs corresponding to the
false elements of `condition`. Returns a Tensor or nested tuple of Tensors
(with the same structure as the return value of `true_branch`), but with
batch dimensions matching its inputs.
Returns:
Interleaved outputs from `true_branch` and `false_branch`, each Tensor
having shape [B_1, ..., B_N, ...].
"""
full_input_flat = nest.flatten(full_input)
true_indices = tf.where(condition)
false_indices = tf.where(tf.logical_not(condition))
true_branch_inputs = nest.pack_sequence_as(
structure=full_input,
flat_sequence=[tf.gather_nd(params=input_tensor, indices=true_indices)
for input_tensor in full_input_flat])
false_branch_inputs = nest.pack_sequence_as(
structure=full_input,
flat_sequence=[tf.gather_nd(params=input_tensor, indices=false_indices)
for input_tensor in full_input_flat])
true_outputs = true_branch(true_branch_inputs)
false_outputs = false_branch(false_branch_inputs)
nest.assert_same_structure(true_outputs, false_outputs)
def scatter_outputs(true_output, false_output):
batch_shape = tf.shape(condition)
scattered_shape = tf.concat(
[batch_shape, tf.shape(true_output)[tf.rank(batch_shape):]],
0)
true_scatter = tf.scatter_nd(
indices=tf.cast(true_indices, tf.int32),
updates=true_output,
shape=scattered_shape)
false_scatter = tf.scatter_nd(
indices=tf.cast(false_indices, tf.int32),
updates=false_output,
shape=scattered_shape)
return true_scatter + false_scatter
result = nest.pack_sequence_as(
structure=true_outputs,
flat_sequence=[
scatter_outputs(true_single_output, false_single_output)
for true_single_output, false_single_output
in zip(nest.flatten(true_outputs), nest.flatten(false_outputs))])
return result
Some examples:
vector_test = slicing_where(
condition=tf.equal(tf.range(10) % 2, 0),
full_input=tf.range(10, dtype=tf.float32),
true_branch=lambda x: 0.2 + x,
false_branch=lambda x: 0.1 + x)
cross_range = (tf.range(10, dtype=tf.float32)[:, None]
* tf.range(10, dtype=tf.float32)[None, :])
matrix_test = slicing_where(
condition=tf.equal(tf.range(10) % 3, 0),
full_input=cross_range,
true_branch=lambda x: -x,
false_branch=lambda x: x + 0.1)
with tf.Session():
print(vector_test.eval())
print(matrix_test.eval())
Prints:
[ 0.2 1.10000002 2.20000005 3.0999999 4.19999981 5.0999999
6.19999981 7.0999999 8.19999981 9.10000038]
[[ 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
[ 0.1 1.10000002 2.0999999 3.0999999 4.0999999
5.0999999 6.0999999 7.0999999 8.10000038 9.10000038]
[ 0.1 2.0999999 4.0999999 6.0999999 8.10000038
10.10000038 12.10000038 14.10000038 16.10000038 18.10000038]
[ 0. -3. -6. -9. -12. -15.
-18. -21. -24. -27. ]
[ 0.1 4.0999999 8.10000038 12.10000038 16.10000038
20.10000038 24.10000038 28.10000038 32.09999847 36.09999847]
[ 0.1 5.0999999 10.10000038 15.10000038 20.10000038
25.10000038 30.10000038 35.09999847 40.09999847 45.09999847]
[ 0. -6. -12. -18. -24. -30.
-36. -42. -48. -54. ]
[ 0.1 7.0999999 14.10000038 21.10000038 28.10000038
35.09999847 42.09999847 49.09999847 56.09999847 63.09999847]
[ 0.1 8.10000038 16.10000038 24.10000038 32.09999847
40.09999847 48.09999847 56.09999847 64.09999847 72.09999847]
[ 0. -9. -18. -27. -36. -45.
-54. -63. -72. -81. ]]

Related

Tensorflow: How to randomly select elements according to condition without np.where?

I have 3 tensorflow arrays (a, b, valid_entries), which share the first two dimensionalities [T, N, ?]. One of these arrays 'valid_entries' has shape [T,N,1] with boolean values. I want to randomly sample T*M 2-tuples of indices (M < N) such that valid_entries[t,m] == 1 for all of these indices.
In other words, for each time step, I want to randomly select M valid entries from a and b.
I persume that in numpy, this task would be solved by doing the following (let's skip the first dimension T for simplicity):
M = 3
N = 5
valid_entries = [[0],[1],[0],[1],[0]]
valid_indices = np.where(a==1)
valid_indices = np.random.select(valid_indices,np.min(len(valid_indices),M))
a_new = a[valid_indices]
b_new = b[valid_indices]
valid_new = valid_entries[valid_indices]
However, all this needs to happen in Tensorflow.
Thanks a ton in advance for any help!
Here is a function that does that:
import tensorflow as tf
def sample_indices(valid, m, seed=None):
valid = tf.convert_to_tensor(valid)
n = tf.size(valid)
# Flatten boolean tensor
valid_flat = tf.reshape(valid, [n])
# Get flat indices where the tensor is true
valid_idx = tf.boolean_mask(tf.range(n), valid_flat)
# Shuffled valid indices
valid_idx_shuffled = tf.random.shuffle(valid_idx, seed=seed)
# Pick sample from shuffled indices
valid_idx_sample = valid_idx_shuffled[:m]
# Unravel indices
return tf.transpose(tf.unravel_index(valid_idx_sample, tf.shape(valid)))
with tf.Graph().as_default(), tf.Session() as sess:
valid = [[ True, True, False, True],
[False, True, True, False],
[False, True, False, False]]
m = 4
print(sess.run(sample_indices(valid, m, seed=0)))
# [[1 1]
# [1 2]
# [0 1]
# [2 1]]
This sample_indices is generic for any shape of boolean tensor. If in your case valid_entries has shape (T, N, 1) then you will get a tensor with shape (M, 3) as output, although you can ignore the last column since it is always going to be zero (or you can pass tf.squeeze(valid_entries, axis=2) instead).
Note: The last tf.transpose is just to have as output a tensor with shape (sample_size, num_dimensions) instead of the other way around. However, if m is rather big and you don't mind the order of the dimensions, you may skip it to save a bit of time and memory, since (unlike its NumPy counterpart) tf.transpose produces a whole new tensor.

Is it possible to write a custom loss function that is based on the difference of sample outputs in a batch in Keras?

I am trying to implement a loss function in Keras, which can do the following:
Suppose y0, y1, ..., yn is the model batch output for a batch input x0, x1, ..., xn, here batch_size is n+1, the output yi for each xi is a scalar value, what I want the loss function to calculate the whole loss for this batch as following:
K.log(K.sigmoid(y1-y0))+K.log(K.sigmoid(y2-y1))+...+K.log(K.sigmoid(yn-yn-1))
I was thinking to use Lambda layer to first convert the batch output[y0,y1,...,yn] to [y1-y0, y2-y1, ...,yn-yn-1], then use a custom loss function on the transformed output.
However, I am not sure if Keras can understand that there is no weight to update in the Lambda layer, and I am not clear how Keras will propagate the gradient back through the Lambda layer, as Keras usually requires each layer/loss function operates on single sample input, but my layer will take the whole output of a batch of samples. Have anyone solved similar issues before? Thanks!
Does the slicing like below work for you (though I am not using keras).
batch = 4
num_classes = 6
logits = tf.random.uniform(shape=[batch, num_classes])
logits1 = tf.slice(logits, (0, 0), [batch, num_classes-1])
logits2 = tf.slice(logits, (0, 1), [batch, num_classes-1])
delta = logits2 - logits1
loss = tf.reduce_sum(tf.log(tf.nn.sigmoid(delta)), axis=-1)
with tf.Session() as sess:
logits, logits1, logits2, delta, loss = sess.run([logits, logits1, logits2,
delta, loss])
print 'logits\n', logits
print 'logits2\n', logits2
print 'logits1\n', logits1
print 'delta\n', delta
print 'loss\n', loss
The result:
logits
[[ 0.61241663 0.70075285 0.98333454 0.4117974 0.5943476 0.84245574]
[ 0.02499413 0.22279179 0.70742595 0.34853518 0.7837007 0.88074362]
[ 0.35030317 0.36670768 0.64244425 0.87957716 0.22823489 0.45076978]
[ 0.38116801 0.39040041 0.82510674 0.64789391 0.45415008 0.03520513]]
logits2
[[ 0.70075285 0.98333454 0.4117974 0.5943476 0.84245574]
[ 0.22279179 0.70742595 0.34853518 0.7837007 0.88074362]
[ 0.36670768 0.64244425 0.87957716 0.22823489 0.45076978]
[ 0.39040041 0.82510674 0.64789391 0.45415008 0.03520513]]
logits1
[[ 0.61241663 0.70075285 0.98333454 0.4117974 0.5943476 ]
[ 0.02499413 0.22279179 0.70742595 0.34853518 0.7837007 ]
[ 0.35030317 0.36670768 0.64244425 0.87957716 0.22823489]
[ 0.38116801 0.39040041 0.82510674 0.64789391 0.45415008]]
delta
[[ 0.08833623 0.28258169 -0.57153714 0.18255019 0.24810815]
[ 0.19779766 0.48463416 -0.35889077 0.43516552 0.09704292]
[ 0.01640451 0.27573657 0.23713291 -0.65134227 0.22253489]
[ 0.0092324 0.43470633 -0.17721283 -0.19374382 -0.41894495]]
loss
[-3.41376281 -3.11249781 -3.49031925 -3.69255161]

Understanding keras.backend.max usage with tf.random_normal

import numpy as np
import tensorflow as tf
from keras import backend as K
sess = tf.InteractiveSession()
box_scores1 = tf.constant([[[ 9.188682, 11.484599 ],
[10.06533, 7.557296 ]],
[[10.099248, 10.591225 ],
[10.592823 , 7.8770704]]])
box_scores2 = tf.random_normal([2,2,2], mean=10, stddev=1, dtype=tf.float32, seed = 1)
box_class_scores1 = K.max(box_scores1, axis=-1)
box_class_scores2 = K.max(box_scores2, axis=-1)
print(box_scores1.eval())
print(box_scores2.eval())
print(box_class_scores1.eval())
print(box_class_scores2.eval())
Output:
[[[ 9.188682 11.484599 ]
[10.06533 7.557296 ]]
[[10.099248 10.591225 ]
[10.592823 7.8770704]]]
[[[ 9.188682 11.484599 ]
[10.06533 7.557296 ]]
[[10.099248 10.591225 ]
[10.592823 7.8770704]]]
[[11.484599 10.06533 ]
[10.591225 10.592823]]
[[10.242094 10.515779]
[12.083789 11.397354]]
As, we can see values in box_scores1 and box_scores2 are same but the result obtained after applying max operation differs. How can the values of box_class_scores1 and box_class_scores2 be different?
Your problem has nothing to do with the max function, but a misunderstanding with tensorflow, as most of its operations are symbolic, so when you use tf.random_mormal, this does not produce random numbers, but a symbolic normal distribution with the given mean and standard distribution.
Then, each time you evaluate this distribution, it generates different outputs, so your first eval looks ok, but the second produces a different output that is given to max, so it produces a different output than just giving a constant vector.

Possible tensorflow cholesky_solve inconsistency?

I am trying to solve a linear system of equations using tensorflow.cholesky_solve and I'm getting some unexpected results.
I wrote a script to compare the output of a very simple linear system with simple matrix inversion a la tensorflow.matrix_inverse, the non-cholesky based matrix equation solver tensorflow.matrix_solve, and tensorflow.cholesky_solve.
According to my understanding of the docs I've linked, these three cases should all yield a solution of the identity matrix divided by 2, but this is not the case for tensorflow.cholesky_solve. Perhaps I'm misunderstanding the docs?
import tensorflow as tf
I = tf.eye(2, dtype=tf.float32)
X = 2 * tf.eye(2, dtype=tf.float32)
X_inv = tf.matrix_inverse(X)
X_solve = tf.matrix_solve(X, I)
X_chol_solve = tf.cholesky_solve(tf.cholesky(X), I)
with tf.Session() as sess:
for x in [X_inv, X_solve, X_chol_solve]:
print('{}:\n{}'.format(x.name, sess.run(x)))
print
yielding output:
MatrixInverse:0:
[[ 0.5 0. ]
[ 0. 0.5]]
MatrixSolve:0:
[[ 0.5 0. ]
[ 0. 0.5]]
cholesky_solve/MatrixTriangularSolve_1:0:
[[ 1. 0.]
[ 0. 1.]]
Process finished with exit code 0
I think it's a bug. Notice how the result doesn't even depend on the RHS, unless RHS = 0, in which case you get nan instead of 0. Please report it on GitHub.

How to work with probabilistic classification with scikit learn SVC's?

Firstly my data looks like this:
label|instances(sentences)
5 |1190
4 |839
3 |239
2 |204
1 |127
Then I cross validated:
from sklearn import cross_validation
kf = cross_validation.KFold(n=len(y),n_folds=10)
for train_index, test_index in kf:
print "\nTRAIN:\n", train_index, "\n TEST:\n", test_index
X_train, X_test = X_combined_features[train_index], X_combined_features[test_index]
y_train, y_test = y[train_index], y[test_index]
From the documentation I know that probabilistic metrics can be turned on as follows:
svm = SVC(probability=True)
I would like to work with probabilistic classification and SVMs, so let's assume that I read the data, then I do the following:
from sklearn.svm import SVC
svm = SVC(kernel='linear', probability=True)
svm.fit(reduced_training_matrix, y)
output_proba = svm.predict_proba(reduced_testing_matrix)
print output_proba
Then I got this:
[[ 0.06351278 0.05312154 0.07709772 ..., 0.41958171 0.00076087
0.00076095]
[ 0.05813505 0.05373973 0.08617775 ..., 0.47467149 0.00082695
0.00082701]
[ 0.05576647 0.04756668 0.08216568 ..., 0.47984425 0.00077685
0.00077693]
...,
[ 0.05983482 0.03972051 0.07636607 ..., 0.4853006 0.00070774
0.00070783]
[ 0.05813505 0.05373973 0.08617775 ..., 0.47467149 0.00082695
0.00082701]
[ 0.05989075 0.04822012 0.07795987 ..., 0.48084117 0.00073095
0.00073101]]
Several questions arised from the above excercise: What is that array output (i.e. what does it mean?), Am I doing things in the right way?... If not, how should I need to proceed in order to use probabilistic classification with SVC?.
Update:
vector_of_probabilities_for_sample= reduced_training_matrix[j,:]
print vector_of_probabilities_for_sample.toarray()
[[ 0. 0. 0. 0. 0. 0.]]
probability_of_corresponding_class = reduced_training_matrix[j,:]
print probability_of_corresponding_class.toarray()
[[ 0. 0. 0. 0. 0. 0.]]
What is that array output
Probability of some label for each corresponding sample from reduced_training_matrix. Every i-th column here - probability of corresponding class svm.classes_[i], every j-th row - vector of probabilities for sample reduced_training_matrix[j,:]. Obviously sum of each row equals to 1.
Am I doing things in the right way?
Yes.