Possible tensorflow cholesky_solve inconsistency? - tensorflow

I am trying to solve a linear system of equations using tensorflow.cholesky_solve and I'm getting some unexpected results.
I wrote a script to compare the output of a very simple linear system with simple matrix inversion a la tensorflow.matrix_inverse, the non-cholesky based matrix equation solver tensorflow.matrix_solve, and tensorflow.cholesky_solve.
According to my understanding of the docs I've linked, these three cases should all yield a solution of the identity matrix divided by 2, but this is not the case for tensorflow.cholesky_solve. Perhaps I'm misunderstanding the docs?
import tensorflow as tf
I = tf.eye(2, dtype=tf.float32)
X = 2 * tf.eye(2, dtype=tf.float32)
X_inv = tf.matrix_inverse(X)
X_solve = tf.matrix_solve(X, I)
X_chol_solve = tf.cholesky_solve(tf.cholesky(X), I)
with tf.Session() as sess:
for x in [X_inv, X_solve, X_chol_solve]:
print('{}:\n{}'.format(x.name, sess.run(x)))
print
yielding output:
MatrixInverse:0:
[[ 0.5 0. ]
[ 0. 0.5]]
MatrixSolve:0:
[[ 0.5 0. ]
[ 0. 0.5]]
cholesky_solve/MatrixTriangularSolve_1:0:
[[ 1. 0.]
[ 0. 1.]]
Process finished with exit code 0

I think it's a bug. Notice how the result doesn't even depend on the RHS, unless RHS = 0, in which case you get nan instead of 0. Please report it on GitHub.

Related

Understanding keras.backend.max usage with tf.random_normal

import numpy as np
import tensorflow as tf
from keras import backend as K
sess = tf.InteractiveSession()
box_scores1 = tf.constant([[[ 9.188682, 11.484599 ],
[10.06533, 7.557296 ]],
[[10.099248, 10.591225 ],
[10.592823 , 7.8770704]]])
box_scores2 = tf.random_normal([2,2,2], mean=10, stddev=1, dtype=tf.float32, seed = 1)
box_class_scores1 = K.max(box_scores1, axis=-1)
box_class_scores2 = K.max(box_scores2, axis=-1)
print(box_scores1.eval())
print(box_scores2.eval())
print(box_class_scores1.eval())
print(box_class_scores2.eval())
Output:
[[[ 9.188682 11.484599 ]
[10.06533 7.557296 ]]
[[10.099248 10.591225 ]
[10.592823 7.8770704]]]
[[[ 9.188682 11.484599 ]
[10.06533 7.557296 ]]
[[10.099248 10.591225 ]
[10.592823 7.8770704]]]
[[11.484599 10.06533 ]
[10.591225 10.592823]]
[[10.242094 10.515779]
[12.083789 11.397354]]
As, we can see values in box_scores1 and box_scores2 are same but the result obtained after applying max operation differs. How can the values of box_class_scores1 and box_class_scores2 be different?
Your problem has nothing to do with the max function, but a misunderstanding with tensorflow, as most of its operations are symbolic, so when you use tf.random_mormal, this does not produce random numbers, but a symbolic normal distribution with the given mean and standard distribution.
Then, each time you evaluate this distribution, it generates different outputs, so your first eval looks ok, but the second produces a different output that is given to max, so it produces a different output than just giving a constant vector.

ValueError: Floating point image RGB values must be in the 0..1 range. while using matplotlib

I want to visualize weights of the layer of a neural network. I'm using pytorch.
import torch
import torchvision.models as models
from matplotlib import pyplot as plt
def plot_kernels(tensor, num_cols=6):
if not tensor.ndim==4:
raise Exception("assumes a 4D tensor")
if not tensor.shape[-1]==3:
raise Exception("last dim needs to be 3 to plot")
num_kernels = tensor.shape[0]
num_rows = 1+ num_kernels // num_cols
fig = plt.figure(figsize=(num_cols,num_rows))
for i in range(tensor.shape[0]):
ax1 = fig.add_subplot(num_rows,num_cols,i+1)
ax1.imshow(tensor[i])
ax1.axis('off')
ax1.set_xticklabels([])
ax1.set_yticklabels([])
plt.subplots_adjust(wspace=0.1, hspace=0.1)
plt.show()
vgg = models.vgg16(pretrained=True)
mm = vgg.double()
filters = mm.modules
body_model = [i for i in mm.children()][0]
layer1 = body_model[0]
tensor = layer1.weight.data.numpy()
plot_kernels(tensor)
The above gives this error ValueError: Floating point image RGB values must be in the 0..1 range.
My question is should I normalize and take absolute value of the weights to overcome this error or is there anyother way ?
If I normalize and use absolute value I think the meaning of the graphs change.
[[[[ 0.02240197 -1.22057354 -0.55051649]
[-0.50310904 0.00891289 0.15427093]
[ 0.42360783 -0.23392732 -0.56789106]]
[[ 1.12248898 0.99013627 1.6526649 ]
[ 1.09936976 2.39608836 1.83921957]
[ 1.64557672 1.4093554 0.76332706]]
[[ 0.26969245 -1.2997849 -0.64577204]
[-1.88377869 -2.0100112 -1.43068039]
[-0.44531786 -1.67845118 -1.33723605]]]
[[[ 0.71286005 1.45265901 0.64986968]
[ 0.75984162 1.8061738 1.06934202]
[-0.08650422 0.83452386 -0.04468433]]
[[-1.36591709 -2.01630116 -1.54488969]
[-1.46221244 -2.5365622 -1.91758668]
[-0.88827479 -1.59151018 -1.47308767]]
[[ 0.93600738 0.98174071 1.12213969]
[ 1.03908169 0.83749604 1.09565806]
[ 0.71188802 0.85773659 0.86840987]]]
[[[-0.48592842 0.2971966 1.3365227 ]
[ 0.47920835 -0.18186836 0.59673625]
[-0.81358945 1.23862112 0.13635623]]
[[-0.75361633 -1.074965 0.70477796]
[ 1.24439156 -1.53563368 -1.03012812]
[ 0.97597247 0.83084011 -1.81764793]]
[[-0.80762428 -0.62829626 1.37428832]
[ 1.01448071 -0.81775147 -0.41943246]
[ 1.02848887 1.39178836 -1.36779451]]]
...,
[[[ 1.28134537 -0.00482408 0.71610934]
[ 0.95264435 -0.09291686 -0.28001019]
[ 1.34494913 0.64477581 0.96984017]]
[[-0.34442815 -1.40002513 1.66856039]
[-2.21281362 -3.24513769 -1.17751861]
[-0.93520379 -1.99811196 0.72937071]]
[[ 0.63388056 -0.17022935 2.06905985]
[-0.7285465 -1.24722099 0.30488953]
[ 0.24900314 -0.19559766 1.45432627]]]
[[[-0.80684513 2.1764245 -0.73765725]
[-1.35886598 1.71875226 -1.73327696]
[-0.75233924 2.14700699 -0.71064663]]
[[-0.79627383 2.21598244 -0.57396138]
[-1.81044972 1.88310981 -1.63758397]
[-0.6589964 2.013237 -0.48532376]]
[[-0.3710472 1.4949851 -0.30245575]
[-1.25448656 1.20453358 -1.29454732]
[-0.56755757 1.30994892 -0.39370224]]]
[[[-0.67361742 -3.69201088 -1.23768616]
[ 3.12674141 1.70414758 -1.76272404]
[-0.22565465 1.66484773 1.38172317]]
[[ 0.28095332 -2.03035069 0.69989491]
[ 1.97936332 1.76992691 -1.09842575]
[-2.22433758 0.52577412 0.18292744]]
[[ 0.48471382 -1.1984663 1.57565165]
[ 1.09911084 1.31910467 -0.51982772]
[-2.76202297 -0.47073677 0.03936549]]]]
It sounds as if you already know your values are not in that range. Yes, you must re-scale them to the range 0.0 - 1.0. I suggest that you want to retain visibility of negative vs positive, but that you let 0.5 be your new "neutral" point. Scale such that current 0.0 values map to 0.5, and your most extreme value (largest magnitude) scale to 0.0 (if negative) or 1.0 (if positive).
Thanks for the vectors. It looks like your values are in the range -2.25 to +2.0. I suggest a rescaling new = (1/(2*2.25)) * old + 0.5

How to use tf.cond for batch processing

I want to use tf.cond(pred, fn1, fn2, name=None) for conditional branching. Let say I have two tensors: x, y. Each tensor is a batch of 0/1 and I want to use this tensors compression x < y as the source for
tf.cond pred argument:
pred: A scalar determining whether to return the result of fn1 or fn2.
But if I am working with batches then it looks like I need to iterate over the source tensor inside the graph and make slices for every item in batch and apply tf.cond for every item. Looks suspiciously as for me. Why tf.cond not accept batch and only scalar? Can you advise what is the right way to use it with batch?
tf.where sounds like what you want: a vectorized selection between Tensors.
tf.cond is a control flow modifier: it determines which ops are executed, and so it's difficult to think of useful batch semantics.
We can also put together a mixture of these operations: an operation which slices based on a condition and passes those slices to two branches.
import tensorflow as tf
from tensorflow.python.util import nest
def slicing_where(condition, full_input, true_branch, false_branch):
"""Split `full_input` between `true_branch` and `false_branch` on `condition`.
Args:
condition: A boolean Tensor with shape [B_1, ..., B_N].
full_input: A Tensor or nested tuple of Tensors of any dtype, each with
shape [B_1, ..., B_N, ...], to be split between `true_branch` and
`false_branch` based on `condition`.
true_branch: A function taking a single argument, that argument having the
same structure and number of batch dimensions as `full_input`. Receives
slices of `full_input` corresponding to the True entries of
`condition`. Returns a Tensor or nested tuple of Tensors, each with batch
dimensions matching its inputs.
false_branch: Like `true_branch`, but receives inputs corresponding to the
false elements of `condition`. Returns a Tensor or nested tuple of Tensors
(with the same structure as the return value of `true_branch`), but with
batch dimensions matching its inputs.
Returns:
Interleaved outputs from `true_branch` and `false_branch`, each Tensor
having shape [B_1, ..., B_N, ...].
"""
full_input_flat = nest.flatten(full_input)
true_indices = tf.where(condition)
false_indices = tf.where(tf.logical_not(condition))
true_branch_inputs = nest.pack_sequence_as(
structure=full_input,
flat_sequence=[tf.gather_nd(params=input_tensor, indices=true_indices)
for input_tensor in full_input_flat])
false_branch_inputs = nest.pack_sequence_as(
structure=full_input,
flat_sequence=[tf.gather_nd(params=input_tensor, indices=false_indices)
for input_tensor in full_input_flat])
true_outputs = true_branch(true_branch_inputs)
false_outputs = false_branch(false_branch_inputs)
nest.assert_same_structure(true_outputs, false_outputs)
def scatter_outputs(true_output, false_output):
batch_shape = tf.shape(condition)
scattered_shape = tf.concat(
[batch_shape, tf.shape(true_output)[tf.rank(batch_shape):]],
0)
true_scatter = tf.scatter_nd(
indices=tf.cast(true_indices, tf.int32),
updates=true_output,
shape=scattered_shape)
false_scatter = tf.scatter_nd(
indices=tf.cast(false_indices, tf.int32),
updates=false_output,
shape=scattered_shape)
return true_scatter + false_scatter
result = nest.pack_sequence_as(
structure=true_outputs,
flat_sequence=[
scatter_outputs(true_single_output, false_single_output)
for true_single_output, false_single_output
in zip(nest.flatten(true_outputs), nest.flatten(false_outputs))])
return result
Some examples:
vector_test = slicing_where(
condition=tf.equal(tf.range(10) % 2, 0),
full_input=tf.range(10, dtype=tf.float32),
true_branch=lambda x: 0.2 + x,
false_branch=lambda x: 0.1 + x)
cross_range = (tf.range(10, dtype=tf.float32)[:, None]
* tf.range(10, dtype=tf.float32)[None, :])
matrix_test = slicing_where(
condition=tf.equal(tf.range(10) % 3, 0),
full_input=cross_range,
true_branch=lambda x: -x,
false_branch=lambda x: x + 0.1)
with tf.Session():
print(vector_test.eval())
print(matrix_test.eval())
Prints:
[ 0.2 1.10000002 2.20000005 3.0999999 4.19999981 5.0999999
6.19999981 7.0999999 8.19999981 9.10000038]
[[ 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
[ 0.1 1.10000002 2.0999999 3.0999999 4.0999999
5.0999999 6.0999999 7.0999999 8.10000038 9.10000038]
[ 0.1 2.0999999 4.0999999 6.0999999 8.10000038
10.10000038 12.10000038 14.10000038 16.10000038 18.10000038]
[ 0. -3. -6. -9. -12. -15.
-18. -21. -24. -27. ]
[ 0.1 4.0999999 8.10000038 12.10000038 16.10000038
20.10000038 24.10000038 28.10000038 32.09999847 36.09999847]
[ 0.1 5.0999999 10.10000038 15.10000038 20.10000038
25.10000038 30.10000038 35.09999847 40.09999847 45.09999847]
[ 0. -6. -12. -18. -24. -30.
-36. -42. -48. -54. ]
[ 0.1 7.0999999 14.10000038 21.10000038 28.10000038
35.09999847 42.09999847 49.09999847 56.09999847 63.09999847]
[ 0.1 8.10000038 16.10000038 24.10000038 32.09999847
40.09999847 48.09999847 56.09999847 64.09999847 72.09999847]
[ 0. -9. -18. -27. -36. -45.
-54. -63. -72. -81. ]]

With Numba's `guvectorize` targeted to CUDA, how do I specify a variable as both input and output?

I want to use Numba's guvectorize method to run code on my CUDA card. I first defined a CPU method
from numba import guvectorize
import numpy as np
#guvectorize(['float32[:,:], float32[:,:]',
'float64[:,:], float64[:,:]'],
'(n,m)->(n,m)', nopython=True, target='cpu')
def update_a_cpu(A, Anew):
n, m = A.shape
for j in range(1, n-1):
for i in range(1, m-1):
Anew[j, i] = 0.25 * (A[j, i+1] + A[j, i-1] + A[j-1, i] + A[j+1, i])
which gives the expected output for a test matrix
>>> A = np.arange(16, dtype=np.float32).reshape(4,4) # single precision for GTX card
>>> Anew = np.zeros((4,4), dtype=np.float32)
>>> res_cpu = update_a_cpu(A, Anew)
>>> print(res_cpu)
[[ 0. 0. 0. 0.]
[ 0. 5. 6. 0.]
[ 0. 9. 10. 0.]
[ 0. 0. 0. 0.]]
Actually, when targeting the CPU, Anew is mutated in place so there was no need to assign the output to res_cpu
>>> res_cpu is Anew
True
Changing the target to 'cuda' drastically changes the guvectorize behavior in a manner not documented for Generalized CUDA ufuncs. Here is the modified ufunc definition
#guvectorize(['float32[:,:], float32[:,:]',
'float64[:,:], float64[:,:]'],
'(n,m)->(n,m)', nopython=True, target='cuda')
def update_a_cuda(A, Anew):
n, m = A.shape
for j in range(1, n-1):
for i in range(1, m-1):
Anew[j, i] = 0.25 * (A[j, i+1] + A[j, i-1] + A[j-1, i] + A[j+1, i])
Now the function does not accept the second input matrix
>>> res_cuda = update_a_cuda(A, Anew)
...
TypeError: invalid number of input argument
and instead creates an empty matrix to put the value into
>>> res_cuda = update_a_cuda(A)
>>> print(res_cuda)
array([[ 1.55011636e-41, 1.55011636e-41, 1.55011636e-41, 1.55011636e-41],
[ 1.55011636e-41, 5.00000000e+00, 6.00000000e+00, 1.55011636e-41],
[ 1.55011636e-41, 9.00000000e+00, 1.00000000e+01, 1.55011636e-41],
[ 1.55011636e-41, 1.55011636e-41, 1.55011636e-41, 1.55011636e-41]], dtype=float32)
I would like the generalized ufunc to update the appropriate values of an input matrix rather than populating an empty matrix. When targeting a CUDA device, is there a way to specify a variable as both input and output?

How to work with probabilistic classification with scikit learn SVC's?

Firstly my data looks like this:
label|instances(sentences)
5 |1190
4 |839
3 |239
2 |204
1 |127
Then I cross validated:
from sklearn import cross_validation
kf = cross_validation.KFold(n=len(y),n_folds=10)
for train_index, test_index in kf:
print "\nTRAIN:\n", train_index, "\n TEST:\n", test_index
X_train, X_test = X_combined_features[train_index], X_combined_features[test_index]
y_train, y_test = y[train_index], y[test_index]
From the documentation I know that probabilistic metrics can be turned on as follows:
svm = SVC(probability=True)
I would like to work with probabilistic classification and SVMs, so let's assume that I read the data, then I do the following:
from sklearn.svm import SVC
svm = SVC(kernel='linear', probability=True)
svm.fit(reduced_training_matrix, y)
output_proba = svm.predict_proba(reduced_testing_matrix)
print output_proba
Then I got this:
[[ 0.06351278 0.05312154 0.07709772 ..., 0.41958171 0.00076087
0.00076095]
[ 0.05813505 0.05373973 0.08617775 ..., 0.47467149 0.00082695
0.00082701]
[ 0.05576647 0.04756668 0.08216568 ..., 0.47984425 0.00077685
0.00077693]
...,
[ 0.05983482 0.03972051 0.07636607 ..., 0.4853006 0.00070774
0.00070783]
[ 0.05813505 0.05373973 0.08617775 ..., 0.47467149 0.00082695
0.00082701]
[ 0.05989075 0.04822012 0.07795987 ..., 0.48084117 0.00073095
0.00073101]]
Several questions arised from the above excercise: What is that array output (i.e. what does it mean?), Am I doing things in the right way?... If not, how should I need to proceed in order to use probabilistic classification with SVC?.
Update:
vector_of_probabilities_for_sample= reduced_training_matrix[j,:]
print vector_of_probabilities_for_sample.toarray()
[[ 0. 0. 0. 0. 0. 0.]]
probability_of_corresponding_class = reduced_training_matrix[j,:]
print probability_of_corresponding_class.toarray()
[[ 0. 0. 0. 0. 0. 0.]]
What is that array output
Probability of some label for each corresponding sample from reduced_training_matrix. Every i-th column here - probability of corresponding class svm.classes_[i], every j-th row - vector of probabilities for sample reduced_training_matrix[j,:]. Obviously sum of each row equals to 1.
Am I doing things in the right way?
Yes.