Scatter operation for middle dimension of a tensor - tensorflow

I have a 3d tensor where I need to preserve vectors at certain positions in the second dimension, and zero out the remaining vectors. The positions are specified as a 1d array. I'm thinking the best way to do this is to multiply the tensor with a binary mask.
Here's a simple Numpy version:
A.shape: (b, n, m)
indices.shape: (b)
mask = np.zeros(A.shape)
for i in range(b):
mask[i][indices[i]] = 1
result = A*mask
So for each nxm matrix in A, I need to preserve rows specified by indices, and zero out the rest.
I'm trying to do this in TensorFlow using tf.scatter_nd op, but I can't figure out the correct shape of indices:
shape = tf.constant([3,5,4])
A = tf.random_normal(shape)
indices = tf.constant([2,1,4]) #???
updates = tf.ones((3,4))
mask = tf.scatter_nd(indices, updates, shape)
result = A*mask

Here's one way to do it, creating a mask and using tf.where:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
shape = tf.constant([3,5,4])
A = tf.random_normal(shape)
array_shape = tf.shape(A)
indices = tf.constant([2,1,4])
non_zero_indices = tf.stack((tf.range(array_shape[0]), indices), axis=1)
should_keep_row = tf.scatter_nd(non_zero_indices, tf.ones_like(indices),
shape=[array_shape[0], array_shape[1]])
print("should_keep_row", should_keep_row)
masked = tf.where(tf.cast(tf.tile(should_keep_row[:, :, None],
[1, 1, array_shape[2]]), tf.bool),
A,
tf.zeros_like(A))
print("masked", masked)
Prints:
should_keep_row tf.Tensor(
[[0 0 1 0 0]
[0 1 0 0 0]
[0 0 0 0 1]], shape=(3, 5), dtype=int32)
masked tf.Tensor(
[[[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0.02036316 -0.07163608 -3.16707373 1.31406844]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]
[[ 0. 0. 0. 0. ]
[-0.76696759 -0.28313264 0.87965059 -1.28844094]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]
[[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 1.03188455 0.44305769 0.71291149 1.59758031]]], shape=(3, 5, 4), dtype=float32)
(The example is using eager execution, but the same ops will work with graph execution in a Session)

Related

How can the numpy computing sequence influence the result?

why the following two lines of code compute the same thing ,but I get the different results.
kernel1 = np.diag(np.exp(-scale*eigen_values))
kernel2 = np.exp(-scale*np.diag(eigen_values))
the
np.all(kernel1==kernel2)
output
False
Look at the values! Then you'll see the problem: when given a 1-d array, numpy.diag creates a 2-d array with zeros in the off-diagonal positions. In kernel1, you do diag last, so the off-diagonal values are 0. In kernel2, you apply exp after diag, and exp(0) is 1, so in kernel2, the off-diagonal terms are all 1. (Remember that numpy.exp is applied element-wise; it is not the matrix exponential.)
In [19]: eigen_values = np.array([1, 0.5, 0.1])
In [20]: scale = 1.0
In [21]: np.diag(np.exp(-scale*eigen_values))
Out[21]:
array([[0.36787944, 0. , 0. ],
[0. , 0.60653066, 0. ],
[0. , 0. , 0.90483742]])
In [22]: np.exp(-scale*np.diag(eigen_values))
Out[22]:
array([[0.36787944, 1. , 1. ],
[1. , 0.60653066, 1. ],
[1. , 1. , 0.90483742]])

Data standardization, across samples or across features?

I have 4 samples data with 5 features, as an array, data.
import numpy as np
data = np.array([[1,1,1,1,0],
[0,0,0,0,0],
[1,1,1,1,0],
[1,0,0,0,0]])
print (data)
n_samples, n_features = data.shape = (4,5)
When I apply StandardScaler on it as follows, does it standardize the data across features or across samples?
from sklearn.preprocessing import StandardScaler, MinMaxScaler
result = StandardScaler().fit_transform(data)
print (result)
[[ 0.57735027 1. 1. 1. 0. ]
[-1.73205081 -1. -1. -1. 0. ]
[ 0.57735027 1. 1. 1. 0. ]
[ 0.57735027 -1. -1. -1. 0. ]]
What's the best practice of data standardization in machine learning, across samples or across features?
in case of StandardScaler/MinMaxScaler the data are scaled across features and this is the best common practice
import numpy as np
from sklearn.preprocessing import StandardScaler
data = np.array([[1,1,1,1,0],
[0,0,0,0,0],
[1,1,1,1,0],
[1,0,0,0,0]])
result = StandardScaler().fit_transform(data)
result
array([[ 0.57735027, 1. , 1. , 1. , 0. ],
[-1.73205081, -1. , -1. , -1. , 0. ],
[ 0.57735027, 1. , 1. , 1. , 0. ],
[ 0.57735027, -1. , -1. , -1. , 0. ]])
you can verify it by your self
(data - data.mean(0))/data.std(0).clip(1e-5)
array([[ 0.57735027, 1. , 1. , 1. , 0. ],
[-1.73205081, -1. , -1. , -1. , 0. ],
[ 0.57735027, 1. , 1. , 1. , 0. ],
[ 0.57735027, -1. , -1. , -1. , 0. ]])

Possible tensorflow cholesky_solve inconsistency?

I am trying to solve a linear system of equations using tensorflow.cholesky_solve and I'm getting some unexpected results.
I wrote a script to compare the output of a very simple linear system with simple matrix inversion a la tensorflow.matrix_inverse, the non-cholesky based matrix equation solver tensorflow.matrix_solve, and tensorflow.cholesky_solve.
According to my understanding of the docs I've linked, these three cases should all yield a solution of the identity matrix divided by 2, but this is not the case for tensorflow.cholesky_solve. Perhaps I'm misunderstanding the docs?
import tensorflow as tf
I = tf.eye(2, dtype=tf.float32)
X = 2 * tf.eye(2, dtype=tf.float32)
X_inv = tf.matrix_inverse(X)
X_solve = tf.matrix_solve(X, I)
X_chol_solve = tf.cholesky_solve(tf.cholesky(X), I)
with tf.Session() as sess:
for x in [X_inv, X_solve, X_chol_solve]:
print('{}:\n{}'.format(x.name, sess.run(x)))
print
yielding output:
MatrixInverse:0:
[[ 0.5 0. ]
[ 0. 0.5]]
MatrixSolve:0:
[[ 0.5 0. ]
[ 0. 0.5]]
cholesky_solve/MatrixTriangularSolve_1:0:
[[ 1. 0.]
[ 0. 1.]]
Process finished with exit code 0
I think it's a bug. Notice how the result doesn't even depend on the RHS, unless RHS = 0, in which case you get nan instead of 0. Please report it on GitHub.

Removing all but last non-zero sequence from numpy array

The problem
I have a 1-dimensional numpy array filled mostly with zeros but also containing some groups of non-zero values.
>> import numpy as np
>> a = np.zeros(10)
>> a[2:4] = 2
>> a[6:9] = 3
>> print a
[ 0. 0. 2. 2. 0. 0. 3. 3. 3. 0.]
I want to get the array that contains only the last non-zero group. In other words, all but the last non-zero group should be replaced by zeros. (The groups could be only 1 element long). Like so:
[ 0. 0. 0. 0. 0. 0. 3. 3. 3. 0.]
Non-robust solution
This seems to do the trick. Reverse the array and find the first index where the change between elements is negative. Then replace all subsequent elements with zero. Then flip back. It's a bit long-winded:
>> b = a[::-1]
>> b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0
>> c = b[::-1]
>> print c
[ 0. 0. 0. 0. 0. 0. 3. 3. 3. 0.]
Fails for a specific case
However, it is not robust and fails in the following case (because the where command returns an empty list of indices):
>> a = np.zeros(10)
>> a[0:4] = 2
>> print a
[ 2. 2. 2. 2. 0. 0. 0. 0. 0. 0.]
>> b = a[::-1]
>> b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0
>> c = b[::-1]
>> print c
Traceback (most recent call last):
File "<ipython-input-81-8cba57558ba8>", line 1, in <module>
runfile('C:/Users/name/test1.py', wdir='C:/Users/name')
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/name/test1.py", line 21, in <module>
b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0
IndexError: index 0 is out of bounds for axis 0 with size 0
Fix
So I need to introduce an if clause:
>> b = a[::-1]
>> if len(np.where(np.ediff1d(b) < 0)[0]) > 0:
>> b[np.where(np.ediff1d(b) < 0)[0][0] + 1:] = 0
>> c = b[::-1]
>> print c
[ 2. 2. 2. 2. 0. 0. 0. 0. 0. 0.]
Is there a more elegant way to do it?
UPDATE
Following on from Divakar's excellent answer and mtrw's question, I would like to extend the specification. The method should also work if the input array has non-zero values that are negative and for groups of non-zero numbers that change within the grouping.
e.g. np.array([1, 0, 0, 4, 5, 4, 5, 0, 0])
This means methods where we check for a positive or negative difference between elements, in order to find the group boundaries, would not work so well.
Approach #1
Since we are after elegance, let's feed ourselves a one-liner -
a[:(a[1:] > a[:-1]).cumsum().argmax()] = 0
Sample run -
In [605]: a
Out[605]: array([ 0., 0., 2., 2., 0., 0., 3., 3., 3., 0.])
In [606]: a[:(a[1:] > a[:-1]).cumsum().argmax()] = 0
In [607]: a
Out[607]: array([ 0., 0., 0., 0., 0., 0., 3., 3., 3., 0.])
Approach #2
Above approach assumes that the last group numbers are greater than 0's. If that's not the case and for cases where the non-zeros group might have different numbers, let's feed one more line to have a generic solution -
mask = a != 0
a[:(mask[1:] > mask[:-1]).cumsum().argmax()] = 0
Sample run -
In [667]: a
Out[667]: array([-1, 0, 0, -4, -5, 4, -5, 0, 0])
In [668]: mask = a != 0
In [669]: a[:(mask[1:] > mask[:-1]).cumsum().argmax()] = 0
In [670]: a
Out[670]: array([ 0, 0, 0, -4, -5, 4, -5, 0, 0])

Why are the convolution outputs calculated with theano and numpy not the same?

I made a simple example ipython notebook to calculate convolution with theano and with numpy, however the results are different. Does anybody know where is the mistake?
import theano
import numpy
from theano.sandbox.cuda import dnn
import theano.tensor as T
Define the input image x0:
x0 = numpy.array([[[[ 7.61323881, 0. , 0. , 0. ,
0. , 0. ],
[ 25.58142853, 0. , 0. , 0. ,
0. , 0. ],
[ 7.51445341, 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 12.74498367, 4.96315479, 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. ],
[ 0. , 0. , 0. , 0. ,
0. , 0. ]]]], dtype='float32')
x0.shape
# (1, 1, 6, 6)
Define the convolution kernel:
w0 = numpy.array([[[[-0.0015835 , -0.00088091, 0.00226375, 0.00378434, 0.00032208,
-0.00396959],
[-0.000179 , 0.00030951, 0.00113849, 0.00012536, -0.00017198,
-0.00318825],
[-0.00263921, -0.00383847, -0.00225416, -0.00250589, -0.00149073,
-0.00287099],
[-0.00149283, -0.00312137, -0.00431571, -0.00394508, -0.00165113,
-0.0012118 ],
[-0.00167376, -0.00169753, -0.00373235, -0.00337372, -0.00025546,
0.00072154],
[-0.00141197, -0.00099017, -0.00091934, -0.00226817, -0.0024105 ,
-0.00333713]]]], dtype='float32')
w0.shape
# (1, 1, 6, 6)
Calculate the convolution with theano and cudnn:
X = T.tensor4('input')
W = T.tensor4('W')
conv_out = dnn.dnn_conv(img=X, kerns=W)
convolution = theano.function([X, W], conv_out)
numpy.array(convolution(x0, w0))
# array([[[[-0.04749081]]]], dtype=float32)
Calculate convolution with numpy (note the result is different):
numpy.sum(x0 * w0)
# -0.097668208
I'm not exactly sure what kind of convolution you are trying to compute, but it seems to me that numpy.sum(x0*w0) might not be the way to do it. Does this help?
import numpy as np
# ... define x0 and w0 like in your example ...
np_convolution = np.fft.irfftn(np.fft.rfftn(x0) * np.fft.rfftn(w0))
The last element of the resulting array, i.e. np_convolution[-1,-1,-1,-1] is -0.047490807560833327, which seems to be the answer you're looking for in your notebook.