Related
tf.sparse_to_dense() fucntion in tensorflow only support ((data, (row_ind, col_ind)), [shape=(M, N)]) format. How can I convert standard CSR tensor (((data, indices, indptr), [shape=(M, N)])) to dense representation in tensorflow?
For example given, data, indices and indptr the function will return dense tensor.
e.g., inputs:
indices = [1 3 3 0 1 2 2 3]
indptr = [0 2 3 6 8]
data = [2 4 1 3 2 1 1 5]
expected output:
[[0, 2, 0, 4],
[0, 0, 0, 1],
[3, 2, 1, 0],
[0, 0, 1, 5]]
According to Scipy documentation, we can convert it back by the following:
the column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their
corresponding values are stored in data[indptr[i]:indptr[i+1]].
If the shape parameter is not supplied, the matrix dimensions are
inferred from the index arrays.
It is relatively easily to convert from the CSR format to the COO by expanding the indptr argument to get the row indices. Here is an example using a subtraction, tf.repeat and tf.range. The shape of the final sparse tensor is inferred from the max indices in the rows/columns respectively (but can also be provided explicitly).
def csr_to_sparse(data, indices, indptr, dense_shape=None):
rep = tf.math.subtract(indptr[1:], indptr[:-1])
row_indices = tf.repeat(tf.range(tf.size(rep)), rep)
sparse_indices = tf.cast(tf.stack((row_indices, indices), axis=-1), tf.int64)
if dense_shape is None:
max_row = tf.math.reduce_max(row_indices)
max_col = tf.math.reduce_max(indices)
dense_shape = (max_row + 1, max_col + 1)
return tf.SparseTensor(indices=sparse_indices, values=data, dense_shape=dense_shape)
With your example:
>>> indices = [1, 3, 3, 0, 1, 2, 2, 3]
>>> indptr = [0, 2, 3, 6, 8,]
>>> data = [2, 4, 1, 3, 2, 1, 1, 5]
>>> tf.sparse.to_dense(csr_to_sparse(data, indices, indptr))
<tf.Tensor: shape=(4, 4), dtype=int32, numpy=
array([[0, 2, 0, 4],
[0, 0, 0, 1],
[3, 2, 1, 0],
[0, 0, 1, 5]], dtype=int32)>
I have two Numpy arrays, each having n rows:
a = [[X1a, Y1a], [X2a, Y2a], .. , [Xna, Yna]]
b = [[X1b, Y1b], [X2b, Y2b], .. , [Xnb, Ynb]]
How can I get a new table with the Euclidean distance of each corresponding row?
c = [dis(1a, 1b), dis(2a, 2b), .. , dis(na, nb)]
or maybe
c = [[dis(1a, 1b)], [dis(2a, 2b)], .. , [dis(na, nb)]]
c = []
for i in range(a.shape[0]):
c.append(math.sqrt( (a[i][0]-b[i][0])**2 + (a[i][1] - b[i][1])**2))
This will work.
a.shape[0] will give the value of n
For inputs
a = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
You will get
c = [1.4142135623730951, 2.8284271247461903, 2.8284271247461903]
I find vectorizing is more pythonic and faster:
a = np.array(a)
b = np.array(b)
np.sqrt(np.sum((a-b)**2,axis=1))
There are loads of examples of using scipy's cdist, or pdist or just numpy's einsum to calculate distances. They scale to multiple dimensions as well.
from scipy.spatial.distance import cdist
a = np.array([[1., 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
cdist(a, b)
Out[14]:
array([[ 1.414, 0.000, 2.828],
[ 3.162, 2.828, 0.000],
[ 5.831, 5.657, 2.828]])
or
a = np.array([[1., 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
b = b[:, np.newaxis]
diff = a - b
np.sqrt(np.einsum('ijk,ijk->ij', diff, diff))
array([[ 1.414, 3.162, 5.831],
[ 0.000, 2.828, 5.657],
[ 2.828, 0.000, 2.828]])
The Euclidian distance is also known as the 2 norm. numpy.linalg.norm will calculate this efficiently across your vectors:
import numpy.linalg as la
a = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([[2, 1], [1, 2], [3, 4]])
c = la.norm(a - b, axis=1)
Say v1 and v2 has the same shape. Is it possible in tensorflow to concat v1 and the transposed version of v2 using the broadcast semantic?
For example,
v1 = tf.constant([[1,1,1,1],[3,3,3,3],[5,5,5,5]])
v2 = tf.constant([[2,2,2,2],[4,4,4,4]])
I want to produce something like
[
[[[1,1,1,1], [2,2,2,2]],
[[1,1,1,1], [4,4,4,4]]],
[[[3,3,3,3], [2,2,2,2]],
[[3,3,3,3], [4,4,4,4]]],
[[[5,5,5,5], [2,2,2,2]],
[[5,5,5,5], [4,4,4,4]]]]
that is, with v1 as [3, 4] and v2 as [2,4], I want to do
tf.concat([v1, tf.transpose(v2)], axis=0)
and produce a [3,2,2,4] matrix.
Is there any trick for doing that?
If you mean by trick an elegant solution, I don't think so. However, a working solution would be to tile and repeat the incoming v1, v2
import tensorflow as tf
v1 = tf.constant([[1, 1, 1, 1],
[3, 3, 3, 3],
[7, 7, 7, 7],
[5, 5, 5, 5]])
v2 = tf.constant([[2, 2, 2, 2],
[6, 6, 6, 6],
[4, 4, 4, 4]])
def my_concat(v1, v2):
v1_m, v1_n = v1.shape.as_list()
v2_m, v2_n = v2.shape.as_list()
v1 = tf.concat([v1 for i in range(v2_m)], axis=-1)
v1 = tf.reshape(v1, [v2_m * v1_m, -1])
v2 = tf.tile(v2, [v1_m, 1])
v1v2 = tf.concat([v1, v2], axis=-1)
return tf.reshape(v1v2, [v1_m, v2_m, 2, v2_n])
with tf.Session() as sess:
ret = sess.run(my_concat(v1, v2))
print ret.shape
print ret
Here's my attempt to add two more elegant solutions to this Cartesian Product problem as follows (both tested); first one using tf.map_fn():
import tensorflow as tf
v1 = tf.constant([[1, 1, 1, 1],
[3, 3, 3, 3],
[5, 5, 5, 5]])
v2 = tf.constant([[2, 2, 2, 2],
[4, 4, 4, 4]])
cartesian_product = tf.map_fn( lambda x: tf.map_fn( lambda y: tf.stack( [ x, y ] ), v2 ), v1 )
with tf.Session() as sess:
print( sess.run( cartesian_product ) )
or this one taking advantage of the implicit broadcasting of add:
import tensorflow as tf
v1 = tf.constant([[1, 1, 1, 1],
[3, 3, 3, 3],
[5, 5, 5, 5]])
v2 = tf.constant([[2, 2, 2, 2],
[4, 4, 4, 4]])
v1, v2 = v1[ :, None, None, : ], v2[ None, :, None, : ]
cartesian_product = tf.concat( [ v1 + tf.zeros_like( v2 ),
tf.zeros_like( v1 ) + v2 ], axis = 2 )
with tf.Session() as sess:
print( sess.run( cartesian_product ) )
both output:
[[[[1 1 1 1]
[2 2 2 2]]
[[1 1 1 1]
[4 4 4 4]]]
[[[3 3 3 3]
[2 2 2 2]]
[[3 3 3 3]
[4 4 4 4]]]
[[[5 5 5 5]
[2 2 2 2]]
[[5 5 5 5]
[4 4 4 4]]]]
as desired.
I have a tensor of shape (2, 3), say input = [[1 2 3] [4 5 6]], I have a index tensor of shape (2, 3) which I hope to be used to retrieve the values from input, index = [[1 0] [2 0]]. My expected result is result = [[2 1] [6 4]]. However, simply using tf.gather(input, index) does not seem to work.
If you want to extract element from the array, you can use gather_nd and the index should be of the form (i,j) for each element. In your example, your index should be:
inputs = tf.Variable([[1, 2, 3], [4, 5, 6]])
index = tf.Variable([[[0,1],[0,0]], [[1,2],[1,0]]])
result = tf.gather_nd(inputs, new_index)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
print(result.eval())
# output
# [[2 1]
# [6 4]]
If you want to generate the index from the form you have mentioned, you can do:
index = tf.Variable([[1, 0], [2, 0]])
r = tf.tile(tf.expand_dims(tf.range(tf.shape(index1)[0]), 1), [1, 2])
new_index = tf.concat([tf.expand_dims(r,-1), tf.expand_dims(index, -1)], axis=2)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
print(new_index.eval())
#output
#[[[0 1]
#[0 0]]
#[[1 2]
#[1 0]]]
The problem is in index, you have to use 0 or 1 values in index only, because your input array has shape (2,3). If you add additional row to input array all work fine:
import tensorflow as tf
input = tf.Variable([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
index = tf.Variable([[1, 0], [2, 0]])
result = tf.gather(input, index)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(result))
# results [[[4 5 6] [1 2 3]] [[7 8 9] [1 2 3]]]
Anyway, index describe which slice gather from input array, not which element.
For example:
array = [[1, 2, 3], [4, 5, 6]]
slice = [[0, 0, 1], [0, 1, 2]]
output = [[1, 1, 2], [4, 5,6]]
I've tried array[slice], but that didn't work. I also couldn't get tf.gather or tf.gather_nd to work, although these initially seemed like the correct functions to use. Note that these are all tensors in-graph.
How can I select these values in my array according to slice?
You need to add a dimension to your slice tensor which you can do with tf.pack and then we can use tf.gather_nd no problem.
import tensorflow as tf
tensor = tf.constant([[1, 2, 3], [4, 5, 6]])
old_slice = tf.constant([[0, 0, 1], [0, 1, 2]])
# We need to add a dimension - we need a tensor of rank 2, 3, 2 instead of 2, 3
dims = tf.constant([[0, 0, 0], [1, 1, 1]])
new_slice = tf.pack([dims, old_slice], 2)
out = tf.gather_nd(tensor, new_slice)
If we run the follow code:
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
run_tensor, run_slice, run_out = sess.run([tensor, new_slice, out])
print 'Input tensor:'
print run_tensor
print 'Correct param for gather_nd:'
print run_slice
print 'Output:'
print run_out
This should give the correct output:
Input tensor:
[[1 2 3]
[4 5 6]]
Correct param for gather_nd:
[[[0 0]
[0 0]
[0 1]]
[[1 0]
[1 1]
[1 2]]]
Output:
[[1 1 2]
[4 5 6]]
An even easier way to calculate the results, which is also of more general nature, is to directly leverage the batch_dims argument of tf.gather:
>>> array = tf.constant([[1,2,3], [4,5,6]])
>>> slice = tf.constant([[0,0,1], [0,1,2]])
>>> output = tf.constant([[1,1,2], [4,5,6]])
>>> tf.gather(array, slice, batch_dims=1, axis=1)
<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[1, 1, 2],
[4, 5, 6]], dtype=int32)>