What is the best/shortest way to get the TWO columns of with max value after selecting the THREE max in every specified row
[[0. , 0. , 0. ],
[0.19, 0. , 0. ],
[0. , 0.29, 0. ],
[0.42, 0. , 0.13],
[0.12, 0.12, 0.13],
[0.13, 0.1 , 0.26],
[0. , 0. , 0. ],
[0. , 0.12, 0. ],
[0.25, 0. , 0.48],
[0. , 0. , 0.21]])
so the three max vals at rows 3,4,5 are
In [132]: np.max(ary[[3,4,5],:],axis=1)
Out[132]: array([0.42, 0.13, 0.26])
now i have to select the columns of the two max values:
In [133]: np.argmax(ary[[3,4,5],:],axis=1)
Out[133]: array([0, 2, 2])
in this case that is element[0]=0 and element[2]=2, ignoring element[1]=2
Is there a quicker way of getting the col-ixs of max-of-max ?
there doesnt seem to be direct argmax-max function you have to always do max+argmax (store intermediary result) and do argmax again
is this correct :
np.argsort(np.max(ary[[3,4,5],:],axis=1))[::-1][:2]
Related
Question
What numpy function to use for mathematical dot product in the case below?
Backpropagation for a Linear Layer
Define sample (2,3) array:
In [299]: dldx = np.arange(6).reshape(2,3)
In [300]: w
Out[300]:
array([[0.1, 0.2, 0.3],
[0. , 0. , 0. ]])
Element wise multiplication:
In [301]: dldx*w
Out[301]:
array([[0. , 0.2, 0.6],
[0. , 0. , 0. ]])
and summing on the last axis (size 3) produces a 2 element array:
In [302]: (dldx*w).sum(axis=1)
Out[302]: array([0.8, 0. ])
Your (6) is the first term, dropping the 0. One might argue that the use of a dot/inner in (5) is a bit sloppy.
np.einsum borrows ideas from physics, where dimensions may be higher. This case can be expressed as
In [303]: np.einsum('ij,ik->i',dldx,w)
Out[303]: array([1.8, 0. ])
inner and dot do more calculations that we want. We just want the diagonal:
In [304]: np.dot(dldx,w.T)
Out[304]:
array([[0.8, 0. ],
[2.6, 0. ]])
In [305]: np.inner(dldx,w)
Out[305]:
array([[0.8, 0. ],
[2.6, 0. ]])
In matmul/# terms, the size 2 dimension is a 'batch' one, so we have to add dimensions:
In [306]: dldx[:,None,:]#w[:,:,None]
Out[306]:
array([[[0.8]],
[[0. ]]])
This is (2,1,1), so we need to squeeze out the 1s.
I want to create an M*N tensor where all elements are all zeros except one random element per row which shall be one but I don't know how.
This is one way to do that:
import tensorflow as tf
m = 4
n = 6
dt = tf.float32
random_idx = tf.random_uniform((m, 1), maxval=n, dtype=tf.int32)
result = tf.cast(tf.equal(tf.range(n)[tf.newaxis], random_idx), dtype=dt)
with tf.Session() as sess:
print(sess.run(result))
Output:
[[ 0. 0. 0. 0. 0. 1.]
[ 0. 0. 1. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 0.]]
I have a list of K (x_i, y_i) pairs where 0 <= x_i < X and 0 <= y_i < Y represented as a tensor of shape [K, 2].
I want to create a tensor T of shape [K, X, Y], where T[i, x, y] = 1 if x = x_i and y = y_i, 0 otherwise.
I know that for a list of indices I can use tf.one_hot, but not sure if I can reuse it here? something like tf.one_hot(pairs, depth=(X,Y))
From this SO post we get a slick way to do this in numpy:
(np.arange(a.max()) == a[...,None]-1).astype(int)
Fully using that trick, now we just have to port this to tensorflow:
# for the numpy, full credit to #Divakar and https://stackoverflow.com/questions/34987509/tensorflow-max-of-a-tensor-along-an-axis
print('first an awesome way to do it in numpy...')
a = np.array([[1,2,4],[3,1,0]])
print((np.arange(a.max()) == a[...,None]-1).astype(int))
# porting this to tensorflow...
print('\nnow in tensorflow...')
b = tf.constant([[1,2,4],[3,1,0]])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(tf.cast(tf.equal(tf.range(tf.reduce_max(b)),tf.reshape(b,[2,3,1])-1),tf.int32)))
Returns:
first an awesome way to do it in numpy...
[[[1 0 0 0]
[0 1 0 0]
[0 0 0 1]]
[[0 0 1 0]
[1 0 0 0]
[0 0 0 0]]]
now in tensorflow...
[[[1 0 0 0]
[0 1 0 0]
[0 0 0 1]]
[[0 0 1 0]
[1 0 0 0]
[0 0 0 0]]]
That was fun.
I think the best solution uses tf.sparse_to_dense. For example, if we want ones in positions (6,2), (3,4), (4,5) of a 10x8 matrix:
indices = sorted([[6,2],[3,4],[4,5]])
one_hot_encoded = tf.sparse_to_dense(sparse_indices=indices, output_shape=[10,8], sparse_values=1)
with tf.Session() as session:
tf.global_variables_initializer().run()
print(one_hot_encoded.eval())
This returns the following:
[[0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0.]]
Furthermore, the inputs (e.g. indices) might be a tf.Variable object, no need for it to be constant.
It has a couple of restrictions, namely indices must be sorted (hence the sorted above) and not repeated. You can also use tf.one_hot directly. In that case, you need the indices as two vectors of all the x before and all the y after, i.e. list(zip(*indices)). Then one can do:
new_indices = list(zip(*indices))
# one of the following: the first one is for xy index convention:
flat_indices = new_indices[1] * depth[1] + new_indices[0]
# this other for ij convention:
# flat_indices = new_indices[0] * depth[1] + new_indices[1]
# Apply tf.one_hot to the flattened vector, then sum along the newly created dimension
one_hot_flat = tf.reduce_sum(tf.one_hot(flat_indices, depth=np.prod(im_size)), axis=0)
# Finally reshape
one_hot_encoded = tf.reshape(oh, im_size)
with tf.Session() as session:
tf.global_variables_initializer().run()
print(one_hot_encoded.eval())
This returns the same as the above. However, indices don't need to be sorted, and they can be repeated (in which case, the corresponding entry will be the number of appearances; for a simple "1" everywhere, replace tf.reduce_sum with tf.reduce_max). Also this supports variables.
However, for large indices / depths, memory consumption may be a problem. It creates a temporary N x W x H tensor, where N is the number of indices tuples, and that might get problematic. Therefore, the first solution is probably preferable, when possible.
Actually, if one is okay with using sparse tensor, the most memory-efficient way is probably just:
sparse = tf.SparseTensor(indices=indices, values=[1]*len(indices), dense_shape=[10, 8])
When run, this returns a more cryptic:
SparseTensorValue(indices=array([[3, 4],
[4, 5],
[6, 2]]), values=array([1, 1, 1], dtype=int32), dense_shape=array([10, 8]))
I have tensor named y, which has values from one-hot encoding over class labels:
y = [[ 0. 0. 1. ..., 0. 0. 0.],[ 1. 0. 0. ..., 0. 0. 0.],[ 0. 0. 0. ..., 0. 1. 0.],
...,[ 0. 0. 0. ..., 0. 0. 0.],[ 0. 0. 0. ..., 0. 0. 1.],[ 0. 0. 1. ..., 0. 0. 0.]]
so here first row has third element as '1' so it represents class label
for that image.
Am trying to get all class labels from the given one-hot encoded array,
or the given example it should be something like this:
y = [2,0,8,...,9,2]
I think the simplest way is:
import numpy as np
y = np.argmax(y)
I am trying to apply DBSCAN on a dataset of (Lan,Lat) .. The algorithm is very sensitive for the parameter; EPS & MinPts.
I would like to have a look through a Histogram over the data, to determine the proper values. Unfortunately, Matplotlib Hist() take only 1D array.
Passing a 2D matrix as argument, Hist() treats each column as a separate input.
Scatter plot and histograms:
Does anyone has a way to solve this,
If you follow the DBSCAN article, you only need the 4-nearest-neighbor distance for each object, not all pairwise distances. I.e., a 1 dimensional array.
Instead of doing a histogram, they sort the values, and try to choose a knee in this plot.
find the 4 nearest neighbor of each object
collect all 4NN distances in one array
sort this array in descending order
plot the resulting curve
look for a knee, often best at around 5%-10% of your x axis (so 95%-90% of objects are core points).
For details, see the original DBSCAN publication!
You could use numpy.histogram2d:
import numpy as np
np.random.seed(2016)
N = 100
arr = np.random.random((N, 2))
xedges = np.linspace(0, 1, 10)
yedges = np.linspace(0, 1, 10)
lat = arr[:, 0]
lng = arr[:, 1]
hist, xedges, yedges = np.histogram2d(lat, lng, (xedges, yedges))
print(hist)
yields
[[ 0. 0. 5. 0. 3. 0. 0. 0. 3.]
[ 0. 3. 0. 3. 0. 0. 4. 0. 2.]
[ 2. 2. 1. 1. 1. 1. 3. 0. 1.]
[ 2. 1. 0. 3. 1. 2. 1. 1. 3.]
[ 3. 0. 3. 2. 0. 1. 0. 2. 0.]
[ 3. 2. 3. 1. 1. 2. 1. 1. 0.]
[ 2. 3. 0. 1. 0. 1. 3. 0. 0.]
[ 1. 1. 1. 1. 2. 0. 2. 1. 1.]
[ 0. 1. 1. 0. 1. 1. 2. 0. 0.]]
Or to visualize the histogram:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.imshow(hist)
plt.show()