Hi I have two arrays y_pred.shape : (64, 7, 5) in which it has the prediction of 5 classifiers for 7 neighbours of 64 samples in the dataset and y_true.shape :(64, 7) in which the true label of 7 neighbours of 64 samples are kept. that I need to compute the Jaccard index using sklearn.metrics.jaccard_score. The Jaccard index needs be calculated and saved into an array:
jacc_index = np.empty((np.size(neighbors,axis=0), self.n_classifiers)) # shape : (64, 7)
I am using two nested for loop to calculate the Jaccard index:
for i in np.arange(np.size(neighbors,axis=0)):
for index_clf, clf in enumerate(self.classifiers_):
tanimoto_index[i,index_clf] = jaccard_score(y_true[i], y_pred[i,:,index_clf])
Question
In advanced array programming of numpy, it is possible to remove the for loops. I am not an expert in python programming, and wondering if it is possible to remove these nested loops by replacing with a specific command in NumPy?
Your expert suggestion is really appreciated.
Related
I find myself doing the following quite frequently and am wondering if there's a "canonical" way of doing it.
I have an ndarray say shape = (100, 4, 6) and I want to reduce to (100, 24) by concatenating the 4 vectors of length 6 into one vector
I can use reshape to do this but I've been manually computing the new shape
i.e.
np.reshape(x,shape=(a.shape[0],a.shape[1]*a.shape[2]))
ideally I'd simply supply the dimension I want to reduce on
np.concatenate(x,dim=-1)
but np.concatenate operates on an enumerable of ndarray. I've wondered if it's possible to supply an iterator over an ndarray axis but haven't looked further. What is the usual pattern here?
You can avoid calculating one dimension by using -1 like:
x.reshape(a.shape[0], -1)
I'm confused by the dimension of a tensor created with tf.zeros(n). For instance, if I write: tf.zeros(6).eval.shape, this will return me (6, ). What dimension is this? is this a matrix of 6 rows and arbitrary # of columns? Or is this a matrix of 6 columns with arbitrary # of rows?
weights = tf.random_uniform([3, 6], minval=-1, maxval=1, seed=1)- this is 3X6 matrix
b=tf.zeros(6).eval- I'm not sure what dimension this is.
Why I am able to add the two like weights+b? If I understand correctly, in order for the two to be added, b needs to be 3X1 dimension.
why i am able to add the two like weights+b?
Operator + is the same as using tf.add() (<obj>.__add__() calls the tf.add() or tf.math.add()) and if you read the documentation it says:
NOTE: math.add supports broadcasting. AddN does not. More about broadcasting here
Now I'm quoting from numpy broadcasting rules (which are the same for tensorflow):
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
So you're able to add two tensors with different shapes because they have the same trailing dimensions. If you change the dimension of your weights tensor to, let's say [3, 5], you will get InvalidArgumentError exception because trailing dimensions differ.
(6,) is python syntax for a tuple with 6 as a single element. Hence the shape here is a uni-dimensional vector of length 6.
I have an array a with shape [3,x,y,z,n] (three 4d-images). And a second array b with shape [x,y,z] which contains the indices I want to choose from the first dimension of a (so the values of b are in the range 0 to 2).
The results I want to have would be of shape [x,y,z,n]. How can I do that in numpy?
Using advanced-indexing -
a[b,np.arange(x)[:,None,None],np.arange(y)[:,None],np.arange(z)]
A shorter way to express that would be -
a[tuple([b] + np.ogrid[:x,:y,:z])]
Using NumPy builtin np.take_along_axis to perform advanced-indexing by doing the dirty work under the hoods -
np.take_along_axis(a,b[None,...,None],axis=0)[0]
I have Variables lengths_X of size (10L,) and A of size (10L, 16L, 5L).
I want to use lengths_X to index along the second axis of A. In other words, I want to get a new tensor predicted_Y of size (10L, 5L) that indexes axis 1 at i for all entries with index i in axis 0.
What is the best way to do this in PyTorch?
What you are looking for is actually called batched_index_select and I looked for such functionality before but couldn't find any native function in PyTorch that can do the job. But we can simply use:
A = torch.randn(10, 16, 5)
index = torch.from_numpy(numpy.random.randint(0, 16, size=10))
B = torch.stack([a[i] for a, i in zip(A, index)])
You can see the discussion here. You can also check out the function batched_index_select provided in the AllenNLP library. I would be happy to know if there is a better solution.
Silly Question, I am going through the third week of Andrew Ng's newest Deep learning course, and getting stuck at a fairly simple Numpy function ( i think? ).
The exercise is to find How many training examples, m , we have.
Any idea what the Numpy function is to find out about the size of a preloaded training example.
Thanks!
shape_X = X.shape
shape_Y = Y.shape
m = ?
print ('The shape of X is: ' + str(shape_X))
print ('The shape of Y is: ' + str(shape_Y))
print ('I have m = %d training examples!' % (m))
It depends on what kind of storage-approach you use.
Most python-based tools use the [n_samples, n_features] approach where the first dimension is the sample-dimension, the second dimension is the feature-dimension (like in scikit-learn and co.). Alternatively expressed: samples are rows and features are columns.
So:
# feature 1 2 3 4
x = np.array([[1,2,3,4], # first sample
[2,3,4,5], # second sample
[3,4,5,6]
])
is a training-set of 3 samples with 4 features each.
The sizes M,N (again: interpretation might be different for others) you can get with:
M, N = x.shape
because numpy's first dimension are rows, numpy's second dimension are columns like in matrix-algebra.
For the above example, the target-array is of shape (M) = n_samples.
Anytime you want to find the number of training examples or the size of an array, you can use
m = X.size
This will give you the size or the total number of the examples. In this case, it would be 400.
The above method is also correct but not the optimal method to find the size since, in large datasets, the values could be large and while python easily handles large values, it is not advisable to utilize extra unneeded space.
Or a better way of doing the above scenario is
m=X.shape[1]