I find myself doing the following quite frequently and am wondering if there's a "canonical" way of doing it.
I have an ndarray say shape = (100, 4, 6) and I want to reduce to (100, 24) by concatenating the 4 vectors of length 6 into one vector
I can use reshape to do this but I've been manually computing the new shape
i.e.
np.reshape(x,shape=(a.shape[0],a.shape[1]*a.shape[2]))
ideally I'd simply supply the dimension I want to reduce on
np.concatenate(x,dim=-1)
but np.concatenate operates on an enumerable of ndarray. I've wondered if it's possible to supply an iterator over an ndarray axis but haven't looked further. What is the usual pattern here?
You can avoid calculating one dimension by using -1 like:
x.reshape(a.shape[0], -1)
Related
I have a Numpy Tensor,
X = np.arange(64).reshape((4,4,4))
I wish to grab the 2,3,4 entries of the first dimension of this tensor, which you can do with,
Y = X[[1,2,3],:,:]
Is this a simpler way of writing this instead of explicitly writing out the indices [1,2,3]? I tried something like [1,:], which gave me an error.
Context: for my real application, the shape of the tensor is something like (30000,100,100). I would like to grab the last (10000, 100,100) to (30000,100,100) of this tensor.
The simplest way in your case is to use X[1:4]. This is the same as X[[1,2,3]], but notice that with X[1:4] you only need one pair of brackets because 1:4 already represent a range of values.
For an N dimensional array in NumPy if you specify indexes for less than N dimensions you get all elements of the remaining dimensions. That is, for N equal to 3, X[1:4] is the same as X[1:4, :, :] or X[1:4, :]. Only if you want to index some dimension while getting all elements in a dimension that comes before it is that you actually need to pass :. Such as X[:, 2:4], for instance.
If you wish to select from some row to the end of array, simply use python slicing notation as below:
X[10000:,:,:]
This will select all rows from 10000 to the end of array and all columns and depths for them.
I'm confused by the dimension of a tensor created with tf.zeros(n). For instance, if I write: tf.zeros(6).eval.shape, this will return me (6, ). What dimension is this? is this a matrix of 6 rows and arbitrary # of columns? Or is this a matrix of 6 columns with arbitrary # of rows?
weights = tf.random_uniform([3, 6], minval=-1, maxval=1, seed=1)- this is 3X6 matrix
b=tf.zeros(6).eval- I'm not sure what dimension this is.
Why I am able to add the two like weights+b? If I understand correctly, in order for the two to be added, b needs to be 3X1 dimension.
why i am able to add the two like weights+b?
Operator + is the same as using tf.add() (<obj>.__add__() calls the tf.add() or tf.math.add()) and if you read the documentation it says:
NOTE: math.add supports broadcasting. AddN does not. More about broadcasting here
Now I'm quoting from numpy broadcasting rules (which are the same for tensorflow):
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
So you're able to add two tensors with different shapes because they have the same trailing dimensions. If you change the dimension of your weights tensor to, let's say [3, 5], you will get InvalidArgumentError exception because trailing dimensions differ.
(6,) is python syntax for a tuple with 6 as a single element. Hence the shape here is a uni-dimensional vector of length 6.
I am trying to implement an algorithm that iteratively removes some rows and columns of a matrix and continues processing the remaining submatrix. However, I would like to know the index of a value in the original matrix rather than the remaining submatrix.
For example, assume that a matrix x is built using
x = np.arange(9).reshape(3, 3)
Now, I would like to find the index of the element that is equal to 8 in the submatrix defined below:
np.where(x[1:, 1:] == 8)
By default, numpy returns (array[1], array[1]) because it is finding the element in the sliced submatrix. What I like to be returned instead is (array[2], array[2]), which is the index of 8 in the original matrix.
What is an efficient solution to this problem?
P.S.
The submatrix may be built arbitrarily. For example, I may need to keep rows, 0 and 1, but columns 0 and 2.
Each submatrix may be sliced in next iterations to make a smaller submatrix. I still would like to have access to the index in the original matrix. In other words, I am looking for a solution that works on submatrices of submatrices as well.
I recently learned about indexing with arrays where submatrices of a matrix can be selected using another numpy array. I think what I can do to solve the problem is to map indices of the submatrix to elements of the indexing array.
For example, in the example above, the submatrix can be defined like this:
row_idx = np.array([1, 2])
col_idx = np.array([1, 2])
np.where(x[row_idx[:, None], col_idx] == 8)
This will still return the same (array[1], array[1]) output, but I can use these indices to lookup the elements of row_idx and col_idx in order to find the corresponding indices in the original matrix, i.e. row_idx[1] and col_idx[1].
Let T be a tensor of shape [n,f], which represents a batch. Now I want to slice T into m tensors along axis=0. The value of m depends on the current batch. I have another tensor I of shape [m,2] which stores pairs of indices which indicate where the slices should occur.
I am not really sure how to "iterate" over the indices to apply tf.slice. Any ideas?
Can this somehow be achieved using tf.scan?
I suppose you are looking for the split function.
If i have indices of shape (D_0,...,D_k) and params of shape (D_0,...,D_k,I,F) (with 0 ≤ indices[i_0,...,i_k] < I), what is the fastest/most elegant way to get the array output of shape (D_0,...,D_k,F) with
output[i_0,...,i_k,f]=params[i_0,...,i_k,indices[i_0,...,i_k],f]
If k=0, then we can use gather. So, in the past, I had a solution based on flattening. Is there a nicer solution now that tensorflow has matured?
Most of the times, when I want this type of gathering, indices is obtained by indices = tf.argmax(params[:,...,:,:,0]). For every (i_0,...,i_k), I have I vectors of size (F,) and I want to keep only those with the maximal value for one of the features. A solution which would only work for this special case (a kind of reduce_max only using one feature to decide how to reduce) would satisfy me.