Multidimensional indexing in numpy array - numpy

Let's suppose I have an image such that
image.shape=(280,280,3)
If I do img[[[1,2],[1,2],[1,2]]].shape, I obtain (2,). But I expect to obtain (2,2,2)...
How can slicing be performed simultaneously on several dimensions in numpy?

You are using the Array of Indices syntax, when you probably want slices.
Try something like this:
img[1:3, 1:3, 1:3]

Related

How to vectorize this operation in numpy?

I have a 2d array s and I want to calculate differences elementwise, i.e.:
Since it cannot be written as a single matrix multiplication, I was wondering what is the proper way to vectorize it?
You can use broadcasting for that: d = s[:, None, :] - s[None, :, :]. Note the None enable you to create a new dimension. Numpy implicitly perform the broadcasting operation between the two arrays.

Why are numpy array called homogeneous?

Why are numpy arrays called homogeneous when you can have elements of different type in the same numpy array like this?
np.array([1,2,3,4,"a"])
I understand that I cannot perform some types of broadcasting operations like I cannot perform
np1*4 here and it results in an error.
but my question really is when it can have elements of different types, why it is called homogeneous?
Numpy automatically converts them to most applicable datatype.
e.g.,
>>> np.array([1,2,3,4,"a"]).dtype.type
numpy.str_
In short this means all elements are of string.
>>> np.array([1,2,3,4]).dtype.type
numpy.int64

check if a numpy array is a numpy masked array

As output of a script, I have numpy masked array and standard numpy array. How do I easily check while running the script if an array is a masked (has data, mask attributes) one or not?
You can check explicitly if it is a masked array by isinstance(arr, np.ma.MaskedArray), or you can check for the attributes hasattr(arr, 'mask'). I'd probably recommend the first approach in general.

Save a numpy sparse matrix into file

I want to save the result of TfidfVectorizer in sklearn.feature_extraction.text into a text file for future use. As I found, it is a sparse matrix of type ''. However when I try to save it using the following code
np.savetxt('Feature_TfIdf.txt', X_Tfidf, fmt='%2.6f')
I get an error like this
IndexError: tuple index out of range
Use joblib.dump or sklearn.externals.joblib.dump for this. NumPy doesn't get SciPy sparse matrices.
Simple example:
np.save('TfIdf.pkl',tfidf)
I manage to solve the problem by converting the sparse matrix to full matrix and then save matrix and save the results. This approach however is not useful for large arrays so it is better to save the matrix in .pkl format.

Should a pandas dataframe column be converted in some way before passing it to a scikit learn regressor?

I have a pandas dataframe and passing df[list_of_columns] as X and df[[single_column]] as Y to a Random Forest regressor.
What does the following warnning mean and what should be done to resolve it?
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). probas = cfr.fit(trainset_X, trainset_Y).predict(testset_X)
Simply check the shape of your Y variable, it should be a one-dimensional object, and you are probably passing something with more (possibly trivial) dimensions. Reshape it to the form of list/1d array.
You can use df.single_column.values or df['single_column'].values to get the underlying numpy array of your series (which, in this case, should also have the correct 1D-shape as mentioned by lejlot).
Actually the warning tells you exactly what is the problem:
You pass a 2d array which happened to be in the form (X, 1), but the method expects a 1d array and has to be in the form (X, ).
Moreover the warning tells you what to do to transform to the form you need: y.values.ravel().
Use Y = df[[single_column]].values.ravel() solves DataConversionWarning for me.