numpy array.shape behaviour - numpy

For following:
d = np.array([[0,1,4,3,2],[10,18,4,7,5]])
print(d.shape)
Output is:
(2, 5)
It is expected.
But, for this(difference in number of elements in individual rows):
d = np.array([[0,1,4,3,2],[10,18,4,7]])
print(d.shape)
Output is:
(2,)
How to explain this behaviour?

Short answer: It parses it as an array of two objects: two lists.
Numpy is used to process "rectangular" data. In case you pass it non-rectangular data, the np.array(..) function will fallback on considering it a list of objects.
Indeed, take a look at the dtype of the array here:
>>> d
array([list([0, 1, 4, 3, 2]), list([10, 18, 4, 7])], dtype=object)
It is an one-dimensional array that contains two items two lists. These lists are simply objects.

Related

Numpy multi-dimensional array slicing

I have a 3-D NumPy array with shape (100, 50, 20). I was trying to slice the third dimension of the array by using the index, e.g., from 1 to 6 and from 8 to 10.
I tried the following code, but it kept reporting a syntax error.
newarr [:,:,1:10] = oldarr[:,:,[1:7,8:11]]
You can use np.r_ to concatenate slice objects:
newarr [:,:,1:10] = oldarr[:,:,np.r_[1:7,8:11]]
Example:
np.r_[1:4,6:8]
array([1, 2, 3, 6, 7])

How to find matrix common members of matrices in Numpy

I have a 2D matrix A and a vector B. I want to find all row indices of elements in A that are also contained in B.
A = np.array([[1,9,5], [8,4,9], [4,9,3], [6,7,5]], dtype=int)
B = np.array([2, 4, 8, 10, 12, 18], dtype=int)
My current solution is only to compare A to one element of B at a time but that is horribly slow:
res = np.array([], dtype=int)
for i in range(B.shape[0]):
cres, _ = (B[i] == A).nonzero()
degElem = np.append(res, cres)
res = np.unique(res)
The following Matlab statement would solve my issue:
find(any(reshape(any(reshape(A, prod(size(A)), 1) == B, 2),size(A, 1),size(A, 2)), 2))
However comparing a row and a colum vector in Numpy does not create a Boolean intersection matrix as it does in Matlab.
Is there a proper way to do this in Numpy?
We can use np.isin masking.
To get all the row numbers, it would be -
np.where(np.isin(A,B).T)[1]
If you need them split based on each element's occurence -
[np.flatnonzero(i) for i in np.isin(A,B).T if i.any()]
Posted MATLAB code seems to be doing broadcasting. So, an equivalent one would be -
np.where(B[:,None,None]==A)[1]

Slicing a numpy array and passing the slice to a function

I want to have a function that can operate on either a row or a column of a 2D ndarray. Assume the array has C order. The function changes values in the 2D data.
Inside the function I want to have identical index syntax whether it is called with a row or column. A row slice is [n,:] and column slice [:,n] so they have different shapes. Inside the function this requires different indexing expressions.
Is there a way to do this that does not require moving or allocating memory? I am under the impression that using reshape will force a copy to make the data to make it contiguous. Is there a way to use nditer in the function?
Do you mean like this:
In [74]: def foo(arr, n):
...: arr += n
...:
In [75]: arr = np.ones((2,3),int)
In [76]: foo(arr[0,:],1)
In [77]: arr
Out[77]:
array([[2, 2, 2],
[1, 1, 1]])
In [78]: foo(arr[:,1],[100,200])
In [79]: arr
Out[79]:
array([[ 2, 102, 2],
[ 1, 201, 1]])
In the first case I'm adding 1 to one row of the array, ie. a row slice. In the second case I'm add a array (list) to a column. In that case n has to have the right length.
Usually we don't worry about whether the values are C contiguous. Striding takes care of access either way.

Numpy index array of unknown dimensions?

I need to compare a bunch of numpy arrays with different dimensions, say:
a = np.array([1,2,3])
b = np.array([1,2,3],[4,5,6])
assert(a == b[0])
How can I do this if I do not know either the shape of a and b, besides that
len(shape(a)) == len(shape(b)) - 1
and neither do I know which dimension to skip from b. I'd like to use np.index_exp, but that does not seem to help me ...
def compare_arrays(a,b,skip_row):
u = np.index_exp[ ... ]
assert(a[:] == b[u])
Edit
Or to put it otherwise, I wan't to construct slicing if I know the shape of the array and the dimension I want to miss. How do I dynamically create the np.index_exp, if I know the number of dimensions and positions, where to put ":" and where to put "0".
I was just looking at the code for apply_along_axis and apply_over_axis, studying how they construct indexing objects.
Lets make a 4d array:
In [355]: b=np.ones((2,3,4,3),int)
Make a list of slices (using list * replicate)
In [356]: ind=[slice(None)]*b.ndim
In [357]: b[ind].shape # same as b[:,:,:,:]
Out[357]: (2, 3, 4, 3)
In [358]: ind[2]=2 # replace one slice with index
In [359]: b[ind].shape # a slice, indexing on the third dim
Out[359]: (2, 3, 3)
Or with your example
In [361]: b = np.array([1,2,3],[4,5,6]) # missing []
...
TypeError: data type not understood
In [362]: b = np.array([[1,2,3],[4,5,6]])
In [366]: ind=[slice(None)]*b.ndim
In [367]: ind[0]=0
In [368]: a==b[ind]
Out[368]: array([ True, True, True], dtype=bool)
This indexing is basically the same as np.take, but the same idea can be extended to other cases.
I don't quite follow your questions about the use of :. Note that when building an indexing list I use slice(None). The interpreter translates all indexing : into slice objects: [start:stop:step] => slice(start, stop, step).
Usually you don't need to use a[:]==b[0]; a==b[0] is sufficient. With lists alist[:] makes a copy, with arrays it does nothing (unless used on the RHS, a[:]=...).

Numpy Integer to String Based on Lookup Table

I have a numpy array called "landuse" that's a series of numbers 1-3 representing different landuse categories. I want to convert this to a string based on a lookup table.
ids = [0,1,2,3]
lookup_table = ['None', 'Forest', 'Water', 'Urban']
First let me explain why your loop isn't working, in python an assignment, ie a = 1 takes the object 1 and gives it the name a. When you do name = "Water", name forgets what it was pointing to before and now points to "Water", but that doesn't mean the previous object that was assigned to name gets replaced with "Water".
That's the problem, and now for a fix. If you have your landuse as an array of integer codes you can just use a lookup table. The table should be big enough so you don't get an indexing error when you do lookup_table[landuse.max()]
import numpy as np
landuse = np.array([1,2,3,1,2,4])
lookup_table = np.array(['None', 'Forest', 'Water', 'Urban', 'Other'])
landuse_title = lookup_table[landuse]
And for the final part of your question, the numpy ndarray is a homogenous data structure, meaning everything in the array must have the same data type. With that limitation in mind, it should be clear that you cannot take a row of the integers and replace it with a row of strings. Numpy does have "flexible dtypes" which allow you to do something like:
>>> dt = np.dtype([('name', 'S4'), ('age', 'int'), ('height', 'float')])
>>> array = np.array([('Mark', 25, 70.5),('Ben',40,72.75)], dtype=dt)
>>> array
array([('Mark', 25, 70.5), ('Ben', 40, 72.75)],
dtype=[('name', '|S4'), ('age', '<i4'), ('height', '<f8')])
>>> array.shape
(2,)
>>> array['name']
array(['Mark', 'Ben'],
dtype='|S4')
We've created an array that hold a name, age and height for each person, but notice that the shape of the array is (2,) because we have two "people" in the array. I'm not sure exactly what your needs are, but you could try and use the flexible dtype to hold all the information in one array if that's what you need. Depending on what my end goal, I often find it's easier to just use a few separate arrays, or a list of arrays. Hope that helps.
I am not entirely clear what your question is, but it seems you could use a dictionary for this:
import numpy as np
landuse=np.array([1,2,3,1,2,4],dtype=np.integer)
a={1:'Forest',2:'Water'}
print [a.setdefault(i,'Urban') for i in landuse]
which will emit a list containing the strings you are interested in:
['Forest', 'Water', 'Urban', 'Forest', 'Water', 'Urban']
If you objective is to have the final result in a numpy array of strings, you can do this:
name=np.array([a.setdefault(i,'Urban') for i in landuse],dtype='|S10')