I am having a problem in my code with non contiguous arrays.
In particular I get the following warning message:
C:\Program Files\Anaconda2\lib\site-packages\skimage\util\shape.py:247: RuntimeWarning: Cannot provide views on a non-contiguous input array without copying.
warn(RuntimeWarning("Cannot provide views on a non-contiguous input "
I am using np.column_stack
import numpy as np
x = np.array([1,2,3,4])
y = np.array([5,6,7,8])
stack = np.column_stack((x,y))
stack.flags.f_contiguous
Out[2]: False
but I get a non contiguous array
Do you know how can I get contigous array? should I use always ascontiguousarray after column_stack?
np.stack([x, y]) is not contiguous. However, np.stack([x, y]).T is.
np.stack([x, y]) # Transpose of what you want and not contiguous
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Instead:
stack = np.stack([x, y]).T
In [276]: xy=np.column_stack((x,y))
In [277]: np.info(xy)
class: ndarray
shape: (4, 2)
strides: (8, 4)
itemsize: 4
aligned: True
contiguous: True
fortran: False
data pointer: 0xa836ec0
byteorder: little
byteswap: False
type: int32
The skimage code, https://github.com/scikit-image/scikit-image/blob/master/skimage/util/shape.py
# -- build rolling window view
if not arr_in.flags.contiguous:
warn(RuntimeWarning("Cannot provide views on a non-contiguous input "
"array without copying."))
arr_in = np.ascontiguousarray(arr_in)
That test, on the column_stack is ok:
In [278]: xy.flags.contiguous
Out[278]: True
In [279]: xy.T.flags.contiguous
Out[279]: False
Normally constructed 2d arrays are contiguous. But their transpose is F-contiguous. The warning is that np.ascontiguousarray will produce a copy. For very large arrays that could be a problem.
If this warning comes up often you could either suppress it, or routinely use ascontiguousarray before calling this function.
RuntimeWarning: Cannot provide views on a non-contiguous input array without copying
Related
Here's an MWE that illustrates the issue I have:
import numpy as np
arr = np.full((3, 3), -1, dtype="i,i")
doesnt_work = arr == (-1, -1)
n_arr = np.full((3, 3), -1, dtype=int)
works = n_arr == 10
arr is supposed to be an array of tuples, but it doesn't behave as expected.
works is an array of booleans, as expected, but doesnt_work is False. Is there a way to get numpy to do elementwise comparisons on more complex types, or do I have to resort to list comprehension, flatten and reshape?
There's a second problem:
f = arr[(0, 0)] == (-1, -1)
f is False, because arr[(0,0)] is of type numpy.void rather than a tuple. So even if the componentwise comparison worked, it would give the wrong result. Is there a clever numpy way to do this or should I just resort to list comprehension?
Both problems are actually the same problem! And are both related to the custom data type you created when you specified dtype="i,i".
If you run arr.dtype you will get dtype([('f0', '<i4'), ('f1', '<i4')]). That is a 2 signed integers that are placed in one continuous block of memory. This is not a python tuple. Thus it is clear why the naive comparison fails, since (-1,-1) is a python tuple and is not represented in memory the same way that the numpy data type is.
However if you compare with a_comp = np.array((-1,-1), dtype="i,i") you get the exact behavior you are expecting!
You can read more about how the custom dtype stuff works on the numpy docs:
https://numpy.org/doc/stable/reference/arrays.dtypes.html
Oh and to address what np.void is: it comes from the idea that it is a void c pointer which essentially means that it is an address to a continuous block of memory of unspecified type. But, provided you (the programer) knows what is going to be stored in that memory (in this case two back to back integers) it's fine provided you are careful (compare with the same custom data type).
I am having the following array of array
a = np.array([[1,2,3],[4,5,6]])
b = np.array([[1,5,10])
and want to add up the value in b into a, like
np.array([[2,7,13],[5,10,16]])
what is the best approach with performance concern to achieve the goal?
Thanks
Broadcasting does that for you, so:
>>> a+b
just works:
array([[ 2, 7, 13],
[ 5, 10, 16]])
And it can also be done with
>>> a + np.tile(b,(2,1))
which gives the result
array([[ 2, 7, 13],
[ 5, 10, 16]])
Depending on size of inputs and time constraints, both methods might be of consideration
Method 1: Numpy Broadcasting
Operation on two arrays are possible if they are compatible
Operation generally done along with broadcasting
broadcasting in lay man terms could be called repeating elements along a specified axis
Conditions for broadcasting
Arrays need to be compatible
Compatibility is decided based on their shapes
shapes are compared from right to left.
from right to left while comparing, either they should be equal or one of them should be 1
smaller array is broadcasted(repeated) over bigger array
a.shape, b.shape
((2, 3), (1, 3))
From the rules they are compatible, so they can be added, b is smaller, so b is repeated long 1 dimension, so b can be treated as [[ 5, 10, 16], [ 5, 10, 16]]. But note numpy does not allocate new memory, it is just view.
a + b
array([[ 2, 7, 13],
[ 5, 10, 16]])
Method 2: Numba
Numba gives parallelism
It will convert to optimized machine code
Why this is because, sometimes numpy broadcasting is not good enough, ufuncs(np.add, np.matmul, etc) allocate temp memory during operations and it might be time consuming if already on memory limits
Easy parallelization
Using numba based on your requirement, you might not need temp memory allocation or various checks which numpy does, which can speed up code for huge inputs, for example. Why are np.hypot and np.subtract.outer very fast?
import numba as nb
#nb.njit(parallel=True)
def sum(a, b):
s = np.empty(a.shape, dtype=a.dtype)
# nb.prange gives numba hint to what to parallelize
for i in nb.prange(a.shape[0]):
s[i] = a[i] + b
return s
sum(a, b)
I have a regression model that I fit in SKlearn's LinearRegression module:
To extract the coefficients, I used the code;
coefficients = model.coef_
It produced the following array with a shape of (1, 10):
[-4.72307152e-05 1.29731143e-04 8.75483702e-05 -6.28749019e-04
1.75096740e-04 -3.30209379e-06 1.35937650e-03 3.89048429e-11
8.48406857e-03 -1.36499030e-05]
Now, I would like to save the array to a pd.Series. I am taking the following approach:
features = ["f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8", "f9", "f10"]
model_coefs = pd.Series(coefficients, index=features)
And, the system gives me the following error:
ValueError: Length of passed values is 1, index implies 10.
What I have tried:
Transposing the underlying array, coefficients, to give it a length of 10.
Reshaping the array to give it a shape of (10,1).
But nothing seems to work. I am not sure where I am going wrong.
For your case you want to flatten the array so .ravel should do the trick for example:
pd.Series(np.zeros((1, 10)).ravel(), index=features)
It's strange the coeffs output are of shape (1, 10), when I run the base sklearn example here (with multiple features) my coeffs are of 1-d:
In [27]: regr.coef_
Out[27]:
array([ 3.03499549e-01, -2.37639315e+02, 5.10530605e+02, 3.27736980e+02,
-8.14131709e+02, 4.92814588e+02, 1.02848452e+02, 1.84606489e+02,
7.43519617e+02, 7.60951722e+01])
In [28]: regr.coef_.shape
Out[28]: (10,)
In Numpy, it appears that the matrix can simply be a nested list of anything not limited to numbers. For example
import numpy as np
a = [[1,2,5],[3,'r']]
b = np.matrix(a)
generates no complaints.
What is the purpose of this tolerance when list can treat the object that is not a matrix in the strict mathematical sense?
What you've created is an object dtype array:
In [302]: b=np.array([[1,2,5],[3,'r']])
In [303]: b
Out[303]: array([[1, 2, 5], [3, 'r']], dtype=object)
In [304]: b.shape
Out[304]: (2,)
In [305]: b[0]
Out[305]: [1, 2, 5]
In [306]: b[1]=None
In [307]: b
Out[307]: array([[1, 2, 5], None], dtype=object)
The elements of this array are pointers - pointers to objects else where in memory. It has a data buffer just like other arrays. In this case 2 pointers, 2
In [308]: b.__array_interface__
Out[308]:
{'data': (169809984, False),
'descr': [('', '|O')],
'shape': (2,),
'strides': None,
'typestr': '|O',
'version': 3}
In [309]: b.nbytes
Out[309]: 8
In [310]: b.itemsize
Out[310]: 4
It is very much like a list - which also stores object pointers in a buffer. But it differs in that it doesn't have an append method, but does have all the array ones like .reshape.
And for many operations, numpy treats such an array like a list - iterating over the pointers, etc. Many of the math operations that work with numeric values fail with object dtypes.
Why allow this? Partly it's just a generalization, expanding the concept of element values/dtypes beyond the simple numeric and string ones. numpy also allows compound dtypes (structured arrays). MATLAB expanded their matrix class to include cells, which are similar.
I see a lot of questions on SO about object arrays. Sometimes they are produced in error, Creating numpy array from list gives wrong shape.
Sometimes they are created intentionally. pandas readily changes a data series to object dtype to accommodate a mix of values (string, nan, int).
np.array() tries to create as high a dimension array as it can, resorting to object dtype only when it can't, for example when the sublists differ in length. In fact you have to resort to special construction methods to create an object array when the sublists are all the same.
This is still an object array, but the dimension is higher:
In [316]: np.array([[1,2,5],[3,'r',None]])
Out[316]:
array([[1, 2, 5],
[3, 'r', None]], dtype=object)
In [317]: _.shape
Out[317]: (2, 3)
Is it possible to trim zero 'records' of a structured numpy array without copying it; i.e. free allocated memory for the 'unused' zero entries at the beginning or the end; actually, I am only interested in trimming zeros at the end.
There is a builtin function numpy.trim_zeros() for 1d arrays. Its return value:
Returns:
trimmed : 1-D array or sequence
The result of trimming the input. The input data type is preserved.
However, I can't say from this whether this does not create a copy and only frees memory. I am not proficient enough to tell from its source code its behaviour.
More specifically, I have following code:
import numpy
edges = numpy.zeros(3, dtype=[('i', 'i4'), ('j', 'i4'), ('length', 'f4')])
# fill the first two records with sensible data:
edges[0]['i'] = 0
edges[0]['j'] = 1
edges[0]['length'] = 2.0
edges[1]['i'] = 1
edges[1]['j'] = 2
edges[1]['length'] = 2.0
# list memory adress and size
edges.__array_interface__
edges = numpy.trim_zeros(edges) # does not work for structured array
edges.__array_interface__
UPDATE
My question is somewhat 'twofold':
1) Does the builtin function simply frees memory or does it copy the array?
Answer: it copies creates a slice (=view); [ipython console] import numpy; numpy?? (see also Resize NumPy array to smaller size without copy and View onto a numpy array?)
2) What be a solution to have similar functionality for structured arrays?
Answer:
begin=(edges!=numpy.zeros(1,edges.dtype)).argmax()
end=len(edges)-(edges!=numpy.zeros(1,edges.dtype))[::-1].argmax()
# 1) create slice without copy but no memory is free
goodedges=edges[begin:end]
# 2) or copy and free memory (temporary both arrays exist)
goodedges=edges[begin:end].copy()
del edges
IMHO, there is two problem.
First, the trim_zeros function doesn't recognize zeroes on composite dtype.
You can locate them by begin=(edges!=zeros(1,edges.dtype)).argmax()
and end=len(edges)-(edges!=zeros(1,edges.dtype))[::-1].argmax(). Then goodedges=edges[begin:end] is the interresting data.
Second, the trim_zeros function doesn't free memory:
Returns -------
trimmed : 1-D array or sequence.
The result of trimming the input. The input data type is preserved.
So I think you must do it manually : goodedges=edges[begin:end].copy();del edges.
To expand on my comment, let's try trim_zeros on a simple integer array:
In [252]: arr = np.zeros(10,int)
In [253]: arr[3:8]=np.ones(5)
In [254]: arr
Out[254]: array([0, 0, 0, 1, 1, 1, 1, 1, 0, 0])
In [255]: arr1=np.trim_zeros(arr)
In [256]: arr1
Out[256]: array([1, 1, 1, 1, 1])
Now compare the __array_interface__ dictionaries:
In [257]: arr.__array_interface__
Out[257]:
{'descr': [('', '<i4')],
'shape': (10,),
'version': 3,
'strides': None,
'data': (150760432, False),
'typestr': '<i4'}
In [258]: arr1.__array_interface__
Out[258]:
{'descr': [('', '<i4')],
'shape': (5,),
'version': 3,
'strides': None,
'data': (150760444, False),
'typestr': '<i4'}
shape reflects the change we want. But look at the data pointer, ...432, and ...444. arr1 just points to 12 bytes (3 ints) further along the same buffer.
If I delete arr or reassign it (even arr=arr1), arr1 continues to point to this data buffer. numpy keeps some sort of reference count, and recycles a data buffer only when all references are gone.
The code for trim_zeros is (fetched in ipython with '??')
File: /usr/lib/python3/dist-packages/numpy/lib/function_base.py
def trim_zeros(filt, trim='fb'):
first = 0
trim = trim.upper()
if 'F' in trim:
for i in filt:
if i != 0.: break
else: first = first + 1
last = len(filt)
if 'B' in trim:
for i in filt[::-1]:
if i != 0.: break
else: last = last - 1
return filt[first:last]
The work is in the last line, and clearly returns a slice, a view. Most of the code handles the 2 trim options (F and B). Notice that it uses iteration to find the first and last non-zeros. That should be fine for arrays with just a few extra 0s at beginning or end. But it isn't the 'vectorized' kind of operation that SO questions often seek.
Before this question I didn't even know that trim_zeros existed, but I'm not at all surprised by its code and action.
On a side issue, here's a more compact way of creating your edges array.
In [259]: edges =np.zeros(3, dtype=[('i', 'i4'), ('j', 'i4'), ('length', 'f4')])
In [260]: edges[:2]=[(0,1,2.0),(1,2,2.0)]
To remove all the zero elements you could just use:
edges[edges!=numpy.zeros(1,edges.dtype)]
This is a copy. It does remove 'embedded' zeros as well, but that might not be an issue if the only zeros are those left at the end after filling in the earlier slots.
You may not need this trimming at all if you collect the edges data in a list, and build the array at the end:
edges1 = np.array([(0,1,2.0),(1,2,2.0)], dtype=edges.dtype)