numpy.array(value) evaluates to true, if value is int, float or complex. The result seems to be a shapeless array (numpy.array(value).shape returns ()).
Reshaping the above like so numpy.array(value).reshape(1) works fine and numpy.array(value).reshape(1).squeeze() reverses this and again results in a shapeless array.
What is the rationale behind this behavior? Which use-cases exist for this behaviour?
When you create a zero-dimensional array like np.array(3), you get an object that behaves as an array in 99.99% of situations. You can inspect the basic properties:
>>> x = np.array(3)
>>> x
array(3)
>>> x.ndim
0
>>> x.shape
()
>>> x[None]
array([3])
>>> type(x)
numpy.ndarray
>>> x.dtype
dtype('int32')
So far so good. The logic behind this is simple: you can process any array-like object the same way, regardless of whether is it a number, list or array, just by wrapping it in a call to np.array.
One thing to keep in mind is that when you index an array, the index tuple must have ndim or fewer elements. So you can't do:
>>> x[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array
Instead, you have to use a zero-sized tuple (since x[] is invalid syntax):
>>> x[()]
3
You can also use the array as a scalar instead:
>>> y = x + 3
>>> y
6
>>> type(y)
numpy.int32
Adding two scalars produces a scalar instance of the dtype, not another array. That being said, you can use y from this example in exactly the same way you would x, 99.99% of the time, since dtypes inherit from ndarray. It does not matter that 3 is a Python int, since np.add will wrap it in an array regardless. y = x + x will yield identical results.
One difference between x and y in these examples is that x is not officially considered to be a scalar:
>>> np.isscalar(x)
False
>>> np.isscalar(y)
True
The indexing issue can potentially throw a monkey wrench in your plans to index any array like-object. You can easily get around it by supplying ndmin=1 as an argument to the constructor, or using a reshape:
>>> x1 = np.array(3, ndmin=1)
>>> x1
array([3])
>>> x2 = np.array(3).reshape(-1)
>>> x2
array([3])
I generally recommend the former method, as it requires no prior knowledge of the dimensionality of the input.
FurtherRreading:
Why are 0d arrays in Numpy not considered scalar?
Related
Maybe there is a better way to do this, but I want to take an array of values from -80 to 0, and sub them into an equation where the missing variable, T, takes each of those values from the first array and then runs the equation with that and makes a new array. See code below:
T = np.arange(-80, 2, 2)
empty = []
esw = 6.11*np.exp(53.49*(6808/T)-5.09*np.log(T))
i = 0
for i in range(T):
6.11*np.exp(53.49*(6808/T[i])-5.09*np.log(T[i]))
x = np.append(empty)
i = I+1
I know this is probably some miserable code, any help would be appreciated, thanks!
First of all, I have taken the liberty of doing a bit of math to hypothetically make your esw possible with negative numbers when the exponent is an integer, though it still won't work when T = 0 because of the division in the exponent. Note that because your exponent is 5.09, the code still doesn't work with negative T values. We now have:
esw = 6.11 * np.exp(53.49*6808/t) / t**(5.09)
where t is some value in your T array of values.
If you're trying to get esw for each value in T, you can structure your code in 2 main ways. The non-vectorised way, which is to loop through every value of T with a for loop and is a safer method, looks like this:
# If the calculation cannot be done, None will be returned.
def func(t):
try:
esw = 6.11 * np.exp(53.49*6808/t) / t**(5.09)
except:
esw = None
return esw
# new_arr is the same size as T. Each value in new_arr is the corresponding value of T
# put through the esw calculation.
new_arr = np.apply_along_axis(func, 0, T)
Note that for the values of T you chose (between -80 and 2), all esw values are either None or infinity.
If your calculations were possible (i.e. all your T values were > 0), you could vectorise your code (a good idea because it's easier to read and also faster), like so:
new_arr = 6.11 * np.exp(53.49*6808/T) / T**(5.09)
This method is less safe because as soon as an error is encountered, the program crashes instead of returning None for that value in T. With your T values, this code crashes.
There are some basic Python errors, suggesting that you haven't read much of a Python intro, and haven't learn to test your code (step by step).
You create an array, e.g.:
In [123]: T = np.arange(-3,2)
In [124]: T
Out[124]: array([-3, -2, -1, 0, 1])
and try to iterate:
In [125]: for i in range(T):print(i)
Traceback (most recent call last):
Input In [125] in <cell line: 1>
for i in range(T):print(i)
TypeError: only integer scalar arrays can be converted to a scalar index
In [126]: range(T)
Traceback (most recent call last):
Input In [126] in <cell line: 1>
range(T)
TypeError: only integer scalar arrays can be converted to a scalar index
range takes a number, not an array. It's a basic Python function that you should know, and use correctly. You can get a number by taking the length of the array or list, len(T):
In [127]: range(len(T))
Out[127]: range(0, 5)
In [128]: list(_)
Out[128]: [0, 1, 2, 3, 4]
You do a i=0 before, and some sort of assignment to i in the loop, which means you don't understand (or care) about how the loop assigns i.
In [129]: for i in range(4):
...: print(i)
...: i = i+10
...:
0
1
2
3
Adding 10 to i did nothing; the for assigns the next value from the range. Again this is basic Python iteration.
As for the empty and np.append:
In [130]: empty=[]
In [131]: np.append(empty)
Traceback (most recent call last):
Input In [131] in <cell line: 1>
np.append(empty)
File <__array_function__ internals>:179 in append
TypeError: _append_dispatcher() missing 1 required positional argument: 'values'
The correct way to use list append is:
In [132]: alist = []
...: for i in T:
...: alist.append(i*2)
...:
In [133]: alist
Out[133]: [-6, -4, -2, 0, 2]
List append works in-place, and it is reasonably fast. np.append is a poorly name function that should not exist. It is not a list append clone.
Since T is an array, we don't need to iterate.
In [134]: T*2
Out[134]: array([-6, -4, -2, 0, 2])
An alternative to the list append loop is a list comprehension. It's a bit faster than the iteration, though not as fast as the direct array calculation.
In [135]: [i*2 for i in T]
Out[135]: [-6, -4, -2, 0, 2]
Finally, that line
6.11*np.exp(53.49*(6808/T[i])-5.09*np.log(T[i]))
in the loop does nothing; not even assign a value to a variable (as you did outside the loop with esw=.... Did you really think it did something? Or was this just a careless mistake?
Here's an example of behavior I cannot understand, maybe someone can share the insight into the logic behind it:
ccn = np.ones(1)
bbb = 7
bbn = np.array(bbb)
bbn * ccn # this is OK
array([7.])
np.prod((bbn,ccn)) # but this is NOT
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2.2\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "<__array_function__ internals>", line 5, in prod
File "C:\Users\...\venv\lib\site-packages\numpy\core\fromnumeric.py", line 2999, in prod
return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out,
File "C:\Users\...\venv\lib\site-packages\numpy\core\fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
numpy.VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
Why? Why would a simple multiplication of two numbers be a problem? As far as formal algebra goes there's no dimensional problems, no datatype problems? The result is invariably also a single number, there's no chance it "suddenly" turn vector or object anything alike. prod(a,b) for a and b being scalars or 1by1 "matrices" is something MATLAB or Octave would eat no problem.
I know I can turn this error off and such, but why is it even and error?
In [346]: ccn = np.ones(1)
...: bbb = 7
...: bbn = np.array(bbb)
In [347]: ccn.shape
Out[347]: (1,)
In [348]: bbn.shape
Out[348]: ()
In [349]: np.array((bbn,ccn))
<ipython-input-349-997419ba7a2f>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
np.array((bbn,ccn))
Out[349]: array([array(7), array([1.])], dtype=object)
You have arrays with different dimensions, that can't be combined into one numeric array.
That np.prod expression is actually:
np.multiply.reduce(np.array([bbn,ccn]))
can be deduced from your traceback.
In Octave both objects have shape (1,1), 2d
>> ccn = ones(1)
ccn = 1
>> ccn = ones(1);
>> size(ccn)
ans =
1 1
>> bbn = 7;
>> size(bbn)
ans =
1 1
>> [bbn,ccn]
ans =
7 1
It doesn't have true scalars; everything is 2d (even 3d is a fudge on the last dimension).
And with 'raw' Python inputs:
In [350]: np.array([1,[1]])
<ipython-input-350-f17372e1b22d>:1: VisibleDeprecationWarning: ...
np.array([1,[1]])
Out[350]: array([1, list([1])], dtype=object)
The object dtype array preserves the type of the inputs.
edit
prod isn't a simple multiplication. It's a reduction operation, like the big Pi in math. Even in Octave it isn't:
>> prod([[2,3],[3;4]])
error: horizontal dimensions mismatch (1x2 vs 2x1)
>> [2,3]*[3;4]
ans = 18
>> [2,3].*[3;4]
ans =
6 9
8 12
The numpy equivalent:
In [97]: np.prod((np.array([2,3]),np.array([[3],[4]])))
/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py:87: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences...
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: could not broadcast input array from shape (2,1) into shape (2,)
In [98]: np.array([2,3])#np.array([[3],[4]])
Out[98]: array([18])
In [99]: np.array([2,3])*np.array([[3],[4]])
Out[99]:
array([[ 6, 9],
[ 8, 12]])
The warning, and here the error, is produced by trying to make ONE array from (np.array([2,3]),np.array([[3],[4]])).
I wish to locate the index of the closest higher value to a query over a sorted numpy array (where the query value is not in the array).
similar to bisect_right in the python standard library, without converting the numpy array to a python list, and leveraging the fact that the array is sorted (i.e. runtime should be O(log N), like numpy's searchsorted).
Pandas have this option using get_loc with the 'bfill' option, but it seems a bit of an overkill to include it as a dependency just for this... I might have to resort to holding this array as both a python list and a numpy array, but wanted to hear if there's a more reasonable solution.
Edit: It seems searchsorted does exactly what I need.
We can see the code for bisect_right on github:
def bisect_right(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e <= x, and all e in
a[i:] have e > x. So if x already appears in the list, a.insert(x) will
insert just after the rightmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
# Use __lt__ to match the logic in list.sort() and in heapq
if x < a[mid]: hi = mid
else: lo = mid+1
return lo
This is all numpy compliant:
import numpy as np
array = np.array([1,2,3,4,5,6])
print(bisect_right(array, 7))
>>> 6
print(bisect_right(array, 0))
>>> 0
To find the index of the closest higher value to a number given:
def closest_higher_value(array, value):
if bisect_right(array, value) < len(array):
return bisect_right(array, value)
print("value too large:", value, "is bigger than all elements of:")
print(array)
print(closest_higher_value(array, 3))
>>> 3
print(closest_higher_value(array, 7))
>>> value too large: 7 is bigger than all elements of:
>>> [1 2 3 4 5 6]
>>> None
I know that numpy array has a method called shape that returns [No.of rows, No.of columns], and shape[0] gives you the number of rows, shape[1] gives you the number of columns.
a = numpy.array([[1,2,3,4], [2,3,4,5]])
a.shape
>> [2,4]
a.shape[0]
>> 2
a.shape[1]
>> 4
However, if my array only have one row, then it returns [No.of columns, ]. And shape[1] will be out of the index. For example
a = numpy.array([1,2,3,4])
a.shape
>> [4,]
a.shape[0]
>> 4 //this is the number of column
a.shape[1]
>> Error out of index
Now how do I get the number of rows of an numpy array if the array may have only one row?
Thank you
The concept of rows and columns applies when you have a 2D array. However, the array numpy.array([1,2,3,4]) is a 1D array and so has only one dimension, therefore shape rightly returns a single valued iterable.
For a 2D version of the same array, consider the following instead:
>>> a = numpy.array([[1,2,3,4]]) # notice the extra square braces
>>> a.shape
(1, 4)
Rather then converting this to a 2d array, which may not be an option every time - one could either check the len() of the tuple returned by shape or just check for an index error as such:
import numpy
a = numpy.array([1,2,3,4])
print(a.shape)
# (4,)
print(a.shape[0])
try:
print(a.shape[1])
except IndexError:
print("only 1 column")
Or you could just try and assign this to a variable for later use (or return or what have you) if you know you will only have 1 or 2 dimension shapes:
try:
shape = (a.shape[0], a.shape[1])
except IndexError:
shape = (1, a.shape[0])
print(shape)
For two numpy array a, b
a=[1,2,3] b=[4,5,6]
I want to change x<2.5 data of a to b. So I tried
a[a<2.5]=b
hoping a to be a=[4,5,3].
but this makes error
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
a[a<2.5]=b
ValueError: NumPy boolean array indexing assignment cannot assign 3 input values to the 2 output values where the mask is true
what is the problem?
The issue you're seeing is a result of how masks work on numpy arrays.
When you write
a[a < 2.5]
you get back the elements of a which match the mask a < 2.5. In this case, that will be the first two elements only.
Attempting to do
a[a < 2.5] = b
is an error because b has three elements, but a[a < 2.5] has only two.
An easy way to achieve the result you're after in numpy is to use np.where.
The syntax of this is np.where(condition, valuesWhereTrue, valuesWhereFalse).
In your case, you could write
newArray = np.where(a < 2.5, b, a)
Alternatively, if you don't want the overhead of a new array, you could perform the replacement in-place (as you're trying to do in the question). To achieve this, you can write:
idxs = a < 2.5
a[idxs] = b[idxs]