numpy.VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences - numpy

Here's an example of behavior I cannot understand, maybe someone can share the insight into the logic behind it:
ccn = np.ones(1)
bbb = 7
bbn = np.array(bbb)
bbn * ccn # this is OK
array([7.])
np.prod((bbn,ccn)) # but this is NOT
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2.2\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
File "<__array_function__ internals>", line 5, in prod
File "C:\Users\...\venv\lib\site-packages\numpy\core\fromnumeric.py", line 2999, in prod
return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out,
File "C:\Users\...\venv\lib\site-packages\numpy\core\fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
numpy.VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
Why? Why would a simple multiplication of two numbers be a problem? As far as formal algebra goes there's no dimensional problems, no datatype problems? The result is invariably also a single number, there's no chance it "suddenly" turn vector or object anything alike. prod(a,b) for a and b being scalars or 1by1 "matrices" is something MATLAB or Octave would eat no problem.
I know I can turn this error off and such, but why is it even and error?

In [346]: ccn = np.ones(1)
...: bbb = 7
...: bbn = np.array(bbb)
In [347]: ccn.shape
Out[347]: (1,)
In [348]: bbn.shape
Out[348]: ()
In [349]: np.array((bbn,ccn))
<ipython-input-349-997419ba7a2f>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
np.array((bbn,ccn))
Out[349]: array([array(7), array([1.])], dtype=object)
You have arrays with different dimensions, that can't be combined into one numeric array.
That np.prod expression is actually:
np.multiply.reduce(np.array([bbn,ccn]))
can be deduced from your traceback.
In Octave both objects have shape (1,1), 2d
>> ccn = ones(1)
ccn = 1
>> ccn = ones(1);
>> size(ccn)
ans =
1 1
>> bbn = 7;
>> size(bbn)
ans =
1 1
>> [bbn,ccn]
ans =
7 1
It doesn't have true scalars; everything is 2d (even 3d is a fudge on the last dimension).
And with 'raw' Python inputs:
In [350]: np.array([1,[1]])
<ipython-input-350-f17372e1b22d>:1: VisibleDeprecationWarning: ...
np.array([1,[1]])
Out[350]: array([1, list([1])], dtype=object)
The object dtype array preserves the type of the inputs.
edit
prod isn't a simple multiplication. It's a reduction operation, like the big Pi in math. Even in Octave it isn't:
>> prod([[2,3],[3;4]])
error: horizontal dimensions mismatch (1x2 vs 2x1)
>> [2,3]*[3;4]
ans = 18
>> [2,3].*[3;4]
ans =
6 9
8 12
The numpy equivalent:
In [97]: np.prod((np.array([2,3]),np.array([[3],[4]])))
/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py:87: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences...
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: could not broadcast input array from shape (2,1) into shape (2,)
In [98]: np.array([2,3])#np.array([[3],[4]])
Out[98]: array([18])
In [99]: np.array([2,3])*np.array([[3],[4]])
Out[99]:
array([[ 6, 9],
[ 8, 12]])
The warning, and here the error, is produced by trying to make ONE array from (np.array([2,3]),np.array([[3],[4]])).

Related

What is the meaning of `numpy.array(value)`?

numpy.array(value) evaluates to true, if value is int, float or complex. The result seems to be a shapeless array (numpy.array(value).shape returns ()).
Reshaping the above like so numpy.array(value).reshape(1) works fine and numpy.array(value).reshape(1).squeeze() reverses this and again results in a shapeless array.
What is the rationale behind this behavior? Which use-cases exist for this behaviour?
When you create a zero-dimensional array like np.array(3), you get an object that behaves as an array in 99.99% of situations. You can inspect the basic properties:
>>> x = np.array(3)
>>> x
array(3)
>>> x.ndim
0
>>> x.shape
()
>>> x[None]
array([3])
>>> type(x)
numpy.ndarray
>>> x.dtype
dtype('int32')
So far so good. The logic behind this is simple: you can process any array-like object the same way, regardless of whether is it a number, list or array, just by wrapping it in a call to np.array.
One thing to keep in mind is that when you index an array, the index tuple must have ndim or fewer elements. So you can't do:
>>> x[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array
Instead, you have to use a zero-sized tuple (since x[] is invalid syntax):
>>> x[()]
3
You can also use the array as a scalar instead:
>>> y = x + 3
>>> y
6
>>> type(y)
numpy.int32
Adding two scalars produces a scalar instance of the dtype, not another array. That being said, you can use y from this example in exactly the same way you would x, 99.99% of the time, since dtypes inherit from ndarray. It does not matter that 3 is a Python int, since np.add will wrap it in an array regardless. y = x + x will yield identical results.
One difference between x and y in these examples is that x is not officially considered to be a scalar:
>>> np.isscalar(x)
False
>>> np.isscalar(y)
True
The indexing issue can potentially throw a monkey wrench in your plans to index any array like-object. You can easily get around it by supplying ndmin=1 as an argument to the constructor, or using a reshape:
>>> x1 = np.array(3, ndmin=1)
>>> x1
array([3])
>>> x2 = np.array(3).reshape(-1)
>>> x2
array([3])
I generally recommend the former method, as it requires no prior knowledge of the dimensionality of the input.
FurtherRreading:
Why are 0d arrays in Numpy not considered scalar?

Isn't this a row vector? [duplicate]

I know that numpy array has a method called shape that returns [No.of rows, No.of columns], and shape[0] gives you the number of rows, shape[1] gives you the number of columns.
a = numpy.array([[1,2,3,4], [2,3,4,5]])
a.shape
>> [2,4]
a.shape[0]
>> 2
a.shape[1]
>> 4
However, if my array only have one row, then it returns [No.of columns, ]. And shape[1] will be out of the index. For example
a = numpy.array([1,2,3,4])
a.shape
>> [4,]
a.shape[0]
>> 4 //this is the number of column
a.shape[1]
>> Error out of index
Now how do I get the number of rows of an numpy array if the array may have only one row?
Thank you
The concept of rows and columns applies when you have a 2D array. However, the array numpy.array([1,2,3,4]) is a 1D array and so has only one dimension, therefore shape rightly returns a single valued iterable.
For a 2D version of the same array, consider the following instead:
>>> a = numpy.array([[1,2,3,4]]) # notice the extra square braces
>>> a.shape
(1, 4)
Rather then converting this to a 2d array, which may not be an option every time - one could either check the len() of the tuple returned by shape or just check for an index error as such:
import numpy
a = numpy.array([1,2,3,4])
print(a.shape)
# (4,)
print(a.shape[0])
try:
print(a.shape[1])
except IndexError:
print("only 1 column")
Or you could just try and assign this to a variable for later use (or return or what have you) if you know you will only have 1 or 2 dimension shapes:
try:
shape = (a.shape[0], a.shape[1])
except IndexError:
shape = (1, a.shape[0])
print(shape)

Getting Error while performing Undersampling for Sklearn

I am trying built an randomforest classifier for binary classification . My data is inbalanced hence I am performing undersampling.
train = data.drop(['Co_Name','Cust_ID','Phone','Shpr_ID','Resi_Cnt','Buz_Cnt','Nearby_Cnt','parseNumber','removeString','Qty','bins','Adj_Addr','Resi','Weight','Resi_Area','Lat','Lng'], axis=1)
Y = data['Resi']
from sklearn import metrics
rus = RandomUnderSampler(random_state=42)
X_train_res, y_train_res = rus.fit_sample(train, Y)
I am getting the below error
446 # make sure we actually converted to numeric:
447 if dtype_numeric and array.dtype.kind == "O":
--> 448 array = array.astype(np.float64)
449 if not allow_nd and array.ndim >= 3:
450 raise ValueError("Found array with dim %d. %s expected <= 2."
ValueError: setting an array element with a sequence.
How to fix this.
Can you share the dataframe? or a sample of that!
This error can be a lot of things, for example:
If you try:
np.asarray(
[
[1, 2],
[2, 3, 4]
],
dtype=np.float)
You will get:
ValueError: setting an array element with a sequence.
This is because the array have incorrect shape of columns. So you can't create an array from lists, with a column length different on the second list. So doesn't match column length.
But your error probably it's related to train vs Y shape or the type in the train(data). During the Under-sampled fit function should have some conversion that throws this error. Confirm if train (data) have the appropriate type before to do the RandomUnderSampler.

Is there a difference between the input paramaters of numpy.random.choice and random.choice?

Why does numpy.random.choice not work the same as random.choice? When I do this :
>>> random.choice([(1,2),(4,3)])
(1, 2)
It works.
But when I do this:
>>> np.random.choice([(1,2), (3,4)])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "mtrand.pyx", line 1393, in mtrand.RandomState.choice
(numpy/random/mtrand/mtrand.c:15450)
ValueError: a must be 1-dimensional
How do I achieve the same behavior as random.choice() in numpy.random.choice()?
Well np.random.choice as noted in the docs, expects a 1D array and your input when expressed as an array would be 2D. So, it won't work simply like that.
To make it work, we can feed in the length of the input and let it select one index, which when indexed into the input would be the equivalent one from random.choice, as shown below -
out = a[np.random.choice(len(a))] # a is input
Sample run -
In [74]: a = [(1,2),(4,3),(6,9)]
In [75]: a[np.random.choice(len(a))]
Out[75]: (6, 9)
In [76]: a[np.random.choice(len(a))]
Out[76]: (1, 2)
Alternatively, we can convert the input to a 1D array of object dtype and that would allow us to directly use np.random.choice, as shown below -
In [131]: a0 = np.empty(len(a),dtype=object)
In [132]: a0[:] = a
In [133]: a0.shape
Out[133]: (3,) # 1D array
In [134]: np.random.choice(a0)
Out[134]: (6, 9)
In [135]: np.random.choice(a0)
Out[135]: (4, 3)
Relatedly, if you want to randomly sample rows of a 2D matrix like this
x = np.array([[1, 100], [2, 200], [3, 300], [4, 400]])
then you can do something like this:
n_rows = x.shape[0]
x[np.random.choice(n_rows, size=n_rows, replace=True), :]
Should work for a 2D matrix with any number of columns, and you can of course sample however many times you want with the size kwarg, etc.

tensorflow: ValueError: setting an array element with a sequence

I am playing with the fixed code from this question. I am getting the above error. Googling suggests it might be some kind of dimension mismatch, though my diagnostics does not show any:
with tf.Session() as sess:
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
for (_x_, _y_) in getb(train_X, train_Y):
print("y data raw", _y_.shape )
_y_ = tf.reshape(_y_, [-1, 1])
print( "y data ", _y_.get_shape().as_list())
print("y place holder", yy.get_shape().as_list())
print("x data", _x_.shape )
print("x place holder", xx.get_shape().as_list() )
sess.run(optimizer, feed_dict={xx: _x_, yy: _y_})
Looking at the dimensions, everything is alright:
y data raw (20,)
y data [20, 1]
y place holder [20, 1]
x data (20, 10)
x place holder [20, 10]
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-131-00e0bdc140b2> in <module>()
16 print("x place holder", xx.get_shape().as_list() )
17
---> 18 sess.run(optimizer, feed_dict={xx: _x_, yy: _y_})
19
20 # # Display logs per epoch step
/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict)
355 e.args = (e.message,)
356 raise e
--> 357 np_val = np.array(subfeed_val, dtype=subfeed_t.dtype.as_numpy_dtype)
358 if subfeed_t.op.type == 'Placeholder':
359 if not subfeed_t.get_shape().is_compatible_with(np_val.shape):
ValueError: setting an array element with a sequence.
Any debugging tips?
This—not very helpful—error is raised when one of the values in the feed_dict argument to tf.Session.run() is a tf.Tensor object (in this case, the result of tf.reshape()).
The values in feed_dict must be numpy arrays, or some value x that can be implicitly converted to a numpy array using numpy.array(x). tf.Tensor objects cannot be implicitly converted, because doing so might require a lot of work: instead you have to call sess.run(t) to convert a tensor t to a numpy array.
As you noticed in your answer, using np.reshape(_y_, [-1, 1]) works, because it produces a numpy array (and because _y_ is a numpy array to begin with). In general, you should always prepare data to be fed using numpy and other pure-Python operations.
replacing tf reshape with plain numpy one helped:
_y_ = np.reshape(_y_, [-1, 1])
the actual reason why is still unclear, but it works.