numpy - why np.multiply.reduce([], axis=None) results in 1? - numpy

Why multiply & reduce non-existing element in an empty array-like results in 1?
np.multiply.reduce([], axis=None)
---
1

ufunc may have an identify attribute:
In [200]: np.multiply.identity
Out[200]: 1
In [201]: np.multiply.reduce([])
Out[201]: 1.0
which can be replaced in a reduce:
In [202]: np.multiply.reduce([], initial=10)
Out[202]: 10.0
In [203]: np.multiply.reduce([1,2,3], initial=10)
Out[203]: 60
In [204]: np.multiply.reduce([1,2,3], initial=None)
Out[204]: 6
and if None,it can produce an error:
In [205]: np.multiply.reduce([], initial=None)
Traceback (most recent call last):
File "<ipython-input-205-1c3b1c890fd6>", line 1, in <module>
np.multiply.reduce([], initial=None)
ValueError: zero-size array to reduction operation multiply which has no identity
max is a ufunc without an intial:
In [211]: np.max([])
Traceback (most recent call last):
File "<ipython-input-211-93f3814168a1>", line 1, in <module>
np.max([])
File "<__array_function__ internals>", line 5, in amax
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2733, in amax
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
In [212]: np.max([], initial=-1)
Out[212]: -1.0
Python reduce
In [222]: from functools import reduce
In [223]: reduce?
Docstring:
reduce(function, sequence[, initial]) -> value
Apply a function of two arguments cumulatively to the items of a sequence,
from left to right, so as to reduce the sequence to a single value.
For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5). If initial is present, it is placed before the items
of the sequence in the calculation, and serves as a default when the
sequence is empty.
Type: builtin_function_or_method
So with a multiply lambda:
In [224]: reduce(lambda x,y: x*y,[1,2,3])
Out[224]: 6
For a empty list, error is the default behavior:
In [225]: reduce(lambda x,y: x*y,[])
Traceback (most recent call last):
File "<ipython-input-225-780706778563>", line 1, in <module>
reduce(lambda x,y: x*y,[])
TypeError: reduce() of empty sequence with no initial value
But with a supplied initial value:
In [227]: reduce(lambda x,y: x*y,[],1)
Out[227]: 1

Related

Why the dtype parameter of pd.Series cannot be used to convert integer strings to ints, but can convert to floats?

Can someone please explain the following ValueError for me:
>>> import pandas as pd
>>> pd.__version__
'1.3.5'
>>> pd.Series(['1', '2'], dtype='float64')
0 1.0
1 2.0
dtype: float64
>>> pd.Series(['1', '2'], dtype='int64')
C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\core\numeric.py:2446: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
return bool(asarray(a1 == a2).all())
Traceback (most recent call last):
File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\IPython\core\interactiveshell.py", line 3444, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-20-171b1b2dceb0>", line 1, in <module>
pd.Series(['1', '2'], dtype='int64')
File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 439, in __init__
data = sanitize_array(data, index, dtype, copy)
File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\construction.py", line 569, in sanitize_array
subarr = _try_cast(data, dtype, copy, raise_cast_failure)
File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\construction.py", line 754, in _try_cast
subarr = maybe_cast_to_integer_array(arr, dtype)
File "C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\dtypes\cast.py", line 2094, in maybe_cast_to_integer_array
raise ValueError(f"values cannot be losslessly cast to {dtype}")
ValueError: values cannot be losslessly cast to int64

model.predict(sample) returning TypeError: cannot perform reduce with flexible type

I keep running into this error when I try to predict based on fitted model.
training, testing = train_test_split(gesture, test_size = 0.2, random_state = 0)
x = training.drop('CLASS', axis = 1) # remove the Class column from Training dataframe
y = testing.drop('CLASS', axis = 1) # remove the Class column from Testing dataframe
f_train = x.values.tolist()
l_train = training['CLASS'].values.tolist() # make a list of class identifiers from Training dataframe
f_test = y.values.tolist()
knn = KNeighborsRegressor(n_neighbors = 5)
knn.fit(f_train, l_train)
predictions = knn.predict(f_test)
The error occurs in the last line of the above code and the error message is given below:
Traceback (most recent call last):
File "C:\Users\Umair Khan\Dropbox\`Shift betweeen PCs\Work\EMG Hand Gesture\Codes\ML_on_CSV.py", line 39, in <module>
predictions = knn.predict(f_test)
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\neighbors\_regression.py", line 185, in predict
y_pred = np.mean(_y[neigh_ind], axis=1)
File "<__array_function__ internals>", line 6, in mean
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\fromnumeric.py", line 3335, in mean
out=out, **kwargs)
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\_methods.py", line 151, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type
f_test is a list of lists like such [[16, 30, 35, 250, -1, 0.5, 35, 0.03, 0.02], [16, 30, 35, 250, -1, 0.5, 35, 0.03, 0.02]]
I have also tried passing an array in predict(sample) but the issue still remains.
predictions = knn.predict(np.array(f_test).astype(np.float))
We need to see more of the error traceback. And info on the function inputs, particularly shape and dtype.
I've seen this error message when working with structured arrays. But it's not obvious where those might arise in your code.
In [15]: np.ones((2,), dtype='i,i')
Out[15]: array([(1, 1), (1, 1)], dtype=[('f0', '<i4'), ('f1', '<i4')])
In [16]: np.sum(np.ones((2,), dtype='i,i'))
....
---> 87 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
88
89
TypeError: cannot perform reduce with flexible type
Solved:
changed dtype of l_train from string to float and the error disappeared. f_train and f_test were already of type float.

How can I output the "Name" value in pandas?

After I do an iloc (df.iloc[3]) , I get an output with all the column names and their values for a given row.
What should the code be if I only want to out put the "Name" value for the same row?
Eg:
Columns 1 Value 1
Columns 2 Value 2
Name: Row 1, dtype: object
So, in this case "Row 1".
>>> df = pd.DataFrame({'Name': ['Uncle', 'Sam', 'Martin', 'Jacob'], 'Salary': [1000, 2000, 3000, 1500]})
>>> df
Name Salary
0 Uncle 1000
1 Sam 2000
2 Martin 3000
3 Jacob 1500
df.iloc[3] gives the following:
>>> df.iloc[3]
Name Jacob
Salary 1500
Name: 3, dtype: object
However, df.iloc[3, 'Name'] throws the following exception:
>>> df.iloc[3, 'Name']
Traceback (most recent call last):
File "/home/nikhil/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 235, in _has_valid_tuple
self._validate_key(k, i)
File "/home/nikhil/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 2035, in _validate_key
"a [{types}]".format(types=self._valid_types)
ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nikhil/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1418, in __getitem__
return self._getitem_tuple(key)
File "/home/nikhil/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 2092, in _getitem_tuple
self._has_valid_tuple(tup)
File "/home/nikhil/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 239, in _has_valid_tuple
"[{types}] types".format(types=self._valid_types)
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types
Use df.loc[3, 'Name'] instead:
>>> df.loc[3, 'Name']
'Jacob'
df.iloc is a series
df.iloc['3'].name will return the name
Example:
>> df=pd.DataFrame({'data': [100,200]})
>> df=df.set_index(pd.Index(['A','B']))
>> df.iloc[1]
data 200
Name: B, dtype: int64
>> df.iloc[1].name
'B'

Python 3.5 Trying to plot PCA with sklearn and matplotlib

Using the following code generates the error: TypeError: float() argument must be a string or a number, not 'Pred':
I am struggling to figure out what is causing this error to be thrown.
self.features is a list composed of three floats ex. [1.1, 1.2, 1.3]
an example of self.features:
[array([-1.67191985, 0.1 , 9.69981494]), array([-0.68486623, 0.05 , 9.99085024]), array([ -1.36 , 0.1 , 10.44720459]), array([-2.46918915, 0. , 3.5483372 ]), array([-0.835 , 0.1 , 4.02740479])]
This is the method where the error is being thrown.
def pca(self):
pca = PCA(n_components=2)
x_np = np.asarray(self.features)
pca.fit(x_np)
X_reduced = pca.transform(x_np)
plt.figure(figsize=(10, 8))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='RdBu')
plt.xlabel('First component')
plt.ylabel('Second component')
The full trace back is:
Traceback (most recent call last):
File "/Users/user/PycharmProjects/Post-Translational-Modification-
Prediction/pred.py", line 244, in <module>
y.generate_pca()
File "/Users/user/PycharmProjects/Post-Translational-Modification-
Prediction/pred.py", line 222, in generate_pca
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='RdBu')
File "/usr/local/lib/python3.5/site-packages/matplotlib/pyplot.py",
line 3435, in scatter
edgecolors=edgecolors, data=data, **kwargs)
File "/usr/local/lib/python3.5/site-packages/matplotlib/__init__.py",
line 1892, in inner
return func(ax, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/matplotlib/axes/_axes.py", line 3976, in scatter
c_array = np.asanyarray(c, dtype=float)
File "/usr/local/lib/python3.5/site-packages/numpy/core/numeric.py", line 583, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
TypeError: float() argument must be a string or a number, not 'Pred'
The suggested fix by #WhoIsJack is to add np.arange(len(self.features))
The functional code for those who run into similar issues is:
def generate_pca(self):
y= np.arange(len(self.features))
pca = PCA(n_components=2)
x_np = np.asarray(self.features)
pca.fit(x_np)
X_reduced = pca.transform(x_np)
plt.figure(figsize=(10, 8))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='RdBu')
plt.xlabel('First component')
plt.ylabel('Second component')
plt.show()

Incompatible indexer with Series

Why do I get an error:
import pandas as pd
a = pd.Series(index=[4,5,6], data=0)
print a.loc[4:5]
a.loc[4:5] += 1
Output:
4 0
5 0
Traceback (most recent call last):
File "temp1.py", line 9, in <module>
dtype: int64
a.loc[4:5] += 1
File "lib\site-packages\pandas\core\indexing.py", line 88, in __setitem__
self._setitem_with_indexer(indexer, value)
File "lib\site-packages\pandas\core\indexing.py", line 177, in _setitem_with_indexer
value = self._align_series(indexer, value)
File "lib\site-packages\pandas\core\indexing.py", line 206, in _align_series
raise ValueError('Incompatible indexer with Series')
ValueError: Incompatible indexer with Series
Pandas 0.12.
I think this is a bug, you can work around this by use tuple index:
import pandas as pd
a = pd.Series(index=[4,5,6], data=0)
print a.loc[4:5]
a.loc[4:5,] += 1