model.predict(sample) returning TypeError: cannot perform reduce with flexible type - numpy

I keep running into this error when I try to predict based on fitted model.
training, testing = train_test_split(gesture, test_size = 0.2, random_state = 0)
x = training.drop('CLASS', axis = 1) # remove the Class column from Training dataframe
y = testing.drop('CLASS', axis = 1) # remove the Class column from Testing dataframe
f_train = x.values.tolist()
l_train = training['CLASS'].values.tolist() # make a list of class identifiers from Training dataframe
f_test = y.values.tolist()
knn = KNeighborsRegressor(n_neighbors = 5)
knn.fit(f_train, l_train)
predictions = knn.predict(f_test)
The error occurs in the last line of the above code and the error message is given below:
Traceback (most recent call last):
File "C:\Users\Umair Khan\Dropbox\`Shift betweeen PCs\Work\EMG Hand Gesture\Codes\ML_on_CSV.py", line 39, in <module>
predictions = knn.predict(f_test)
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\neighbors\_regression.py", line 185, in predict
y_pred = np.mean(_y[neigh_ind], axis=1)
File "<__array_function__ internals>", line 6, in mean
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\fromnumeric.py", line 3335, in mean
out=out, **kwargs)
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\_methods.py", line 151, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type
f_test is a list of lists like such [[16, 30, 35, 250, -1, 0.5, 35, 0.03, 0.02], [16, 30, 35, 250, -1, 0.5, 35, 0.03, 0.02]]
I have also tried passing an array in predict(sample) but the issue still remains.
predictions = knn.predict(np.array(f_test).astype(np.float))

We need to see more of the error traceback. And info on the function inputs, particularly shape and dtype.
I've seen this error message when working with structured arrays. But it's not obvious where those might arise in your code.
In [15]: np.ones((2,), dtype='i,i')
Out[15]: array([(1, 1), (1, 1)], dtype=[('f0', '<i4'), ('f1', '<i4')])
In [16]: np.sum(np.ones((2,), dtype='i,i'))
....
---> 87 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
88
89
TypeError: cannot perform reduce with flexible type

Solved:
changed dtype of l_train from string to float and the error disappeared. f_train and f_test were already of type float.

Related

Populating numpy array with most performance

I have few arrays a,b,c and d as shown below and would like to populate a matrix by evaluating a function f(...) which consumes a,b,c and d.
with nested for loop this is obviously possible but I'm looking for more pythonic and fast way to do this.
So far I tried, np.fromfunction with no luck.
Thanks
PS: This function f has a conditional. I still can consider approaches which does not support conditionals but if the solution supports conditionals that would be fantastic.
example function in case helpful
def fun(a,b,c,c): return a+b+c+d if a==b else a*b*c*d
Also why fromfunction failed is shown below
>>> a = np.array([1,2,3,4,5])
>>> b = np.array([10,20,30])
>>> def fun(i,j): return a[i] * b[j]
>>> np.fromfunction(fun, (3,5))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Anaconda3\lib\site-packages\numpy\core\numeric.py", line 1853, in fromfunction
return function(*args, **kwargs)
File "<stdin>", line 1, in fun
IndexError: arrays used as indices must be of integer (or boolean) type
The reason the function fails is that np.fromfunction passes floating-point values, which are not valid as indices. You can modify your function like this to make it work:
def fun(i,j):
return a[j.astype(int)] * b[i.astype(int)]
print(np.fromfunction(fun, (3,5)))
[[ 10 20 30 40 50]
[ 20 40 60 80 100]
[ 30 60 90 120 150]]
Jake has explained why your fromfunction approach fails. However, you don't need fromfunction for your example. You could simply add an axis to b and have numpy broadcast the shapes:
a = np.array([1,2,3,4,5])
b = np.array([10,20,30])
def fun(i,j): return a[j.astype(int)] * b[i.astype(int)]
f1 = np.fromfunction(fun, (3, 5))
f2 = b[:, None] * a
(f1 == f2).all() # True
Extending this to the function you showed that contains an if condition, you could just split the if into two operations in sequence: creating an array given by the if expression, and overwriting the relevant parts by the else expression.
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
c = np.array([100, 200, 300, 400, 500])
d = np.array([0, 1, 2, 3])
# Calculate the values at all indices as the product
result = d[:, None] * (a * b * c)
# array([[ 0, 0, 0, 0, 0],
# [ 500, 1600, 2700, 3200, 2500],
# [1000, 3200, 5400, 6400, 5000],
# [1500, 4800, 8100, 9600, 7500]])
# Calculate sum
sum_arr = d[:, None] + (a + b + c)
# array([[106, 206, 306, 406, 506],
# [107, 207, 307, 407, 507],
# [108, 208, 308, 408, 508],
# [109, 209, 309, 409, 509]])
# Set diagonal elements (i==j) to sum:
np.fill_diagonal(result, np.diag(sum_arr))
which gives the following result:
array([[ 106, 0, 0, 0, 0],
[ 500, 207, 2700, 3200, 2500],
[1000, 3200, 308, 6400, 5000],
[1500, 4800, 8100, 409, 7500]])

Per channel normalization of RGB images

I would like to know how I can manually normalize an RGB image.
I tried:
img_name = 'example/abc/myfile.png'
img = np.asarray(Image.open(img_name))
img = np.transpose(img, (2,0,1))
(img/255.0 - mean)/std
mean and std has shape (3,)
When I run the code above I get this error:
ValueError: operands could not be broadcast together with shapes (3,512,512) (3,)
How can we normalize each channel individually?
Normalization means to transform to zero mean and unit variance. This is done by subtracting the mean and dividing the result by the standard deviation.
You can do it per channel by specifying the axes as x.mean((1,2)) instead of just x.mean(). In order to be able to broadcast you need to transpose the image first and then transpose back.
import numpy as np
np.random.seed(0)
c = 3 # number of channels
h = 2 # height
w = 4 # width
img = np.random.randint(0, 255, (c, h, w))
img_n = ((img.T - img.mean((1,2))) / img.std((1,2))).T
Original image:
array([[[172, 47, 117, 192],
[ 67, 251, 195, 103]],
[[ 9, 211, 21, 242],
[ 36, 87, 70, 216]],
[[ 88, 140, 58, 193],
[230, 39, 87, 174]]])
Normalized image:
array([[[ 0.43920493, -1.45391976, -0.39376994, 0.74210488],
[-1.15101981, 1.63565973, 0.78753987, -0.6057999 ]],
[[-1.14091546, 1.10752281, -1.00734487, 1.45258017],
[-0.84038163, -0.27270662, -0.46193163, 1.16317723]],
[[-0.59393963, 0.21615508, -1.06130196, 1.04182853],
[ 1.61824207, -1.35729811, -0.60951837, 0.74583239]]])
Verify zero mean and unit standard deviation per channel print(img_n.mean((1,2)), img_n.std((1,2))):
[-1.38777878e-17 0.00000000e+00 0.00000000e+00]
[1. 1. 1.]

numpy - why np.multiply.reduce([], axis=None) results in 1?

Why multiply & reduce non-existing element in an empty array-like results in 1?
np.multiply.reduce([], axis=None)
---
1
ufunc may have an identify attribute:
In [200]: np.multiply.identity
Out[200]: 1
In [201]: np.multiply.reduce([])
Out[201]: 1.0
which can be replaced in a reduce:
In [202]: np.multiply.reduce([], initial=10)
Out[202]: 10.0
In [203]: np.multiply.reduce([1,2,3], initial=10)
Out[203]: 60
In [204]: np.multiply.reduce([1,2,3], initial=None)
Out[204]: 6
and if None,it can produce an error:
In [205]: np.multiply.reduce([], initial=None)
Traceback (most recent call last):
File "<ipython-input-205-1c3b1c890fd6>", line 1, in <module>
np.multiply.reduce([], initial=None)
ValueError: zero-size array to reduction operation multiply which has no identity
max is a ufunc without an intial:
In [211]: np.max([])
Traceback (most recent call last):
File "<ipython-input-211-93f3814168a1>", line 1, in <module>
np.max([])
File "<__array_function__ internals>", line 5, in amax
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2733, in amax
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
In [212]: np.max([], initial=-1)
Out[212]: -1.0
Python reduce
In [222]: from functools import reduce
In [223]: reduce?
Docstring:
reduce(function, sequence[, initial]) -> value
Apply a function of two arguments cumulatively to the items of a sequence,
from left to right, so as to reduce the sequence to a single value.
For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5). If initial is present, it is placed before the items
of the sequence in the calculation, and serves as a default when the
sequence is empty.
Type: builtin_function_or_method
So with a multiply lambda:
In [224]: reduce(lambda x,y: x*y,[1,2,3])
Out[224]: 6
For a empty list, error is the default behavior:
In [225]: reduce(lambda x,y: x*y,[])
Traceback (most recent call last):
File "<ipython-input-225-780706778563>", line 1, in <module>
reduce(lambda x,y: x*y,[])
TypeError: reduce() of empty sequence with no initial value
But with a supplied initial value:
In [227]: reduce(lambda x,y: x*y,[],1)
Out[227]: 1

Minimizing the peak difference of elements of two lists given some constraints

I want to minimize the peak difference of list1[i] - list2[i] using the scipy.optimize.minimize method.
The elements in list1 and list2 are floats.
For example:
list1 = [50, 50.5, 52, 53, 55, 55.5, 56, 57, 60, 61]
How do I minimize list1[i] - list2[i] given that I have two constraints:
1. list2 = list1[0]
2. list2[i+1]-list2[i]<=1.5
Basically, two consecutive elements in list2 can not be separated by more than 1.5 and the first element of list2 is the first element of list1.
Maybe there is another way other than scipy.optimize.minimize but I don't know how to do this.
I think the optimum values for list2 are maybe:
list2 = [50, 50.5, 52, 53, 54.5, 55.5, 56, 57, 58.5, 60]
In this case the peak difference is 1.5.
But maybe the algorithm finds a more optimum solution where there is less difference between the elements of list1 and list2.
Here is what I have tried but failed:
import numpy as np
from scipy.optimize import minimize
list1 = [50, 50.5, 52, 53, 55, 55.5, 56, 57,60, 61]
list2 = [list1[0]]
#Define objective
def peakDifference(*args):
global list2
peak_error = []
for list1_i, list2_i in zip(list1, list2):
peak_error.append(list1_i-list2_i)
return max(peak_error)
peak_error = peakDifference()
#Define constraints
def constraint1(*args):
for x in range(len(list2) - 1):
return list2[x+1] - list2[x] - 1.5
con1 = {'type': 'ineq', 'fun': constraint1}
#Optimize
sol = minimize(peakDifference,list2, constraints=con1)
Traceback (most recent call last): File "C:/Users/JumpStart/PycharmProjects/anglesimulation/venv/asdfgh.py", line 27, in <module>
sol = minimize(peakDifference,list2, constraints=con1) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\_minimize.py", line 625, in minimize
return _minimize_slsqp(fun, x0, args, jac, bounds, File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 412, in _minimize_slsqp
a = _eval_con_normals(x, cons, la, n, m, meq, mieq) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 486, in _eval_con_normals
a_ieq = vstack([con['jac'](x, *con['args']) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 486, in <listcomp>
a_ieq = vstack([con['jac'](x, *con['args']) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 284, in cjac
return approx_derivative(fun, x, method='2-point', File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\_numdiff.py", line 426, in approx_derivative
return _dense_difference(fun_wrapped, x0, f0, h, File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\_numdiff.py", line 497, in _dense_difference
df = fun(x) - f0 TypeError: unsupported operand type(s) for -: 'NoneType' and 'NoneType'
Process finished with exit code 1
NLP model
Here is a "working" version of the NLP model. The model cannot be solved reliably this way, as it is non-differentiable.
import numpy as np
from scipy.optimize import minimize
list1 = np.array([50, 50.5, 52, 53, 55, 55.5, 56, 57,60, 61])
list1_0 = list1[0]
n = len(list1)
# our x variable will have one element less (first element is fixed)
list1x = np.delete(list1,0) # version with 1st element dropped
nx = len(list1x)
# objective function
# minimize the the maximum difference
# Notes:
# - x excludes first element (they are the same by definition)
# - this is non-differentiable so likely to be non-optimal
def maxDifference(x):
return np.max(np.abs(list1x - x))
# n-1 constraints
def constraint1(x):
return 1.5 - np.diff(np.insert(x,0,list1_0),1)
con1 = {'type': 'ineq', 'fun': constraint1}
#Optimize
x = 55*np.ones(nx) # initial value
sol = minimize(maxDifference, x, constraints=con1)
sol
# optimal is: x = [51.25,51.25,52.75,54.25,54.75,56.25,57.75,59.25,60.25]
# obj = 0.75
The result is:
fun: 5.0
jac: array([0., 0., 0., 0., 0., 0., 0., 0., 0.])
message: 'Optimization terminated successfully'
nfev: 20
nit: 2
njev: 2
status: 0
success: True
x: array([51.5, 53. , 54.5, 55. , 55. , 55. , 55. , 55. , 56. ])
This is non-optimal: the objective is 5 (instead of 0.75).
LP model
An LP model will find a proven optimal solution. That is much more reliable. E.g.:
import pulp as lp
list1 = [50, 50.5, 52, 53, 55, 55.5, 56, 57,60, 61]
n = len(list1)
model = pulp.LpProblem("Min_difference", pulp.LpMinimize)
x = lp.LpVariable.dicts("x",(i for i in range(n)))
z = lp.LpVariable("z")
# objective
model += z
# constraints
for i in range(n):
model += z >= x[i]-list1[i]
model += z >= list1[i]-x[i]
for i in range(n-1):
model += x[i+1] - x[i] <= 1.5
model += x[0] == list1[0]
model.solve()
print(lp.LpStatus[model.status])
print("obj:",z.varValue)
print([x[i].varValue for i in range(n)])
This shows:
Optimal
obj: 0.75
[50.0, 51.25, 52.75, 53.75, 55.25, 56.25, 56.75, 57.75, 59.25, 60.25]

Python 3.5 Trying to plot PCA with sklearn and matplotlib

Using the following code generates the error: TypeError: float() argument must be a string or a number, not 'Pred':
I am struggling to figure out what is causing this error to be thrown.
self.features is a list composed of three floats ex. [1.1, 1.2, 1.3]
an example of self.features:
[array([-1.67191985, 0.1 , 9.69981494]), array([-0.68486623, 0.05 , 9.99085024]), array([ -1.36 , 0.1 , 10.44720459]), array([-2.46918915, 0. , 3.5483372 ]), array([-0.835 , 0.1 , 4.02740479])]
This is the method where the error is being thrown.
def pca(self):
pca = PCA(n_components=2)
x_np = np.asarray(self.features)
pca.fit(x_np)
X_reduced = pca.transform(x_np)
plt.figure(figsize=(10, 8))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='RdBu')
plt.xlabel('First component')
plt.ylabel('Second component')
The full trace back is:
Traceback (most recent call last):
File "/Users/user/PycharmProjects/Post-Translational-Modification-
Prediction/pred.py", line 244, in <module>
y.generate_pca()
File "/Users/user/PycharmProjects/Post-Translational-Modification-
Prediction/pred.py", line 222, in generate_pca
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='RdBu')
File "/usr/local/lib/python3.5/site-packages/matplotlib/pyplot.py",
line 3435, in scatter
edgecolors=edgecolors, data=data, **kwargs)
File "/usr/local/lib/python3.5/site-packages/matplotlib/__init__.py",
line 1892, in inner
return func(ax, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/matplotlib/axes/_axes.py", line 3976, in scatter
c_array = np.asanyarray(c, dtype=float)
File "/usr/local/lib/python3.5/site-packages/numpy/core/numeric.py", line 583, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
TypeError: float() argument must be a string or a number, not 'Pred'
The suggested fix by #WhoIsJack is to add np.arange(len(self.features))
The functional code for those who run into similar issues is:
def generate_pca(self):
y= np.arange(len(self.features))
pca = PCA(n_components=2)
x_np = np.asarray(self.features)
pca.fit(x_np)
X_reduced = pca.transform(x_np)
plt.figure(figsize=(10, 8))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='RdBu')
plt.xlabel('First component')
plt.ylabel('Second component')
plt.show()