Related
I have few arrays a,b,c and d as shown below and would like to populate a matrix by evaluating a function f(...) which consumes a,b,c and d.
with nested for loop this is obviously possible but I'm looking for more pythonic and fast way to do this.
So far I tried, np.fromfunction with no luck.
Thanks
PS: This function f has a conditional. I still can consider approaches which does not support conditionals but if the solution supports conditionals that would be fantastic.
example function in case helpful
def fun(a,b,c,c): return a+b+c+d if a==b else a*b*c*d
Also why fromfunction failed is shown below
>>> a = np.array([1,2,3,4,5])
>>> b = np.array([10,20,30])
>>> def fun(i,j): return a[i] * b[j]
>>> np.fromfunction(fun, (3,5))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Anaconda3\lib\site-packages\numpy\core\numeric.py", line 1853, in fromfunction
return function(*args, **kwargs)
File "<stdin>", line 1, in fun
IndexError: arrays used as indices must be of integer (or boolean) type
The reason the function fails is that np.fromfunction passes floating-point values, which are not valid as indices. You can modify your function like this to make it work:
def fun(i,j):
return a[j.astype(int)] * b[i.astype(int)]
print(np.fromfunction(fun, (3,5)))
[[ 10 20 30 40 50]
[ 20 40 60 80 100]
[ 30 60 90 120 150]]
Jake has explained why your fromfunction approach fails. However, you don't need fromfunction for your example. You could simply add an axis to b and have numpy broadcast the shapes:
a = np.array([1,2,3,4,5])
b = np.array([10,20,30])
def fun(i,j): return a[j.astype(int)] * b[i.astype(int)]
f1 = np.fromfunction(fun, (3, 5))
f2 = b[:, None] * a
(f1 == f2).all() # True
Extending this to the function you showed that contains an if condition, you could just split the if into two operations in sequence: creating an array given by the if expression, and overwriting the relevant parts by the else expression.
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
c = np.array([100, 200, 300, 400, 500])
d = np.array([0, 1, 2, 3])
# Calculate the values at all indices as the product
result = d[:, None] * (a * b * c)
# array([[ 0, 0, 0, 0, 0],
# [ 500, 1600, 2700, 3200, 2500],
# [1000, 3200, 5400, 6400, 5000],
# [1500, 4800, 8100, 9600, 7500]])
# Calculate sum
sum_arr = d[:, None] + (a + b + c)
# array([[106, 206, 306, 406, 506],
# [107, 207, 307, 407, 507],
# [108, 208, 308, 408, 508],
# [109, 209, 309, 409, 509]])
# Set diagonal elements (i==j) to sum:
np.fill_diagonal(result, np.diag(sum_arr))
which gives the following result:
array([[ 106, 0, 0, 0, 0],
[ 500, 207, 2700, 3200, 2500],
[1000, 3200, 308, 6400, 5000],
[1500, 4800, 8100, 409, 7500]])
I want to minimize the peak difference of list1[i] - list2[i] using the scipy.optimize.minimize method.
The elements in list1 and list2 are floats.
For example:
list1 = [50, 50.5, 52, 53, 55, 55.5, 56, 57, 60, 61]
How do I minimize list1[i] - list2[i] given that I have two constraints:
1. list2 = list1[0]
2. list2[i+1]-list2[i]<=1.5
Basically, two consecutive elements in list2 can not be separated by more than 1.5 and the first element of list2 is the first element of list1.
Maybe there is another way other than scipy.optimize.minimize but I don't know how to do this.
I think the optimum values for list2 are maybe:
list2 = [50, 50.5, 52, 53, 54.5, 55.5, 56, 57, 58.5, 60]
In this case the peak difference is 1.5.
But maybe the algorithm finds a more optimum solution where there is less difference between the elements of list1 and list2.
Here is what I have tried but failed:
import numpy as np
from scipy.optimize import minimize
list1 = [50, 50.5, 52, 53, 55, 55.5, 56, 57,60, 61]
list2 = [list1[0]]
#Define objective
def peakDifference(*args):
global list2
peak_error = []
for list1_i, list2_i in zip(list1, list2):
peak_error.append(list1_i-list2_i)
return max(peak_error)
peak_error = peakDifference()
#Define constraints
def constraint1(*args):
for x in range(len(list2) - 1):
return list2[x+1] - list2[x] - 1.5
con1 = {'type': 'ineq', 'fun': constraint1}
#Optimize
sol = minimize(peakDifference,list2, constraints=con1)
Traceback (most recent call last): File "C:/Users/JumpStart/PycharmProjects/anglesimulation/venv/asdfgh.py", line 27, in <module>
sol = minimize(peakDifference,list2, constraints=con1) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\_minimize.py", line 625, in minimize
return _minimize_slsqp(fun, x0, args, jac, bounds, File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 412, in _minimize_slsqp
a = _eval_con_normals(x, cons, la, n, m, meq, mieq) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 486, in _eval_con_normals
a_ieq = vstack([con['jac'](x, *con['args']) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 486, in <listcomp>
a_ieq = vstack([con['jac'](x, *con['args']) File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\slsqp.py", line 284, in cjac
return approx_derivative(fun, x, method='2-point', File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\_numdiff.py", line 426, in approx_derivative
return _dense_difference(fun_wrapped, x0, f0, h, File "C:\Users\JumpStart\anaconda3\lib\site-packages\scipy\optimize\_numdiff.py", line 497, in _dense_difference
df = fun(x) - f0 TypeError: unsupported operand type(s) for -: 'NoneType' and 'NoneType'
Process finished with exit code 1
NLP model
Here is a "working" version of the NLP model. The model cannot be solved reliably this way, as it is non-differentiable.
import numpy as np
from scipy.optimize import minimize
list1 = np.array([50, 50.5, 52, 53, 55, 55.5, 56, 57,60, 61])
list1_0 = list1[0]
n = len(list1)
# our x variable will have one element less (first element is fixed)
list1x = np.delete(list1,0) # version with 1st element dropped
nx = len(list1x)
# objective function
# minimize the the maximum difference
# Notes:
# - x excludes first element (they are the same by definition)
# - this is non-differentiable so likely to be non-optimal
def maxDifference(x):
return np.max(np.abs(list1x - x))
# n-1 constraints
def constraint1(x):
return 1.5 - np.diff(np.insert(x,0,list1_0),1)
con1 = {'type': 'ineq', 'fun': constraint1}
#Optimize
x = 55*np.ones(nx) # initial value
sol = minimize(maxDifference, x, constraints=con1)
sol
# optimal is: x = [51.25,51.25,52.75,54.25,54.75,56.25,57.75,59.25,60.25]
# obj = 0.75
The result is:
fun: 5.0
jac: array([0., 0., 0., 0., 0., 0., 0., 0., 0.])
message: 'Optimization terminated successfully'
nfev: 20
nit: 2
njev: 2
status: 0
success: True
x: array([51.5, 53. , 54.5, 55. , 55. , 55. , 55. , 55. , 56. ])
This is non-optimal: the objective is 5 (instead of 0.75).
LP model
An LP model will find a proven optimal solution. That is much more reliable. E.g.:
import pulp as lp
list1 = [50, 50.5, 52, 53, 55, 55.5, 56, 57,60, 61]
n = len(list1)
model = pulp.LpProblem("Min_difference", pulp.LpMinimize)
x = lp.LpVariable.dicts("x",(i for i in range(n)))
z = lp.LpVariable("z")
# objective
model += z
# constraints
for i in range(n):
model += z >= x[i]-list1[i]
model += z >= list1[i]-x[i]
for i in range(n-1):
model += x[i+1] - x[i] <= 1.5
model += x[0] == list1[0]
model.solve()
print(lp.LpStatus[model.status])
print("obj:",z.varValue)
print([x[i].varValue for i in range(n)])
This shows:
Optimal
obj: 0.75
[50.0, 51.25, 52.75, 53.75, 55.25, 56.25, 56.75, 57.75, 59.25, 60.25]
I keep running into this error when I try to predict based on fitted model.
training, testing = train_test_split(gesture, test_size = 0.2, random_state = 0)
x = training.drop('CLASS', axis = 1) # remove the Class column from Training dataframe
y = testing.drop('CLASS', axis = 1) # remove the Class column from Testing dataframe
f_train = x.values.tolist()
l_train = training['CLASS'].values.tolist() # make a list of class identifiers from Training dataframe
f_test = y.values.tolist()
knn = KNeighborsRegressor(n_neighbors = 5)
knn.fit(f_train, l_train)
predictions = knn.predict(f_test)
The error occurs in the last line of the above code and the error message is given below:
Traceback (most recent call last):
File "C:\Users\Umair Khan\Dropbox\`Shift betweeen PCs\Work\EMG Hand Gesture\Codes\ML_on_CSV.py", line 39, in <module>
predictions = knn.predict(f_test)
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\neighbors\_regression.py", line 185, in predict
y_pred = np.mean(_y[neigh_ind], axis=1)
File "<__array_function__ internals>", line 6, in mean
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\fromnumeric.py", line 3335, in mean
out=out, **kwargs)
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\_methods.py", line 151, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type
f_test is a list of lists like such [[16, 30, 35, 250, -1, 0.5, 35, 0.03, 0.02], [16, 30, 35, 250, -1, 0.5, 35, 0.03, 0.02]]
I have also tried passing an array in predict(sample) but the issue still remains.
predictions = knn.predict(np.array(f_test).astype(np.float))
We need to see more of the error traceback. And info on the function inputs, particularly shape and dtype.
I've seen this error message when working with structured arrays. But it's not obvious where those might arise in your code.
In [15]: np.ones((2,), dtype='i,i')
Out[15]: array([(1, 1), (1, 1)], dtype=[('f0', '<i4'), ('f1', '<i4')])
In [16]: np.sum(np.ones((2,), dtype='i,i'))
....
---> 87 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
88
89
TypeError: cannot perform reduce with flexible type
Solved:
changed dtype of l_train from string to float and the error disappeared. f_train and f_test were already of type float.
I'm trying to get to grips with sci-kit learn for some simple machine learning projects but I'm coming unstuck with Pipelines and wonder what I've done wrong...
I'm trying to work through a tutorial on Kaggle
Here's my code:
import pandas as pd
train = pd.read_csv(local path to training data)
train_labels = pd.read_csv(local path to labels)
from sklearn.decomposition import PCA
from sklearn.svm import LinearSVC
from sklearn.grid_search import GridSearchCV
pca = PCA()
clf = LinearSVC()
n_components = arange(1, 39)
loss =['l1','l2']
penalty =['l1','l2']
C = arange(0, 1, .1)
whiten = [True, False]
from sklearn.pipeline import Pipeline
#set up pipeline
pipe = Pipeline(steps=[('pca', pca), ('clf', clf)])
#set up GridsearchCV
estimator = GridSearchCV(pipe, dict(pca__n_components = n_components, pca__whiten = whiten,
clf__loss = loss, clf__penalty = penalty, clf__C = C))
estimator
Returns:
GridSearchCV(cv=None,
estimator=Pipeline(steps=[('pca', PCA(copy=True, n_components=None, whiten=False)), ('clf', LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2',
random_state=None, tol=0.0001, verbose=0))]),
fit_params={}, iid=True, loss_func=None, n_jobs=1,
param_grid={'clf__penalty': ['l1', 'l2'], 'clf__loss': ['l1', 'l2'], 'clf__C': array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]), 'pca__n_components': array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38]), 'pca__whiten': [True, False]},
pre_dispatch='2*n_jobs', refit=True, score_func=None, scoring=None,
verbose=0)
But when I try to train data:
estimator.fit(train, train_labels)
The error is:
428 for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
429 for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 430 label_test_folds = test_folds[y == label]
431 # the test split can be too big because we used
432 # KFold(max(c, self.n_folds), self.n_folds) instead of
IndexError: too many indices for array
Can anyone point me in the right direction?
It turns out that the Pandas dataframe is the wrong shape.
estimator.fit(train.values, train_labels[0].values)
works, although I also had to drop the penalty term.
i am newbie and just want to implement Hierarchical Agglomerative clustering for RGB images. For this I extract all values of RGB from an image. And I process image.Next I find its distance and then develop the linkage. Now from linkage I want to extract my original data (i.e RGB values) on specified indices with indices id. Here is code I have done so far.
image = Image.open('image.jpg')
image = image.convert('RGB')
im = np.array(image).reshape((-1,3))
rgb = list(im.getdata())
X = pdist(im)
Y = linkage(X)
I = inconsistent(Y)
based on the 4th column of consistency. I opt minimum value of the cutoff in order to get maximum clusters.
cutoff = 0.7
cluster_assignments = fclusterdata(Y, cutoff)
# Print the indices of the data points in each cluster.
num_clusters = cluster_assignments.max()
print "%d clusters" % num_clusters
indices = cluster_indices(cluster_assignments)
ind = np.array(enumerate(rgb))
for k, ind in enumerate(indices):
print "cluster", k + 1, "is", ind
dendrogram(Y)
I got results like this
cluster 6 is [ 6 11]
cluster 7 is [ 9 12]
cluster 8 is [15]
Means cluster 6 contains the indices of 6 and 11 leafs. Now at this point I stuck in how to map these indices to get original data(i.e rgb values). indices of each rgb values to each pixel in the image. And then I have to generate codebook to implement Agglomeration Clustering. I have no idea how to approach this task. Read a lot of stuff but nothing clued.
Here is my solution:
import numpy as np
from scipy.cluster import hierarchy
im = np.array([[54,101,9],[ 67,89,27],[ 67,85,25],[ 55,106,1],[ 52,108,0],
[ 55,78,24],[ 19,57,8],[ 19,46,0],[ 95,110,15],[112,159,57],
[ 67,118,26],[ 76,127,35],[ 74,128,30],[ 25,62,0],[100,120,9],
[127,145,61],[ 48,112,25],[198,25,21],[203,11,10],[127,171,60],
[124,173,45],[120,133,19],[109,137,18],[ 60,85,0],[ 37,0,0],
[187,47,20],[127,170,52],[ 30,56,0]])
groups = hierarchy.fclusterdata(im, 0.7)
idx_sorted = np.argsort(groups)
group_sorted = groups[idx_sorted]
im_sorted = im[idx_sorted]
split_idx = np.where(np.diff(group_sorted) != 0)[0] + 1
np.split(im_sorted, split_idx)
output:
[array([[203, 11, 10],
[198, 25, 21]]),
array([[187, 47, 20]]),
array([[127, 171, 60],
[127, 170, 52]]),
array([[124, 173, 45]]),
array([[112, 159, 57]]),
array([[127, 145, 61]]),
array([[25, 62, 0],
[30, 56, 0]]),
array([[19, 57, 8]]),
array([[19, 46, 0]]),
array([[109, 137, 18],
[120, 133, 19]]),
array([[100, 120, 9],
[ 95, 110, 15]]),
array([[67, 89, 27],
[67, 85, 25]]),
array([[55, 78, 24]]),
array([[ 52, 108, 0],
[ 55, 106, 1]]),
array([[ 54, 101, 9]]),
array([[60, 85, 0]]),
array([[ 74, 128, 30],
[ 76, 127, 35]]),
array([[ 67, 118, 26]]),
array([[ 48, 112, 25]]),
array([[37, 0, 0]])]