suggest_int() missing 1 required positional argument: 'high' error on Optuna - xgboost

I have the following code of Optuna to do the hyperparameter tunning for a Xgboost classifier.
import optuna
from optuna import Trial, visualization
from optuna.samplers import TPESampler
from xgboost import XGBClassifier
def objective(trial: Trial,X_train,y_train,X_test,y_test):
param = {
"n_estimators" : Trial.suggest_int("n_estimators", 0, 1000),
'max_depth':Trial.suggest_int('max_depth', 2, 25),
'reg_alpha':Trial.suggest_int('reg_alpha', 0, 5),
'reg_lambda':Trial.suggest_int('reg_lambda', 0, 5),
'min_child_weight':Trial.suggest_int('min_child_weight', 0, 5),
'gamma':Trial.suggest_int('gamma', 0, 5),
'learning_rate':Trial.suggest_loguniform('learning_rate',0.005,0.5),
'colsample_bytree':Trial.suggest_discrete_uniform('colsample_bytree',0.1,1,0.01),
'nthread' : -1
}
model = XGBClassifier(**param)
model.fit(X_train,y_train)
return cross_val_score(model,X_test,y_test).mean()
study = optuna.create_study(direction='maximize',sampler=TPESampler())
study.optimize(lambda trial : objective(trial,X_train,y_train,X_test,y_test),n_trials= 50)
It keeps giving me the following error:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\JaneStreet\lib\site-packages\optuna\_optimize.py", line 217, in _run_trial
value_or_values = func(trial)
File "<ipython-input-74-c1454daaa53e>", line 2, in <lambda>
study.optimize(lambda trial : objective(trial,X_train,y_train,X_test,y_test),n_trials= 50)
File "<ipython-input-73-4438e1db47ef>", line 4, in objective
"n_estimators" : Trial.suggest_int("n_estimators", 0, 1000),
TypeError: suggest_int() missing 1 required positional argument: 'high'
Thanks so much

The problem is that you are calling suggest_int on the class Trial as if it were a class/static method. suggest_int is a regular method and should be called on an object, in this case trial. Changing Trial.suggest_int to trial.suggest_int should get rid of the error.

What about below. I just changed the params after objective and changed Trial to trial.
def objective(trial,X_train,y_train,X_test,y_test):
param = {
"n_estimators" : trial.suggest_int("n_estimators", 0, 1000),
'max_depth':trial.suggest_int('max_depth', 2, 25),
'reg_alpha':trial.suggest_int('reg_alpha', 0, 5),
'reg_lambda':trial.suggest_int('reg_lambda', 0, 5),
'min_child_weight':trial.suggest_int('min_child_weight', 0, 5),
'gamma':trial.suggest_int('gamma', 0, 5),
'learning_rate':trial.suggest_loguniform('learning_rate',0.005,0.5),
'colsample_bytree':trial.suggest_discrete_uniform('colsample_bytree',0.1,1,0.01),
'nthread' : -1
}

"n_estimators" : trial.suggest_int("n_estimators", 0, 1000, 20) where
0 is the starting range,
1000 is the ending range, and
20 is the step difference

Related

Populating numpy array with most performance

I have few arrays a,b,c and d as shown below and would like to populate a matrix by evaluating a function f(...) which consumes a,b,c and d.
with nested for loop this is obviously possible but I'm looking for more pythonic and fast way to do this.
So far I tried, np.fromfunction with no luck.
Thanks
PS: This function f has a conditional. I still can consider approaches which does not support conditionals but if the solution supports conditionals that would be fantastic.
example function in case helpful
def fun(a,b,c,c): return a+b+c+d if a==b else a*b*c*d
Also why fromfunction failed is shown below
>>> a = np.array([1,2,3,4,5])
>>> b = np.array([10,20,30])
>>> def fun(i,j): return a[i] * b[j]
>>> np.fromfunction(fun, (3,5))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Anaconda3\lib\site-packages\numpy\core\numeric.py", line 1853, in fromfunction
return function(*args, **kwargs)
File "<stdin>", line 1, in fun
IndexError: arrays used as indices must be of integer (or boolean) type
The reason the function fails is that np.fromfunction passes floating-point values, which are not valid as indices. You can modify your function like this to make it work:
def fun(i,j):
return a[j.astype(int)] * b[i.astype(int)]
print(np.fromfunction(fun, (3,5)))
[[ 10 20 30 40 50]
[ 20 40 60 80 100]
[ 30 60 90 120 150]]
Jake has explained why your fromfunction approach fails. However, you don't need fromfunction for your example. You could simply add an axis to b and have numpy broadcast the shapes:
a = np.array([1,2,3,4,5])
b = np.array([10,20,30])
def fun(i,j): return a[j.astype(int)] * b[i.astype(int)]
f1 = np.fromfunction(fun, (3, 5))
f2 = b[:, None] * a
(f1 == f2).all() # True
Extending this to the function you showed that contains an if condition, you could just split the if into two operations in sequence: creating an array given by the if expression, and overwriting the relevant parts by the else expression.
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
c = np.array([100, 200, 300, 400, 500])
d = np.array([0, 1, 2, 3])
# Calculate the values at all indices as the product
result = d[:, None] * (a * b * c)
# array([[ 0, 0, 0, 0, 0],
# [ 500, 1600, 2700, 3200, 2500],
# [1000, 3200, 5400, 6400, 5000],
# [1500, 4800, 8100, 9600, 7500]])
# Calculate sum
sum_arr = d[:, None] + (a + b + c)
# array([[106, 206, 306, 406, 506],
# [107, 207, 307, 407, 507],
# [108, 208, 308, 408, 508],
# [109, 209, 309, 409, 509]])
# Set diagonal elements (i==j) to sum:
np.fill_diagonal(result, np.diag(sum_arr))
which gives the following result:
array([[ 106, 0, 0, 0, 0],
[ 500, 207, 2700, 3200, 2500],
[1000, 3200, 308, 6400, 5000],
[1500, 4800, 8100, 409, 7500]])

model.predict(sample) returning TypeError: cannot perform reduce with flexible type

I keep running into this error when I try to predict based on fitted model.
training, testing = train_test_split(gesture, test_size = 0.2, random_state = 0)
x = training.drop('CLASS', axis = 1) # remove the Class column from Training dataframe
y = testing.drop('CLASS', axis = 1) # remove the Class column from Testing dataframe
f_train = x.values.tolist()
l_train = training['CLASS'].values.tolist() # make a list of class identifiers from Training dataframe
f_test = y.values.tolist()
knn = KNeighborsRegressor(n_neighbors = 5)
knn.fit(f_train, l_train)
predictions = knn.predict(f_test)
The error occurs in the last line of the above code and the error message is given below:
Traceback (most recent call last):
File "C:\Users\Umair Khan\Dropbox\`Shift betweeen PCs\Work\EMG Hand Gesture\Codes\ML_on_CSV.py", line 39, in <module>
predictions = knn.predict(f_test)
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\neighbors\_regression.py", line 185, in predict
y_pred = np.mean(_y[neigh_ind], axis=1)
File "<__array_function__ internals>", line 6, in mean
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\fromnumeric.py", line 3335, in mean
out=out, **kwargs)
File "C:\Users\Umair Khan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\numpy\core\_methods.py", line 151, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: cannot perform reduce with flexible type
f_test is a list of lists like such [[16, 30, 35, 250, -1, 0.5, 35, 0.03, 0.02], [16, 30, 35, 250, -1, 0.5, 35, 0.03, 0.02]]
I have also tried passing an array in predict(sample) but the issue still remains.
predictions = knn.predict(np.array(f_test).astype(np.float))
We need to see more of the error traceback. And info on the function inputs, particularly shape and dtype.
I've seen this error message when working with structured arrays. But it's not obvious where those might arise in your code.
In [15]: np.ones((2,), dtype='i,i')
Out[15]: array([(1, 1), (1, 1)], dtype=[('f0', '<i4'), ('f1', '<i4')])
In [16]: np.sum(np.ones((2,), dtype='i,i'))
....
---> 87 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
88
89
TypeError: cannot perform reduce with flexible type
Solved:
changed dtype of l_train from string to float and the error disappeared. f_train and f_test were already of type float.

How to implement ufunc 'matmul' using Numpy-C API?

I am trying to implement a custom datatype with built-in support for numpy using Numpy-C API.
My code is based on the quaterion implementation by https://github.com/moble/quaternion.
Currently that project has supported functions like 'add', 'multiply', 'divide', 'power'.
>>> a = np.random.rand(2, 2).astype(np.quaternion)
>>> a
array([[quaternion(0.135325974754691, 0, 0, 0),
quaternion(0.536005227432872, 0, 0, 0)],
[quaternion(0.86691238892986, 0, 0, 0),
quaternion(0.306838076780839, 0, 0, 0)]], dtype=quaternion)
>>> a * a # multiply per element
array([[quaternion(0.0183131194433073, 0, 0, 0),
quaternion(0.287301603835365, 0, 0, 0)],
[quaternion(0.751537090080076, 0, 0, 0),
quaternion(0.0941496053625638, 0, 0, 0)]], dtype=quaternion)
But found it really difficult to support matmul (#) using Numpy C API. And the quaternion project does not support the matmul ufunc:
>>> a # a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'matmul' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Is there any Demo code or Document that can help me to write support for np.matmul?

matplotlib error: ValueError: x and y must have same first dimension

I am trying to graph two lists with matplotlib but I am getting an error regarding the dimension of x and y. One of the lists contains dates and the other numbers, you can see the content of the lists, I have printed them below.
I have tried checking the length of the lists with len() and they seem to be equal so I am a bit lost. I have checked several theads on this error without much luck.
Note: "query" contains my SQL query which I have not included for simplicity.
##### My code
t = 0
for row in query:
data = query[t]
date.append(data[0])
close.append(data[1])
t = t + 1
print "date = ", date
print "close = ", close
print "date length = ", len(date)
print "close length = ", len(close)
def plot2():
plt.plot(date, close)
plt.show()
plot2()
#
Output of my script:
date = [datetime.datetime(2010, 1, 31, 22, 0), datetime.datetime(2010, 1, 31, 22, 1), datetime.datetime(2010, 1, 31, 22, 2), datetime.datetime(2010, 1, 31, 22, 3), datetime.datetime(2010, 1, 31, 22, 4), datetime.datetime(2010, 1, 31, 22, 5), datetime.datetime(2010, 1, 31, 22, 6), datetime.datetime(2010, 1, 31, 22, 7), datetime.datetime(2010, 1, 31, 22, 8), datetime.datetime(2010, 1, 31, 22, 9), datetime.datetime(2010, 1, 31, 22, 10)]
close = [1.5945, 1.5946, 1.59465, 1.59505, 1.59525, 1.59425, 1.5938, 1.59425, 1.59425, 1.5939, 1.5939]
date length = 11
close length = 11
Traceback (most recent call last):
File "script.py", line 234, in <module>
plot2()
File "script.py", line 231, in plot2
plt.plot(date, close)
File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 2467, in plot
ret = ax.plot(*args, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 3893, in plot
for line in self._get_lines(*args, **kwargs):
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 322, in _grab_next_args
for seg in self._plot_args(remaining, kwargs):
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 300, in _plot_args
x, y = self._xy_from_xy(x, y)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 240, in _xy_from_xy
raise ValueError("x and y must have same first dimension")
ValueError: x and y must have same first dimension
Thanks in advance.
Works for me with your data.
Change your code and put the print statements inside the function.
def plot2():
print "date = ", date
print "close = ", close
print "date length = ", len(date)
print "close length = ", len(close)
plt.plot(date, close)
plt.show()
There must be something happing your code does not show.

how to vectorize an operation on a 1 dimensionsal array to produce 2 dimensional matrix in numpy

I have a 1d array of values
i = np.arange(0,7,1)
and a function
# Returns a column matrix
def fn(i):
return np.matrix([[i*2,i*3]]).T
fnv = np.vectorize(fn)
then writing
fnv(i)
gives me an error
File "<stdin>", line 1, in <module>
File "c:\Python33\lib\site-packages\numpy\lib\function_base.py",
line 1872, in __call__
return self._vectorize_call(func=func, args=vargs)
File "c:\Python33\lib\site-packages\numpy\lib\function_base.py",
line 1942, in _vectorize_call
copy=False, subok=True, dtype=otypes[0])
ValueError: setting an array element with a sequence.
The result I am looking for is a matrix with two rows and as many columns as in the input array. What is the best notation in numpy to achieve this?
For example i would equal
[1,2,3,4,5,6]
and the output would equal
[[2,4,6,8,10,12],
[3,6,9,12,15,18]]
EDIT
You should try to avoid using vectorize, because it gives the illusion of numpy efficiency, but inside it's all python loops.
If you really have to deal with user supplied functions that take ints and return a matrix of shape (2, 1) then there probably isn't much you can do. But that seems like a really weird use case. If you can replace that with a list of functions that take an int and return an int, and that use ufuncs when needed, i.e. np.sin instead of math.sin, you can do the following
def vectorize2(funcs) :
def fnv(arr) :
return np.vstack([f(arr) for f in funcs])
return fnv
f2 = vectorize2((lambda x : 2 * x, lambda x : 3 * x))
>>> f2(np.arange(10))
array([[ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18],
[ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27]])
Just for your reference, I have timed this vectorization against your proposed one:
f = vectorize(fn)
>>> timeit.timeit('f(np.arange(10))', 'from __main__ import np, f', number=1000)
0.28073329263679625
>>> timeit.timeit('f2(np.arange(10))', 'from __main__ import np, f2', number=1000)
0.023139129945661807
>>> timeit.timeit('f(np.arange(10000))', 'from __main__ import np, f', number=10)
2.3620706288432984
>>> timeit.timeit('f2(np.arange(10000))', 'from __main__ import np, f2', number=10)
0.002757072593169596
So there is an order of magnitude in speed even for small arrays, that grows to a x1000 speed up, available almost for free, for larger arrays.
ORIGINAL ANSWER
Don't use vectorize unless there is no way around it, it's slow. See the following examples
>>> a = np.array(range(7))
>>> a
array([0, 1, 2, 3, 4, 5, 6])
>>> np.vstack((a, a+1))
array([[0, 1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6, 7]])
>>> np.vstack((a, a**2))
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 0, 1, 4, 9, 16, 25, 36]])
Whatever your function is, if it can be constructed with numpy's ufuncs, you can do something like np.vstack((a, f(a))) and get what you want
A simple reimplementation of vectorize gives me what I want
def vectorize( fn):
def do_it (array):
return np.column_stack((fn(p) for p in array))
return do_it
If this is not performant or there is a better way then let me know.