Related
I have a pd.multiindex which looks like this:
However, when I use the run check_raise(df_train, mtype="pd-multiindex)"
I get the following error:
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/sktime/datatypes/_check.py:252, in check_raise(obj, mtype, scitype, var_name)
250 return True
251 else:
--> 252 raise TypeError(msg)
TypeError: input.loc[i] must be Series of mtype pd.DataFrame, not at i=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
I believe this means I am meant to convert each row into a pandas series, but I am unsure if this is correct?
Any help would be appreciated.
I have similar issue, try to check if your index have duplicate keys, in your case:
df_train.reset_index(['sbj', 'system_time_stamp'])[['sbj', 'system_time_stamp']].duplicated(keep=False)
Remove duplicated index works for me.
Suppose I have a 1d array arr. I want to slice this array at every nth element but beyond the normal array bounds. An example is below:
import numpy as np
arr = np.arange(20)
n = 5
print(arr[0:n*20:n]) # not how it works just what I thought it would be like
#Desired output
#[0, 5, 10, 15, 0, 5, 10, 15, 0, 5, 10, 15, 0, 5, 10, 15, 0, 5, 10, 15]
#Actual output
#[0, 5, 10, 15]
How do I go beyond the regular bounds of the array such that the new index is basically just index%arr.size?
I have the following code of Optuna to do the hyperparameter tunning for a Xgboost classifier.
import optuna
from optuna import Trial, visualization
from optuna.samplers import TPESampler
from xgboost import XGBClassifier
def objective(trial: Trial,X_train,y_train,X_test,y_test):
param = {
"n_estimators" : Trial.suggest_int("n_estimators", 0, 1000),
'max_depth':Trial.suggest_int('max_depth', 2, 25),
'reg_alpha':Trial.suggest_int('reg_alpha', 0, 5),
'reg_lambda':Trial.suggest_int('reg_lambda', 0, 5),
'min_child_weight':Trial.suggest_int('min_child_weight', 0, 5),
'gamma':Trial.suggest_int('gamma', 0, 5),
'learning_rate':Trial.suggest_loguniform('learning_rate',0.005,0.5),
'colsample_bytree':Trial.suggest_discrete_uniform('colsample_bytree',0.1,1,0.01),
'nthread' : -1
}
model = XGBClassifier(**param)
model.fit(X_train,y_train)
return cross_val_score(model,X_test,y_test).mean()
study = optuna.create_study(direction='maximize',sampler=TPESampler())
study.optimize(lambda trial : objective(trial,X_train,y_train,X_test,y_test),n_trials= 50)
It keeps giving me the following error:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\JaneStreet\lib\site-packages\optuna\_optimize.py", line 217, in _run_trial
value_or_values = func(trial)
File "<ipython-input-74-c1454daaa53e>", line 2, in <lambda>
study.optimize(lambda trial : objective(trial,X_train,y_train,X_test,y_test),n_trials= 50)
File "<ipython-input-73-4438e1db47ef>", line 4, in objective
"n_estimators" : Trial.suggest_int("n_estimators", 0, 1000),
TypeError: suggest_int() missing 1 required positional argument: 'high'
Thanks so much
The problem is that you are calling suggest_int on the class Trial as if it were a class/static method. suggest_int is a regular method and should be called on an object, in this case trial. Changing Trial.suggest_int to trial.suggest_int should get rid of the error.
What about below. I just changed the params after objective and changed Trial to trial.
def objective(trial,X_train,y_train,X_test,y_test):
param = {
"n_estimators" : trial.suggest_int("n_estimators", 0, 1000),
'max_depth':trial.suggest_int('max_depth', 2, 25),
'reg_alpha':trial.suggest_int('reg_alpha', 0, 5),
'reg_lambda':trial.suggest_int('reg_lambda', 0, 5),
'min_child_weight':trial.suggest_int('min_child_weight', 0, 5),
'gamma':trial.suggest_int('gamma', 0, 5),
'learning_rate':trial.suggest_loguniform('learning_rate',0.005,0.5),
'colsample_bytree':trial.suggest_discrete_uniform('colsample_bytree',0.1,1,0.01),
'nthread' : -1
}
"n_estimators" : trial.suggest_int("n_estimators", 0, 1000, 20) where
0 is the starting range,
1000 is the ending range, and
20 is the step difference
I have a frequency analysis of words said in episodes of my favorite show. I'm making a plot.barh(s1e1_y, s1e1_x) but it's sorting by words instead of values.
The output of >>> s1e1_y
is
['know', 'go', 'now', 'here', 'gonna', 'can', 'them', 'think', 'come', 'time', 'got', 'elliot', 'talk', 'out', 'night', 'been', 'then', 'need', 'world', "what's"]
and >>>s1e1_x
[42, 30, 26, 25, 24, 22, 20, 19, 19, 18, 18, 18, 17, 17, 15, 15, 14, 14, 13, 13]
When the plots are actually plotted, the graph's y axis ticks are sorted alphabetically even though the plotting list is unsorted...
s1e1_wordlist = []
s1e1_count = []
for word, count in s1e01:
if((word[:-1] in excluded_words) == False):
s1e1_wordlist.append(word[:-1])
s1e1_count.append(int(count))
s1e1_sorted = sorted(list(sorted(zip(s1e1_count, s1e1_wordlist))),
reverse=True)
s1e1_20 = []
for i in range(0,20):
s1e1_20.append(s1e1_sorted[i])
s1e1_x = []
s1e1_y = []
for count, word in s1e1_20:
s1e1_x.append(word)
s1e1_y.append(count)
plot.figure(1, figsize=(20,20))
plot.subplot(341)
plot.title('Season1 : Episode 1')
plot.tick_params(axis='y',labelsize=8)
plot.barh(s1e1_x, s1e1_y)
From matplotlib 2.1 on you can plot categorical variables. This allows to plot plt.bar(["apple","cherry","banana"], [1,2,3]). However in matplotlib 2.1 the output will be sorted by category, hence alphabetically. This was considered as bug and is changed in matplotlib 2.2 (see this PR).
In matplotlib 2.2 the bar plot would hence preserve the order.
In matplotlib 2.1, you would plot the data as numeric data as in any version prior to 2.1. This means to plot the numbers against their index and to set the labels accordingly.
w = ['know', 'go', 'now', 'here', 'gonna', 'can', 'them', 'think', 'come',
'time', 'got', 'elliot', 'talk', 'out', 'night', 'been', 'then', 'need',
'world', "what's"]
n = [42, 30, 26, 25, 24, 22, 20, 19, 19, 18, 18, 18, 17, 17, 15, 15, 14, 14, 13, 13]
import matplotlib.pyplot as plt
import numpy as np
plt.barh(range(len(w)),n)
plt.yticks(range(len(w)),w)
plt.show()
Ok you seem to have a lot of spurious code in your example which isn't relevant to the problem as you've described it but assuming you don't want the y axis to sort alphabetically then you need to zip your two lists into a dataframe then plot the dataframe as follows
df = pd.DataFrame(list(zip(s1e1_y,s1e1_x))).set_index(1)
df.plot.barh()
This then produces the following
I am using version 1.5.1 of numpy and Python 2.6.6.
I am reading a binary file into a numpy array:
>>> dt = np.dtype('<u4,<i2,<i2,<i2,<i2,<i2,<i2,<i2,<i2,u1,u1,u1,u1')
>>> file_data = np.fromfile(os.path.join(folder,f), dtype=dt)
This works just fine. Examining the result:
>>> type(file_data)
<type 'numpy.ndarray'>
>>> file_data
array([(3571121L, -54, 103, 1, 50, 48, 469, 588, -10, 0, 102, 0, 0),
(3571122L, -78, 20, 25, 45, 44, 495, 397, -211, 0, 102, 0, 0),
(3571123L, -69, -48, 23, 60, 19, 317, -26, -151, 0, 102, 0, 0), ...,
(3691138L, -53, 52, -2, -11, 76, 988, 288, -101, 1, 102, 0, 0),
(3691139L, -11, 21, -27, 25, 47, 986, 253, 176, 1, 102, 0, 0),
(3691140L, -30, -19, -63, 59, 12, 729, 23, 302, 1, 102, 0, 0)],
dtype=[('f0', '<u4'), ('f1', '<i2'), ('f2', '<i2'), ... , ('f12', '|u1')])
>>> file_data[0]
(3571121L, -54, 103, 1, 50, 48, 469, 588, -10, 0, 102, 0, 0)
>>> file_data[0][0]
3571121
>>> len(file_data)
120020
When I try to slice the first column:
>>> file_data[:,0]
I get:
IndexError: invalid index.
I have looked at simple examples and was able to do the slicing:
>>> a = np.array([(1,2,3),(4,5,6)])
>>> a[:,0]
array([1, 4])
The only difference I can see between my case and the simple example is that I am using the dtype. What I am doing wrong?
When you set the dtype like that, you are creating a Record Array. Numpy treats that like a 1D array of elements of your dtype. There's a fundamental difference between
file_data[0][0]
and
file_data[0,0]
In the first, you are asking for the first element of a 1D array and then retrieving the first element of that returned element. In the second, you are asking for the element in the first row of the first column of a 2D array. That's why you are getting the IndexError.
If you want to access an individual element using 2D notation, you can create a view and work with that. Unfortunately, AFAIK if you want to treat your object like a 2D array, all elements have to have the same dtype.