I have a problem with polyfit function. My data is:
value_to_cycle_slip_x_1 = [0.0, 30.0, 60.0, 90.0, 120.0, 150.0, 180.0, 210.0, 240.0, 270.0]
value_to_cycle_slip_y_1 = [1.4108499772846699, 1.410405956208706, 1.4104186482727528, 1.4109007231891155, 1.4058293923735619, 1.4069204106926918, 1.4082905240356922, 1.4050713926553726, 1.405217282474041, 1.4059784598648548]
And my function is:
a_coef_cycle_slip, b_coef_cycle_slip, c_coef_cycle_slip = polyfit(value_to_cycle_slip_x_1,value_to_cycle_slip_y_1,2).
When I use it in Python Console everything is ok, but when I use it in my script (executable) I receive an error:
"numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares".
Also, I try to use it on three different computers. With two (laptop) works normally, but, when I try to use it in stationary PC it does not work.
Maybe someone has already encountered such a problem and knows how to solve it?
I have experienced the same issue. I've found that after excluding rows of zeroes, the error message disappears.
df = df[df.column > 0]
Related
I have some problems when I was practicing how to use xgboost.
As I know, the "DMatrix" is a special internal structure that makes the model run faster.
Here's the problem:
To tune the model, (I guess) GridSearchCV or RandomizedSearchCV are considerable.
With the code below:
params = {
'min_child_weight': [1, 5, 10],
'gamma': [0.5, 1, 1.5, 2, 5],
'subsample': [0.6, 0.8, 1.0],
'colsample_bytree': [0.6, 0.8, 1.0],
'max_depth': [3, 4, 5]
}
random_search = RandomizedSearchCV(xgb, param_distributions=params, n_iter=param_comb, scoring='roc_auc', n_jobs=4, cv=skf.split(X,Y), verbose=3, random_state=1001 )
I can also do the cross validation by passing cv. That was great.
However, it really takes time (almost 40 mins with big data and colab gpu) and I really want to improve it.
After I transform my train data to DMatrix:
xgbtrain = xgb.DMatrix(train_x, train_y)
I'm not knowing what to do next because the .fit requires X and y..
How to do that? Or any way to make it faster?
Thanks
This question is pretty old, so I suspect you may have already found an answer. XGBoost can be tricky to navigate the different options when incorporating CV or parameter tuning.
Instead of using xgb.fit() you can use xgb.train() to utilize the DMatrix object. Additionally, XGB has xgb.cv() for performing a cross validation. I myself am hoping to find an alternative to GridSearchCV, but I don't think there is one. The best method may be to create a loop of xgb.cv() to compare evaluation results and identify the best performing parameters.
XGB has really helpful documentation, you may want to check outXGB Python Intro: Training and Cross Validation Demo
Try Optuna for hyperparameter tuning of XGBoost, much much faster, and use gpu (tree_method = gpu_hist). Kaggle has free GPU every week.
I working with a multilabel classification problem, using Keras, scikit-learn, etc...
My dataframe contain 4000 microscopic oil samples, with images and 13 different labels for which problem find in those samples.
Actually i convert all images and labels to numpy array.
Example of one labeled image:
[ 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0 ]
In this label, if position is equal to 1, that means the current sample have a specific problem, like some particles in oil, and as you can see, it's possible the sample have more than one output.
Well, the problem is, my dataframe are imbalanced and i need to apply Class Weight method, but before, looking at the labels, i think i need to use like: [ 0, 1, 0, 0, ... ], not like the example i gave above.
Detail, i can run my neural network code without class weight, works well, but i can't train all the model with that imbalanced data.
Already tryed working using lists, unsuccessfully!
Of course, i have problems with shape, images have in example: (1000, 100, 200, 3) and labels (1000, 13); Thats why i can't apply Class Weight too...
There is a few problems i trying to fix.
I will post my code, because i stuck and i don't know what to do.
class_weight_list = compute_class_weight('balanced',np.unique(Y_train), Y_train)
class_weight = dict(zip(np.unique(Y_train), class_weight_list))
Y_train = to_categorical(Y_train,num_classes=len(np.unique(Y_train)))
main.py
dataset.py
models.py
What is the best strategy to work with labels in this case?
I appreciate if someone can help me.
Thanks in advance!!
bullets' trajectory comparison
I'm a new python user. I'm using this powerful code to do scientific research and data analysis.
I'm writing my thesis in physics, I'm trying to describe and analyze the external ballistics behind the bullet flight.
I'm using matplotlib to draw graphics representing the bullet's parabolic path and the related cross points; given that I'd like to know if there is a special code to smooth up the graphic lines drawn following the real experimental data avoiding to have a graphic made by a lot of linear segments.
Thanks a lot to all of you!
Francesco
All right Francesco, thanks for uploading the image. Now, let's have some fun with coding.
As first I suggest to use the numpy function to fit a polynomial curve of a certain degree to a set of value: np.polyfit(). Be aware of the degree you set as the results can widely change. For more information, please take a look at this documentation: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.polyfit.html
Then, in order to smooth your curve down, you need to increase the number of point to draw the function with np.linspace() and use this new set to apply the
function np.poly1d() (it calculates the y coordinates based on the fitting you did with polyfit).
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x = [0, 50, 100, 150, 200, 250]
y = [-1, 0.8, 1.9, 1.6, 0, -3]
z = np.polyfit(x, y, 2)
p = np.poly1d(z)
xp = np.linspace(-2, 255)
plt.plot(x, y, '.', xp, p(xp), '-')
plt.show()
I have the following issue with finding the roots of a non-linear equation. The equation is the following:
tanh[ 5* log [ (2/t)^(0.00990099) (1+x)^(0.990099) (1-x)^(-1) ] ]-x = 0
Solving this with NSolve, for {t, 0, 100} returns the following with Mathematica:
This what I was expecting by plotting the resulting roots versus the time parameter within this range. Now, I have tried to replicate this result with Python by using scipy.optimize.root but it seems that my code returns as a solution any value that I use as an initial condition, hence it is nothing else that the identity map. This can be also see in the pic below, where I used an initial condition 0.7:
I have provided the code below:
import math
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import root
#Setting up the function
def delta(v,t):
epsilon = 10**(-20)
return np.tanh( 5*np.log( (2/(1.0*t+epsilon))**(0.00990099)*(1+v+epsilon)**(0.990099)*(1-v+epsilon)**(-1)))-v
#Setting up time paramerer
time = np.linspace(0, 101)
res = [root(delta, 0.7, args=(t, )).x[0] for t in time]
print res
plt.plot(time, res)
plt.savefig("plot.png")
I am not really sure if I am using the scipy.optimize.root correct, since the function looks ok as far as what I expect from its behaviour. Perhaps a mistake in the way I pass the args?
The root-finding methods that begin with a bracketing interval [a, b] (one where f(a) and f(b) have opposite signs) are generally more robust than the methods that begin with a single point x0 of departure. The reason is that the former have a definite field to work with, and can refine it iteratively. The bisection method is a classical example of these, but it's slow. SciPy implements more sophisticated methods such as brentq. It works fine here, with the bracket of [-0.1, 0.1] (which should be enough from looking from the Mathematica plot).
Also, t=0 is problematic in the equation, as it's not even defined then. Put a small positive number like 0.01 instead.
time = np.linspace(0.01, 101, 500)
res = [brentq(delta, -0.1, 0.1, args=(t, )) for t in time]
I'm using pandas to work with a data set and am tring to use a simple line plot with error bars to show the end results. It's all working great except that the plot looks funny.
By default, it will put my 2 data groups at the far left and right of the plot, which obscures the error bar to the point that it's not useful (the error bars in this case are key to intpretation so I want them plainly visible).
Now, I fix that problem by setting xlim to open up some space on either end of the x axis so that the error bars are plainly visible, but then I have an offset from where the x labels are to where the actual x data is.
Here is a simplified example that shows the problem:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df6 = pd.DataFrame( [-0.07,0.08] , index = ['A','B'])
df6.plot(kind='line', linewidth=2, yerr = [ [0.1,0.1],[0.1,0.1 ] ], elinewidth=2,ecolor='green')
plt.xlim(-0.2,1.2) # Make some room at ends to see error bars
plt.show()
I tried to include a plot (image) showing the problem but I cannot post images yet, having just joined up and do not have anough points yet to post images.
What I want to know is: How do I shift these labels over one tick to the right?
Thanks in advance.
Well, it turns out I found a solution, which I will jsut post here in case anyone else has this same issue in the future.
Basically, it all seems to work better in the case of a line plot if you just specify both the labels and the ticks in the same place at the same time. At least that was helpful for me. It sort of forces you to keep the length of those two lists the same, which seems to make the assignment between ticks and labels more well behaved (simple 1:1 in this case).
So I coudl fix my problem by including something like this:
plt.xticks([0, 1], ['A','B'] )
right after the xlim statement in code from original question. Now the A and B align perfectly with the place where the data is plotted, not offset from it.
Using above solution it works, but is less good-looking since now the x grid is very coarse (this is purely and aesthetic consideration). I could fix that by using a different xtick statement like:
plt.xticks([-0.2, 0, 0.2, 0.4, 0.6, 0.8, 1.0], ['','A','','','','','B',''])
This gives me nice looking grid and the data where I need it, but of course is very contrived-looking here. In the actual program I'd find a way to make that less clunky.
Hope that is of some help to fellow seekers....