Drawing random data from a given two dimensional distribution function [closed] - numpy

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have a theoretical distribution and I want to randomly sample in 2D space for the following distribution :
def p(z,m):
E = { 'ft':0.55, 'alpha': 2.99, 'z0':0.191, 'km':0.089, 'kt':0.25 }
S = { 'ft':0.39, 'alpha': 2.15, 'z0':0.121, 'km':0.093, 'kt':-0.175 }
I={ 'ft':0.06, 'alpha': 1.77, 'z0':0.045, 'km':0.096, 'kt':0.0 }
Evalue=E['ft']*np.exp(-1*E['kt']*(m-20))*z**E['alpha']*np.exp(-1*(z/(E['z0']+E['km']*(m-20)))**E['alpha'])
Svalue=S['ft']*np.exp(-1*S['kt']*(m-20))*z**S['alpha']*np.exp(-1*(z/(S['z0']+S['km']*(m-20)))**S['alpha'])
Ivalue=I['ft']*np.exp(-1*I['kt']*(m-20))*z**I['alpha']*np.exp(-1*(z/(I['z0']+I['km']*(m-20)))**I['alpha'])
value=Evalue+Svalue+Ivalue
return value
Update:
I figured out that inverse transform sampling is the appropriate approach to sample data from a probability distribution.
How could I program this method in python for 2D data or is there any library that I can use?

Take a look at Markov chain Monte Carlo (MCMC) methods. Basically you jump around the space of (z, m) points. From wherever you are, you always accept a jump that increases p(z, m). You accept a jump which decreases p(z, m) with some probability. There is a Python library PyMC to carry out that process.

If you want to randomly sample a value from p(z,m), then a simple way to implement this would be to use the 'random' module in python. I show the idea using numpy's version of random:
import numpy as np
import matplotlib.pyplot as plt
def p(z,m):
E = { 'ft':0.55, 'alpha': 2.99, 'z0':0.191, 'km':0.089, 'kt':0.25 }
S = { 'ft':0.39, 'alpha': 2.15, 'z0':0.121, 'km':0.093, 'kt':-0.175 }
I={ 'ft':0.06, 'alpha': 1.77, 'z0':0.045, 'km':0.096, 'kt':0.0 }
Evalue=E['ft']*np.exp(-1*E['kt']*(m-20))*z**E['alpha']*np.exp(-1*(z/(E['z0']+E['km']*(m-20)))**E['alpha'])
Svalue=S['ft']*np.exp(-1*S['kt']*(m-20))*z**S['alpha']*np.exp(-1*(z/(S['z0']+S['km']*(m-20)))**S['alpha'])
Ivalue=I['ft']*np.exp(-1*I['kt']*(m-20))*z**I['alpha']*np.exp(-1*(z/(I['z0']+I['km']*(m-20)))**I['alpha'])
value=Evalue+Svalue+Ivalue
return value
# Define the number of iterations you want for each variable
num_iter_m = 50
num_iter_z = 50
# I then set rand_m to go from 20 to 30, as your function fails for <20
rand_m = (np.random.random(num_iter_m)*10)+20
# z goes from the range 0 - 1
rand_z = (np.random.random(num_iter_z))
# Note, I am sampling from a uniform distribution for m and z. You can use more complicated functions, i.e., Gaussian/Normal shapes or even user defined.
rand_p = np.zeros((len(rand_z), len(rand_m)))
# Fill a grid with the random p(z,m) values
for i in range(len(rand_z)):
for j in range(len(rand_m)):
rand_p[i][j] = p(rand_z[i], rand_m[j])
# Plot
fig = plt.figure(0)
ax1 = fig.add_subplot(211)
ax1.scatter(rand_z, rand_m)
ax1.set_xlabel("z")
ax1.set_ylabel("m")
ax2 = fig.add_subplot(212)
cf = ax2.contourf(rand_z, rand_m, rand_p)
ax2.set_xlabel("z")
ax2.set_ylabel("m")
colbar = plt.colorbar(cf)
colbar.set_label("p(z,m)")
plt.show()
A specific module to use to it in a more sophisticated way would be, e.g., PyMC (
https://github.com/pymc-devs/pymc) or emcee (http://dan.iel.fm/emcee/current/).
If you wanted to sample z and m weighted by the 2D function p(z,m) this is a little more complicated.

Related

How to get seaborn or matplotlib bar plots to flip direction of bars at 1 instead of the origin 0? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 months ago.
Improve this question
In the data I'm working with, it's more valuable to see what values are greater than or less than one, and I want to show this with bars that go in opposite directions. This happens naturally for values greater than or equal to zero. How do I change this?
So far I have visual solution, but the axis values then aren't correct.
plt.barh(weights_df['Variable'],weights_df['Odds Ratio']-1, color="Purple", align='edge', label='Odds Ratio')
plt.xlabel('Odds Ratio')
plt.ylabel('Variable')
plt.title("Odds Ratios")
plt.show()
Sample data:
weights = {
'Age': 0.42,
'Location': 1.5,
'Smoke': 2.9,
'Lesion': 0.22,
}
with the given data and that it is easy to plot works well naturally well with zero, one way to show the plot is to do what you have done above and edit the labels. Use get_xticklables() and set_xticklables() after adding 1 to the axis. See if this works. Code below..
weights_df = pd.read_excel('myinput.xlsx', 'Sheet56')
fig, ax1 = plt.subplots()
ax1.barh(weights_df['Variable'],weights_df['Odds Ratio'] - 1, color="Purple", align='edge', label='Odds Ratio')
labels = ax1.get_xticks().tolist() ## Get the x-axis labels
labels = [x + 1 for x in labels] ## Add 1 to each label
ax1.set_xticks(ax1.get_xticks().tolist())
ax1.set_xticklabels(labels) ## Set the x-axis labels to new values
plt.xlabel('Odds Ratio')
plt.ylabel('Variable')
plt.title("Odds Ratios")
plt.show()
Plot

Is there a python tensorflow/pytorch function like the x.toPixels tensorflowjs function? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I need a replacement for the tf.browser.toPixels() tensorflowJS function. Trying to port some code to python and I'm wondering if there is a quick way around this.
In the browser this gets really simple and we just callback new frames and draw into a canvas. But in python development, say in matplotlib or tkinter, I guess I'm gonna need some tricks.
Is there a (not super big) solution for this?
Thanks
Let say you are having a 2D tensor img which run it in your browser like tf.browser.toPixels(img). You can draw similar images using OpenCV and matplotlib like:
Using Pytorch
import matplotlib.pyplot as plt
# If your data is in GPU:
img_np = img.cpu().numpy()
# Using OpenCV
cv2.imwrite(img_np.astype(np.uint8), "image.png")
# Using matplotlib
plt.imshow(img_np)
Tensorflow
```python
import matplotlib.pyplot as plt
img_np = img.numpy()
# Using OpenCV
cv2.imwrite(img_np.astype(np.uint8), "image.png")
# Using matplotlib
plt.imshow(img_np)
Also, if you have a 3D tensor ( i.e. n x m x 3) you can still average the bands and make a 2D tensor out of it and plot them the same way.

How does GEKKO optimization with bounded variables work?

I am using GEKKO to estimate the parameters of a differential equation and I have bounded one of the variables between 0 and 1. However, when I solve the ODE, I get values outside of the bounds for this variable, so I was wondering if somebody knew how GEKKO finds the solution, as this might help me resolve the issue.
Here is the code I use to fit the data. This gives me a solution x and u where x is between 0 and 1.
However, afterwards, I try to solve the ODE using scipy.integrate.solve_ivp, with the initial value of u that I got, and the solution I get for u is not between this bounds. Since it should be unique, I am wondering what is the process that GEKKO follows to find the solution (does it proyect the values to the bound or how does it deal with this?) Any comment is very appreciated.
Here is an MVCE. If you run it you can see that with GEKKO, I get a solution between the bounds and then, when I solve the ODE with solve_ivp, I don't get the same solution. Can you explain why this happens and how can I deal with it? I want to use solve_ivp to predict the next values.
from scipy.integrate import solve_ivp
from gekko import GEKKO
import matplotlib.pyplot as plt
time=[0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357]
m = GEKKO(remote=False)
m.time= [0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357]
x_data= [0.0003777630481280617, 0.002024573836061331,\
0.0008954383363035536, 0.005331749410182463]
x = m.CV(value=x_data, lb=0); x.FSTATUS = 1 # fit to measurement
x.SPLO = 0
sigma = m.FV(value=0.5, lb= 0, ub=1); sigma.STATUS=1
d = m.Param(0.05)
k = m.Param(0.001)
b = m.Param(0.5)
r = m.FV(value=0.5, lb= 0); r.STATUS=1
m_param = m.Param(1)
u = m.Var(value=0.1, lb=0, ub=1)
m.free(u)
a = m.Param(0.999)
Kmax= m.Param(100000)
m.Equations([x.dt()==x*(r*(1-a*u**2)*(1-x/(Kmax*(1-a*u**2)))-\
m_param/(k+b*u)-d), u.dt() == \
sigma*((-2*a*(b**2)*r*(u**3)+4*a*b*k*r*(u**2)\
+2*a*(k**2)*r*u-b*m_param)/((b*u+k)**2))])
m.options.IMODE = 5 # dynamic estimation
m.options.NODES = 5 # collocation nodes
m.options.EV_TYPE = 1 # linear error (2 for squared)
m.solve(disp=False, debug=False) # display solver output
def model_case_3(t, z, r, k, b, Kmax, sigma):
m=1
a=0.999
x, u= z
dxdt = x*(r*(1-a*u**2)*(1-x/(Kmax*(1-a*u**2)))-m/(k+b*u)-0.05)
dudt = sigma*((-2*a*(b**2)*r*(u**3)+4*a*b*k*r*(u**2)\
+2*a*(k**2)*r*u-b*m)/((b*u+k)**2))
return [dxdt, dudt]
sol = solve_ivp(fun=model_case_3, t_span=[0.0, 0.2356902356902357],\
y0=[0.0003777630481280617, u.value[0]],\
t_eval=[0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357], \
args=(r.value[0], 0.001, 0.5,1000000 , sigma.value[0]))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,3), constrained_layout=True)
ax1.set_title('x')
ax1.plot(time, x.value, time, sol['y'][0])
ax2.set_title('u')
ax2.plot(time, u.value, time, sol['y'][1])
plt.show()
It is not an issue with the version of Gekko as I have Gekko 0.2.8, so I am wondering if it has anything to do with the initialization of variables. I run the example I posted on spyder (I was using google colab) and I got the correct solution, but when I run the rest of the cases I got again negative values for u (solving with solve_ivp), which is quite strange.
You can add a bound to a variable when it is created by setting lb (lower bound) and ub (upper bound).
z = m.Var(lb=0,ub=10)
After you create the variable, the bound is adjusted with .lower and .upper.
z.LOWER = 1
z.UPPER = 9
Here is an example problem that shows the use of bounds where x is constrained to be greater than 0.5.
from gekko import GEKKO
t_data = [0, 0.1, 0.2, 0.4, 0.8, 1]
x_data = [2.0, 1.6, 1.2, 0.7, 0.3, 0.15]
m = GEKKO(remote=False)
m.time = t_data
x = m.CV(value=x_data,lb=0.5,ub=3); x.FSTATUS = 1 # fit to measurement
k = m.FV(); k.STATUS = 1 # adjustable parameter
m.Equation(x.dt()== -k * x) # differential equation
m.options.IMODE = 5 # dynamic estimation
m.options.NODES = 5 # collocation nodes
m.solve(disp=False) # display solver output
k = k.value[0]; print(k)
A plot of the results shows that the bounds are enforced but the model prediction does not fit the data because of the lower bound constraint (x>=0.5).
import numpy as np
import matplotlib.pyplot as plt # plot solution
plt.plot(m.time,x.value,'bo',\
label='Predicted (k='+str(np.round(k,2))+')')
plt.plot(m.time,x_data,'rx',label='Measured')
# plot exact solution
t = np.linspace(0,1); xe = 2*np.exp(-k*t)
plt.plot(t,xe,'k:',label='Exact Solution')
plt.legend()
plt.xlabel('Time'), plt.ylabel('Value')
plt.show()
Without the restrictive lower bound, the solver optimizes to best fit the points.
x = m.CV(value=x_data,lb=0.0,ub=3)
Response 1 to Question Edit
The only way that a variable (such as u) is outside of the bounds is if the solver did not report a successful solution. To report a successful solution, the solver must satisfy the Karush Kuhn Tucker conditions for optimality. I recommend that you check that it satisfied all of the equations by checking that m.options.APPSTATUS==1 after the m.solve() command. If you can include an MVCE (https://stackoverflow.com/help/minimal-reproducible-example) that has sample data so the script can run, we can help you check it.
Response 2 to Question Edit
Thanks for including a minimal reproducible example. Here are the results that I get with Gekko 0.2.8. If you are using an earlier version, I recommend that you upgrade with pip install gekko --upgrade.
The solver reports a successful solution.
EXIT: Optimal Solution Found.
The solution was found.
The final value of the objective function is 0.03164650667928192
---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 0.23339999999999997 sec
Objective : 0.0316473666078486
Successful solution
---------------------------------------------------
The constraints x>=0 and 0<=u<=1 are satisfied. Could it just be an issue with an older version of Gekko?

How to perform linear regression with numpy.polyfit and print error statistics? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am figuring out how to use the np.polyfit function and the documentation confuses me. In particular, I am trying to perform linear regression and print related statistics like the sum of squared errors (SSE). Can someone provide clear and concise explanations, possibly with a minimal working example?
np.polyfit returns a tuple containing the coefficients parametrizing the best-fitting polynomial of degree deg. To fit a line, use deg = 1. You can return the residual (sum of squared errors) by passing full = True as an argument to polyfit. Note that with this argument, polyfit will also return some other information about the fit, which we can just discard.
Altogether, then, we have might have something like
import matplotlib.pyplot as plt
import numpy as np
# Generate some toy data.
x = np.random.rand(25)
y = 2 * x + 0.5 + np.random.normal(scale=0.05, size=x.size)
# Fit the trend line.
(m, b), (SSE,), *_ = np.polyfit(x, y, deg=1, full=True)
# Plot the original data.
plt.scatter(x, y, color='k')
# Plot the trend line.
line_x = np.linspace(0, 1, 200)
plt.plot(line_x, m * line_x + b, color='r')
plt.title(f'slope = {round(m, 3)}, int = {round(b, 3)}, SSE = {round(SSE, 3)}')
plt.show()
The *_ notation in the call to polyfit just tells Python to discard however many additional values are returned by the function. The documentation can tell you about these extra values if you're interested. We have to parse the SSE as a tuple (SSE,) because polyfit returns it as a singleton array. This code produces something like this plot.
You might also like to know about np.polyval, which will take tuples of polynomial coefficients and evaluate the corresponding function at input points.

Exponential decay curve fitting in numpy and scipy

I'm having a bit of trouble with fitting a curve to some data, but can't work out where I am going wrong.
In the past I have done this with numpy.linalg.lstsq for exponential functions and scipy.optimize.curve_fit for sigmoid functions. This time I wished to create a script that would let me specify various functions, determine parameters and test their fit against the data. While doing this I noticed that Scipy leastsq and Numpy lstsq seem to provide different answers for the same set of data and the same function. The function is simply y = e^(l*x) and is constrained such that y=1 at x=0.
Excel trend line agrees with the Numpy lstsq result, but as Scipy leastsq is able to take any function, it would be good to work out what the problem is.
import scipy.optimize as optimize
import numpy as np
import matplotlib.pyplot as plt
## Sampled data
x = np.array([0, 14, 37, 975, 2013, 2095, 2147])
y = np.array([1.0, 0.764317544, 0.647136491, 0.070803763, 0.003630962, 0.001485394, 0.000495131])
# function
fp = lambda p, x: np.exp(p*x)
# error function
e = lambda p, x, y: (fp(p, x) - y)
# using scipy least squares
l1, s = optimize.leastsq(e, -0.004, args=(x,y))
print l1
# [-0.0132281]
# using numpy least squares
l2 = np.linalg.lstsq(np.vstack([x, np.zeros(len(x))]).T,np.log(y))[0][0]
print l2
# -0.00313461628963 (same answer as Excel trend line)
# smooth x for plotting
x_ = np.arange(0, x[-1], 0.2)
plt.figure()
plt.plot(x, y, 'rx', x_, fp(l1, x_), 'b-', x_, fp(l2, x_), 'g-')
plt.show()
Edit - additional information
The MWE above includes a small sample of the dataset. When fitting the actual data the scipy.optimize.curve_fit curve presents an R^2 of 0.82, while the numpy.linalg.lstsq curve, which is the same as that calculated by Excel, has an R^2 of 0.41.
You are minimizing different error functions.
When you use numpy.linalg.lstsq, the error function being minimized is
np.sum((np.log(y) - p * x)**2)
while scipy.optimize.leastsq minimizes the function
np.sum((y - np.exp(p * x))**2)
The first case requires a linear dependency between the dependent and independent variables, but the solution is known analitically, while the second can handle any dependency, but relies on an iterative method.
On a separate note, I cannot test it right now, but when using numpy.linalg.lstsq, I you don't need to vstack a row of zeros, the following works as well:
l2 = np.linalg.lstsq(x[:, None], np.log(y))[0][0]
To expound a bit on Jaime's point, any non-linear transformation of the data will lead to a different error function and hence to different solutions. These will lead to different confidence intervals for the fitting parameters. So you have three possible criteria to use to make a decision: which error you want to minimize, which parameters you want more confidence in, and finally, if you are using the fitting to predict some value, which method yields less error in the interesting predicted value. Playing around a bit analytically and in Excel suggests that different kinds of noise in the data (e.g. if the noise function scales the amplitude, affects the time-constant or is additive) leads to different choices of solution.
I'll also add that while this trick "works" for exponential decay to 0, it can't be used in the more general (and common) case of damped exponentials (rising or falling) to values that cannot be assumed to be 0.