Getting "ValueError: data type <class 'numpy.object_'> not inexact" error while trying to linear fit a dataset using uncertainities - numpy

I am very new to python so i am struggling a lot to do what i want to do, so i figured i could ask.
I have an excel sheet with data columns like period, pdot, flux values etc. There are also error columns associated with these. I want to plot these in python, and then do a linear fit while counting in the errors. Then obtain values like standard deviation or p-value to decide the goodness of the fit. Then using this fit i will try to predict values based on a missing parameter. I managed to do it without the errors, but now im trying to do it while propagating my error and its causing me some errors.
My working code that doesnt take errors into consideration is like this:
dist_array1= np.multiply(3.08567758128*10**21,dist_array)
dist_array2 = np.multiply(dist_array1,dist_array1)
e1=np.multiply(4*math.pi,dist_array2)
L_gamma = np.multiply(e1,flux_array)
Gamma_Eff = np.divide(L_gamma,edot_array)
Tau = np.divide(period_array,pdot_array)
constant = 2.94*10**8
t1=np.power(period_array,-5)
t2=np.multiply(t1,pdot_array)
t3=np.power(t2,1/2)
B_LC = np.multiply(constant,t3)
c1=np.multiply(10**15,pdot_array)
c2=np.log(c1)
c3=np.log(period_array)
c4=1-np.multiply(11/7,c3)+np.multiply(4/7,c2)
c5=3.56-c3-c2
Zeta1=1+np.divide(c4,c5)
c6=0.8-np.multiply(2/7,c3)+np.multiply(2/7,c2)
Zeta2=1+np.divide(c6,1.3)
c8=0.6-np.multiply(11/14,c3)+np.multiply(2/7,c2)
Zeta3=1+np.divide(c8,1.3)
#Here i defined my variables that i will work with, now i will try to fit it.
x1 = np.log(period_array)
y1 = np.log(Gamma_Eff)
coef1, V1 = np.polyfit(x1,y1,1, cov=True)
poly1d_fn1 = np.poly1d(coef1)
fig, (ax1, ax2, ax3) = plt.subplots(1, 3,figsize=(30,10))
fig.suptitle('Figure 1')
ax1.plot(x1,y1, 'yo', x1, poly1d_fn1(x1), '-k')
x2 = np.log(Tau)
coef2, V2 = np.polyfit(x2,y1,1, cov=True)
poly1d_fn2 = np.poly1d(coef2)
ax2.plot(x2,y1, 'yo', x2, poly1d_fn2(x2), '-k')
x3= np.log(B_LC)
coef3, V3 = np.polyfit(x3,y1,1, cov=True)
poly1d_fn3 = np.poly1d(coef3)
ax3.plot(x3,y1, 'yo', x3, poly1d_fn3(x3), '-k')
ax1.set(xlabel='log P (s)', ylabel='log η')
ax2.set(xlabel='log τ (yr)', ylabel='log η')
ax3.set(xlabel='log B_LC (G)', ylabel='log η')
#And then obtain the uncertainities
sigma_period_1=np.sqrt(V1[0][0])
sigma_period_2=np.sqrt(V1[1][1])
sigma_Tau_1=np.sqrt(V2[0][0])
sigma_Tau_2=np.sqrt(V2[1][1])
sigma_B_LC_1=np.sqrt(V3[0][0])
sigma_B_LC_2=np.sqrt(V3[1][1])
Now this works well and i can fit it, the problem is i cannot get stuff like p-value or standard deviation from the fit. I think i need to use statsmodels for that. And i also need to put errors into the formulas to be more accurate. What i changed to obtain this so far is as follows:
period_array= unumpy.uarray(period_array,perioderr_array) # Here im combining the error and the value so that i can use it propagates the error.
pdot_array=unumpy.uarray(pdot_array,pdoterr_array) #Same thing for the second value with error
flux_array=unumpy.uarray(flux_array,flux_err_array) #Same thing for third
c2=unumpy.log(c1) #Here i had to use unumpy instead of np because it gave me errors when using log function
c3=unumpy.log(period_array) #Same thing
Then i tried to fit using polyfit, to see if it works, then i will try to get the same fit with statsmodels.
x1 = unumpy.log(period_array) #log issue again
y1 = unumpy.log(Gamma_Eff)
coef1, V1 = np.polyfit(x1,y1,1, cov=True)
The last line gives me the error "ValueError: data type <class 'numpy.object_'> not inexact" I did some digging and i understood the problem as "my values are not float, and this is why im getting error, so i need to turn them into float". To do this i tried many things including stuff like x = list(x) but to no avail.
So what am i doing wrong?

Related

Extract ggplot smoothing function and save in dataframe

I am trying to extract my smoothing function from a ggplot and save it as dataframe (hourly datapoints) Plot shown here.
What I have tried:
I have already tried different interpolation techniques, but the results are not satisfying.
Linear interpolation causes a zic-zac pattern.
Na_spline causes a weird curved pattern.
The real data behaves more closely to the geom_smoothing of ggplot. I have tried to reproduce it with the following functions:
loess.data <- stats::loess(Hallwil2018_2019$Avgstemp~as.numeric(Hallwil2018_2019$datetime), span = 0.5)
loess.predict <- predict(loess.data, se = T)
But it creates a list that misses the NA values and is much shorter.
You can pass a newdata argument to predict() to get it to predict a value for every time period you give it. For example (from randomly generated data):
df <- data.frame(date = sample(seq(as.Date('2021/01/01'),
as.Date('2022/01/01'),
by="day"), 40),
var = rnorm(40, 100, 10))
mod <- loess(df$var ~ as.numeric(df$date), span = 0.5)
predict(mod, newdata = seq(as.Date('2021/01/01'), as.Date('2022/01/01'), by="day"))

Zooming a pherical projection in matplotlib

I need to display a catalogue of galaxies projected on the sky. Not all the sky is relevant here, so I need to center an zoom on the relevant part. I am OK with more or less any projection, like Lambert, Mollweide, etc. Here are mock data and code sample, using Mollweide:
# Generating mock data
np.random.seed(1234)
(RA,Dec)=(np.random.rand(100)*60 for _ in range(2))
# Creating projection
projection='mollweide'
fig = plt.figure(figsize=(20, 10));
ax = fig.add_subplot(111, projection=projection);
ax.scatter(np.radians(RA),np.radians(Dec));
# Creating axes
xtick_labels = ["$150^{\circ}$", "$120^{\circ}$", "$90^{\circ}$", "$60^{\circ}$", "$30^{\circ}$", "$0^{\circ}$",
"$330^{\circ}$", "$300^{\circ}$", "$270^{\circ}$", "$240^{\circ}$", "$210^{\circ}$"]
labels = ax.set_xticklabels(xtick_labels, fontsize=15);
ytick_labels = ["$-75^{\circ}$", "$-60^{\circ}$", "$-45^{\circ}$", "$-30^{\circ}$", "$-15^{\circ}$",
"$0^{\circ}$","$15^{\circ}$", "$30^{\circ}$", "$45^{\circ}$", "$60^{\circ}$",
"$75^{\circ}$", "$90^{\circ}$"]
ax.set_yticklabels(ytick_labels,fontsize=15);
ax.set_xlabel("RA");
ax.xaxis.label.set_fontsize(20);
ax.set_ylabel("Dec");
ax.yaxis.label.set_fontsize(20);
ax.grid(True);
The result is the following:
I have tried various set_whateverlim, set_extent, clip_box and so on, as well as importing cartopy and passing ccrs.LambertConformal(central_longitude=...,central_latitude=...) as arguments. I was unable to get a result.
Furthermore, I would like to shift RA tick labels down, as they are difficult to read with real data. Unfortunately, ax.tick_params(pad=-5) doesn't do anything.

Visualizing Data, Tracking Specific SD Values

BLUF: I want to track a specific Std Dev, e.g. 1.0 to 1.25, by color coding it and making a separate KDF or other probability density graph.
What I want to do with this is be able to pick out other Std Dev ranges and get back new graphs that I can turn around and use to predict outcomes in that specific Std Dev.
Data: https://www.dropbox.com/s/y78pynq9onyw9iu/Data.csv?dl=0
What I have so far is normalized data that looks like a shotgun blast:
Code used to produce it:
data = pd.read_csv("Data.csv")
sns.jointplot(data.x,data.y, space=0.2, size=10, ratio=2, kind="reg");
What I want to achieve here looks like what I have marked up below:
I kind of know how to do this in RStudio using RidgePlot-type functions, but I'm at a loss here in Python, even while using Seaborn. Any/All help appreciated!
The following code might point you in the right directly, you can tweak the appearance of the plot as you please from there.
tips = sns.load_dataset("tips")
g = sns.jointplot(x="total_bill", y="tip", data=tips)
top_lim = 4
bottom_lim = 2
temp = tips.loc[(tips.tip>=bottom_lim)&(tips.tip<top_lim)]
g.ax_joint.axhline(top_lim, c='k', lw=2)
g.ax_joint.axhline(bottom_lim, c='k', lw=2)
# we have to create a secondary y-axis to the joint-plot, otherwise the
# kde might be very small compared to the scale of the original y-axis
ax_joint_2 = g.ax_joint.twinx()
sns.kdeplot(temp.total_bill, shade=True, color='red', ax=ax_joint_2, legend=False)
ax_joint_2.spines['right'].set_visible(False)
ax_joint_2.spines['top'].set_visible(False)
ax_joint_2.yaxis.set_visible(False)

Convert date/time index of external dataset so that pandas would plot clearly

When you already have time series data set but use internal dtype to index with date/time, you seem to be able to plot the index cleanly as here.
But when I already have data files with columns of date&time in its own format, such as [2009-01-01T00:00], is there a way to have this converted into the object that the plot can read? Currently my plot looks like the following.
Code:
dir = sorted(glob.glob("bsrn_txt_0100/*.txt"))
gen_raw = (pd.read_csv(file, sep='\t', encoding = "utf-8") for file in dir)
gen = pd.concat(gen_raw, ignore_index=True)
gen.drop(gen.columns[[1,2]], axis=1, inplace=True)
#gen['Date/Time'] = gen['Date/Time'][11:] -> cause error, didnt work
filter = gen[gen['Date/Time'].str.endswith('00') | gen['Date/Time'].str.endswith('30')]
filter['rad_tot'] = filter['Direct radiation [W/m**2]'] + filter['Diffuse radiation [W/m**2]']
lis = np.arange(35040) #used the number of rows, checked by printing. THis is for 2009-2010.
plt.xticks(lis, filter['Date/Time'])
plt.plot(lis, filter['rad_tot'], '.')
plt.title('test of generation 2009')
plt.xlabel('Date/Time')
plt.ylabel('radiation total [W/m**2]')
plt.show()
My other approach in mind was to use plotly. Yet again, its main purpose seems to feed in data on the internet. It would be best if I am familiar with all the modules and try for myself, but I am learning as I go to use pandas and matplotlib.
So I would like to ask whether there are anyone who experienced similar issues as I.
I think you need set labels to not visible by loop:
ax = df.plot(...)
spacing = 10
visible = ax.xaxis.get_ticklabels()[::spacing]
for label in ax.xaxis.get_ticklabels():
if label not in visible:
label.set_visible(False)

Numpy - AttributeError: 'Zero' object has no attribute 'exp'

I'm having trouble solving a discrepancy between something breaking at runtime, but using the exact same data and operations in the python console, having it work fine.
# f_err - currently has value 1.11819388872025
# l_scales - currently a numpy array [1.17840183376334 1.13456764589809]
sq_euc_dists = self.se_term(x1, x2, l_scales) # this is fine. It calls cdists on x1/l_scales, x2/l_scales vectors
return (f_err**2) * np.exp(-0.5 * sq_euc_dists) # <-- errors on this line
The error that I get is
AttributeError: 'Zero' object has no attribute 'exp'
However, calling those exact same lines, with the same f_err, l_scales, and x1, x2 in the console right after it errors out, somehow does not produce errors.
I was not able to find a post referring to the 'Zero' object error specifically, and the non-'Zero' ones I found didn't seem to apply to my case here.
EDIT: It was a bit lacking in info, so here's an actual (extracted) runnable example with sample data I took straight out of a failed run, which when run in isolation works fine/I can't reproduce the error except in runtime.
Note that the sqeucld_dist function below is quite bad and I should be using scipy's cdist instead. However, because I'm using sympy's symbols for matrix elementwise gradients with over 15 partial derivatives in my real data, cdist is not an option as it doesn't deal with arbitrary objects.
import numpy as np
def se_term(x1, x2, l):
return sqeucl_dist(x1/l, x2/l)
def sqeucl_dist(x, xs):
return np.sum([(i-j)**2 for i in x for j in xs], axis=1).reshape(x.shape[0], xs.shape[0])
x = np.array([[-0.29932052, 0.40997373], [0.40203481, 2.19895326], [-0.37679417, -1.11028267], [-2.53012051, 1.09819485], [0.59390005, 0.9735], [0.78276777, -1.18787904], [-0.9300892, 1.18802775], [0.44852545, -1.57954101], [1.33285028, -0.58594779], [0.7401607, 2.69842268], [-2.04258086, 0.43581565], [0.17353396, -1.34430191], [0.97214259, -1.29342284], [-0.11103534, -0.15112815], [0.41541759, -1.51803154], [-0.59852383, 0.78442389], [2.01323359, -0.85283772], [-0.14074266, -0.63457529], [-0.49504797, -1.06690869], [-0.18028754, -0.70835799], [-1.3794126, 0.20592016], [-0.49685373, -1.46109525], [-1.41276934, -0.66472598], [-1.44173868, 0.42678815], [0.64623684, 1.19927771], [-0.5945761, -0.10417961]])
f_err = 1.11466725760716
l = [1.18388412685279, 1.02290811104357]
result = (f_err**2) * np.exp(-0.5 * se_term(x, x, l)) # This runs fine, but fails with the exact same calls and data during runtime
Any help greatly appreciated!
Here is how to reproduce the error you are seeing:
import sympy
import numpy
zero = sympy.sympify('0')
numpy.exp(zero)
You will see the same exception you are seeing.
You can fix this (inefficiently) by changing your code to the following to make things floating point.
def sqeucl_dist(x, xs):
return np.sum([np.vectorize(float)(i-j)**2 for i in x for j in xs],
axis=1).reshape(x.shape[0], xs.shape[0])
It will be better to fix your gradient function using lambdify.
Here's an example of how lambdify can be used on partial d
from sympy.abc import x, y, z
expression = x**2 + sympy.sin(y) + z
derivatives = [expression.diff(var, 1) for var in [x, y, z]]
derivatives is now [2*x, cos(y), 1], a list of Sympy expressions. To create a function which will evaluate this numerically at a particular set of values, we use lambdify as follows (passing 'numpy' as an argument like that means to use numpy.cos rather than sympy.cos):
derivative_calc = sympy.lambdify((x, y, z), derivatives, 'numpy')
Now derivative_calc(1, 2, 3) will return [2, -0.41614683654714241, 1]. These are ints and numpy.float64s.
A side note: np.exp(M) will calculate the element-wise exponent of each of the elements of M. If you are trying to do a matrix exponential, you need np.linalg.exmp.