How can I fit my plots from measured data? - matplotlib

How can I fit my plots?
Up to now, I've got the following code, which plots a variety of graphs from an array (data from an experiment) as it is placed in a loop:
import matplotlib as plt
plt.figure(6)
plt.semilogx(Tau_Array, Correlation_Array, '+-')
plt.ylabel('Correlation')
plt.xlabel('Tau')
plt.title("APD" + str(detector) + "_Correlations_log_graph")
plt.savefig(DataFolder + "/APD" + str(detector) + "_Correlations_log_graph.png")
This works so far with a logarithmic plot, but I am wondering how the fitting process could work right here. In the end I would like to have some kind of a formula or/and a graph which best describes the data I measured.
I would be pleased if someone could help me!

You can use curve_fit from scipy.optimize for this. Here is an example
# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x,a):
return np.exp(a*x)
x,y,z = np.loadtxt("fit3.dat",unpack=True)
popt,pcov = curve_fit(func,x,y)
y_fit = np.exp(popt[0]*x)
plt.plot(x,y,'o')
plt.errorbar(x,y,yerr=z)
plt.plot(x,y_fit)
plt.savefig("fit3_plot.png")
plt.show()
In yourcase, you can define the func accordingly. popt is an array containing the value of your fitting parameters.

Related

Unable to generate plot using matplotlib

I am a beginner to Python and experimenting with a plot. the script runs fine but plot does not show up.
the matplotlib and numpy libraries are installed.
import numpy as np
f= h5py.File('3DIMG_05JUN2021_0000_L3B_HEM_DLY.h5','r')
#Studying the structure of the file by printing what HDF5 groups are present
for key in f.keys():
print(key) #Names of the groups in HDF5 file.
# will print the variables in the file
#Get the HDF5 group
ls=list(f.keys())
print("ls")
print(ls)
tsurf = f['HEM_DLY'][:]
print("tsurf")
print(tsurf)
tsurf1=np.squeeze(tsurf)
print(tsurf1.shape)
import matplotlib.pyplot as plt
im= plt.plot(tsurf1)
#plt.colorbar()
plt.imshow(im)```
Python version is 3 running on Ubuntu
Difficult to give you the exact answer without the dataset (please update the question with the dataset), but for sure, plt.plot does not return an object that can be plotted with plt.imshow
Try instead:
ax = plt.plot(tsurf1)
plt.show()
Probably the error was on the final plot.Try this:
import numpy as np
import matplotlib.pyplot as plt
f= h5py.File('/path','r')
ls=list(f.keys())
tsurf = f['your_key_str'][:]
tsurf1=np.squeeze(tsurf)
im= plt.plot(tsurf1)
plt.show(im) # <-- plt.show() NOT plt.imshow()

How to avoid this splited regression curve in matplotlib?

I had difficulty producing a continuous regression curve, with no splits as is currently the case.
from numpy import *
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.optimize import curve_fit
from sklearn.metrics import mean_squared_error
from matplotlib.pyplot import *
from scipy.interpolate import *
%matplotlib inline
x = array([28,47,43,40,42,37,64,48,47,45,58,38,45,38,50,42,59,40,49,38,52])
y = array([64,44,46,48,43,55,41,50,52,58,44,56,47,59,40,50,43,58,47,55,40])
p2 = polyfit(x,y,2) # Polynoial curve, secend order
plot (x,y,'o')
plot (x,polyval(p2,x),'r-')
The printed plot is:
But all I need is just a normal curve, the red curve should be just one continuous curve, and not be split or divided as it is now.
How can this be done?
(In the code snippet above this is printing a second order regression curve, but the problem also exists in a higher order (and in the first order))
I would appreciate any help.

matplotlib - seaborn - the numbers on the correlation plots are not readable

The plot below shows the correlation for one column. The problem is that the numbers are not readable, because there are many columns in it.
How is it possible to show only 5 or 6 most important columns and not all of them with very low importance?
plt.figure(figsize=(20,3))
sns.heatmap(df.corr()[['price']].sort_values('price', ascending=False).iloc[1:].T, annot=True,
cmap='Spectral_r', vmax=0.9, vmin=-0.31)
You can limit the cells shown via .iloc[1:7]. If you also want to show the highest negative values, you could create a second plot with .iloc[-6:]. To have both together, you could use numpy's slicing function and write .iloc[np.r_[1:4, -3:0]].
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.random.rand(7, 27), columns=['price'] + [*'abcdefghijklmnopqrstuvwxyz'])
plt.figure(figsize=(20, 3))
sns.heatmap(df.corr()[['price']].sort_values('price', ascending=False).iloc[1:7].T,
annot=True, annot_kws={'rotation':90, 'size': 20},
cmap='Spectral_r', vmax=0.9, vmin=-0.31)
plt.show()
annot can also be a list of labels. Using this, you can define a string matrix that you use to display the desired numbers and set the others to an empty string.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
import seaborn as sns; sns.set_theme()
import pandas as pd
from string import ascii_letters
# generate random data
rs = np.random.RandomState(33)
df = pd.DataFrame(data=rs.normal(size=(100, 26)),
columns=list(ascii_letters[26:]))
importance_index = 5 # until which idx to hide values
data = df.corr()[['A']].sort_values('A', ascending=False).iloc[1:].T
labels = data.astype(str) # make a str-copy
labels.iloc[0,:importance_index] = ' ' # mask columns that you want to hide
sns.heatmap(data, annot=labels, cmap='Spectral_r', vmax=0.9, vmin=-0.31, fmt='', annot_kws={'rotation':90})
plt.show()
The output on some random data:
This works but it has its limits, particulary with setting fmt='' (can't use it to conveniently format decimals anymore, need to do it manually now). I would also question whether your approach is even the best one to take here. I think consistency in plots is quite important. I would rather evaluate if we can't rotate the heatmap labels (I've included it above) or leave them out completely since it is technically redundant due to the color-coding. Alternatively, you could only plot the cells with the "important" values.

Using multiple sliders to manipulate curves in a single graph

I created the following Jupyter Notebook. Here three functions are shifted using three sliders. In the future I would like to generalise it to an arbitrary number of curves (i.e. n-curves). However, right now, the graph updating procedure is very slow and the graph itself doesn't seem to be fixed in the corrispective cell . I didn't receive any error message but I'm pretty sure that there is a mistake in the update function.
Here is the the code
from ipywidgets import interact
import ipywidgets as widgets
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display
x = np.linspace(0, 2*np.pi, 2000)
y1=np.exp(0.3*x)*np.sin(5*x)
y2=5*np.exp(-x**2)*np.sin(20*x)
y3=np.sin(2*x)
m=[y1,y2,y3]
num_curve=3
def shift(v_X):
v_T=v_X
vector=np.transpose(m)
print(' ')
print(v_T)
print(' ')
curve=vector+v_T
return curve
controls=[]
o='vertical'
for i in range(num_curve):
title="x%i" % (i%num_curve+1)
sl=widgets.FloatSlider(description=title,min=-2.0, max=2.0, step=0.1,orientation=o)
controls.append(sl)
Dict = {}
for c in controls:
Dict[c.description] = c
uif = widgets.HBox(tuple(controls))
def update_N(**xvalor):
xvalor=[]
for i in range(num_curve):
xvalor.append(controls[i].value)
curve=shift(xvalor)
new_curve=pd.DataFrame(curve)
new_curve.plot()
plt.show()
outf = widgets.interactive_output(update_N,Dict)
display(uif, outf)
Your function is running on every single value the slider moves through, which is probably giving you the long times to run you are seeing. You can change this by adding continuous_update=False into your FloatSlider call (line 32).
sl=widgets.FloatSlider(description=title,
min=-2.0,
max=2.0,
step=0.1,
orientation=o,
continuous_update=False)
This got me much better performance, and the chart doesn't flicker as much as there are vastly fewer redraws. Does this help?

Basic axis malfuction in matplotlib

When plotting using matplotlib, I ran into an interesting issue where the y axis is scaled by a very inconvenient quantity. Here's a MWE that demonstrates the problem:
import numpy as np
import matplotlib.pyplot as plt
l = np.linspace(0.5,2,2**10)
a = (0.696*l**2)/(l**2 - 9896.2e-9**2)
plt.plot(l,a)
plt.show()
When I run this, I get a figure that looks like this picture
The y-axis clearly is scaled by a silly quantity even though the y data are all between 1 and 2.
This is similar to the question:
Axis numerical offset in matplotlib
I'm not satisfied with the answer to this question in that it makes no sense to my why I need to go the the convoluted process of changing axis settings when the data are between 1 and 2 (EDIT: between 0 and 1). Why does this happen? Why does matplotlib use such a bizarre scaling?
The data in the plot are all between 0.696000000017 and 0.696000000273. For such cases it makes sense to use some kind of offset.
If you don't want that, you can use you own formatter:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
l = np.linspace(0.5,2,2**10)
a = (0.696*l**2)/(l**2 - 9896.2e-9**2)
plt.plot(l,a)
fmt = matplotlib.ticker.StrMethodFormatter("{x:.12f}")
plt.gca().yaxis.set_major_formatter(fmt)
plt.show()