How to fit a Gaussian best fit for the data - matplotlib

I have a data set for which I am plotting a graph of time vs intensity at a particular frequency.
On the x axis is the time data set which is in a numpy array and on the y axis is the intensity array.
time = [ 0.3 1.3 2.3 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 11.3 12.3 13.3
14.3 15.3 16.3 17.3 18.3 19.3 20.3 21.3 22.3 23.3 24.3 25.3 26.3 27.3
28.3 29.3 30.3 31.3 32.3 33.3 34.3 35.3 36.3 37.3 38.3 39.3 40.3 41.3
42.3 43.3 44.3 45.3 46.3 47.3 48.3 49.3 50.3 51.3 52.3 53.3 54.3 55.3
56.3 57.3 58.3 59.3]
intensity = [1.03587, 1.03187, 1.03561, 1.02893, 1.04659, 1.03633, 1.0481 ,
1.04156, 1.02164, 1.02741, 1.02675, 1.03651, 1.03713, 1.0252 ,
1.02853, 1.0378 , 1.04374, 1.01427, 1.0387 , 1.03389, 1.03148,
1.04334, 1.042 , 1.04154, 1.0161 , 1.0469 , 1.03152, 1.22406,
5.4362 , 7.92132, 6.50259, 4.7227 , 3.32571, 2.46484, 1.74615,
1.51446, 1.2711 , 1.15098, 1.09623, 1.0697 , 1.06085, 1.05837,
1.04151, 1.0358 , 1.03574, 1.05095, 1.03382, 1.04629, 1.03636,
1.03219, 1.03555, 1.02886, 1.04652, 1.02617, 1.04363, 1.03591,
1.04199, 1.03726, 1.03246, 1.0408 ]
When I plot this using matplotlib, using;
plt.figure(figsize=(15,6))
plt.title('Single frequency graph at 636 kHz', fontsize=18)
plt.plot(time,intensity)
plt.xticks(time[::3], fontsize=12)
plt.yticks(fontsize=12)
plt.xlabel('Elapsed time (minutes:seconds)', fontsize=18)
plt.ylabel('Intensity at 1020 kHz', fontsize=18)
plt.savefig('WIND_Single_frequency_graph_1020_kHz')
plt.show()
The graph looks like -
I want to fit a gaussian curve for this data, and this is the code I used,
def Gauss(x, A, B):
y = A*np.exp(-1*B*x**2)
return y
parameters, covariance = curve_fit(Gauss, time, intensity_636)
fit_A = parameters[0]
fit_B = parameters[1]
fit_y = Gauss(time, fit_A, fit_B)
plt.figure(figsize=(15,6))
plt.plot(time, intensity, 'o', label = 'data')
plt.plot(time, fit_y, '-', label ='fit')
plt.legend()
And the best fit i obtain looks like this -
Where am I going wrong? How can I make the best fit curve fit the data better?

By inspection one obweve that the curve isn't symmetrical. This is even more visible in logarithmic y-scale zoomed in the range close to the peak.
This draw to think that the Gaussian model (which is symetrical) cannot be fitted correctly.
Also one observe that the part of the curve around the peak isn't far from to be linear. Thus a better model might be made of the combination of two exponential functions, for example :
I suppose that you can code this function in your nonlinear regression sofware. The above rough values of the parameters can be used as starting values for the iterative calculus.

You need to define a more flexible model (more parameters) and to define reasonable initial values for them:
def f(x, a, b, mu, sigma):
return a + b * np.exp(-(x - mu) ** 2 / (2 * sigma ** 2))
popt, pcov = curve_fit(f, time, intensity, p0=[1, 1, 30.3, 2])
plt.plot(time, intensity)
plt.plot(time, f(time, *popt))
plt.show()

Related

How to calculate approximate fourier coefficients using np.trapz

I have a dataset which looks roughly as follows (and is sinusoidal in nature):
TW-240-run1.txt
Point Number Temperature
0 51.504781
1 51.487722
2 51.487722
3 51.828893
4 51.828893
5 51.436547
6 51.368312
7 51.726542
8 51.368312
9 51.317137
10 51.317137
11 51.283020
12 51.590073
.
.
.
9599 51.675366
I am tasked with finding the fundamental/first fourier coefficients, a_n and b_n for this dataset, by means of a numerical integration technique. In this case, I am simply using numpy.trapz from numpy, which aims to implement the trapezium rule. The fourier coefficients, a_n and b_n can be calculated with the following formulae:
where tau (𝛕) is the time period of the sine function. For my case, 𝛕 = 240 seconds (referring to the point number 240 on the data sheet), and thus the bounds of integration are 0 to 240. T(t) from the above formulae is the data set and n = 1.
My current code for trying to calculate the fourier coefficients is as follows:
# Packages
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
#input data from datasheet, the loadtxt below takes in the data from t = 0s to t = 240s
x1, y1 = np.loadtxt(r'C:\Users\Sidharth\Documents\y2python\y2python\thermal_4min_a.txt', unpack=True, skiprows=3)
tau_4min = 240.0
def cosine(period, t, n):
return np.cos((2*np.pi*n*t)/(period)) #defines the cos term for the a_n formula
def sine(period, t, n): #defines the sin term for the a_n formula
return np.sin((2*np.pi*n*t)/(period))
a_1_4min = (2/tau_4min)*np.trapz((y1*cos_term_4min), x1) #implement a_n formula (trapezium rule for T(t)*cos)
print('a_1 is', a_1_4min)
b_1_4min = (2/tau_4min)*np.trapz((y1*sin_term_4min), x1) #implement b_n formula (trapezium rule for T(t)*cos)
print('b_1 is', b_1_4min)
Essentially what this is doing is, it takes in the data, but only up to the row index 241 (point number 240), and then multiplies it by the sine/cosine term from each of the above formulae. However, I realise this isn't calculating the fourier coefficients properly.
My question(s) are as follows:
Will my code work if I can find a way to set limits of integration for np.trapz and then importing the entire data set, instead of only importing the data points from 0 to 240 and multiplying it by the cos or sine term, then using np trapz on that product, as I am currently doing (0 and 240 are supposed to be my limits of integration)

Percentage weighting given two variables to equal a target

I have a target of target = 11.82 with two variables
x = 9
y = 15
How do I find the percentage weighting that would blend x & y to equal my target? i.e. 55% of x and 45% of y - what function is most efficient way to calc a weighting to obtain my target?
Looking at it again, what I think you want is really two equations:
9x + 15y = 11.82
x + y = 1
Solving that system of equations is pretty fast on pen and paper (just do linear combination). Or you could use sympy to solve the system of linear equations:
>>> from sympy import *
>>> from sympy.solvers.solveset import linsolve
>>> x, y = symbols('x, y')
>>> linsolve([x + y - 1, 9 * x + 15 * y - 11.82], (x, y)) # make 0 on right by subtraction
FiniteSet((0.53, 0.47))
We can confirm this by substitution:
>>> 9 * 0.53 + 15 * 0.47
11.82

How to do pairwise addition in tensorflow

I am new in tensorflow so this might be an easy question, but it is really stuck me
I am tring to implement this paper by keras, background is tensorflow
In first stage of training, he used softmax_pair
if we got this output from last fc
vertical is batch size and this is NoneType
x11 x12 x13 x14...
x21 x22 x23 x24...
x31 x32 x33 x34...
...
and we do exponential, so we have
e11 e12 e13 e14...
e21 e22 e23 e24...
e31 e32 e33 e34...
...
and then, I am stuck here
e11/(e11+e12) e12/(e11+e12) e13/(e13+e14) e14/(e13+e14)...
e21/(e21+e22) e22/(e21+e22) e23/(e23+e24) e24/(e23+e24)...
e31/(e31+e32) e32/(e31+e32) e33/(e33+e34) e34/(e33+e34)...
...
I don't know how to do pairwise addition
tf.transpose and tf.segment_sum might be great
but after research I found transpose is expensive
further more, after tf.segment_sum I only have half size of tensor,
I don't know how to double it
oh and I am thinking how to produce segment_ids
so how can I do this calculate?
Thanks!!
----------update
The part I talked in paper is Fig.3
The fc output is P2c-1 and P2c, which is mean possibility of class c appear or not appear in the image
c=1,2,3...num of class
Is transpose not expensive? sometimes I see this,e.g. the comment ,perhaps I misunderstood this?
The tensorflow docs for tf.transpose state that unlike numpy tensorflow returns a new tensor -> memory.
Assuming X is your tensor of size R x C:
_, C = X.get_shape()
X_split = tf.split(1, C/2, X)
Y_split = [tf.nn.softmax(slice) for slice in X_split]
Y = tf.concat(1, Y_split)
C will be the number of colums, X_split will be a list of subtensors, each having a two columns, Y_split will calculate regular softmax for each of the tensors, Y will join the results of softmaxes.

Stata - Multiple rotated plots on graph (including distributions on sides of axes)

I would like to produce a single graph containing both: (1) a scatter plot (2) either histograms or kernel density functions of the Y and X variables to the left of the Y axis and below the X axis.
I found a graph that does this in MATLAB -- I would just like to produce something similar in Stata:
That graph was produced using the following MATLAB code:
n = 1000;
rho = .7;
Z = mvnrnd([0 0], [1 rho; rho 1], n);
U = normcdf(Z);
X = [gaminv(U(:,1),2,1) tinv(U(:,2),5)];
[n1,ctr1] = hist(X(:,1),20);
[n2,ctr2] = hist(X(:,2),20);
subplot(2,2,2); plot(X(:,1),X(:,2),'.'); axis([0 12 -8 8]); h1 = gca;
title('1000 Simulated Dependent t and Gamma Values');
xlabel('X1 ~ Gamma(2,1)'); ylabel('X2 ~ t(5)');
subplot(2,2,4); bar(ctr1,-n1,1); axis([0 12 -max(n1)*1.1 0]); axis('off'); h2 = gca;
subplot(2,2,1); barh(ctr2,-n2,1); axis([-max(n2)*1.1 0 -8 8]); axis('off'); h3 = gca;
set(h1,'Position',[0.35 0.35 0.55 0.55]);
set(h2,'Position',[.35 .1 .55 .15]);
set(h3,'Position',[.1 .35 .15 .55]);
colormap([.8 .8 1]);
UPDATE: The Stata13 manual entry for "graph combine" has precisely this example (http://www.stata.com/manuals13/g-2graphcombine.pdf). Here is the code:
use http://www.stata-press.com/data/r13/lifeexp, clear
generate loggnp = log10(gnppc)
label var loggnp "Log base 10 of GNP per capita"
scatter lexp loggnp, ysca(alt) xsca(alt) xlabel(, grid gmax) fysize(25) saving(yx)
twoway histogram lexp, fraction xsca(alt reverse) horiz fxsize(25) saving(hy)
twoway histogram loggnp, fraction ysca(alt reverse) ylabel(,nogrid) xlabel(,grid gmax) saving(hx)
graph combine hy.gph yx.gph hx.gph, hole(3) imargin(0 0 0 0) graphregion(margin(l=22 r=22)) title("Life expectancy at birth vs. GNP per capita") note("Source: 1998 data from The World Bank Group")
There's probably a better a way to do it, but this is my quick attempt to take up the challenge.
sysuse auto,clear
set obs 1000
twoway scatter mpg price, saving(sct,replace) ///
xsc(r(0(5000)20000) off ) ysc(r(10(10)50) off) ///
xti("") yti("") xlab(,nolab) ylab(,nolab)
kdensity mpg, n(1000) k(gauss) gen(x0 d0)
line x0 d0, xsc(rev off) ysc(alt) xlab(,nolab) xtick(,notick) saving(hist0, replace)
kdensity price, n(1000) k(gauss) gen(x1 d1)
line d1 x1, xsc(alt) ysc(rev off) ylab(,nolab) ytick(,notick) saving(hist1, replace)
graph combine hist0.gph sct.gph hist1.gph, cols(2) holes(3)
I'd also like to know if there are ways to improve on it. The codes are not very neat, and I had trouble properly aligning the line plot and the scatter plot without removing the ticks and labels of the scatter plot (xcommon and ycommon did not really do the job for me).
credit to this post on the Statalist.

How to calculate the interior angles total from giving 4 points in order?

I am using objective-c, and I would like to calculate the interior angles total, with giving 4 points in order. Is objective-c have these kind of maths library to do so ? Thanks.
It is 180*(n-2), where n is the number of sides (=number of vertices) of the polygon.
Reference is here.
Objective-C uses the standard C maths library maths.h. This has the trig and sqrt functions you would be likely to need.
I have just recently solved this problem in Java. There must be a good library for this. However if you are looking to calculate the angle between three points then you simply need to use the dot product of the two vectors which would be produced thus for
x_1, y_1, x_2, y_2, x_3, y_3
define
a_x = x_2 - x_1
a_y = y_2 - y_1
b_x = x_3 - x_2
b_y = y_3 - y_2
Then
dot_product = a_x * b_x + a_y * b_y
This allows you to calculate the value of cos_theta via the relation
cos_theta = dot_product / sqrt((a_x * a_x + a_y * a_y) * (b_x * b_x + b_y * b_y))
When you calculate the inverse cos of cos_theta you will get the smallest of the two possible solutions. I.e the values which are lass than or equal to 180 degress or PI radians.
I am not sure what you mean by the sum of the interior angles but if you sum the values derived from the above algorithm I think you will get what you want.
If you need to get the "angles on the left" or "the angles on the right" you will need to add a cross product to this algorithm.