Finding the shape and scale of a gamma distribution given home range - bayesian

pasted below is supplemental material from an article. After finding home range values, the authors say, "gamma(13,10) covers this nicely". How did the authors find 13 and 10?
# Moldenhauer and Regelski (1996) state that home range size typically
# ranges from 0.08-0.65 ha, which equates to a radius of
sqrt(0.08/pi*10000) # 15.96 m or
sqrt(0.65/pi*10000) # 45.49 m
# Since parulas were detected by song, let's be safe
# and add a minimum and maximum
# distance at which they could be heard, 50 and 250 m, based upon
# Simons et al. (2009).
# So now we have an area between
(15.96+50)^2*pi / 10000 # 1.37 ha, and
(45.49+250)^2*pi / 10000 # 27.4 ha
# We note that this is a very large range.
# Following Royle et. al (2011), and assuming a
# chi-squared distribution with 2 degrees of freedom,
# the range of sigma is given by
sqrt(1.37*10000/pi)/sqrt(5.99) # 27 m
sqrt(27.4*10000/pi)/sqrt(5.99) # 120 m
# In our grid spacing, 1 unit = 50m, so our we want a prior with most
# of the density between:
27/50 # 0.54
121/50 # 2.42
# Gamma(13, 10) covers this nicely
qgamma(c(0.001, 0.5, 0.999), 13, 10)
plot(function(x) dgamma(x, 13, 10), 0, 5, xlim=c(0, 3), ylim=c(0, 1.5))**

Honestly, I think the answer to this may just be a bunch of hand waving. If I were to choose a gamma prior and had some intuition about the range I would find the median value and make it the mean of the gamma distribution. So for this example I would do something like:
# Find median of range (which is 1.48)
gamma_med <- median(seq(0.54, 2.42, length.out = 1e6))
The gamma distribution has two parameters, gamma(a,b). The first moment, or the mean, can be calculated easily because it is just a/b. Therefore, if we want our mean to be 1.48 we just need to choose a shape (a) and scale(b) whose ratio equals 1.48. The simplest one would be gamma(14.8, 10), but we could increase or decrease our variance by changing those parameters (variance of gamma = a/(b^2)).
Here is how different the two priors are, you can see how gamma(13,10) is more precise. At the end of the day though it really comes down to what you feel is defensible in what you want your priors to be (or preferably you should use multiple priors to see how they influence your posterior).

Related

Parameters for numpy.random.lognormal function

I need to create a fictitious log-normal distribution of household income in a particular area. The data I have are: Average: 13,600 and Standard Deviation 7,900.
What should be the parameters in the function numpy.random.lognormal?
When i set the mean and the standard deviation as they are most of the values in the distribution are "inf", and the values also doesn't make sense when i set the parameters as the log of the mean and standard deviation.
If someone can help me to figure out what the parameters are it would be great.
Thanks!
This is indeed a nontrivial task as the moments of the log-normal distribution should be solved for the unknown parameters. By looking at say [Wikipedia][1], you will find the mean and variance of the log-normal distribution to be exp(mu + sigma2) and [exp(sigma2)-1]*exp(2*mu+sigma**2), respectively.
The choice of mu and sigma should solve exp(mu + sigma**2) = 13600 and [exp(sigma**2)-1]*exp(2*mu+sigma**2)= 7900**2. This can be solved analytically because the first equation squared provides exactly exp(2*mu+sigma**2) thus eliminating the variable mu from the second equation.
A sample code is provided below. I took a large sample size to explicitly show that the mean and standard deviation of the simulated data are close to the desired numbers.
import numpy as np
# Input characteristics
DataAverage = 13600
DataStdDev = 7900
# Sample size
SampleSize = 100000
# Mean and variance of the standard lognormal distribution
SigmaLogNormal = np.sqrt( np.log(1+(DataStdDev/DataAverage)**2))
MeanLogNormal = np.log( DataAverage ) - SigmaLogNormal**2/2
print(MeanLogNormal, SigmaLogNormal)
# Obtain draw from log-normal distribution
Draw = np.random.lognormal(mean=MeanLogNormal, sigma=SigmaLogNormal, size=SampleSize)
# Check
print( np.mean(Draw), np.std(Draw))

Low-pass Chebyshev type-I filter with Scipy

I am reading a paper, trying to reproduce the results of the paper. In this paper, they use a low-pass Chebyshev type-I filter on the raw data. And they give those parameters.
Sampling frequency = 32Hz, Fcut=0.25Hz, Apass = 0.001dB, Astop = -100dB, Fstop = 2Hz, Order of the filter = 5. I found some materials help me understand these parameters
But when I take a look at the scipy.signal.cheby1. The parameters required by this function are different.
cheby1(N, rp, Wn, btype='low', analog=False, output='ba')
Here N:The order of the filter; btype: type of filter, in my case, it is 'lowpass'; analog=False, because the data is sampled, so it is digital; output: specifies the type of output. But I am not sure about rp, Wn.
In the documentation, it says:
rp : float
The maximum ripple allowed below unity gain in the passband. Specified in decibels, as a positive number.
Wn : array_like
A scalar or length-2 sequence giving the critical frequencies. For Type I filters, this is the point in the transition band at which the gain first drops below -rp. For digital filters, Wn is normalized from 0 to 1, where 1 is the Nyquist frequency, pi radians/sample. (Wn is thus in half-cycles / sample.) For analog filters, Wn is an angular frequency (e.g. rad/s).
According to this question:
How To apply a filter to a signal in python
I know how I can use the filter. But I don't know how to create a filter which has the same parameters as mentioned above. I don't know how to convert these parameters and provide them to the function in Scipy.
Take a look at the wikipedia page on the Type I Chebyshev filter. Note that your plot illustrates the characteristics of a general filter. A lowpass Type I Chebyshev filter, however, has no ripple in the stop band.
You have three available parameters for the design of a Type I Chebyshev filter: the filter order, the ripple factor, and the cutoff frequency. These are the first three parameters of scipy.signal.cheby1:
The first argument of cheby1 is the order of the filter.
The second argument, rp, corresponds to δ in the wikipedia page, and is apparently what you called Apass.
The third argument is wn, the cutoff frequency expressed as a fraction of the Nyquist frequency. In your case, you could write something like
fs = 32 # Sample rate (Hz)
fcut = 0.25 # Desired filter cutoff frequency (Hz)
# Cutoff frequency relative to the Nyquist
wn = fcut / (0.5*fs)
Once those three parameters are chosen, all the other characteristics
(e.g. transition band width, Astop, Fstop, etc) are determined. So it appears that the specification that you give, "Sampling frequency = 32Hz, Fcut=0.25Hz, Apass = 0.001dB, Astop = -100dB, Fstop = 2Hz, Order of the filter = 5", are not compatible with a Type I Chebyshev filter. In particular, I get a gain of approximately -78 dB at 2 Hz.
(If you increase the order to 6, then the gain at 2 Hz is approximately -103.)
Here's a complete script, followed by the plot that it generates. The plot shows just the pass band, but you can change the arguments of the xlim and ylim functions to see more.
import numpy as np
from scipy.signal import cheby1, freqz
import matplotlib.pyplot as plt
# Sampling parameters
fs = 32 # Hz
# Desired filter parameters
order = 5
Apass = 0.001 # dB
fcut = 0.25 # Hz
# Normalized frequency argument for cheby1
wn = fcut / (0.5*fs)
b, a = cheby1(order, Apass, wn)
w, h = freqz(b, a, worN=8000)
plt.figure(1)
plt.plot(0.5*fs*w/np.pi, 20*np.log10(np.abs(h)))
plt.axvline(fcut, color='r', alpha=0.2)
plt.plot([0, fcut], [-Apass, -Apass], color='r', alpha=0.2)
plt.xlim(0, 0.3)
plt.xlabel('Frequency (Hz)')
plt.ylim(-5*Apass, Apass)
plt.ylabel('Gain (dB)')
plt.grid()
plt.title("Chebyshev Type I Lowpass Filter")
plt.tight_layout()
plt.show()

Laplace distribution sampling

Anyone know how to draw multiple times from a Laplace distribution in Stata? I want to run some Monte Carlo analysis and know that my data fits a Laplace distribution.
Here's a sample script. Naturally your scale parameter will be whatever it is. The location parameter is here zero by implication; if not just add it in.
clear
version 10: set seed 2803
set obs 10000
scalar sigma = 1
gen P = runiform()
gen y = sigma * cond(P <= 0.5, log(2 * P), -log(2 * (1 - P)))
We can use a normal quantile plot as reference showing that the tail behaviour is quite different from the normal or Gaussian.
qnorm y
Many people prefer to see some kind of density estimate
kdensity y, biweight bw(0.2)
but the most critical graph is a dedicated quantile-quantile plot. This one uses qplot which you must install from the Stata Journal archive after a search qplot in Stata. Note that # is not a typo here: it is a placeholder for whatever would otherwise be plotted on the x axis.
qplot y, trscale(cond(# <= 0.5, log(2 * #), -log(2 * (1 - #))))

comparing two frequency spectra

I'm trying to compare two frequency spectra but I am confused over a number of points.
One device samples at 40 Hz the other at 100 Hz and so I'm not sure if I need to take this into account. Anyway I have produced frequency spectra from both devices and now I wish to compare these. How can I do correlation at each point so that I get pearson correlations at each point. I know how to do an overall one of course but I want to see points of strong correlation and those less strong?
If you are calculating power spectral densities P(f), then it doesn't matter how your original signal x(t) is sampled. You can directly and quantitatuively compare both spectra. To make sure that you have calculated the spectral densities you can explicitly check Parsevals theorem:
$ \int P(f) df = \int x(t)^2 dt $
Of course you have to think about which frequencies are actually evaluated Remember that a fft gives you frequencies from f = 1/T until or below the Nyquist frequency f_ny = 1/(2 dt) depending on the number of samples in x(t) being even or odd.
Here's a python example code for psd
def psd(x,dt=1.):
"""Computes one-sided power spectral density of x.
PSD estimated via abs**2 of Fourier transform of x
Takes care of even or odd number of elements in x:
- if x is even both f=0 and Nyquist freq. appear once
- if x is odd f=0 appears once and Nyquist freq. does not appear
Note that there are no tapers applied: This may lead to leakage!
Parseval's theorem (Variance of time series equal to integral over PSD) holds and can be checked via
print ( np.var(x), sum(Px*f[1]) )
Accordingly, the etsimated PSD is independent of time series length
Author/date: M. von Papen / 16.03.2017
"""
N = np.size(x)
xf = np.fft.fft(x)
Px = abs(xf)**2./N*dt
f = np.arange(N/2+1)/(N*dt)
if np.mod(N,2) == 0:
Px[1:N/2] = 2.*Px[1:N/2]
else:
Px[1:N/2+1] = 2.*Px[1:N/2+1]
# Take one-sided spectrum
Px = Px[0:N/2+1]
return Px, f

parameters for low pass fir filter using scipy

I am trying to write a simple low pass filter using scipy, but I need help defining the parameters.
I have 3.5 million records in the time series data that needs to be filtered, and the data is sampled at 1000 hz.
I am using signal.firwin and signal.lfilter from the scipy library.
The parameters I am choosing in the code below do not filter my data at all. Instead, the code below simply produces something that graphically looks like the same exact data except for a time phase distortion that shifts the graph to the right by slightly less than 1000 data points (1 second).
In another software program, running a low pass fir filter through graphical user interface commands produces output that has similar means for each 10 second (10,000 data point) segment, but that has drastically lower standard deviations so that we essentially lose the noise in this particular data file and replace it with something that retains the mean value while showing longer term trends that are not polluted by higher frequency noise. The other software's parameters dialog box contains a check box that allows you to select the number of coefficients so that it "optimizes based on sample size and sampling frequency." (Mine are 3.5 million samples collected at 1000 hz, but I would like a function that uses these inputs as variables.)
*Can anyone show me how to adjust the code below so that it removes all frequencies above 0.05 hz?* I would like to see smooth waves in the graph rather than just the time distortion of the same identical graph that I am getting from the code below now.
class FilterTheZ0():
def __init__(self,ZSmoothedPylab):
#------------------------------------------------------
# Set the order and cutoff of the filter
#------------------------------------------------------
self.n = 1000
self.ZSmoothedPylab=ZSmoothedPylab
self.l = len(ZSmoothedPylab)
self.x = arange(0,self.l)
self.cutoffFreq = 0.05
#------------------------------------------------------
# Run the filter
#------------------------------------------------------
self.RunLowPassFIR_Filter(self.ZSmoothedPylab, self.n, self.l
, self.x, self.cutoffFreq)
def RunLowPassFIR_Filter(self,data, order, l, x, cutoffFreq):
#------------------------------------------------------
# Set a to be the denominator coefficient vector
#------------------------------------------------------
a = 1
#----------------------------------------------------
# Create the low pass FIR filter
#----------------------------------------------------
b = signal.firwin(self.n, cutoff = self.cutoffFreq, window = "hamming")
#---------------------------------------------------
# Run the same data set through each of the various
# filters that were created above.
#---------------------------------------------------
response = signal.lfilter(b,a,data)
responsePylab=p.array(response)
#--------------------------------------------------
# Plot the input and the various outputs that are
# produced by running each of the various filters
# on the same inputs.
#--------------------------------------------------
plot(x[10000:20000],data[10000:20000])
plot(x[10000:20000],responsePylab[10000:20000])
show()
return
Cutoff is normalized to the Nyquist frequency, which is half the sampling rate. So with FS = 1000 and FC = 0.05, you want cutoff = 0.05/500 = 1e-4.
from scipy import signal
FS = 1000.0 # sampling rate
FC = 0.05/(0.5*FS) # cutoff frequency at 0.05 Hz
N = 1001 # number of filter taps
a = 1 # filter denominator
b = signal.firwin(N, cutoff=FC, window='hamming') # filter numerator
M = FS*60 # number of samples (60 seconds)
n = arange(M) # time index
x1 = cos(2*pi*n*0.025/FS) # signal at 0.025 Hz
x = x1 + 2*rand(M) # signal + noise
y = signal.lfilter(b, a, x) # filtered output
plot(n/FS, x); plot(n/FS, y, 'r') # output in red
grid()
The filter output is delayed half a second (the filter is centered on tap 500). Note that the DC offset added by the noise is preserved by the low-pass filter. Also, 0.025 Hz is well within the pass range, so the output swing from peak to peak is approximately 2.
The units of cutoff freq are probably [0,1) where 1.0 is equivalent to FS (the sampling frequency). So if you really mean 0.05Hz and FS=1000Hz, you'd want to pass cutoffFreq / 1000. You may need a longer filter to get such a low cutoff.
(BTW you are passing some arguments but then using the object attributes instead, but I don't see that introducing any obvious bugs yet...)