How do I bandpass-filter a signal using a Gaussian function in Python (Numpy/Scipy) - numpy

I have a time series (more specifically a correlation function). I want to bandpass-filter this signal using a Gaussian function H:
H(w) = e^(-alpha((w-wn)/wn)^2),
where wn is the central frequency in my bandpass filter and alpha is a certain constant value that I know.
I apply a (inverse) FFT to my H function:
H = np.e ** (-alfa * ((w - wn) / wn) ** 2)
H = np.fft.ifft(H)
HH = np.asarray([i1 for i1 in itertools.chain(H[len(H)/2:len(H)], H[0:len(H)/2])])
And what I do then is to use fftconvolve:
filtered = fftconvolve(data, HH.real, mode='same'),
but the "filtered signal" that I see seems to be filtering frequencies centered in 2 times wn.
What is the correct way of doing this? Is there a restriction in the length of my filter with respect to the length of my time series?

Perhaps what you are looking for is the Gaussian filter from Scipy,
from scipy.ndimage import gaussian_filter
output = gaussian_filter(input, sigma )
where sigma is the standard deviation of the Gaussian kernel. See the Scipy documentation for more details. https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.gaussian_filter.html

Related

How to calculate approximate fourier coefficients using np.trapz

I have a dataset which looks roughly as follows (and is sinusoidal in nature):
TW-240-run1.txt
Point Number Temperature
0 51.504781
1 51.487722
2 51.487722
3 51.828893
4 51.828893
5 51.436547
6 51.368312
7 51.726542
8 51.368312
9 51.317137
10 51.317137
11 51.283020
12 51.590073
.
.
.
9599 51.675366
I am tasked with finding the fundamental/first fourier coefficients, a_n and b_n for this dataset, by means of a numerical integration technique. In this case, I am simply using numpy.trapz from numpy, which aims to implement the trapezium rule. The fourier coefficients, a_n and b_n can be calculated with the following formulae:
where tau (𝛕) is the time period of the sine function. For my case, 𝛕 = 240 seconds (referring to the point number 240 on the data sheet), and thus the bounds of integration are 0 to 240. T(t) from the above formulae is the data set and n = 1.
My current code for trying to calculate the fourier coefficients is as follows:
# Packages
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
#input data from datasheet, the loadtxt below takes in the data from t = 0s to t = 240s
x1, y1 = np.loadtxt(r'C:\Users\Sidharth\Documents\y2python\y2python\thermal_4min_a.txt', unpack=True, skiprows=3)
tau_4min = 240.0
def cosine(period, t, n):
return np.cos((2*np.pi*n*t)/(period)) #defines the cos term for the a_n formula
def sine(period, t, n): #defines the sin term for the a_n formula
return np.sin((2*np.pi*n*t)/(period))
a_1_4min = (2/tau_4min)*np.trapz((y1*cos_term_4min), x1) #implement a_n formula (trapezium rule for T(t)*cos)
print('a_1 is', a_1_4min)
b_1_4min = (2/tau_4min)*np.trapz((y1*sin_term_4min), x1) #implement b_n formula (trapezium rule for T(t)*cos)
print('b_1 is', b_1_4min)
Essentially what this is doing is, it takes in the data, but only up to the row index 241 (point number 240), and then multiplies it by the sine/cosine term from each of the above formulae. However, I realise this isn't calculating the fourier coefficients properly.
My question(s) are as follows:
Will my code work if I can find a way to set limits of integration for np.trapz and then importing the entire data set, instead of only importing the data points from 0 to 240 and multiplying it by the cos or sine term, then using np trapz on that product, as I am currently doing (0 and 240 are supposed to be my limits of integration)

how to get better Kriging result graphs in openturns?

I performed spherical Kriging, but I can't seem to get good output graphs.
The coordinates(x, and y) range from around around 51 latitude and around 6.5 as longitude
my observations range from -70 to +10
here is my code :
import openturns as ot
import pandas as pd
# your input / output data can be easily formatted as samples for openturns
df = pd.read_csv("kreuzkerpenutm.csv")
inputdata = ot.Sample(df[['x','y']].values)
outputdata = ot.Sample(df[['z']].values)
dimension = 2 # dimension of your input (x,y)
basis = ot.ConstantBasisFactory(dimension).build()
covarianceModel = ot.SphericalModel(dimension)
algo = ot.KrigingAlgorithm(inputdata, outputdata, covarianceModel, basis)
algo.run()
result = algo.getResult()
metamodel = result.getMetaModel()
lower = [-10.0] * 2 # lower bound of the 2D window
upper = [50.0] * 2 # upper bound of the 2D window
graph = metamodel.draw(lower, upper)
graph.setBoundingBox(ot.Interval(lower, upper))
graph.add(ot.Cloud(inputdata)) # overlay a scatter plot of the observation points
graph.setTitle("Kriging metamodel")
# A View object allows us to interact with the underlying matplotlib figure
from openturns.viewer import View
view = View(graph, legend_kw={'bbox_to_anchor':(1,1), 'loc':"upper left"})
view.getFigure().tight_layout()
here is my output:
kriging metamodel graph
I don't know why my graph won't show me my inputs aswell as my kriging results.
thanks for ideas and help
If the input data is not scaled in [-1,1]^d, the kriging metamodel may have issues to identify the scale parameters using maximum likelihood optimization. In order to help for this, we may:
provide a better starting point for the scale parameters of the covariance model (this is trick "A" below),
set the bounds of the optimization algorithm so that the interval where the parameters are searched for correspond to the data at hand (this is trick "B" below).
This is what the following script does, using simulated data instead of a csv data file. In the script, I create the data using a g function which is scaled so that it produces results in the [-10, 70] range, as in your problem. Please look carefuly at the setScale() method which sets the initial value of the covariance model: this is the starting point of the optimization algorithm. Then look at the setOptimizationBounds() method, which sets the bounds of the optimization algorithm.
import openturns as ot
dimension = 2 # dimension of your input (x,y)
distribution = ot.ComposedDistribution([ot.Uniform(-10.0, 50.0)] * dimension)
inputdata = distribution.getSample(100)
g = ot.SymbolicFunction(["x", "y"], ["30 + 3.0 * sin(x / 10.0) * (y / 10.0) ^ 2"])
outputdata = g(inputdata)
basis = ot.ConstantBasisFactory(dimension).build()
covarianceModel = ot.SphericalModel(dimension)
covarianceModel.setScale(inputdata.getMax()) # Trick A
algo = ot.KrigingAlgorithm(inputdata, outputdata, covarianceModel, basis)
# Trick B, v2
x_range = inputdata.getMax() - inputdata.getMin()
scale_max_factor = 2.0 # Must be > 1, tune this to match your problem
scale_min_factor = 0.1 # Must be < 1, tune this to match your problem
maximum_scale_bounds = scale_max_factor * x_range
minimum_scale_bounds = scale_min_factor * x_range
scaleOptimizationBounds = ot.Interval(minimum_scale_bounds, maximum_scale_bounds)
algo.setOptimizationBounds(scaleOptimizationBounds)
algo.run()
result = algo.getResult()
metamodel = result.getMetaModel()
metamodel.setInputDescription(["x", "y"])
metamodel.setOutputDescription(["z"])
lower = [-10.0] * 2 # lower bound of the 2D window
upper = [50.0] * 2 # upper bound of the 2D window
graph = metamodel.draw(lower, upper)
graph.setBoundingBox(ot.Interval(lower, upper))
graph.add(ot.Cloud(inputdata)) # overlay a scatter plot of the observation points
graph.setTitle("Kriging metamodel")
# A View object allows us to interact with the underlying matplotlib figure
from openturns.viewer import View
view = View(graph, legend_kw={"bbox_to_anchor": (1, 1), "loc": "upper left"})
view.getFigure().tight_layout()
The previous script produces the following figure.
There are other ways to implement trick B. Here is one provided by J.Pelamatti:
# Trick B, v3
for d in range(X_train.getDimension()):
dist = scipy.spatial.distance.pdist(X_train[:,d])
scale_max_factor = 2.0 # Must be > 1, tune this to match your problem
scale_min_factor = 0.1 # Must be < 1, tune this to match your problem
maximum_scale_bounds = scale_max_factor * np.max(dist)
minimum_scale_bounds = scale_min_factor * np.min(dist)
This topic is discussed in this particular thread in OT's forum.
Sorry for the late answer.
Which version of openturns are you using?
Probably you have an embedded transformation of (input) data, which makes the data range between (-3, 3) approximately (standard scaling). The kriging result should contains the transformation in such a case.
With more recent openturns implementations, this feature has been removed.
Hope this can help.
Cheers

How to implement a method to generate Poincaré sections for a non-linear system of ODEs?

I have been trying to work out how to calculate Poincaré sections for a system of non-linear ODEs, using a paper on the exact system as reference, and have been wrestling with numpy to try and make it run better. This is intended to run within a bounded domain.
Currently, I have the following code
import numpy as np
from scipy.integrate import odeint
X = 0
Y = 1
Z = 2
def generate_poincare_map(function, initial, plane, iterations, delta):
intersections = []
p_i = odeint(function, initial.flatten(), [0, delta])[-1]
for i in range(1, iterations):
p_f = odeint(function, p_i, [i * delta, (i+1) * delta])[-1]
if (p_f[Z] > plane) and (p_i[Z] < plane):
intersections.append(p_i[:2])
if (p_f[Z] > plane) and (p_i[Z] < plane):
intersections.append(p_i[:2])
p_i = p_f
return np.stack(intersections)
This is pretty wasteful due to the integration solely between successive time steps, and seems to produce incorrect results. The original reference includes sections along the lines of
whereas mine tend to result in something along the lines of
Do you have any advice on how to proceed to make this more correct, and perhaps a little faster?
To get a Pointcaré map of the ABC flow
def ABC_ode(u,t):
A, B, C = 0.75, 1, 1 # matlab parameters
x, y, z = u
return np.array([
A*np.sin(z)+C*np.cos(y),
B*np.sin(x)+A*np.cos(z),
C*np.sin(y)+B*np.cos(x)
])
def mysolver(u0, tspan): return odeint(ABC_ode, u0, tspan, atol=1e-10, rtol=1e-11)
you have first to understand that the dynamical system is really about the points (cos(x),sin(x)) etc. on the unit circle. So values different by multiples of 2*pi represent the same point. In the computation of the section one has to reflect this, either by computing it on the Cartesian product of the 3 circles. Let's stay with the second variant, and chose [-pi,pi] as the fundamental period to have the zero location well in the center. Keep in mind that jumps larger pi are from the angle reduction, not from a real crossing of that interval.
def find_crosssections(x0,y0):
u0 = [x0,y0,0]
px = []
py = []
u = mysolver(u0, np.arange(0, 4000, 0.5)); u0 = u[-1]
u = np.mod(u+pi,2*pi)-pi
x,y,z = u.T
for k in range(len(z)-1):
if z[k]<=0 and z[k+1]>=0 and z[k+1]-z[k]<pi:
# find a more exact intersection location by linear interpolation
s = -z[k]/(z[k+1]-z[k]) # 0 = z[k] + s*(z[k+1]-z[k])
rx, ry = (1-s)*x[k]+s*x[k+1], (1-s)*y[k]+s*y[k+1]
px.append(rx);
py.append(ry);
return px,py
To get a full picture of the Poincare cross-section and avoid duplicate work, use a grid of squares and mark if one of the intersections already fell in it. Only start new iterations from the centers of free squares.
N=20
grid = np.zeros([N,N], dtype=int)
for i in range(N):
for j in range(N):
if grid[i,j]>0: continue;
x0, y0 = (2*i+1)*pi/N-pi, (2*j+1)*pi/N-pi
px, py = find_crosssections(x0,y0)
for rx,ry in zip(px,py):
m, n = int((rx+pi)*N/(2*pi)), int((ry+pi)*N/(2*pi))
grid[m,n]=1
plt.plot(px, py, '.', ms=2)
You can now play with the density of the grid and the length of the integration interval to get the plot a little more filled out, but all characteristic features are already here. But I'd recommend re-programming this in a compiled language, as the computation will take some time.

numpy function giving incorrect results - checked by hand and excel

I'm writing some functions in numpy for rock physics modelling and have noticed that one of my functions gives erroneous results. The function is my implimentation of Hertz-Mindlin sphere modelling:
Summary of the Hertz-Mindlin model
Here is my function currently:
# Hertz-Mindlin sphere pack model:
import numpy as np
def hertzmindlin(K0, G0, PHIC, P, f=1.0):
'''
Hertz-Mindlin sphere-pack model, adapted from:
'Dvorkin, J. and Nur, A., 1996. Elasticity of high-porosity sandstones:
Theory for two North Sea data sets. Geophysics, 61(5), pp.1363-1370."
Arguments:
K0 = Bulk modulus of mineral in GPa
G0 = Shear modulus of mineral in GPa
PHIC = Critical porosity for mineral-fluid mixture. Calculate using Dvorkin-Nuir (1995) or use literature
P = Confining pressure in GPa
f = Shear modulus correction factor. Default = 1
Results:
V0 = Theoretical poissons ratio of mineral
n = Coordination number of sphere-pack, calculated from Murphy's (1982) empirical relation
K_HM = Hertz-Mindlin effective dry Bulk modulus at pressure, P, in GPa
G_HM = Hertz-Mindlin effective dry Shear modulus at pressure, P, in GPa
'''
V0 = (3*K0-2*G0)/(6*K0+2*G0) # Calculated theoretical poissons ratio of bulk rock
n = 20-(34*PHIC)+(14*(PHIC**2)) # Coordination number at critical porosity (Murphy 1982)
K_HM = (P*(n**2*(1-PHIC)**2*G0**2) / (18*np.pi**2*(1-V0)**2))**(1/3)
G_HM = ((2+3*f-V0*(1+3*f))/(5*(2-V0))) * ((P*(3*n**2*(1-PHIC)**2*G0**2)/(2*np.pi**2*(1-V0)**2)))**(1/3)
return K_HM, G_HM
The problem is that when I run this function for inputs of:
K, G, = 36, 45
PHIC = 0.4
P = 0.001
I get a result of K_HM = 1.0, G_HM = 0.49009009009009
The hand calculated and excel calculated values show this is incorrect, I should be outputting K_HM = 0.763265313, G_HM = 1.081083984
I am fairly certain something is going wrong in the function based on the fact that for the inputs K, G, the output G should be larger than K (it is currently smaller)
Any help would be appreciated! I can do this in excel, but ideally want everything running in python.
In Python2, division of integers (using /) returns an integer. For example, 1/3 = 0.
In Python3, division of integers (using /) may return a float.
It appears you are using Python2. To get floating-point division (in both Python2 and Python3), ensure each division operation involves at least one float: for example, change 1/3 to 1.0/3 or 1/3.0 or (acceptable but perhaps less readable, 1/3.):
import numpy as np
def hertzmindlin(K0, G0, PHIC, P, f=1.0):
K0, G0 = map(float, (K0, G0))
V0 = (3*K0-2*G0)/(6*K0+2*G0) # Calculated theoretical poissons ratio of bulk rock
n = 20-(34*PHIC)+(14*(PHIC**2)) # Coordination number at critical porosity (Murphy 1982)
K_HM = (P*(n**2*(1-PHIC)**2*G0**2) / (18*np.pi**2*(1-V0)**2))**(1/3.0)
G_HM = ((2+3*f-V0*(1+3*f))/(5*(2-V0))) * ((P*(3*n**2*(1-PHIC)**2*G0**2)/(2*np.pi**2*(1-V0)**2)))**(1/3.0)
return K_HM, G_HM
K, G, = 36, 45
PHIC = 0.4
P = 0.001
print(hertzmindlin(K, G, PHIC, P))
Alternatively, in later versions of Python2 (such as Python2.7) you could place
from __future__ import division
at the top of your script (before all other import statements) to activate Python3-style floating-point division.

Is it possible to optimize this Matlab code for doing vector quantization with centroids from k-means?

I've created a codebook using k-means of size 4000x300 (4000 centroids, each with 300 features). Using the codebook, I then want to label an input vector (for purposes of binning later on). The input vector is of size Nx300, where N is the total number of input instances I receive.
To compute the labels, I calculate the closest centroid for each of the input vectors. To do so, I compare each input vector against all centroids and pick the centroid with the minimum distance. The label is then just the index of that centroid.
My current Matlab code looks like:
function labels = assign_labels(centroids, X)
labels = zeros(size(X, 1), 1);
% for each X, calculate the distance from each centroid
for i = 1:size(X, 1)
% distance of X_i from all j centroids is: sum((X_i - centroid_j)^2)
% note: we leave off the sqrt as an optimization
distances = sum(bsxfun(#minus, centroids, X(i, :)) .^ 2, 2);
[value, label] = min(distances);
labels(i) = label;
end
However, this code is still fairly slow (for my purposes), and I was hoping there might be a way to optimize the code further.
One obvious issue is that there is a for-loop, which is the bane of good performance on Matlab. I've been trying to come up with a way to get rid of it, but with no luck (I looked into using arrayfun in conjunction with bsxfun, but haven't gotten that to work). Alternatively, if someone know of any other way to speed this up, I would be greatly appreciate it.
Update
After doing some searching, I couldn't find a great solution using Matlab, so I decided to look at what is used in Python's scikits.learn package for 'euclidean_distance' (shortened):
XX = sum(X * X, axis=1)[:, newaxis]
YY = Y.copy()
YY **= 2
YY = sum(YY, axis=1)[newaxis, :]
distances = XX + YY
distances -= 2 * dot(X, Y.T)
distances = maximum(distances, 0)
which uses the binomial form of the euclidean distance ((x-y)^2 -> x^2 + y^2 - 2xy), which from what I've read usually runs faster. My completely untested Matlab translation is:
XX = sum(data .* data, 2);
YY = sum(center .^ 2, 2);
[val, ~] = max(XX + YY - 2*data*center');
Use the following function to calculate your distances. You should see an order of magnitude speed up
The two matrices A and B have the columns as the dimenions and the rows as each point.
A is your matrix of centroids. B is your matrix of datapoints.
function D=getSim(A,B)
Qa=repmat(dot(A,A,2),1,size(B,1));
Qb=repmat(dot(B,B,2),1,size(A,1));
D=Qa+Qb'-2*A*B';
You can vectorize it by converting to cells and using cellfun:
[nRows,nCols]=size(X);
XCell=num2cell(X,2);
dist=reshape(cell2mat(cellfun(#(x)(sum(bsxfun(#minus,centroids,x).^2,2)),XCell,'UniformOutput',false)),nRows,nRows);
[~,labels]=min(dist);
Explanation:
We assign each row of X to its own cell in the second line
This piece #(x)(sum(bsxfun(#minus,centroids,x).^2,2)) is an anonymous function which is the same as your distances=... line, and using cell2mat, we apply it to each row of X.
The labels are then the indices of the minimum row along each column.
For a true matrix implementation, you may consider trying something along the lines of:
P2 = kron(centroids, ones(size(X,1),1));
Q2 = kron(ones(size(centroids,1),1), X);
distances = reshape(sum((Q2-P2).^2,2), size(X,1), size(centroids,1));
Note
This assumes the data is organized as [x1 y1 ...; x2 y2 ...;...]
You can use a more efficient algorithm for nearest neighbor search than brute force.
The most popular approach are Kd-Tree. O(log(n)) average query time instead of the O(n) brute force complexity.
Regarding a Maltab implementation of Kd-Trees, you can have a look here