I have a dataset which looks roughly as follows (and is sinusoidal in nature):
TW-240-run1.txt
Point Number Temperature
0 51.504781
1 51.487722
2 51.487722
3 51.828893
4 51.828893
5 51.436547
6 51.368312
7 51.726542
8 51.368312
9 51.317137
10 51.317137
11 51.283020
12 51.590073
.
.
.
9599 51.675366
I am tasked with finding the fundamental/first fourier coefficients, a_n and b_n for this dataset, by means of a numerical integration technique. In this case, I am simply using numpy.trapz from numpy, which aims to implement the trapezium rule. The fourier coefficients, a_n and b_n can be calculated with the following formulae:
where tau (𝛕) is the time period of the sine function. For my case, 𝛕 = 240 seconds (referring to the point number 240 on the data sheet), and thus the bounds of integration are 0 to 240. T(t) from the above formulae is the data set and n = 1.
My current code for trying to calculate the fourier coefficients is as follows:
# Packages
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
#input data from datasheet, the loadtxt below takes in the data from t = 0s to t = 240s
x1, y1 = np.loadtxt(r'C:\Users\Sidharth\Documents\y2python\y2python\thermal_4min_a.txt', unpack=True, skiprows=3)
tau_4min = 240.0
def cosine(period, t, n):
return np.cos((2*np.pi*n*t)/(period)) #defines the cos term for the a_n formula
def sine(period, t, n): #defines the sin term for the a_n formula
return np.sin((2*np.pi*n*t)/(period))
a_1_4min = (2/tau_4min)*np.trapz((y1*cos_term_4min), x1) #implement a_n formula (trapezium rule for T(t)*cos)
print('a_1 is', a_1_4min)
b_1_4min = (2/tau_4min)*np.trapz((y1*sin_term_4min), x1) #implement b_n formula (trapezium rule for T(t)*cos)
print('b_1 is', b_1_4min)
Essentially what this is doing is, it takes in the data, but only up to the row index 241 (point number 240), and then multiplies it by the sine/cosine term from each of the above formulae. However, I realise this isn't calculating the fourier coefficients properly.
My question(s) are as follows:
Will my code work if I can find a way to set limits of integration for np.trapz and then importing the entire data set, instead of only importing the data points from 0 to 240 and multiplying it by the cos or sine term, then using np trapz on that product, as I am currently doing (0 and 240 are supposed to be my limits of integration)
Related
I'm trying to implement the Gradient descent method for solving $Ax = b$ for a positive definite symmetric matrix $A$ of size about $9600 \times 9600$. I thought my code was relatively simple
#Solves the problem Ax = b for x within epsilon tolerance or until MAX_ITERATION is reached
def GradientDescent(Amat,target,epsilon = .01,MAX_ITERATION = 100,x=np.zeros(9604):
CurrentRes = target-np.matmul(Amat,x)
count = 0
while(np.linalg.norm(CurrentRes)> epsilon and count < MAX_ITERATION):
Ar = np.matmul(Amat,CurrentRes)
alpha = CurrentRes.T.dot(CurrentRes)/CurrentRes.T.dot(Ar)
x = x+alpha*CurrentRes
Ax = np.matmul(Amat,x)
CurrentRes = target-Ax
count = count+1
return(x,count,norm(CurrentRes))
#A is square matrix about 9600x9600 and b is about 9600x1
GDSum = GradientDescent(A,b)
but the above takes almost 3 minutes to run a single iteration of the main while loop.
I didn't think that $9600 \times 9600$ was too big for NumPy to handle effectively, but even the step of computing alpha which is just the quotient of two dot products is taking over 30 seconds.
I tried error-testing the code by timing each action in the while loop, and they are all running much slower than expected. A single matrix multiplication is taking almost a minute. The steps involving vector addition or subtraction at least seem to be running quickly.
#A is square matrix about 9600x9600 and b is about 9600x1
GDSum = GradientDescent(A,b)
Perhaps the most relevant bit of information is missing.
Your function is fast when A and b are Numpy arrays, but it's terribly slow when they are lists.
Is that your case?
I have been trying to work out how to calculate Poincaré sections for a system of non-linear ODEs, using a paper on the exact system as reference, and have been wrestling with numpy to try and make it run better. This is intended to run within a bounded domain.
Currently, I have the following code
import numpy as np
from scipy.integrate import odeint
X = 0
Y = 1
Z = 2
def generate_poincare_map(function, initial, plane, iterations, delta):
intersections = []
p_i = odeint(function, initial.flatten(), [0, delta])[-1]
for i in range(1, iterations):
p_f = odeint(function, p_i, [i * delta, (i+1) * delta])[-1]
if (p_f[Z] > plane) and (p_i[Z] < plane):
intersections.append(p_i[:2])
if (p_f[Z] > plane) and (p_i[Z] < plane):
intersections.append(p_i[:2])
p_i = p_f
return np.stack(intersections)
This is pretty wasteful due to the integration solely between successive time steps, and seems to produce incorrect results. The original reference includes sections along the lines of
whereas mine tend to result in something along the lines of
Do you have any advice on how to proceed to make this more correct, and perhaps a little faster?
To get a Pointcaré map of the ABC flow
def ABC_ode(u,t):
A, B, C = 0.75, 1, 1 # matlab parameters
x, y, z = u
return np.array([
A*np.sin(z)+C*np.cos(y),
B*np.sin(x)+A*np.cos(z),
C*np.sin(y)+B*np.cos(x)
])
def mysolver(u0, tspan): return odeint(ABC_ode, u0, tspan, atol=1e-10, rtol=1e-11)
you have first to understand that the dynamical system is really about the points (cos(x),sin(x)) etc. on the unit circle. So values different by multiples of 2*pi represent the same point. In the computation of the section one has to reflect this, either by computing it on the Cartesian product of the 3 circles. Let's stay with the second variant, and chose [-pi,pi] as the fundamental period to have the zero location well in the center. Keep in mind that jumps larger pi are from the angle reduction, not from a real crossing of that interval.
def find_crosssections(x0,y0):
u0 = [x0,y0,0]
px = []
py = []
u = mysolver(u0, np.arange(0, 4000, 0.5)); u0 = u[-1]
u = np.mod(u+pi,2*pi)-pi
x,y,z = u.T
for k in range(len(z)-1):
if z[k]<=0 and z[k+1]>=0 and z[k+1]-z[k]<pi:
# find a more exact intersection location by linear interpolation
s = -z[k]/(z[k+1]-z[k]) # 0 = z[k] + s*(z[k+1]-z[k])
rx, ry = (1-s)*x[k]+s*x[k+1], (1-s)*y[k]+s*y[k+1]
px.append(rx);
py.append(ry);
return px,py
To get a full picture of the Poincare cross-section and avoid duplicate work, use a grid of squares and mark if one of the intersections already fell in it. Only start new iterations from the centers of free squares.
N=20
grid = np.zeros([N,N], dtype=int)
for i in range(N):
for j in range(N):
if grid[i,j]>0: continue;
x0, y0 = (2*i+1)*pi/N-pi, (2*j+1)*pi/N-pi
px, py = find_crosssections(x0,y0)
for rx,ry in zip(px,py):
m, n = int((rx+pi)*N/(2*pi)), int((ry+pi)*N/(2*pi))
grid[m,n]=1
plt.plot(px, py, '.', ms=2)
You can now play with the density of the grid and the length of the integration interval to get the plot a little more filled out, but all characteristic features are already here. But I'd recommend re-programming this in a compiled language, as the computation will take some time.
I'm writing some functions in numpy for rock physics modelling and have noticed that one of my functions gives erroneous results. The function is my implimentation of Hertz-Mindlin sphere modelling:
Summary of the Hertz-Mindlin model
Here is my function currently:
# Hertz-Mindlin sphere pack model:
import numpy as np
def hertzmindlin(K0, G0, PHIC, P, f=1.0):
'''
Hertz-Mindlin sphere-pack model, adapted from:
'Dvorkin, J. and Nur, A., 1996. Elasticity of high-porosity sandstones:
Theory for two North Sea data sets. Geophysics, 61(5), pp.1363-1370."
Arguments:
K0 = Bulk modulus of mineral in GPa
G0 = Shear modulus of mineral in GPa
PHIC = Critical porosity for mineral-fluid mixture. Calculate using Dvorkin-Nuir (1995) or use literature
P = Confining pressure in GPa
f = Shear modulus correction factor. Default = 1
Results:
V0 = Theoretical poissons ratio of mineral
n = Coordination number of sphere-pack, calculated from Murphy's (1982) empirical relation
K_HM = Hertz-Mindlin effective dry Bulk modulus at pressure, P, in GPa
G_HM = Hertz-Mindlin effective dry Shear modulus at pressure, P, in GPa
'''
V0 = (3*K0-2*G0)/(6*K0+2*G0) # Calculated theoretical poissons ratio of bulk rock
n = 20-(34*PHIC)+(14*(PHIC**2)) # Coordination number at critical porosity (Murphy 1982)
K_HM = (P*(n**2*(1-PHIC)**2*G0**2) / (18*np.pi**2*(1-V0)**2))**(1/3)
G_HM = ((2+3*f-V0*(1+3*f))/(5*(2-V0))) * ((P*(3*n**2*(1-PHIC)**2*G0**2)/(2*np.pi**2*(1-V0)**2)))**(1/3)
return K_HM, G_HM
The problem is that when I run this function for inputs of:
K, G, = 36, 45
PHIC = 0.4
P = 0.001
I get a result of K_HM = 1.0, G_HM = 0.49009009009009
The hand calculated and excel calculated values show this is incorrect, I should be outputting K_HM = 0.763265313, G_HM = 1.081083984
I am fairly certain something is going wrong in the function based on the fact that for the inputs K, G, the output G should be larger than K (it is currently smaller)
Any help would be appreciated! I can do this in excel, but ideally want everything running in python.
In Python2, division of integers (using /) returns an integer. For example, 1/3 = 0.
In Python3, division of integers (using /) may return a float.
It appears you are using Python2. To get floating-point division (in both Python2 and Python3), ensure each division operation involves at least one float: for example, change 1/3 to 1.0/3 or 1/3.0 or (acceptable but perhaps less readable, 1/3.):
import numpy as np
def hertzmindlin(K0, G0, PHIC, P, f=1.0):
K0, G0 = map(float, (K0, G0))
V0 = (3*K0-2*G0)/(6*K0+2*G0) # Calculated theoretical poissons ratio of bulk rock
n = 20-(34*PHIC)+(14*(PHIC**2)) # Coordination number at critical porosity (Murphy 1982)
K_HM = (P*(n**2*(1-PHIC)**2*G0**2) / (18*np.pi**2*(1-V0)**2))**(1/3.0)
G_HM = ((2+3*f-V0*(1+3*f))/(5*(2-V0))) * ((P*(3*n**2*(1-PHIC)**2*G0**2)/(2*np.pi**2*(1-V0)**2)))**(1/3.0)
return K_HM, G_HM
K, G, = 36, 45
PHIC = 0.4
P = 0.001
print(hertzmindlin(K, G, PHIC, P))
Alternatively, in later versions of Python2 (such as Python2.7) you could place
from __future__ import division
at the top of your script (before all other import statements) to activate Python3-style floating-point division.
I have a time series (more specifically a correlation function). I want to bandpass-filter this signal using a Gaussian function H:
H(w) = e^(-alpha((w-wn)/wn)^2),
where wn is the central frequency in my bandpass filter and alpha is a certain constant value that I know.
I apply a (inverse) FFT to my H function:
H = np.e ** (-alfa * ((w - wn) / wn) ** 2)
H = np.fft.ifft(H)
HH = np.asarray([i1 for i1 in itertools.chain(H[len(H)/2:len(H)], H[0:len(H)/2])])
And what I do then is to use fftconvolve:
filtered = fftconvolve(data, HH.real, mode='same'),
but the "filtered signal" that I see seems to be filtering frequencies centered in 2 times wn.
What is the correct way of doing this? Is there a restriction in the length of my filter with respect to the length of my time series?
Perhaps what you are looking for is the Gaussian filter from Scipy,
from scipy.ndimage import gaussian_filter
output = gaussian_filter(input, sigma )
where sigma is the standard deviation of the Gaussian kernel. See the Scipy documentation for more details. https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.gaussian_filter.html
I've created a codebook using k-means of size 4000x300 (4000 centroids, each with 300 features). Using the codebook, I then want to label an input vector (for purposes of binning later on). The input vector is of size Nx300, where N is the total number of input instances I receive.
To compute the labels, I calculate the closest centroid for each of the input vectors. To do so, I compare each input vector against all centroids and pick the centroid with the minimum distance. The label is then just the index of that centroid.
My current Matlab code looks like:
function labels = assign_labels(centroids, X)
labels = zeros(size(X, 1), 1);
% for each X, calculate the distance from each centroid
for i = 1:size(X, 1)
% distance of X_i from all j centroids is: sum((X_i - centroid_j)^2)
% note: we leave off the sqrt as an optimization
distances = sum(bsxfun(#minus, centroids, X(i, :)) .^ 2, 2);
[value, label] = min(distances);
labels(i) = label;
end
However, this code is still fairly slow (for my purposes), and I was hoping there might be a way to optimize the code further.
One obvious issue is that there is a for-loop, which is the bane of good performance on Matlab. I've been trying to come up with a way to get rid of it, but with no luck (I looked into using arrayfun in conjunction with bsxfun, but haven't gotten that to work). Alternatively, if someone know of any other way to speed this up, I would be greatly appreciate it.
Update
After doing some searching, I couldn't find a great solution using Matlab, so I decided to look at what is used in Python's scikits.learn package for 'euclidean_distance' (shortened):
XX = sum(X * X, axis=1)[:, newaxis]
YY = Y.copy()
YY **= 2
YY = sum(YY, axis=1)[newaxis, :]
distances = XX + YY
distances -= 2 * dot(X, Y.T)
distances = maximum(distances, 0)
which uses the binomial form of the euclidean distance ((x-y)^2 -> x^2 + y^2 - 2xy), which from what I've read usually runs faster. My completely untested Matlab translation is:
XX = sum(data .* data, 2);
YY = sum(center .^ 2, 2);
[val, ~] = max(XX + YY - 2*data*center');
Use the following function to calculate your distances. You should see an order of magnitude speed up
The two matrices A and B have the columns as the dimenions and the rows as each point.
A is your matrix of centroids. B is your matrix of datapoints.
function D=getSim(A,B)
Qa=repmat(dot(A,A,2),1,size(B,1));
Qb=repmat(dot(B,B,2),1,size(A,1));
D=Qa+Qb'-2*A*B';
You can vectorize it by converting to cells and using cellfun:
[nRows,nCols]=size(X);
XCell=num2cell(X,2);
dist=reshape(cell2mat(cellfun(#(x)(sum(bsxfun(#minus,centroids,x).^2,2)),XCell,'UniformOutput',false)),nRows,nRows);
[~,labels]=min(dist);
Explanation:
We assign each row of X to its own cell in the second line
This piece #(x)(sum(bsxfun(#minus,centroids,x).^2,2)) is an anonymous function which is the same as your distances=... line, and using cell2mat, we apply it to each row of X.
The labels are then the indices of the minimum row along each column.
For a true matrix implementation, you may consider trying something along the lines of:
P2 = kron(centroids, ones(size(X,1),1));
Q2 = kron(ones(size(centroids,1),1), X);
distances = reshape(sum((Q2-P2).^2,2), size(X,1), size(centroids,1));
Note
This assumes the data is organized as [x1 y1 ...; x2 y2 ...;...]
You can use a more efficient algorithm for nearest neighbor search than brute force.
The most popular approach are Kd-Tree. O(log(n)) average query time instead of the O(n) brute force complexity.
Regarding a Maltab implementation of Kd-Trees, you can have a look here