fft find low frequencies in short time history - numpy

I have 1 time unit of signal history. My dominant frequency is 1/100 time units. When I use numpy's fft function, I am limited in resolution by the extent of the signal history. How can I increase the resolution of my frequency comb without corrupting my signal?
import numpy as np
import matplotlib.pyplot as plt
'''
I need to caputre a low-frequency oscillation with only 1 time unit of data.
So far, I have not been able to find a way to make the fft resolution < 1.
'''
timeResolution = 10000
mytimes = np.linspace(0, 1, timeResolution)
mypressures = np.sin(2 * np.pi * mytimes / 100)
fft = np.fft.fft(mypressures[:])
T = mytimes[1] - mytimes[0]
N = mypressures.size
# fft of original signal is limitted by the maximum time
f = np.linspace(0, 1 / T, N)
filteredidx = f > 0.001
freq = f[filteredidx][np.argmax(np.abs(fft[filteredidx][:N//2]))]
print('freq bin is is ', f[1] - f[0]) # 1.0
print('frequency is ', freq) # 1.0
print('(real frequency is 0.01)')
I thought that I could artificially increase the time history length (and thus decrease the width of the frequency comb) by pasting the signal end-to-end and doing the fft. That didn't work for me for some reason I don't understand:
import numpy as np
import matplotlib.pyplot as plt
timeResolution = 10000
mytimes = np.linspace(0, 1, timeResolution)
mypressures = np.sin(2 * np.pi * mytimes / 100)
# glue data to itself to make signal articicially longer
timesby = 1000
newtimes = np.concatenate([mytimes * ii for ii in range(1, timesby + 1)])
newpressures = np.concatenate([mypressures] * timesby)
fft = np.fft.fft(newpressures[:])
T = newtimes[1] - newtimes[0]
N = newpressures.size
# fft of original signal is limitted by the maximum time
f = np.linspace(0, 1 / T, N)
filteredidx = f > 0.001
freq = f[filteredidx][np.argmax(np.abs(fft[filteredidx][:N//2]))]
print('freq bin is is ', f[1] - f[0]) # 0.001
print('frequency is ', freq) # 1.0
print('(real frequency is 0.01)')

Your goal, recovering spectral information from a "too short" , i.e. << sample_rate / frequency_of_interest, window seems ambitious.
Even in the most simple case (clean sine wave, your example) the data look pretty much like a straight line (left panel below). Only after detrending we can see a tiny bit of curvature (right panel below, note the very small y-values) and that is all any hypothetical algorithm can go by. In particular, FT---as far as I can see---will not work.
If we are very lucky there is one way out: comparing derivatives.
If you have a sinosoidal signal with an offset---like f = c + sin(om * t´---then the 1st and 3rd derivatives will be om * cos(om * t) and -om^3 * cos(om * t)´´.
If the signal is simple and clean enough this together with robust numerical differentiation can be used to recover the frequency omega.
In the demo code below I use a SavGol filter to obtain the derivatives while getting rid of some high frequency noise (blue curve below) that had been added to the signal (orange curve). Other (better) methods of numerical differentiation may exist.
Sample run:
Estimated freq clean signal: 0.009998
Estimated freq noisy signal: 0.009871
We can see that in this very simple case the frequency is recovered ok.
It may be possible to recover multiple frequencies using more derivatives and some linear decomposition voodoo, but I'm not going to explore this here.
Code:
import numpy as np
import matplotlib.pyplot as plt
'''
I need to caputre a low-frequency oscillation with only 1 time unit of data.
So far, I have not been able to find a way to make the fft resolution < 1.
'''
timeResolution = 10000
mytimes = np.linspace(0, 1, timeResolution)
mypressures = np.sin(2 * np.pi * mytimes / 100)
fft = np.fft.fft(mypressures[:])
T = mytimes[1] - mytimes[0]
N = mypressures.size
# fft of original signal is limitted by the maximum time
f = np.linspace(0, 1 / T, N)
filteredidx = f > 0.001
freq = f[filteredidx][np.argmax(np.abs(fft[filteredidx][:N//2]))]
print('freq bin is is ', f[1] - f[0]) # 1.0
print('frequency is ', freq) # 1.0
print('(real frequency is 0.01)')
import scipy.signal as ss
plt.figure(1)
plt.subplot(121)
plt.plot(mytimes, mypressures)
plt.subplot(122)
plt.plot(mytimes, ss.detrend(mypressures))
plt.figure(2)
mycorrupted = mypressures + 0.00001 * np.random.normal(size=mypressures.shape)
plt.plot(mytimes, ss.detrend(mycorrupted))
plt.plot(mytimes, ss.detrend(mypressures))
width, order = 8999, 3
hw = (width+3) // 2
dsdt = ss.savgol_filter(mypressures, width, order, 1, 1/timeResolution)[hw:-hw]
d3sdt3 = ss.savgol_filter(mypressures, width, order, 3, 1/timeResolution)[hw:-hw]
est_freq_clean = np.nanmean(np.sqrt(-d3sdt3/dsdt) / (2 * np.pi))
dsdt = ss.savgol_filter(mycorrupted, width, order, 1, 1/timeResolution)[hw:-hw]
d3sdt3 = ss.savgol_filter(mycorrupted, width, order, 3, 1/timeResolution)[hw:-hw]
est_freq_noisy = np.nanmean(np.sqrt(-d3sdt3/dsdt) / (2 * np.pi))
print(f"Estimated freq clean signal: {est_freq_clean:10.6f}")
print(f"Estimated freq noisy signal: {est_freq_noisy:10.6f}")

Related

Solve motion equations for first ODE using scipy

I would like to solve motion first order ODE equations using scipy solve_ivp function. I can see that I'm doing something wrong because this should be an ellipse but I'm plotting only four points. Are you able to spot the mistake?
import math
import matplotlib.pyplot as plt
import numpy as np
import scipy.integrate
gim = 4*(math.pi**2)
x0 = 1 #x-position of the center or h
y0 = 0 #y-position of the center or k
vx0 = 0 #vx position
vy0 = 1.1* 2* math.pi #vy position
initial = [x0, y0, vx0, vy0] #initial state of the system
time = np.arange(0, 1000, 0.01) #period
def motion(t, Z):
dx = Z[2] # vx
dy = Z[3] # vy
dvx = -gim/(x**2+y**2)**(3/2) * x * Z[2]
dvy = -gim/(x**2+y**2)**(3/2) * y * Z[3]
return [dx, dy, dvx, dvy]
sol = scipy.integrate.solve_ivp(motion, t_span=time, y0= initial, method='RK45')
plt.plot(sol.y[0],sol.y[1],"x", label="Scipy RK45 solution")
plt.show()
The code should not be able to run from a fresh workspace. The variables x,y that are used in the gravitation formula are not declared anywhere. So insert the line x,y = Z[:2] or similar.
The gravitation formula usually does not contain the velocity components. Remove Z[2], Z[3].
Check again what the time span and evaluation times arguments expect. The time span takes the first two values from the array. So change to t_span=time[[0,-1]] to build the pair of first and last time value.
The second plot suffers from insufficient evaluation points, the line segments used are too large. With your time array that should not be a problem.

changing range causes a distribution not normal

A post gives some code to plot this figure
import scipy.stats as ss
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(-10, 11)
xU, xL = x + 0.5, x - 0.5
prob = ss.norm.cdf(xU, scale = 3) - ss.norm.cdf(xL, scale = 3)
prob = prob / prob.sum() #normalize the probabilities so their sum is 1
nums = np.random.choice(x, size = 10000, p = prob)
plt.hist(nums, bins = len(x))
I modifyied this line
x = np.arange(-10, 11)
to this line
x = np.arange(10, 31)
I got this figure
How to fix that?
Given what you're asking Python to do, there's no error in this plot: it's a histogram of 10,000 samples from the tail (anything that rounds to between 10 and 31) of a normal distribution with mean 0 and standard deviation 3. Since probabilities drop off steeply in the tail of a normal, it happens that none of the 10,000 exceeded 17, which is why you didn't get the full range up to 31.
If you just want the x-axis of the plot to cover your full intended range, you could add plt.xlim(9.5, 31.5) after plt.hist.
If you want a histogram with support over this entire range, then you'll need to adjust the mean and/or variance of the distribution. For instance, if you specify that your normal distribution has mean 20 rather than mean 0 when you obtain prob, i.e.
prob = ss.norm.cdf(xU, loc=20, scale=3) - ss.norm.cdf(xL, loc=20, scale=3)
then you'll recover a similar-looking histogram, just translated to the right by 20.

Numpy FFT give unexpected result when signal frequency falls exactly on a fft bin

When a signal's frequency falls exactly on an FFT bin, the amplitude becomes 0!
But if I offset the signal frequency a little bit off, the result is ok.
Reproducing code:
Here the signal's frequency is 30
import numpy as np
import matplotlib.pyplot as plt
N = 1024
Freq = 30
t = np.arange(N)
x = np.sin(2*np.pi*Freq/N*t)
f = np.fft.fft(x)
plt.plot(t, x)
plt.plot(t, f)
I would expect the output to have a huge spike in the 30th bin, but it's flat, as in the following figure.
However, if just slightly change the frequency to 30.1 to not let it fall on the exact bin,
import numpy as np
import matplotlib.pyplot as plt
N = 1024
Freq = 30.1
t = np.arange(N)
x = np.sin(2*np.pi*Freq/N*t)
f = np.fft.fft(x)
plt.plot(t, x)
plt.plot(t, f)
The result is correct as in the following figure:
WHY? Is this a numpy FFT implementation issue? Or is it a limitation of the standard FFT algorithm?
To get the power spectrum, you need to take the magnitude of the Fourier coefficient. Plotting the Fourier coefficient directly discards the imaginary component and only plots the real component.
Technically x and f shouldn't be plotted on the same x-axis, since they have different meanings.
import numpy as np
import matplotlib.pyplot as plt
T = 1 # Total signal duration (s)
N = 1024 # samples over signal duration
Freq = 30 # frequency: (Hz)
t = np.arange(N)/N*T # time array
df = 1.0/T # resolution of angular frequency
f = np.arange(N)*df
x = np.sin(2*np.pi*Freq*t)
xhat = np.fft.fft(x) # Fourier series of x
plt.plot(t, x)
plt.xlabel("t (s)")
plt.ylabel("x")
plt.savefig("fig1.png")
plt.cla()
plt.plot(f, np.abs(xhat))
plt.xlabel("f (Hz)")
plt.ylabel("|fft(x)|")
plt.savefig("fig2.png")
f is complex number, I should be using abs(f) for plotting.
It had slipped my mind :P

Financial time series: python Matplotlib "specgram" y-axis displaying Period instead of Frequency

python Matplotlib's "specgram" display of a heatmap showing frequency (y-axis) vs. time (x-axis) is useful for time series analysis, but I would like to have the y-axis displayed in terms of Period (= 1/frequency), rather than frequency. I am still asking if anyone has a complete working solution to achieve this?
The immediately following python code generates the author's original plot using "specgram" and (currently commented out) a comparison with the suggested solution that was offered using "mlab.specgram". This suggested solution succeeds with the easy conversion from frequency to period = 1/frequency, but does not generate a viable plot for the authors example.
from __future__ import division
from datetime import datetime
import numpy as np
from pandas import DataFrame, Series
import pandas.io.data as web
import pandas as pd
from pylab import plot,show,subplot,specgram
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
################################################
# obtain data:
ticker = "SPY"
source = "google"
start_date = datetime(1999,1,1)
end_date = datetime(2012,1,1)
qt = web.DataReader(ticker, source, start_date, end_date)
qtC = qt.Close
################################################
data = qtC
fs = 1 # 1 sample / day
nfft = 128
# display the time-series data
fig = plt.figure()
ax1 = fig.add_subplot(311)
ax1.plot(range(len(data)),data)
#----------------
# Original version
##################
# specgram (NOT mlab.specgram) --> gives direct plot, but in Frequency space (want plot in Period, not freq).
ax2 = fig.add_subplot(212)
spec, freq, t = specgram(data, NFFT=nfft, Fs=fs, noverlap=0)
#----------------
"""
# StackOverflow version (with minor changes to axis titles)
########################
# calcuate the spectrogram
spec, freq, t = mlab.specgram(data, NFFT=nfft, Fs=fs, noverlap=0)
# calculate the bin limits in time (x dir)
# note that there are n+1 fence posts
dt = t[1] - t[0]
t_edge = np.empty(len(t) + 1)
t_edge[:-1] = t - dt / 2.
# however, due to the way the spectrogram is calculates, the first and last bins
# a bit different:
t_edge[0] = 0
t_edge[-1] = t_edge[0] + len(data) / fs
# calculate the frequency bin limits:
df = freq[1] - freq[0]
freq_edge = np.empty(len(freq) + 1)
freq_edge[:-1] = freq - df / 2.
freq_edge[-1] = freq_edge[-2] + df
# calculate the period bin limits, omit the zero frequency bin
p_edge = 1. / freq_edge[1:]
# we'll plot both
ax2 = fig.add_subplot(312)
ax2.pcolormesh(t_edge, freq_edge, spec)
ax2.set_ylim(0, fs/2)
ax2.set_ylabel('freq.[day^-1]')
ax3 = fig.add_subplot(313)
# note that the period has to be inverted both in the vector and the spectrum,
# as pcolormesh wants to have a positive difference between samples
ax3.pcolormesh(t_edge, p_edge[::-1], spec[:0:-1])
#ax3.set_ylim(0, 100/fs)
ax3.set_ylim(0, nfft)
ax3.set_xlabel('t [days]')
ax3.set_ylabel('period [days]')
"""
If you are only asking how to display the spectrogram differently, then it is actually rather straightforward.
One thing to note is that there are two functions called specgram: matplotlib.pyplot.specgram and matplotlib.mlab.specgram. The difference between these two is that the former draws a spectrogram wheras the latter only calculates one (and that's what we want).
The only slightly tricky thing is to calculate the colour mesh rectangle edge positions. We get the following from the specgram:
t: centerpoints in time
freq: frequency centers of the bins
For the time dimension it is easy to calculate the bin limits by the centers:
t_edge[n] = t[0] + (n - .5) * dt, where dt is the time difference of two consecutive bins
It would be similarly simple for frequencies:
f_edge[n] = freq[0] + (n - .5) * df
but we want to use the period instead of frequency. This makes the first bin unusable, and we'll have to toss the DC component away.
A bit of code:
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import numpy as np
# create some data: (fs = sampling frequency)
fs = 2000.
ts = np.arange(10000) / fs
sig = np.sin(500 * np.pi * ts)
sig[5000:8000] += np.sin(200 * np.pi * (ts[5000:8000] + 0.0005 * np.random.random(3000)))
# calcuate the spectrogram
spec, freq, t = mlab.specgram(sig, Fs=fs)
# calculate the bin limits in time (x dir)
# note that there are n+1 fence posts
dt = t[1] - t[0]
t_edge = np.empty(len(t) + 1)
t_edge[:-1] = t - dt / 2.
# however, due to the way the spectrogram is calculates, the first and last bins
# a bit different:
t_edge[0] = 0
t_edge[-1] = t_edge[0] + len(sig) / fs
# calculate the frequency bin limits:
df = freq[1] - freq[0]
freq_edge = np.empty(len(freq) + 1)
freq_edge[:-1] = freq - df / 2.
freq_edge[-1] = freq_edge[-2] + df
# calculate the period bin limits, omit the zero frequency bin
p_edge = 1. / freq_edge[1:]
# we'll plot both
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax1.pcolormesh(t_edge, freq_edge, spec)
ax1.set_ylim(0, fs/2)
ax1.set_ylabel('frequency [Hz]')
ax2 = fig.add_subplot(212)
# note that the period has to be inverted both in the vector and the spectrum,
# as pcolormesh wants to have a positive difference between samples
ax2.pcolormesh(t_edge, p_edge[::-1], spec[:0:-1])
ax2.set_ylim(0, 100/fs)
ax2.set_xlabel('t [s]')
ax2.set_ylabel('period [s]')
This gives:

inverse of FFT not the same as original function

I don't understand why the ifft(fft(myFunction)) is not the same as my function. It seems to be the same shape but a factor of 2 out (ignoring the constant y-offset). All the documentation I can see says there is some normalisation that fft doesn't do, but that ifft should take care of that. Here's some example code below - you can see where I've bodged the factor of 2 to give me the right answer. Thanks for any help - its driving me nuts.
import numpy as np
import scipy.fftpack as fftp
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
def fourier_series(x, y, wn, n=None):
# get FFT
myfft = fftp.fft(y, n)
# kill higher freqs above wavenumber wn
myfft[wn:] = 0
# make new series
y2 = fftp.ifft(myfft).real
# find constant y offset
myfft[1:]=0
c = fftp.ifft(myfft)[0]
# remove c, apply factor of 2 and re apply c
y2 = (y2-c)*2 + c
plt.figure(num=None)
plt.plot(x, y, x, y2)
plt.show()
if __name__=='__main__':
x = np.array([float(i) for i in range(0,360)])
y = np.sin(2*np.pi/360*x) + np.sin(2*2*np.pi/360*x) + 5
fourier_series(x, y, 3, 360)
You're removing half the spectrum when you do myfft[wn:] = 0. The negative frequencies are those in the top half of the array and are required.
You have a second fudge to get your results which is taking the real part to find y2: y2 = fftp.ifft(myfft).real (fftp.ifft(myfft) has a non-negligible imaginary part due to the asymmetry in the spectrum).
Fix it with myfft[wn:-wn] = 0 instead of myfft[wn:] = 0, and remove the fudges. So the fixed code looks something like:
import numpy as np
import scipy.fftpack as fftp
import matplotlib.pyplot as plt
def fourier_series(x, y, wn, n=None):
# get FFT
myfft = fftp.fft(y, n)
# kill higher freqs above wavenumber wn
myfft[wn:-wn] = 0
# make new series
y2 = fftp.ifft(myfft)
plt.figure(num=None)
plt.plot(x, y, x, y2)
plt.show()
if __name__=='__main__':
x = np.array([float(i) for i in range(0,360)])
y = np.sin(2*np.pi/360*x) + np.sin(2*2*np.pi/360*x) + 5
fourier_series(x, y, 3, 360)
It's really worth paying attention to the interim arrays that you are creating when trying to do signal processing. Invariably, there are clues as to what is going wrong that should direct you to the problem. In this case, you taking the real part masked the problem and made your task more difficult.
Just to add another quick point: Sometimes taking the real part of the resultant array is exactly the correct thing to do. It's often the case that you end up with an imaginary part to the signal output which is just down to numerical errors in the input to the inverse FFT. Typically this manifests itself as very small imaginary values, so taking the real part is basically the same array.
You are killing the negative frequencies between 0 and -wn.
I think what you mean to do is to set myfft to 0 for all frequencies outside [-wn, wn].
Change the following line:
myfft[wn:] = 0
to:
myfft[wn:-wn] = 0