I have a time series data
I am trying to find the fft .But it gives keyerror :Aligned when trying to get the value
my data looks like below
this is the code:
import datetime
import numpy as np
import scipy as sp
import scipy.fftpack
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
temp_fft = sp.fftpack.fft(data3)
Looks like your data is a pandas series. fft works with numpy arrays rather than series.
Easy resolution is to convert your series into a numpy array either via
data3.values
or
np.array(data3)
You can then pass that array into fft function. So the end result is:
temp_fft = sp.fftpack.fft(data3.values)
This should work for you now.
Related
I have a set of simulation data to which I want to perform an FFT. I am using matplotlib to do this. However, the FFT is looking strange, so I don't know if I am missing something in my code. Would appreciate any help.
Original data:
time-varying data
FFT:
FFT
Code for the FFT calculation:
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack as fftpack
data = pd.read_csv('table.txt',header=0,sep="\t")
fig, ax = plt.subplots()
mz_res=data[['mz ()']].to_numpy()
time=data[['# t (s)']].to_numpy()
ax.plot(time[:300],mz_res[:300])
ax.set_title("Time-varying mz component")
ax.set_xlabel('time')
ax.set_ylabel('mz amplitude')
fft_res=fftpack.fft(mz_res[:300])
power=np.abs(fft_res)
frequencies=fftpack.fftfreq(fft_res.size)
fig2, ax_fft=plt.subplots()
ax_fft.plot(frequencies[:150],power[:150]) // taking just half of the frequency range
I am just plotting the first 300 datapoints because the rest is not important.
Am I doing something wrong here? I was expecting single frequency peaks not what I got. Thanks!
Link for the input file:
Pastebin
EDIT
Turns out the mistake was in the conversion of the dataframe to a numpy array. For a reason I have yet to understand, if I convert a dataframe to a numpy array it is converted as an array of arrays, i.e., each element of the resulting array is itself an array of a single element. When I change the code to:
mz_res=data['mz ()'].to_numpy()
so that it is a conversion from a pandas series to a numpy array, then the FFT behaves as expected and I get single frequency peaks from the FFT.
So I just put this here in case someone else finds it useful. Lesson learned: the conversion from a pandas series to a numpy array yields a different result than the conversion from a pandas dataframe.
Solution:
Using the conversion from pandas series to numpy array instead of pandas dataframe to numpy array.
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack as fftpack
data = pd.read_csv('table.txt',header=0,sep="\t")
fig, ax = plt.subplots()
mz_res=data['mz ()'].to_numpy() #series to array
time=data[['# t (s)']].to_numpy() #dataframe to array
ax.plot(time,mz_res)
ax.set_title("Time-varying mz component")
ax.set_xlabel('time')
ax.set_ylabel('mz amplitude')
fft_res=fftpack.fft(mz_res)
power=np.abs(fft_res)
frequencies=fftpack.fftfreq(fft_res.size)
indices=np.where(frequencies>0)
freq_pos=frequencies[indices]
power_pos=power[indices]
fig2, ax_fft=plt.subplots()
ax_fft.plot(freq_pos,power_pos) # taking just half of the frequency range
ax_fft.set_title("FFT")
ax_fft.set_xlabel('Frequency (Hz)')
ax_fft.set_ylabel('FFT Amplitude')
ax_fft.set_yscale('linear')
Yields:
Time-dependence
FFT
In the table below, I have values and frequencies. I'd like to draw a box-plot using Jupyter Notebook. I googled it but not able to find any answers.
My idea is to create a column, 2,2,2,2,4,4,4,4,4,4,4,...
But I think there must be a better way.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
value=np.array([2,4,6,7,10])
freq=np.array([4,7,8,5,2])
# do something here
plt.boxplot(newdata)
plt.show()
use numpy's repeat:
newdata = np.repeat(value,freq)
So I tried to make a categorical plot of my data and this is what my code and the graph.
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
sns.set(style="whitegrid")
sns.set_style("ticks")
sns.set_context("paper", font_scale=1, rc={"lines.linewidth": 6})
sns.catplot(y = "Region",x = "Interest by subregion",data = sample)
Image:
How can I make the y-labels more spread out and have a bigger font?
Try using sns.figure(figsize(x,y)) and sns.set_context(context=None,font_scale=1).
Try different values for these parameters to get the best results.
Let's say, I have the following code.
import numpy as np
import pandas as pd
x = pd.DataFrame(np.random.randn(100, 3)).rolling(window=10, center=True).cov()
For each index, I have a 3x3 matrix. I would like to calculate eigenvalues and then some function of those eigenvalues. Or, perhaps, I might want to compute some function of eigenvalues and eigenvectors. The point is that if I take x.loc[0] then I have no problem to compute anything from that matrix. How do I do it in a rolling fashion for all matrices?
Thanks!
You can use the analogous eigenvector/eigenvalue methods in spicy.sparse.linalg.
import numpy as np
import pandas as pd
from scipy import linalg as LA
x = pd.DataFrame(np.random.randn(100, 3)).rolling(window=10, center=True).cov()
for i in range(len(x)):
try:
e_vals,e_vec = LA.eig(x.loc[i])
print(e_vals,e_vec)
except:
continue
If there are no NaN values present then you need not use the try and except instead go for only for loop.
Assume we want to plot a time series, e.g.:
import pandas as pd
import numpy as np
a=pd.DatetimeIndex(start='2010-01-01',end='2014-01-01' , freq='D')
b=pd.Series(np.randn(len(a)), index=a)
b.plot()
The result is a figure in which the x-axis has years as labels, I would like to get month-year labels. Is there a fast way to do this (possibly avoiding the use of tens of lines of complex code calling matplotlib)?
Pandas does some really weird stuff to the Axes objects, making it hard to avoid matplotlib calls.
Here's how I would do it
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
a = pd.DatetimeIndex(start='2010-01-01',end='2014-01-01' , freq='D')
b = pd.Series(np.random.randn(len(a)), index=a)
fig, ax = plt.subplots()
ax.plot(b.index, b)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
which give me: