I am trying to recreate the result of a full fft2 by manipulating the result of an rfft2. The documentation states that rfft2 only computes the positive coefficients since the negative coefficients have a symmetry with the positive ones when the input is real. This would be extremely useful for large arrays since computing the rfft2 is much faster than the full fft2.
So the below code is me trying to recreate the fft2 from the rfft2 output. I have tried all kinds of manipulations of the "left" array and can't quite get "same" to be true everywhere. Any ideas?
import numpy as np
import matplotlib.pyplot as plt
from skimage.data import camera
frame = camera()
full_fft = np.fft.fft2(frame)
real_fft = np.fft.rfft2(frame)
left = real_fft[:, :-1].copy()
right = np.flipud(left[:, ::-1])
sim_fft2 = np.hstack((left, right))
same = np.isclose(full_fft, sim_fft2)
plt.figure()
plt.imshow(same)
plt.figure()
plt.imshow(np.log(np.abs(full_fft)))
plt.figure()
plt.imshow(np.log(np.abs(sim_fft2)))
I figured out the symmetry by doing the fft2 on a 6x6 array which then just required programming up a function to convert the output of a rfft2 to be the same as a fft2. Below is that function and an image of the symmetry.
def _rfft2_to_fft2(im_shape, rfft):
fcols = im_shape[-1]
fft_cols = rfft.shape[-1]
result = numpy.zeros(im_shape, dtype=rfft.dtype)
result[:, :fft_cols] = rfft
top = rfft[0, 1:]
if fcols%2 == 0:
result[0, fft_cols-1:] = top[::-1].conj()
mid = rfft[1:, 1:]
mid = numpy.hstack((mid, mid[::-1, ::-1][:, 1:].conj()))
else:
result[0, fft_cols:] = top[::-1].conj()
mid = rfft[1:, 1:]
mid = numpy.hstack((mid, mid[::-1, ::-1].conj()))
result[1:, 1:] = mid
return result
Related
I currently need to include STFT layers in a neural network, and I am using tensorflow's STFT.
To test it, I successively applied STFT and iSTFT transforms several time on a signal, and the amplitude of the signal grows with each transform.
The parameters I used are the following (both for STFT and iSTFT):
STFT = tf.signal.stft(np.array(sig), frame_length=174, frame_step=43, fft_length=174)
I tried librosa's STFT with the exact same parameters and this doesn't happen (the signal is the exact same after each transform). Is this normal behaviour? Am I missing something?
Edit: Here is an example of what I mean:
This is what an simple sine looks like after three consecutive STFT/iSTFT with librosa (you only see one curve because they are merged, as should be)
This is what the same curve looks like after three consecutive STFT/iSTFT with tensorflow. The original sine wave is in blue, I get the yellow curve after one STFT/iSTFT, then green, then red.
Here is the exact code:
import librosa
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
LENGTH=88
def get_STFT_librosa(sig):
fft_size = (LENGTH-1)*2
hop_size = int(fft_size/4)
spec_librosa = librosa.core.stft(np.array(sig), n_fft=(LENGTH-1)*2, hop_length=hop_size, win_length=fft_size)
return spec_librosa
def get_iSTFT_librosa(spec):
fft_size = (LENGTH-1)*2
hop_size = int(fft_size/4)
inverse = librosa.core.istft(spec, hop_length=hop_size, win_length=fft_size)
return inverse
def get_STFT_tf(sig):
STFT = tf.signal.stft(np.array(sig), frame_length=(LENGTH-1)*2, frame_step=int((LENGTH-1)*2/4), fft_length=(LENGTH-1)*2)#,
STFT = STFT.numpy()
spec_tf = STFT.T
return spec_tf
def get_iSTFT_tf(spec):
STFT = tf.convert_to_tensor(spec.T)
sig = tf.signal.inverse_stft(STFT, frame_length=(LENGTH-1)*2, frame_step=int((LENGTH-1)*2/4), fft_length=(LENGTH-1)*2)#,
return sig.numpy()
def librosa_tf_comparison():
time = np.arange(0, 1000, 0.1);
sig = np.sin(time)
#STFT level 1
tf_STFT = get_STFT_tf(sig)
lib_STFT = get_STFT_librosa(sig)
tf_iSTFT = get_iSTFT_tf(tf_STFT)
lib_iSTFT = get_iSTFT_librosa(lib_STFT)
#STFT level 2
tf_STFT2 = get_STFT_tf(tf_iSTFT)
lib_STFT2 = get_STFT_librosa(lib_iSTFT)
tf_iSTFT2 = get_iSTFT_tf(tf_STFT2)
lib_iSTFT2 = get_iSTFT_librosa(lib_STFT2)
#STFT level 3
tf_STFT3 = get_STFT_tf(tf_iSTFT2)
lib_STFT3 = get_STFT_librosa(lib_iSTFT2)
tf_iSTFT3 = get_iSTFT_tf(tf_STFT3)
lib_iSTFT3 = get_iSTFT_librosa(lib_STFT3)
plt.plot(sig[:800])
plt.plot(tf_iSTFT[:800])
plt.plot(tf_iSTFT2[:800])
plt.plot(tf_iSTFT3[:800])
plt.show()
plt.plot(sig[:800])
plt.plot(lib_iSTFT[:800])
plt.plot(lib_iSTFT2[:800])
plt.plot(lib_iSTFT3[:800])
plt.show()
librosa_tf_comparison()
I have been trying to smooth curves with Savgol (scikit) and, in several of my attempt, raising the polynomial degree resulted in "drops" like the one I show below. This example is from Google trends data, but I had similar problems with stock data and electricity consumption data. Any lead as to why it behaves like it or how to solve it (and be able to raise the polynomial degree) would be highly appreciated.
Image below: "Sample output".
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
from pytrends.request import TrendReq
pytrends = TrendReq(hl='en-US', tz=360)
from scipy.signal import savgol_filter
kw_list = ["Carbon footprint"]
pytrends.build_payload(kw_list, timeframe='2004-12-14 2019-12-25', geo='', gprop='')
da1 = pytrends.interest_over_time()
#(drop last one for Savgol as need odd number, used to have 196 records)
Y3 = da1["Carbon footprint"]
fig = plt.figure(figsize=(18,9))
l = Y3.shape[0]
l = l if l%2 == 1 else l-1
# window = odd number closest to size of data
ax1 = plt.subplot(2,1,1)
ax1 = sns.lineplot(data=Y3, color="navy")
#Savgol with polynomial order = 7 is fine (but misses the initial plateau)
Y3_smooth = savgol_filter(Y3,l, 7)
ax1 = sns.lineplot(x=da1.index.to_pydatetime(),y=Y3_smooth, color="red")
plt.title(f"red = with Savgol, polynomial order = 7, window = {l}", fontsize=18)
ax2 = plt.subplot(2,1,2)
ax2 = sns.lineplot(data=Y3, color="navy")
#Savgol with polynomial order = 9 or more has a weird drop
Y3_smooth = savgol_filter(Y3,l, 10)
ax2 = sns.lineplot(x=da1.index.to_pydatetime(),y=Y3_smooth, color="red")
plt.title(f"red = with Savgol, polynomial order = 10, window = {l}", fontsize=18)
Sample output
If anyone is interested, I found this workaround using a different way to smooth. It works well including in the beginning and end, and allows a fine tuning of the degree of smoothing.
from scipy.ndimage.filters import gaussian_filter1d
def smooth(y, sigma=2):
y_smooth = gaussian_filter1d(y, sigma)
return y_smooth
I have a vector of 5 different values that I use as my sample value, and the label is a single integer of 0, 1, or 3. The machine learning algorithms work when I pass an array as a sample, but I get this warning. How do I pass feature vectors without getting this warning?
import numpy as np
from numpy import random
from sklearn import neighbors
from sklearn.model_selection import train_test_split
import pandas as pd
filepath = 'test.csv'
# example label values
index = [0,1,3,1,1,1,0,0]
# example sample arrays
data = []
for i in range(len(index)):
d = []
for i in range(6):
d.append(random.randint(50,200))
data.append(d)
feat1 = 'brightness'
feat2, feat3, feat4 = ['h', 's', 'v']
feat5 = 'median hue'
feat6 = 'median value'
features = [feat1, feat2, feat3, feat4, feat5, feat6]
df = pd.DataFrame(data, columns=features, index=index)
df.index.name = 'state'
with open(filepath, 'a') as f:
df.to_csv(f, header=f.tell() == 0)
states = pd.read_csv(filepath, usecols=['state'])
df_partial = pd.read_csv(filepath, usecols=features)
states = states.astype(np.float32)
states = states.values
labels = states
samples = np.array([])
for i, row in df_partial.iterrows():
r = row.values
samples = np.vstack((samples, r)) if samples.size else r
n_neighbors = 5
test_size = .3
labels, test_labels, samples, test_samples = train_test_split(labels, samples, test_size=test_size)
clf1 = neighbors.KNeighborsClassifier(n_neighbors, weights='distance')
clf1 = clf1.fit(samples, labels)
score1 = clf1.score(test_samples, test_labels)
print("Here's how the models performed \nknn: %d %%" %(score1 * 100))
Warning:
"DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). clf1 = clf1.fit(samples, labels)"
sklearn documentation for fit(self, X, Y)
Try replacing
states = states.values by states = states.values.flatten()
OR
clf1 = clf1.fit(samples, labels) by clf1 = clf1.fit(samples, labels.flatten()).
states = states.values holds the correct labels that were stored in your panda dataframe, however they are getting stored on different rows. Using .flatten() put all those labels on the same row. (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.ndarray.flatten.html)
In Sklearn's KNeighborsClassifier documentation
(https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html), they show in their example that the labels must be stored on the same row: y = [0, 0, 1, 1].
When you retrieve data from dataframe states, it is stored in multiple rows (column vector) whereas it expected values in single row.
You can also try using ravel() function which is used to create a contiguous flattened array.
numpy.ravel(array, order = āCā) : returns contiguous flattened array (1D array with all the input-array elements and with the same type as it)
Try:
states = states.values.ravel() in place of states = states.values
In matplotlib, it's easy to draw a line from data points with plt.plot(xs, ys, '-'+marker). This gets you an undirected line, where you can't tell from looking at the resulting diagram, which end corresponds to the beginning of the arrays of data points and which to the end of the arrays. It happens that for what I'm doing, it's important to be able to tell which end is which, or equivalently, which direction the line is going. What is the recommended way to plot a line so as to obtain that visual distinction?
The following would be one option. It is to add some arrow heads along a line. This can be done using a FancyArrowPatch.
import numpy as np ; np.random.seed(7)
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
class RL(object):
def __init__(self, n, d, s=0.1):
a = np.random.randn(n)*s
a[0] = np.random.rand(1)*np.pi*2
self.xy = np.random.rand(n,2)*5
self.xy[1,:] = self.xy[0,:] + np.array([d*np.cos(a[0]),d*np.sin(a[0])])
for i in range(2,n):
(x,y), = np.diff(self.xy[i-2:i,:], axis=0)
na = np.arctan2(y,x)+a[i]
self.xy[i,:] = self.xy[i-1,:] + np.array([d*np.cos(na),d*np.sin(na)])
self.x = self.xy[:,0]; self.y = self.xy[:,1]
l1 = RL(1000,0.005)
l2 = RL(1000,0.007)
l3 = RL(1000,0.005)
fig, ax = plt.subplots()
ax.set_aspect("equal")
ax.plot(l1.x, l1.y)
ax.plot(l2.x, l2.y)
ax.plot(l3.x, l3.y)
ax.plot(l1.x[0], l1.y[0], marker="o")
def arrow(x,y,ax,n):
d = len(x)//(n+1)
ind = np.arange(d,len(x),d)
for i in ind:
ar = FancyArrowPatch ((x[i-1],y[i-1]),(x[i],y[i]),
arrowstyle='->', mutation_scale=20)
ax.add_patch(ar)
arrow(l1.x,l1.y,ax,3)
arrow(l2.x,l2.y,ax,6)
arrow(l3.x,l3.y,ax,10)
plt.show()
I'd like to segment my image using numpy's label and then based on the number of indices found in each label remove those which satisfy my criteria. For example if an image with regions in it that I'd segmented were created like this and segmented using scipy's label:
from numpy import ones, zeros
from numpy.random import random_integers
from scipy.ndimage import label
image = zeros((512, 512), dtype='int')
regionator = ones((11, 11), dtype='int')
xs = random_integers(5, 506, size=500)
ys = random_integers(5, 506, size=500)
for x, y in zip(xs, ys):
image[x-5:x+6, y-5:y+6] = regionator
labels, n_labels = label(image)
Now I'd like to retrieve the indices for each region which has a size greater than 121 pixels (or one regionator size). I'd then like to take those indices and set them to zero so they are no longer part of the labeled image. What is the most efficient way to accomplish this task?
Essentially something similar to MATLAB's regionprops or utilizing IDL's reverse_indices output from its histogram function.
I would use bincount and threshold the result to make a lookup table:
import numpy as np
threshold = 121
size = np.bincount(labels.ravel())
keep_labels = size <= threshold
# Make sure the background is left as 0/False
keep_labels[0] = 0
filtered_labels = keep_labels[labels]
On the last above I index the array keep_labels with the array labels. This is called advanced indexing in numpy and it requires that labels be an integer array. Numpy then uses the elements of labels as indices to keep_labels and produces an array the same shape as labels.
Here's what I've found to work for me so far with good performance even for large datasets.
Using the get indices process taken from here I've come to this:
from numpy import argsort, histogram, reshape, where
import bisect
h = histogram(labels, bins=n_labels)
h_inds = where(h[0] > 121)[0]
labels_f = labels.flatten()
sortedind = argsort(labels_f)
sorted_labels_f = labels_f[sortedind]
inds = []
for i in range(1, len(h_inds)):
i1 = bisect.bisect_left(sorted_labels_f, h[1][h_inds[i]])
i2 = bisect.bisect_right(sorted_labels_f, h[1][h_inds[i]])
inds.extend(sortedind[i1:i2])
# Now get rid of all of those indices that were part of a label
# larger than 121 pixels
labels_f[inds] = 0
filtered_labels = reshape(labels_f, (512, 512))