I have a graph like this and since i want to use the log scale in the frequency domain i want to resample the data so the points are evently distributed. When I use the interpolate function of numpy like this:
f_new = np.geomspace(f[0], f[-1], points)
mag_new = np.interp(f_new, f, mag)
it would just interpolate between the neighboring point but i want to take the average of the nearast points.
Is there an elegant numpyish way to do this using some agglomaration function?
Thanks ahead!!!
Related
I have spectrogram data from an audio analysis which looks like this:
On one axis I have frequencies in Hz and in the other times in seconds. I added the grid over the map to show the actual data points. Due to the nature of the used frequency analysis, the best results never give evenly spaced time and frequency values.
To allow comparison data from multiple sources, I would like to normalize this data. For this reason, I would like to calculate the peak values (maximum and minimum values) for specified areas in the map.
The second visualization shows the areas where I would like to calculate the peak values. I marked an area with a green rectangle to visualize this.
While for the time values, I would like to use equally spaced ranges (e.g 0.0-10.0, 10.0-20.0, 20.0-30.0), The frequency ranges are unevenly distributed. In higher frequencies, they will be like 450-550, 550-1500, 1500-2500, ...
You can download an example data-set here: data.zip. You can unpack the datasets like this:
with np.load(DATA_PATH) as data:
frequency_labels = data['frequency_labels']
time_labels = data['time_labels']
spectrogram_data = data['data']
DATA_PATH has to point to the path of the .npz data file.
As input, I would provide an array of frequency and time ranges. The result should be another 2d NumPy ndarray with either the maximum or the minimum values. As the amount of data is huge, I would like to rely on NumPy as much as possible to speed up the calculations.
How do I calculate the maximum/minimum values of defined areas from a 2d data map?
I am trying to do a time series prediction with ARIMA.
So, as the first step, I am doing some series transformation
#Taking log transform
dflog=np.log(df)
#Taking exponential weighted mean`enter code here`
df_expwighted_mean = dflog.ewm(span=12).mean()
#Taking moving average
df_expwighted_mean_diff = dflog - df_expwighted_mean
#Differencing
df_diff = df_expwighted_mean_diff - df_expwighted_mean_diff.shift()
#filling zero for NaN
df_diff = df_diff.fillna(0)
And after with the below code I am very much able to reach back to the original series
# Take cumulative some to remove the differencing
bdf_expwighted_mean_diff = df_diff.cumsum()
# Add rolling mean as we originally reduced it
bdf_log=bdf_expwighted_mean_diff + df_expwighted_mean
#Take exponentiation as we originally did log transform
bdf=np.exp(bdf_log)
But the problem comes when I do this on the predicted series.
It fails on it as I do not have the EWM of the predicted series.(pdf_expwighted_mean)
SO basically, I want some way to reverse the exponentially weighted mean.
df_expwighted_mean = dflog.ewm(span=12).mean()
Any thoughts?
It doesn't make sense to reverse exponentially weighted mean in Time series prediction. Exponentially weighted mean is used smoothen a time series, basically you are trying to remove noise from the series that would otherwise make the series hard to predict.
For Example: Let red series be your actual data, blue is the EWMA series, green is predicted series based on EWMA series in the following image
Once you use the smoothened series to predict, reversing EWMA would mean you add noise to it. You are able to it on source data becuase you stored the noise data from your original data. Usualy you just use the predictions on EWMA as is, ie. no reversing of EWMA required.
In your case, just do cumsum and exp(to reverse differencing and log).
I have a dataset with three columns: lat, lon, and wind speed. My goal is to have a 2-dimensional lat/lon gridded array that sums the wind speed observations that fall within each gridbox. It seems like that should be possible with groupby or cut in pandas. But I can't puzzle through how to do that.
Here is an example of what I'm trying to replicate from another language: https://www.ncl.ucar.edu/Document/Functions/Built-in/bin_sum.shtml
It sounds like you are using pandas. Are the data already binned? If so, something like this should work
data.groupby(["lat_bins", "lon_bins"]).sum()
If the lat and lon data are not binned yet, you can use pandas.cut to create a binned value column like this
data["lat_bins"] = pandas.cut(x = data["lat"], bins=[...some binning...])
I have a 3D point distribution in a numpy array formed of 1,000,000 points, lets call it points. I would like to take a 10% uniform sample so that the points are evenly distributed (e.g. every 10th point)
I think this is what I'm looking for but this generates data, how do I sample existing data?
numpy.random.uniform(low=0.0, high=1.0, size=None)
In case I understood the problem right you would just need to do this:
points[::10]
to get every 10th element of points.
If that is not what you wanted, please clarify.
simple indexing will do it:
# data
x = numpy.random.uniform(low=0.0, high=1.0, size=(300,3))
#sampled result
sample_step = 10
y = x[:-1:sample_step]
I have sampled data and plot it with imshow():
I would like to interpolate just in horizontal axis so that I can easier distinguish samples and spot features.
Is it possible to make interpolation just in one direction with MPL?
Update:
SciPy has whole package with various interpolation methods.
I used simplest interp1d, as suggested by tcaswell:
def smooth_inter_fun(r):
s = interpolate.interp1d(arange(len(r)), r)
xnew = arange(0, len(r)-1, .1)
return s(xnew)
new_data = np.vstack([smooth_inter_fun(r) for r in data])
Linear and cubic results:
As expected :)
This tutorial covers a range of interpolation available in numpy/scipy. If you want to just one direction, I would work on each row independently and then re-assemble the results. You might also be interested is simply smoothing your data (exmple, Python Smooth Time Series Data, Using strides for an efficient moving average filter).
def smooth_inter_fun(r):
#what ever process you want to use
new_data = np.vstack([smooth_inter_fun(r) for r in data])