How to specify the number of knot points when using scipy.splprep - numpy

I have the following piece of code. It generates the 3-D cubic spline of the given 3-D function given in parametric form. I pretty much adapted this to my case using the online documentation for splprep and splev.
But I have some things I don't understand. Here is the code:
%matplotlib inline
from numpy import arange, cos, linspace, pi, sin, random
from scipy.interpolate import splprep, splev
import matplotlib.pyplot as plt
# make ascending spiral in 3-space
t=linspace(0,1.75*2*pi,100)
x = sin(t)
y = cos(t)
z = t
# spline parameters
s=3.0 # smoothness parameter
k=3 # spline order
nest=-1 # estimate of number of knots needed (-1 = maximal)
# find the knot points
tck,u = splprep([x,y,z],s=s,k=k,nest=-1)
# evaluate spline, including interpolated points
xnew,ynew,znew = splev(linspace(0,1,400),tck)
I have a few questions regarding this implementation.
What exactly does the (t,c,k) tuple return in this case?. I read the documentation and it says it returns the knot points, the coefficients and the degree of the spline. Doesn't knot points have to be coordinates of the form (x, y, z)?. So we have to have "number of knots" such coordinate points. But that's not what gets returned. We simply get returned an array of length 11.
What does u return? (Documentation says it returns the values of the parameter. What does that mean?. The values of the parameter t?
When I use nest = -1 (This is the default) it uses the maximal number of knot points needed (in this case they use 11 knot points). But how do I specify my own number of knot points, let's say 50 or 80 etc?
I am completely misunderstanding the documentation here. Can someone enlighten me may be using examples?

Values of parameter, u
The idea is that your points [x,y,z] are values of some parameterized curve, the original parameter being t in your example. Knowing the values of the parameter t helps in figuring out how to interpolate between these points. For this reason, you are given an option to pass the values of parameter as optional argument u (it would be u=t in this example). But if you choose not to do so, the method will make a guess for the values of the parameter, based on the distances between given points (the parameter interval will be from 0 to 1). This guess is then returned to you as variable u, so you know how the data was interpreted. If you do pass u=t as the argument, the u you get back is exactly the same.
You don't really need this u to use the spline. However, if it is desirable to compare the location of original [x,y,z] points to the values of the spline, then you may want to pass u as an argument to splev. A shorter way to explain the meaning of u is: it's what splev would need to reproduce the [x,y,z] coordinates that you started with, with some deviations due to smoothing.
tck values
The knots of a spline, t, are the points in the parameter interval, not in the 3D space. Since in your example the parameter interval is [0,1], chosen by default, the values of t are in this range. A knot is a place on the parameter interval where some coefficients of the spline change. The endpoints 0 and 1 are technically multiple knots, so they are listed several times.
The 3D nature of the curve is expressed by coefficients c. You may notice it's a list of three arrays, one for each coordinate.
Number of knot points
With this interpolation routine, you have two choices:
tell it exactly what the knot points are (by giving task=-1 and providing t argument with the knots). To avoid confusion: this t is not necessarily the t from which you got the points [x,y,z]. One doesn't necessarily want every sample point to be a knot point.
leave the determination of the knots up to the routine, including their number.
However, the number of knots depends on the value of smoothness parameter s, so it can be influenced indirectly. For example, with your data there are 11 knots with s=3, but 12 knots with s=1 and 14 knots with s=0.1.

Related

numpy: Different results between diff and gradient for finite differences

I want to calculate the numerical derivative of two arrays a and b.
If I do
c = diff(a) / diff(b)
I get what I want, but I loose the edge (the last point) so c.shape ~= a.shape.
If I do
c = gradient(a, b)
then c.shape = a.shape, but I get a completely different result.
I have read how gradient is calculated in numpy and I guess it does a completely different thing, although I dont understand quite well the difference yet. But is there a way or another function to calculate the differential which also gives the values at the edges?
And why is the result so different between gradient and diff?
These functions, although related, do different actions.
np.diff simply takes the differences of matrix slices along a given axis, and used for n-th difference returns a matrix smaller by n along the given axis (what you observed in the n=1 case). Please see: https://docs.scipy.org/doc/numpy/reference/generated/numpy.diff.html
np.gradient produces a set of gradients of an array along all its dimensions while preserving its shape https://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html Please also observe that np.gradient should be executed for one input array, your second argument b does not make sense here (was interpreted as first non-keyword argument from *varargs which is meant to describe spacings between the values of the first argument), hence the results that don't match your intuition.
I would simply use c = diff(a) / diff(b) and append values to c if you really need to have c.shape match a.shape. For instance, you might append zeros if you expect the gradient to vanish close to the edges of your window.

Bad result plotting windowing FFT

im playing with python and scipy to understand windowing, i made a plot to see how windowing behave under FFT, but the result is not what i was specting.
the plot is:
the middle plots are pure FFT plot, here is where i get weird things.
Then i changed the trig. function to get leak, putting a 1 straight for the 300 first items of the array, the result:
the code:
sign_freq=80
sample_freq=3000
num=np.linspace(0,1,num=sample_freq)
i=0
#wave data:
sin=np.sin(2*pi*num*sign_freq)+np.sin(2*pi*num*sign_freq*2)
while i<1000:
sin[i]=1
i=i+1
#wave fft:
fft_sin=np.fft.fft(sin)
fft_freq_axis=np.fft.fftfreq(len(num),d=1/sample_freq)
#wave Linear Spectrum (Rms)
lin_spec=sqrt(2)*np.abs(np.fft.rfft(sin))/len(num)
lin_spec_freq_axis=np.fft.rfftfreq(len(num),d=1/sample_freq)
#window data:
hann=np.hanning(len(num))
#window fft:
fft_hann=np.fft.fft(hann)
#window fft Linear Spectrum:
wlin_spec=sqrt(2)*np.abs(np.fft.rfft(hann))/len(num)
#window + sin
wsin=hann*sin
#window + sin fft:
wsin_spec=sqrt(2)*np.abs(np.fft.rfft(wsin))/len(num)
wsin_spec_freq_axis=np.fft.rfftfreq(len(num),d=1/sample_freq)
fig=plt.figure()
ax1 = fig.add_subplot(431)
ax2 = fig.add_subplot(432)
ax3 = fig.add_subplot(433)
ax4 = fig.add_subplot(434)
ax5 = fig.add_subplot(435)
ax6 = fig.add_subplot(436)
ax7 = fig.add_subplot(413)
ax8 = fig.add_subplot(414)
ax1.plot(num,sin,'r')
ax2.plot(fft_freq_axis,abs(fft_sin),'r')
ax3.plot(lin_spec_freq_axis,lin_spec,'r')
ax4.plot(num,hann,'b')
ax5.plot(fft_freq_axis,fft_hann)
ax6.plot(lin_spec_freq_axis,wlin_spec)
ax7.plot(num,wsin,'c')
ax8.plot(wsin_spec_freq_axis,wsin_spec)
plt.show()
EDIT: as asked in the comments, i plotted the functions in dB scale, obtaining much clearer plots. Thanks a lot #SleuthEye !
It appears the plot which is problematic is the one generated by:
ax5.plot(fft_freq_axis,fft_hann)
resulting in the graph:
instead of the expected graph from Wikipedia.
There are a number of issues with the way the plot is constructed. The first is that this command essentially attempts to plot a complex-valued array (fft_hann). You may in fact be getting the warning ComplexWarning: Casting complex values to real discards the imaginary part as a result. To generate a graph which looks like the one from Wikipedia, you would have to take the magnitude (instead of the real part) with:
ax5.plot(fft_freq_axis,abs(fft_hann))
Then we notice that there is still a line striking through our plot. Looking at np.fft.fft's documentation:
The values in the result follow so-called “standard” order: If A = fft(a, n), then A[0] contains the zero-frequency term (the sum of the signal), which is always purely real for real inputs. Then A[1:n/2] contains the positive-frequency terms, and A[n/2+1:] contains the negative-frequency terms, in order of decreasingly negative frequency.
[...]
The routine np.fft.fftfreq(n) returns an array giving the frequencies of corresponding elements in the output.
Indeed, if we print the fft_freq_axis we can see that the result is:
[ 0. 1. 2. ..., -3. -2. -1.]
To get around this problem we simply need to swap the lower and upper parts of the arrays with np.fft.fftshift:
ax5.plot(np.fft.fftshift(fft_freq_axis),np.fft.fftshift(abs(fft_hann)))
Then you should note that the graph on Wikipedia is actually shown with amplitudes in decibels. You would then need to do the same with:
ax5.plot(np.fft.fftshift(fft_freq_axis),np.fft.fftshift(20*np.log10(abs(fft_hann))))
We should then be getting closer, but the result is not quite the same as can be seen from the following figure:
This is due to the fact that the plot on Wikipedia actually has a higher frequency resolution and captures the value of the frequency spectrum as its oscillates, whereas your plot samples the spectrum at fewer points and a lot of those points have near zero amplitudes. To resolve this problem, we need to get the frequency spectrum of the window at more frequency points.
This can be done by zero padding the input to the FFT, or more simply setting the parameter n (desired length of the output) to a value much larger than the input size:
N = 8*len(num)
fft_freq_axis=np.fft.fftfreq(N,d=1/sample_freq)
fft_hann=np.fft.fft(hann, N)
ax5.plot(np.fft.fftshift(fft_freq_axis),np.fft.fftshift(20*np.log10(abs(fft_hann))))
ax5.set_xlim([-40, 40])
ax5.set_ylim([-50, 80])

Interpolating data onto a line of points

I have some irregularly spaced data and need to analyze it. I can successfully interpolate this data onto a regular grid using mlab.griddata (or rather, the natgrid implementation of it). This allows me to use pcolormesh and contour to generate plots, extract levels, etc. Using plot.contour, I then extract a certain level using get_paths from the contour CS.collections().
Now, what I'd like to do is then, with my original irregularly spaced data, interpolate some quantities onto this specific contour line (i.e., NOT onto a regular grid). The similarly named griddata function from Scipy allows for this behavior, and it almost works. However, I find that as I increase the number of original points, I can get odd erratic behavior in the interpolation. I'm wondering if there's a way around this, i.e., another way to interpolate irregularly spaced (or regularly spaced data for that matter, since I can use my regularly spaced data from mlab.griddata) onto a specific line.
Let me show some numerical examples of what I'm talking about. Take a look at this figure:
The top left shows my data as points, and the line shows an extracted level of level=0 from some data D that I have at those points (x,y) [note, I have data 'D', 'Energy', and 'Pressure', all defined in this (x,y) space]. Once I have this curve, I can plot the interpolated quantities of D, Energy, and Pressure onto my specific line. First, note the plot of D (middle, right). It should be zero at all points, but it's not quite zero at all points. The likely cause of this is that the line that corresponds to the 0 level is generated from a uniform set of points that came from mlab.griddata, whereas the plot of 'D' is generated from my ORIGINAL data interpolated onto that level curve. You can also see some unphysical wiggles in 'Energy' and 'Pressure'.
Okay, seems easy enough, right? Maybe I should just get more original data points along my level=0 curve. Getting some more of these points, I then generate the following plots:
First look at the top left. You can see that I've sampled the hell out of the (x,y) space in the vicinity of my level=0 curve. Furthermore, you can see that my new "D" plot (middle, right) now correctly interpolates to zero in the region that it originally didn't. But now I get some wiggles at the start of the curve, as well as getting some other wiggles in the 'Energy' and 'Pressure' in this space! It is far from obvious to me that this should occur, since my original data points are still there and I've only supplemented additional points. Furthermore, some regions where my interpolation is going bad aren't even near the points that I added in the second run -- they are exclusively neighbored by my original points.
So this brings me to my original question. I'm worried that the interpolation that produces the 'Energy', 'D', and 'Pressure' curves is not working correctly (this is scigrid's griddata). Mlab's griddata only interpolates to a regular grid, whereas I want to interpolate to this specific line shown in the top left plot. What's another way for me to do this?
Thanks for your time!
After posting this, I decided to try scipy.interpolate.SmoothBivariateSpline, which produced the following result:
You can now see that my line is smoothed, so it seems like this will work. I'll mark this as the answer unless someone posts something soon that hints that there may be an even better solution.
Edit: As requested, below is some of the code used to generate these plots. I don't have a minimally working example, and the above plots were generated in a larger framework of code, but I'll write the important parts schematically below with comments.
# x,y,z are lists of data where the first point is x[0],y[0],z[0], and so on
minx=min(x)
maxx=max(x)
miny=min(y)
maxy=max(y)
# convert to numpy arrays
x=np.array(x)
y=np.array(y)
z=np.array(z)
# here we are creating a fine grid to interpolate the data onto
xi=np.linspace(minx,maxx,100)
yi=np.linspace(miny,maxy,100)
# here we interpolate our data from the original x,y,z unstructured grid to the new
# fine, regular grid in xi,yi, returning the values zi
zi=griddata(x,y,z,xi,yi)
# now let's do some plotting
plt.figure()
# returns the CS contour object, from which we'll be able to get the path for the
# level=0 curve
CS=plt.contour(x,y,z,levels=[0])
# can plot the original data if we want
plt.scatter(x,y,alpha=0.5,marker='x')
# now let's get the level=0 curve
for c in CS.collections:
data=c.get_paths()[0].vertices
# lineX,lineY are simply the x,y coordinates for our level=0 curve, expressed as arrays
lineX=data[:,0]
lineY=data[:,1]
# so it's easy to plot this too
plt.plot(lineX,lineY)
# now what to do if we want to interpolate some other data we have, say z2
# (also at our original x,y positions), onto
# this level=0 curve?
# well, first I tried using scipy.interpolate.griddata == scigrid like so
origdata=np.transpose(np.vstack((x,y))) # just organizing this data like the
# scigrid routine expects
lineZ2=scigrid(origdata,z2,data,method='linear')
# plotting the above curve (as plt.plot(lineZ2)) gave me really bad results, so
# trying a spline approach
Z2spline=SmoothBivariateSpline(x,y,z2)
# the above creates a spline object on our original data. notice we haven't EVALUATED
# it anywhere yet (we'll want to evaluate it on our level curve)
Z2Line=[]
# here we evaluate the spline along all our points on the level curve, and store the
# result as a new list
for i in range(0,len(lineX)):
Z2Line.append(Z2spline(lineX[i],lineY[i])[0][0]) # the [0][0] is just to get the
# value, which is enclosed in
# some array structure for some
# reason otherwise
# you can then easily plot this
plt.plot(Z2Line)
Hope this helps someone!

Fitting curves to a set of points

Basically, I have a set of up to 100 co-ordinates, along with the desired tangents to the curve at the first and last point.
I have looked into various methods of curve-fitting, by which I mean an algorithm with takes the inputted data points and tangents, and outputs the equation of the cure, such as the gaussian method and interpolation, but I really struggled understanding them.
I am not asking for code (If you choose to give it, thats acceptable though :) ), I am simply looking for help into this algorithm. It will eventually be converted to Objective-C for an iPhone app, if that changes anything..
EDIT:
I know the order of all of the points. They are not too close together, so passing through all points is necessary - aka interpolation (unless anyone can suggest something else). And as far as I know, an algebraic curve is what I'm looking for. This is all being done on a 2D plane by the way
I'd recommend to consider cubic splines. There is some explanation and code to calculate them in plain C in Numerical Recipes book (chapter 3.3)
Most interpolation methods originally work with functions: given a set of x and y values, they compute a function which computes a y value for every x value, meeting the specified constraints. As a function can only ever compute a single y value for every x value, such an curve cannot loop back on itself.
To turn this into a real 2D setup, you want two functions which compute x resp. y values based on some parameter that is conventionally called t. So the first step is computing t values for your input data. You can usually get a good approximation by summing over euclidean distances: think about a polyline connecting all your points with straight segments. Then the parameter would be the distance along this line for every input pair.
So now you have two interpolation problem: one to compute x from t and the other y from t. You can formulate this as a spline interpolation, e.g. using cubic splines. That gives you a large system of linear equations which you can solve iteratively up to the desired precision.
The result of a spline interpolation will be a piecewise description of a suitable curve. If you wanted a single equation, then a lagrange interpolation would fit that bill, but the result might have odd twists and turns for many sets of input data.

Creating grid and interpolating (x,y,z) for contour plot sagemath

!I have values in the form of (x,y,z). By creating a list_plot3d plot i can clearly see that they are not quite evenly spaced. They usually form little "blobs" of 3 to 5 points on the xy plane. So for the interpolation and the final "contour" plot to be better, or should i say smoother(?), do i have to create a rectangular grid (like the squares on a chess board) so that the blobs of data are somehow "smoothed"? I understand that this might be trivial to some people but i am trying this for the first time and i am struggling a bit. I have been looking at the scipy packages like scipy.interplate.interp2d but the graphs produced at the end are really bad. Maybe a brief tutorial on 2d interpolation in sagemath for an amateur like me? Some advice? Thank you.
EDIT:
https://docs.google.com/file/d/0Bxv8ab9PeMQVUFhBYWlldU9ib0E/edit?pli=1
This is mostly the kind of graphs it produces along with this message:
Warning: No more knots can be added because the number of B-spline
coefficients
already exceeds the number of data points m. Probably causes:
either
s or m too small. (fp>s)
kx,ky=3,3 nx,ny=17,20 m=200 fp=4696.972223 s=0.000000
To get this graph i just run this command:
f_interpolation = scipy.interpolate.interp2d(*zip(*matrix(C)),kind='cubic')
plot_interpolation = contour_plot(lambda x,y:
f_interpolation(x,y)[0], (22.419,22.439),(37.06,37.08) ,cmap='jet', contours=numpy.arange(0,1400,100), colorbar=True)
plot_all = plot_interpolation
plot_all.show(axes_labels=["m", "m"])
Where matrix(c) can be a huge matrix like 10000 X 3 or even a lot more like 1000000 x 3. The problem of bad graphs persists even with fewer data like the picture i attached now where matrix(C) was only 200 x 3. That's why i begin to think that it could be that apart from a possible glitch with the program my approach to the use of this command might be totally wrong, hence the reason for me to ask for advice about using a grid and not just "throwing" my data into a command.
I've had a similar problem using the scipy.interpolate.interp2d function. My understanding is that the issue arises because the interp1d/interp2d and related functions use an older wrapping of FITPACK for the underlying calculations. I was able to get a problem similar to yours to work using the spline functions, which rely on a newer wrapping of FITPACK. The spline functions can be identified because they seem to all have capital letters in their names here http://docs.scipy.org/doc/scipy/reference/interpolate.html. Within the scipy installation, these newer functions appear to be located in scipy/interpolate/fitpack2.py, while the functions using the older wrappings are in fitpack.py.
For your purposes, RectBivariateSpline is what I believe you want. Here is some sample code for implementing RectBivariateSpline:
import numpy as np
from scipy import interpolate
# Generate unevenly spaced x/y data for axes
npoints = 25
maxaxis = 100
x = (np.random.rand(npoints)*maxaxis) - maxaxis/2.
y = (np.random.rand(npoints)*maxaxis) - maxaxis/2.
xsort = np.sort(x)
ysort = np.sort(y)
# Generate the z-data, which first requires converting
# x/y data into grids
xg, yg = np.meshgrid(xsort,ysort)
z = xg**2 - yg**2
# Generate the interpolated, evenly spaced data
# Note that the min/max of x/y isn't necessarily 0 and 100 since
# randomly chosen points were used. If we want to avoid extrapolation,
# the explicit min/max must be found
interppoints = 100
xinterp = np.linspace(xsort[0],xsort[-1],interppoints)
yinterp = np.linspace(ysort[0],ysort[-1],interppoints)
# Generate the kernel that will be used for interpolation
# Note that the default version uses three coefficients for
# interpolation (i.e. parabolic, a*x**2 + b*x +c). Higher order
# interpolation can be used by setting kx and ky to larger
# integers, i.e. interpolate.RectBivariateSpline(xsort,ysort,z,kx=5,ky=5)
kernel = interpolate.RectBivariateSpline(xsort,ysort,z)
# Now calculate the linear, interpolated data
zinterp = kernel(xinterp, yinterp)