In a previous question (fastest way to use numpy.interp on a 2-D array) someone asked for the fastest way to implement the following:
np.array([np.interp(X[i], x, Y[i]) for i in range(len(X))])
assume X and Y are matrices with many rows so the for loop is costly. There is a nice solution in this case that avoids the for loop (see linked answer above).
I am faced with a very similar problem, but I am unclear on whether the for loop can be avoided in this case:
np.array([np.interp(x, X[i], Y[i]) for i in range(len(X))])
In other words, I want to use linear interpolation to upsample a large number of signals stored in the rows of two matrices X and Y.
I was hoping to find a function in numpy or scipy (scipy.interpolate.interp1d) that supported this operation via broadcasting semantics but I so far can't seem to find one.
Other points:
If it helps, the rows X[i] and x are pre-sorted in my application. Also, in my case len(x) is quite a bit larger than len(X[i]).
The function scipy.signal.resample almost does what I want, but it doesn't use linear interpolation...
This is a vectorized approach that directly implements linear interpolation. First, for each x value and each i, j compute the weight w expressing how much of the interval (X[i, j], X[i, j+1]) is to the left of x.
If the entire interval is to the left of x, the weight of that interval is 1.
If none of the subinterval is to the left, the weight is 0
Otherwise, the weight is a number between 0 and 1, expressing the proportion of that interval to the left of x.
Then the value of PL interpolant is computed as Y[i, 0] + sum of differences dY[i, j] multiplied by the corresponding weight. The logic is to follow by how much the interpolant changes from interval to interval. The differences dY = np.diff(Y, axis=1) show how much it changes over the entire interval. Multiplication by the weight prorates that change accordingly.
Setup, with some small data arrays
import numpy as np
X = np.array([[0, 2, 5, 6, 9], [1, 3, 4, 7, 8]])
Y = np.array([[3, 5, 2, 4, 1], [8, 6, 9, 5, 4]])
x = np.linspace(1, 8, 20)
The computation
dX = np.diff(X, axis=1)
dY = np.diff(Y, axis=1)
w = np.clip((x - X[:, :-1, None])/dX[:, :, None], 0, 1)
y = Y[:, [0]] + np.sum(w*dY[:, :, None], axis=1)
Demonstration
This is only to show that the interpolation is correct. Blue points: original data, red ones are computed.
import matplotlib.pyplot as plt
plt.plot(x, y[0], 'ro')
plt.plot(X[0], Y[0], 'bo')
plt.plot(x, y[1], 'rd')
plt.plot(X[1], Y[1], 'bd')
plt.show()
Related
I've got a 3D tensor x (e.g 4x4x100). I want to obtain a subset of this by explicitly choosing elements across the last dimension. This would have been easy if I was choosing the same elements across last dimension (e.g. x[:,:,30:50] but I want to target different elements across that dimension using the 2D tensor indices which specifies the idx across third dimension. Is there an easy way to do this in numpy?
A simpler 2D example:
x = [[1,2,3,4,5,6],[10,20,30,40,50,60]]
indices = [1,3]
Let's say I want to grab two elements across third dimension of x starting from points specified by indices. So my desired output is:
[[2,3],[40,50]]
Update: I think I could use a combination of take() and ravel_multi_index() but some of the platforms that are inspired by numpy (like PyTorch) don't seem to have ravel_multi_index so I'm looking for alternative solutions
Iterating over the idx, and collecting the slices is not a bad option if the number of 'rows' isn't too large (and the size of the sizes is relatively big).
In [55]: x = np.array([[1,2,3,4,5,6],[10,20,30,40,50,60]])
In [56]: idx = [1,3]
In [57]: np.array([x[j,i:i+2] for j,i in enumerate(idx)])
Out[57]:
array([[ 2, 3],
[40, 50]])
Joining the slices like this only works if they all are the same size.
An alternative is to collect the indices into an array, and do one indexing.
For example with a similar iteration:
idxs = np.array([np.arange(i,i+2) for i in idx])
But broadcasted addition may be better:
In [58]: idxs = np.array(idx)[:,None]+np.arange(2)
In [59]: idxs
Out[59]:
array([[1, 2],
[3, 4]])
In [60]: x[np.arange(2)[:,None], idxs]
Out[60]:
array([[ 2, 3],
[40, 50]])
ravel_multi_index is not hard to replicate (if you don't need clipping etc):
In [65]: np.ravel_multi_index((np.arange(2)[:,None],idxs),x.shape)
Out[65]:
array([[ 1, 2],
[ 9, 10]])
In [66]: x.flat[_]
Out[66]:
array([[ 2, 3],
[40, 50]])
In [67]: np.arange(2)[:,None]*x.shape[1]+idxs
Out[67]:
array([[ 1, 2],
[ 9, 10]])
along the 3D axis:
x = [x[:,i].narrow(2,index,2) for i,index in enumerate(indices)]
x = torch.stack(x,dim=1)
by enumerating you get the index of the axis and index from where you want to start slicing in one.
narrow gives you a zero-copy length long slice from a starting index start along a certain axis
you said you wanted:
dim = 2
start = index
length = 2
then you simply have to stack these tensors back to a single 3D.
This is the least work intensive thing i can think of for pytorch.
EDIT
if you just want different indices along different axis and indices is a 2D tensor you can do:
x = [x[:,i,index] for i,index in enumerate(indices)]
x = torch.stack(x,dim=1)
You really should have given a proper working example, making it unnecessarily confusing.
Here is how to do it in numpy, now clue about torch, though.
The following picks a slice of length n along the third dimension starting from points idx depending on the other two dimensions:
# example
a = np.arange(60).reshape(2, 3, 10)
idx = [(1,2,3),(4,3,2)]
n = 4
# build auxiliary 4D array where the last two dimensions represent
# a sliding n-window of the original last dimension
j,k,l = a.shape
s,t,u = a.strides
aux = np.lib.stride_tricks.as_strided(a, (j,k,l-n+1,n), (s,t,u,u))
# pick desired offsets from sliding windows
aux[(*np.ogrid[:j, :k], idx)]
# array([[[ 1, 2, 3, 4],
# [12, 13, 14, 15],
# [23, 24, 25, 26]],
# [[34, 35, 36, 37],
# [43, 44, 45, 46],
# [52, 53, 54, 55]]])
I came up with below using broadcasting:
x = np.array([[1,2,3,4,5,6,7,8,9,10],[10,20,30,40,50,60,70,80,90,100]])
i = np.array([1,5])
N = 2 # number of elements I want to extract along each dimension. Starting points specified in i
r = np.arange(x.shape[-1])
r = np.broadcast_to(r, x.shape)
ii = i[:, np.newaxis]
ii = np.broadcast_to(ii, x.shape)
mask = np.logical_and(r-ii>=0, r-ii<=N)
output = x[mask].reshape(2,3)
Does this look alright?
I want to split a long vector into smaller unequal pieces, do a summation on each piece and gather the results into a new vector.
I need to do this in pytorch but I am also interested to see how this is done with numpy.
This can easily be accomplish by splitting the vector.
sizes = [3, 7, 5, 9]
X = torch.ones(sum(sizes))
Y = torch.tensor([s.sum() for s in torch.split(X, sizes)])
or with np.ones and np.split.
Is there a more efficient way to do this?
Edit:
Inspired by the first comment:
indices = np.cumsum([0]+sizes)[:-1]
Y = np.add.reduceat(X, indices.tolist())
solves it for numpy. I am still looking for a solution with pytorch.
index_add_ is your friend!
# inputs
sizes = torch.tensor([3, 7, 5, 9], dtype=torch.long)
x = torch.ones(sizes.sum())
# prepare an index vector for summation (what elements of x are summed to each element of y)
ind = torch.zeros(sizes.sum(), dtype=torch.long)
ind[torch.cumsum(sizes, dim=0)[:-1]] = 1
ind = torch.cumsum(ind, dim=0)
# prepare the output
y = torch.zeros(len(sizes))
# do the actual summation
y.index_add_(0, ind, x)
Suppose I have a very basic function in Python:
def f(x, y):
return x + y
Then I can call this with scalars, f(1, 5.4) == 6.4 or with numpy vectors of arbitrary (but the same) shape. E.g. this works:
x = np.arange(3)
y = np.array([1,4,2.3])
f(x, y)
which gives an array with entries 1, 5, 4.3.
But what if f is more complicated? For example, xx and yy are 1D numpy arrays here.
def g(x, y):
return np.sum((xx - x)**2 + (yy - y)**2)
(I hasten to add that I'm not interested in this specific g, but in general strategies...) Then g(5, 6) works fine, but if I want to pass numpy arrays, I seem to have to write a very different function with explict broadcasting etc. For example:
def gg(x, y):
xfull = np.stack([x]*len(xx),axis=-1)
yfull = np.stack([y]*len(xx),axis=-1)
return np.sum((xfull - xx)**2 + (yfull - yy)**2, axis=-1)
This does now work with scalars and arrays. But it seems like a mess, and is hard to read.
Is there a better way?
Given:
def g(x, y):
return np.sum((xx - x)**2 + (yy - y)**2)
my first questions are:
this is written with scalar x and y in mind?
what are xx and yy? You say 1d arrays. Same length?
why aren't they parameters? Because in this context they are fixed?
in words, this offsets xx and yy by constant amounts and takes the sum of their squares, returning a single value?
My next step is to explore the 'broadcasting' limits of this expression. For example it runs for any x that can be used in xx-x. That could be a 0d array, a one element 1d array, an array with the same shape as xx, or anything else that can 'broadcast' with `xx. That's where a thorough understanding of 'broadcasting' is essential.
g(1,2)
g(xx,xx)
g(xx[:,None],yy[None,:])
xx-xx[:,None] though produces a 2d array. np.sum as written takes the sum over all values, i.e. a flattened. Your gg suggests you want to sum on the last axis. If so go ahead and put that in g
def g(x, y):
return np.sum((xx - x)**2 + (yy - y)**2, axis=-1)
Your use of stack in gg produces:
In [101]: xx
Out[101]: array([0, 1, 2, 3, 4])
In [103]: np.stack([np.arange(3)]*len(xx), axis=-1)
Out[103]:
array([[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2]])
I would have written that as x[:,None]
In [104]: xx-_
Out[104]:
array([[ 0, 1, 2, 3, 4],
[-1, 0, 1, 2, 3],
[-2, -1, 0, 1, 2]])
In [105]: xx-np.arange(3)[:,None]
Out[105]:
array([[ 0, 1, 2, 3, 4],
[-1, 0, 1, 2, 3],
[-2, -1, 0, 1, 2]])
That does not work with scalar x; but this does
xx-np.asarray(x)[...,None]
np.array or np.asarray is commonly used as the start of numpy functions to accommodate scalar or list inputs. ... is handy when dealing with a variable number of dimensions. reshape(...,-1) and [...,None] are widely used to expand or generalize dimensions.
I've learned a lot by looking the Python code of numpy functions. I've also learned from years of work with MATLAB to be pedantic about dimensions. Keep track of intended and actual array shapes. It helps to use test shapes that will highlight errors. Test with a (2,3) array instead of an ambiguous (3,3) one.
In NumPy:
A = np.array([[1,2,3],[4,5,6]])
array([[1, 3, 5],
[2, 4, 6]])
B = np.array([[1,2],[3,4],[5,6]])
array([[1, 2],
[3, 4],
[5, 6]])
A.dot(B)
array([[35, 44],
[44, 56]])
I only care about getting A.dot(B).diagonal() = array([35, 56])
Is there a way I can get array([35, 56]) without having to compute the inner products of all the rows and columns? I.e. the inner product of the ith row with ith column?
I ask because the performance difference becomes more significant for larger matrices.
This is just matrix multiplication for 2D arrays:
C[i, j] = sum(A[i, ] * B[, j])
So since you just want the diagonal elements, looks like you're after
sum(A[i, ] * B[, i]) # for each i
So you could just use list comprehension:
[np.dot(A[i,:], B[:, i]) for i in xrange(A.shape[0])]
# [22, 64]
OR, (and this only works because you want a diagonal so this assumes that if A's dimensions are n x m, B's dimensions will be m x n):
np.sum(A * B.T, axis=1)
# array([22, 64])
(no fancy numpy tricks going on here, just playing around with the maths).
Can you simply leave out the row in the parameter you don't care about?
The 2x3 x 3x2 gives you a 2x2 result.
A 1x3 x 3x2 matrices will give you only the top row of [A][B], a 1x2 matrix.
EDIT: misread the question. Still, each value in the matrix is produced by the product of the transpose of a column and a row.
I'm trying to understand how to build arrays for use in plot_surface (in Axes3d).
I tried to build a simple surface manipulating data of those arrays:
In [106]: x
Out[106]:
array([[0, 0],
[0, 1],
[0, 0]])
In [107]: y
Out[107]:
array([[0, 0],
[1, 1],
[0, 0]])
In [108]: z
Out[108]:
array([[0, 0],
[1, 1],
[2, 2]])
But I can't figure out how they are interpreted - for example there is nothing in z=2 on my plot.
Anybody please explain exactly which values will be taken to make point, which for line and finally surface.
For example I would like to build a surface that would connect with lines points:
[0,0,0]->[1,1,1]->[0,0,2]
[0,0,0]->[1,-1,1]->[0,0,2]
and a surface between those lines.
What should arrays for plot_surface look like to get something like this?
Understanding how the grids in plot_surface work is not easy. So first I'll give a general explanation, and then I'll explain how to convert the data in your case.
If you have an array of N x values and an array of M y values, you need to create two grids of x and y values of dimension (M,N) each. Fortunately numpy.meshgrid will help. Confused? See an example:
x = np.arange(3)
y=np.arange(1,5)
X, Y = np.meshgrid(x,y)
The element (x[i], y[j]) is accessed as (X[j,i], Y[j,i]). And its Z value is, of course, Z[j,i], which you also need to define.
Having said that, your data does produce a point of the surface in (0,0,2), as expected. In fact, there are two points at that position, coming from coordinate indices (0,0,0) and (1,1,1).
I attach the result of plotting your arrays with:
fig = plt.figure()
ax=fig.add_subplot(1,1,1, projection='3d')
surf=ax.plot_surface(X, Y, Z)
If I understand you correctly you try to interpolate a surface through a set of points. I don't think the plot_surface is the correct function for this. But correct me if I'm wrong. I think you should look for interpolation tools, probably those in scipy.interpolate. The result of the interpolation can then be plotted using plot_surface.
plot_surface is able to plot a grid (with z values) in 3D space based on x, y coordinates. The arrays of x and y are those created by numpy.meshgrid.
example of plot_surface:
import pylab as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
plt.ion()
x = np.arange(0,np.pi, 0.1)
y = x.copy()
z = np.sin(x).repeat(32).reshape(32,32)
X, Y = np.meshgrid(x,y)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X,Y,z, cmap=plt.cm.jet, cstride=1, rstride=1)