NumPy: How to calulate piecewise linear interpolant on multiple axes - numpy

Given the following ndarray t -
In [26]: t.shape
Out[26]: (3, 3, 2)
In [27]: t
Out[27]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]]])
this piecewise linear interpolant for the points t[:, 0, 0] can evaluated for [0 , 0.66666667, 1.33333333, 2.] as follows using numpy.interp -
In [38]: x = np.linspace(0, t.shape[0]-1, 4)
In [39]: x
Out[39]: array([0. , 0.66666667, 1.33333333, 2. ])
In [30]: xp = np.arange(t.shape[0])
In [31]: xp
Out[31]: array([0, 1, 2])
In [32]: fp = t[:,0,0]
In [33]: fp
Out[33]: array([ 0, 6, 12])
In [40]: np.interp(x, xp, fp)
Out[40]: array([ 0., 4., 8., 12.])
How can all the interpolants be efficiently calculated and returned together for all values of fp -
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 4, 5],
[ 6, 7],
[ 8, 9]],
[[ 8, 9],
[10, 11],
[12, 13]],
[[12, 13],
[14, 15],
[16, 17]]])

As the interpolation is 1d with changing y values it must be run for each 1d slice of t. It's probably faster to loop explicitly but neater to loop using np.apply_along_axis
import numpy as np
t = np.arange( 18 ).reshape(3,3,2)
x = np.linspace( 0, t.shape[0]-1, 4)
xp = np.arange(t.shape[0])
def interfunc( arr ):
""" Function interpolates a 1d array. """
return np.interp( x, xp, arr )
np.apply_along_axis( interfunc, 0, t ) # apply function along axis 0
""" Result
array([[[ 0., 1.],
[ 2., 3.],
[ 4., 5.]],
[[ 4., 5.],
[ 6., 7.],
[ 8., 9.]],
[[ 8., 9.],
[10., 11.],
[12., 13.]],
[[12., 13.],
[14., 15.],
[16., 17.]]]) """
With explicit loops
result = np.zeros((4,3,2))
for c in range(t.shape[1]):
for p in range(t.shape[2]):
result[:,c,p] = np.interp( x, xp, t[:,c,p])
On my machine the second option runs in half the time.
Edit to use np.nditer
As the result and the parameter have different shapes I seem to have to create two np.nditer objects one for the parameter and one for the result. This is my first attempt to use nditer for anything so it could be over complicated.
def test( t ):
ts = t.shape
result = np.zeros((ts[0]+1,ts[1],ts[2]))
param = np.nditer( [t], ['external_loop'], ['readonly'], order = 'F')
with np.nditer( [result], ['external_loop'], ['writeonly'], order = 'F') as res:
for p, r in zip( param, res ):
r[:] = interfunc(p)
return result
It's slightly slower than the explicit loops and less easy to follow than either of the other solutions.

As requested by #Tis Chris, here is a solution using np.nditer with the multi_index flag but I prefer the explicit nested for loops method above because it is 10% faster
In [29]: t = np.arange( 18 ).reshape(3,3,2)
In [30]: ax0old = np.arange(t.shape[0])
In [31]: ax0new = np.linspace(0, t.shape[0]-1, 4)
In [32]: tnew = np.zeros((len(ax0new), t.shape[1], t.shape[2]))
In [33]: it = np.nditer(t[0], flags=['multi_index'])
In [34]: for _ in it:
...: tnew[:, it.multi_index[0], it.multi_index[1]] = np.interp(ax0new, ax0old, t[:, it.multi_
...: index[0], it.multi_index[1]])
...:
In [35]: tnew
Out[35]:
array([[[ 0., 1.],
[ 2., 3.],
[ 4., 5.]],
[[ 4., 5.],
[ 6., 7.],
[ 8., 9.]],
[[ 8., 9.],
[10., 11.],
[12., 13.]],
[[12., 13.],
[14., 15.],
[16., 17.]]])

You could try scipy.interpolate.interp1d:
from scipy.interpolate import interp1d
import numpy as np
t = np.array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]]])
# for the first slice
f = interp1d(np.arange(t.shape[0]), t[..., 0], axis=0)
# returns a function which you call with values within range np.arange(t.shape[0])
# data used for interpolation
t[..., 0]
>>> array([[ 0, 2, 4],
[ 6, 8, 10],
[12, 14, 16]])
f(1)
>>> array([ 6., 8., 10.])
f(1.5)
>>> array([ 9., 11., 13.])

Related

How to build a one hot encoder from a vector of words in TF

I have data where each row in the batch is a fixed sized vector of mutually exclusive "words". Ex:
batch:
[1, 2, 3]
[4, 5, 6]
[4, 7, 8]
global dictionary:
{
0: {1, 4}
1: {2, 5, 7}
2: {3, 6, 8}
}
In the above example, for cols 0, 1, and 2...we have vocabs of [1, 4], [2, 5, 7], and [3, 6, 8] which are mutually exclusive. I also have a giant dictionary of all vocab words for all the columns.
How do I use that dictionary of dict(column_idx) -> {vocab set} to build a one hot encoder in tensorflow?
For the above example I would want an output of:
[1, 0, 1, 0, 0, 1, 0, 0]
[0, 1, 0, 1, 0, 0, 1, 0]
[0, 1, 0, 0, 1, 0, 0, 1]
with the one hot mappings being:
[1, 4, 2, 5, 7, 3, 6, 8]
The tricky part is that each column needs to be encoded differently. If you break the problem down to encoding one column at a time then it is mostly about deconstructing, encoding and recombining the batched input. Here is a working example in numpy:
import numpy as np
batch = np.array([
[1, 2, 3],
[4, 5, 6],
[4, 7, 8]])
vocab = {
0: {1, 4},
1: {2, 5, 7},
2: {3, 6, 8}
}
# Construct a one hot encoding matrix for each vocabulary
# Here we are using identity matrix as the getter
vocab_eye = {k: np.eye(len(v)) for k, v in vocab.items()}
# Construct a converter to indices, a map from value to index if you will
vocab_map = {k:np.vectorize(list(v).index) for k, v in vocab.items()}
# Deconstruct, encode and merge each column
encoded_cols = [vocab_eye[i][vocab_map[i](col[:,0])]
for i, col in enumerate(np.split(batch, batch.shape[1], axis=1))]
encoded_batch = np.concatenate(encoded_cols, axis=1)
# Outputs
# array([[1., 0., 1., 0., 0., 0., 1., 0.],
# [0., 1., 0., 1., 0., 0., 0., 1.],
# [0., 1., 0., 0., 1., 1., 0., 0.]])
You can perhaps optimise the encoding by already giving split data. So instead of batching everything and splitting, just give 3 separate inputs to the data pipeline in Tensorflow. Then you can imagine having 3 encoding paths one for each feature set, which then gets merged.

Sparse Matrix One 1 for Every Row

I would like to generate a random matrix MxN where every rows has just a single one in a random position.
For example, I would a matrix like this:
Out[3]:
array([[1, 0, 0],
[0, 1, 0],
[0, 1, 0],
[1, 0, 0],
[0, 0, 1]])
I tried with
M = 5
N = 3
arr = np.array([1] + [0] * (N-1))
arr = np.tile(arr,(M,1))
np.random.shuffle(arr)
But it gives:
Out[75]:
array([[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0]])
There may be a more elegant way to do this, but works:
def randOne():
M = 5
N = 3
arr = np.zeros((M, N))
for row in range(M):
arr[row, np.random.randint(N)] = 1
return arr
>>> randOne() array([[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.],
[ 1., 0., 0.]])
OR
Yup, there is a more elegant way to do this ;)
def randOne2(M=5, N=3):
arr = np.zeros((M, N), dtype=np.int8)
arr[np.arange(M),np.random.randint(0,N,M)] = 1
return arr
>>> randOne2()
array([[0, 0, 1],
[1, 0, 0],
[1, 0, 0],
[0, 1, 0],
[1, 0, 0]], dtype=int8)

Sum rows of a 2D array with a specific stepsize - NumPy

This is a quick one. I am wondering if there is a better way to express the following lines (besides using a short loop):
energy = np.zeros((4, signal.shape[1]))
energy[0::4, 0:] = np.sum(signal[0::4, :], axis=0)
energy[1::4, 0:] = np.sum(signal[1::4, :], axis=0)
energy[2::4, 0:] = np.sum(signal[2::4, :], axis=0)
energy[3::4, 0:] = np.sum(signal[3::4, :], axis=0)
Reshape to split the first axis into two and then sum along the first of those two, like so -
energy = signal.reshape(-1,4,signal.shape[1]).sum(0)
Sample run -
In [327]: np.random.seed(0)
In [328]: signal = np.random.randint(0,9,(8,5))
In [329]: energy = np.zeros((4, signal.shape[1]))
...: energy[0::4, 0:] = np.sum(signal[0::4, :], axis=0)
...: energy[1::4, 0:] = np.sum(signal[1::4, :], axis=0)
...: energy[2::4, 0:] = np.sum(signal[2::4, :], axis=0)
...: energy[3::4, 0:] = np.sum(signal[3::4, :], axis=0)
In [330]: energy
Out[330]:
array([[ 13., 4., 6., 3., 10.],
[ 8., 5., 4., 7., 15.],
[ 7., 11., 11., 4., 13.],
[ 7., 8., 8., 5., 12.]])
In [331]: signal.reshape(-1,4,signal.shape[1]).sum(0)
Out[331]:
array([[13, 4, 6, 3, 10],
[ 8, 5, 4, 7, 15],
[ 7, 11, 11, 4, 13],
[ 7, 8, 8, 5, 12]])
For arrays with number of rows not necessarily a multiple of 4, here's the generic version -
m = signal.shape[0]
n = m//4
energy = signal[:n*4].reshape(n,4,-1).sum(0)
energy[:m%4] += signal[n*4:]

Select indices in tensorflow that fulfils a certain condition

I wish to select elements of a matrix where the coordinates of the elements in the matrix fulfil a certain condition. For example, a condition could be : (y_coordinate-x_coordinate) == -4
So, those elements whose coordinates fulfil this condition will be selected. How can I do this efficiently without looping through every element?
Perhaps you need tf.gather_nd:
iterSession = tf.InteractiveSession()
vals = tf.constant([[1,2,3], [4,5,6], [7,8,9]])
arr = tf.constant([[x, y] for x in range(3) for y in range(3) if -1 <= x - y <= 1])
arr.eval()
# >> array([[0, 0],
# >> [0, 1],
# >> [1, 0],
# >> [1, 1],
# >> [1, 2],
# >> [2, 1],
# >> [2, 2]], dtype=int32)
tf.gather_nd(vals, arr).eval()
# >> array([1, 2, 4, 5, 6, 8, 9], dtype=int32)
Or tf.boolean_mask:
iterSession = tf.InteractiveSession()
vals = tf.constant([[1,2,3], [4,5,6], [7,8,9]])
arr = tf.constant([[-1 <= x - y <= 1 for x in range(3)] for y in range(3)])
arr.eval()
# array([[ True, True, False],
# [ True, True, True],
# [False, True, True]], dtype=bool)
tf.boolean_mask(vals, arr).eval()
# array([ 1., 2., 4., 5., 6., 8., 9.], dtype=int32)

Correct use of Tensorflow tf.split function in SKFlow

There is a minimal example of an RNN in the Skflow documentation. The input data is a matrix with shape (4,5). Why is the data split according to the following function for input?:
def input_fn(X):
return tf.split(1, 5, X)
This function returns a list of 5 arrays with shape 4,1
[array([[ 2.],
[ 2.],
[ 3.],
[ 2.]], dtype=float32), array([[ 1.],
[ 2.],
[ 3.],
[ 4.]], dtype=float32), array([[ 2.],
[ 3.],
[ 1.],
[ 5.]], dtype=float32), array([[ 2.],
[ 4.],
[ 2.],
[ 4.]], dtype=float32), array([[ 3.],
[ 5.],
[ 1.],
[ 1.]], dtype=f
and, what is the difference/impact on the RNN between the above function, or defining the function like this? As both input functions run
def input_fn(X):
return tf.split(1, 1, X)
Which returns the following:
[[[ 1., 3., 3., 2., 1.],
[ 2., 3., 4., 5., 6.]]
Presented here:
testRNN(self):
random.seed(42)
import numpy as np
data = np.array(list([[2, 1, 2, 2, 3],
[2, 2, 3, 4, 5],
[3, 3, 1, 2, 1],
[2, 4, 5, 4, 1]]), dtype=np.float32)
# labels for classification
labels = np.array(list([1, 0, 1, 0]), dtype=np.float32)
# targets for regression
targets = np.array(list([10, 16, 10, 16]), dtype=np.float32)
test_data = np.array(list([[1, 3, 3, 2, 1], [2, 3, 4, 5, 6]]))
def input_fn(X):
return tf.split(1, 5, X)
# Classification
classifier = skflow.TensorFlowRNNClassifier(
rnn_size=2, cell_type='lstm', n_classes=2, input_op_fn=input_fn)
classifier.fit(data, labels)
classifier.weights_
classifier.bias_
predictions = classifier.predict(test_data)
self.assertAllClose(predictions, np.array([1, 0]))