Python - multidimensional array syntax - numpy

I am came across numpyand am trying to understand the proper syntax for building multidimensional arrays. For instance:
numpy.asarray([[1.,2], [3,4], [5, 6]])
prints:
[[ 1. 2.]
[ 3. 4.]
[ 5. 6.]]
while:
numpy.asarray([[1 ,2], [3, 4], [5, 6]])
prints:
[[1 2]
[3 4]
[5 6]]
that . is an odd syntax element.
what is it doing exactly?

np.array deduces the array shape from the nesting of the [], and dtype from the nature of the elements. If at least one element is a Python float, the whole array is float:
In [178]: x=np.array([1, 2, 3.0]) # 1d float
In [179]: x.shape
Out[179]: (3,)
In [180]: x.dtype
Out[180]: dtype('float64')
if all elements are integer - the array is also int
In [182]: x=np.array([[1, 2],[3, 4]]) # 2d int
In [183]: x.shape
Out[183]: (2, 2)
In [184]: x.dtype
Out[184]: dtype('int32')
You can also set the dtype explicitly, e.g.
In [185]: x=np.array([[1, 2],[3, 4]], dtype=np.float32)
In [186]: x
Out[186]:
array([[ 1., 2.],
[ 3., 4.]], dtype=float32)
In [187]: x.dtype
Out[187]: dtype('float32')

Related

Vectorized LU decomposition solve for multiple b

I'm preprocessing a quadratic matrix A of shape (n,n) with scipy's LU decomposition and then solve over and over again for multiple right hand sides B of shape (...,n). But scipy.linalg.lu_solve only accepts a vector for b, not a matrix like (m,n) or (k,m,n).
How can I wrap lu_solve to work for arguments of shape (...,n)? Numpy's linalg.solve would accept multiple b, but does not allow for separated LU factor and solve operation.
It is not mentioned in the documenation of lu_solve, but in fact b can contain multiple vectors. If A has shape (n, n), then b can have shape (n, m). For example,
In [44]: A
Out[44]:
array([[ 1.01, 0.02, -0.01],
[ 0.02, 1.04, -0.02],
[-0.01, -0.02, 1.01]])
In [45]: b
Out[45]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [46]: lu = lu_factor(A)
In [47]: x = lu_solve(lu, b)
In [48]: x
Out[48]:
array([[ 0. , 0.98113208, 1.96226415, 2.94339623],
[ 4. , 4.96226415, 5.9245283 , 6.88679245],
[ 8. , 9.01886792, 10.03773585, 11.05660377]])
In [49]: A.dot(x)
Out[49]:
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]])
Higher dimensional b must have shape (n, ...). Note that for shapes with more than two dimensions, testing the result with A.dot(x) will not work, because the shape of x will not be compatible with NumPy's matrix multiplication. For example, here B has shape (3, 2, 5):
In [40]: A
Out[40]:
array([[ 1.01, 0.02, -0.01],
[ 0.02, 1.04, -0.02],
[-0.01, -0.02, 1.01]])
In [41]: B = np.random.rand(3, 2, 5)
In [42]: lu = lu_factor(A)
In [43]: x = lu_solve(lu, B)
In [44]: x.shape
Out[44]: (3, 2, 5)
In [45]: xx = np.moveaxis(x, 0, 1)
In [46]: np.allclose(A.dot(xx), B)
Out[46]: True

How to make mask for variable length sequences, which are then padded, in tensorflow2 for RNN

Trying implement a mask for a sequence of timeperiods, with zero-padding, to an LSTM network.
Each sequence of timeperiods is of varying length, hence requiring padding & masking.
I am trying to model sequences of length 96(timeperiods), and features=33. Simplified data (7 timeperiods and 3 features) are shown:
example state at a timeperiod = [4, 2, 9] at time0(t0)
example sequence = [[2, 3, 6], [1, 6, 8], [2, 9, 4], [2, 7, 3]] at t(0), t(1), t(2), t(3)
example_padded1 = [[2, 3, 6], [1, 6, 8], [2, 9, 4], [2, 7, 3], 0, 0, 0] at t(0) to t(6)
example_padded2 = [[2, 6, 0], [1, 6, 3], [2, 9, 7], [2, 7, 3], 0., 0., 0.]
example_padded3 = [[2, 6, 0], [5, 8, 3], [9, 4, 7], [2, 5, 3], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]
Submitting each example sequence to:
seq = example_padded1
masking = layers.Masking(mask_value=0)
masked_output = masking(seq)
print(masked_output._keras_mask)
Gives Errors:
padded1 error: InvalidArgumentError: cannot compute Pack as input #2(zero-based) was expected to be a
float tensor but is a int32 tensor [Op:Pack] name: packed
padded2 error: InvalidArgumentError: Shapes of all inputs must match:
values[0].shape = [3] != values[2].shape = [] [Op:Pack] name: packed
padded3 error: Error: 'list' object has no attribute 'dtype' when
when checking for mask_value = 0
I then added an input layer to define shape of a sequence:
seq_len, n_features = 7, 3
inp = Input(shape=(seq_len, n_features))
masking = layers.Masking(mask_value=0, input_shape=inp)
masked_output2 = masking(seq)
print(masked_output2._keras_mask)
But got error:
TypeError: Cannot iterate over a tensor with unknown first dimension.
(Python 3.8, TF2)
Have also been trying Embedding, but that seems even more problematical
How to implement a mask for variable length sequences, which are then padded?
Have possibly resolved this. Problem lies in how I am building the array of sequences.
Each of my sequences consists of:
[array([1,2,3]),array([2,3,4]), array([4,5,6]), array([0,0,0]), , array([0,0,0]), , array([0,0,0]),, array([0,0,0])]
a list of arrays...
when it should be a single array, like:
array([[1,2,3], [2,3,4], [4,5,6], [0,0,0], [0,0,0], [0,0,0], [0,0,0]])

NumPy: How to calulate piecewise linear interpolant on multiple axes

Given the following ndarray t -
In [26]: t.shape
Out[26]: (3, 3, 2)
In [27]: t
Out[27]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]]])
this piecewise linear interpolant for the points t[:, 0, 0] can evaluated for [0 , 0.66666667, 1.33333333, 2.] as follows using numpy.interp -
In [38]: x = np.linspace(0, t.shape[0]-1, 4)
In [39]: x
Out[39]: array([0. , 0.66666667, 1.33333333, 2. ])
In [30]: xp = np.arange(t.shape[0])
In [31]: xp
Out[31]: array([0, 1, 2])
In [32]: fp = t[:,0,0]
In [33]: fp
Out[33]: array([ 0, 6, 12])
In [40]: np.interp(x, xp, fp)
Out[40]: array([ 0., 4., 8., 12.])
How can all the interpolants be efficiently calculated and returned together for all values of fp -
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 4, 5],
[ 6, 7],
[ 8, 9]],
[[ 8, 9],
[10, 11],
[12, 13]],
[[12, 13],
[14, 15],
[16, 17]]])
As the interpolation is 1d with changing y values it must be run for each 1d slice of t. It's probably faster to loop explicitly but neater to loop using np.apply_along_axis
import numpy as np
t = np.arange( 18 ).reshape(3,3,2)
x = np.linspace( 0, t.shape[0]-1, 4)
xp = np.arange(t.shape[0])
def interfunc( arr ):
""" Function interpolates a 1d array. """
return np.interp( x, xp, arr )
np.apply_along_axis( interfunc, 0, t ) # apply function along axis 0
""" Result
array([[[ 0., 1.],
[ 2., 3.],
[ 4., 5.]],
[[ 4., 5.],
[ 6., 7.],
[ 8., 9.]],
[[ 8., 9.],
[10., 11.],
[12., 13.]],
[[12., 13.],
[14., 15.],
[16., 17.]]]) """
With explicit loops
result = np.zeros((4,3,2))
for c in range(t.shape[1]):
for p in range(t.shape[2]):
result[:,c,p] = np.interp( x, xp, t[:,c,p])
On my machine the second option runs in half the time.
Edit to use np.nditer
As the result and the parameter have different shapes I seem to have to create two np.nditer objects one for the parameter and one for the result. This is my first attempt to use nditer for anything so it could be over complicated.
def test( t ):
ts = t.shape
result = np.zeros((ts[0]+1,ts[1],ts[2]))
param = np.nditer( [t], ['external_loop'], ['readonly'], order = 'F')
with np.nditer( [result], ['external_loop'], ['writeonly'], order = 'F') as res:
for p, r in zip( param, res ):
r[:] = interfunc(p)
return result
It's slightly slower than the explicit loops and less easy to follow than either of the other solutions.
As requested by #Tis Chris, here is a solution using np.nditer with the multi_index flag but I prefer the explicit nested for loops method above because it is 10% faster
In [29]: t = np.arange( 18 ).reshape(3,3,2)
In [30]: ax0old = np.arange(t.shape[0])
In [31]: ax0new = np.linspace(0, t.shape[0]-1, 4)
In [32]: tnew = np.zeros((len(ax0new), t.shape[1], t.shape[2]))
In [33]: it = np.nditer(t[0], flags=['multi_index'])
In [34]: for _ in it:
...: tnew[:, it.multi_index[0], it.multi_index[1]] = np.interp(ax0new, ax0old, t[:, it.multi_
...: index[0], it.multi_index[1]])
...:
In [35]: tnew
Out[35]:
array([[[ 0., 1.],
[ 2., 3.],
[ 4., 5.]],
[[ 4., 5.],
[ 6., 7.],
[ 8., 9.]],
[[ 8., 9.],
[10., 11.],
[12., 13.]],
[[12., 13.],
[14., 15.],
[16., 17.]]])
You could try scipy.interpolate.interp1d:
from scipy.interpolate import interp1d
import numpy as np
t = np.array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]]])
# for the first slice
f = interp1d(np.arange(t.shape[0]), t[..., 0], axis=0)
# returns a function which you call with values within range np.arange(t.shape[0])
# data used for interpolation
t[..., 0]
>>> array([[ 0, 2, 4],
[ 6, 8, 10],
[12, 14, 16]])
f(1)
>>> array([ 6., 8., 10.])
f(1.5)
>>> array([ 9., 11., 13.])

How to enlarge a tensor(duplicate value) in tensorflow?

I am new in TensorFlow. I am trying to implement the global_context extraction in this paper https://arxiv.org/abs/1506.04579, which is actually an average pooling over the whole feature map, then duplicate the 1x1 feature map back to the original size. The illustration is as below
Specifically, the expected operation is following.
input: [N, 1, 1, C] tensor, where N is the batch size and C is the number of channel
output: [N, H, W, C] tensor, where H, W is the hight and width of original feature map, and all the H * W values of output are the same as the 1x1 input.
For example,
[[1, 1, 1]
1 -> [1, 1, 1]
[1, 1, 1]]
I have no idea how to do this using TensorFlow. tf.image.resize_images requires 3 channels, and tf.pad cannot pad constant value other than zero.
tf.tile may help you
x = tf.constant([[1, 2, 3]]) # shape (1, 3)
y = tf.tile(x, [3, 1]) # shape (3, 3)
y_ = tf.tile(x, [3, 2]) # shape (3, 6)
with tf.Session() as sess:
a, b, c = sess.run([x, y, y_])
>>>a
array([[1, 2, 3]], dtype=int32)
>>>b
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]], dtype=int32)
>>>c
array([[1, 2, 3, 1, 2, 3],
[1, 2, 3, 1, 2, 3],
[1, 2, 3, 1, 2, 3]], dtype=int32)
tf.tile(input, multiples, name=None)
multiples means how many times you want to repeat in this axis
in y repeat axis0 3 times
in y_ repeat axis0 3 times, and axis1 2 times
you may need to use tf.expand_dim first
yes it accept dynamic shape
x = tf.placeholder(dtype=tf.float32, shape=[None, 4])
x_shape = tf.shape(x)
y = tf.tile(x, [3 * x_shape[0], 1])
with tf.Session() as sess:
x_ = np.array([[1, 2, 3, 4]])
a = sess.run(y, feed_dict={x:x_})
>>>a
array([[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]], dtype=float32)

Fill a tensor with a value that is not a scalar

I am trying to move a list of points to the origin using tensorflow the best way to do it mathematically is to find the centroid of the list of points then subtract the list of points by that centroid.
The problems: The number of rows contained in the point list is unknown until runtime.
Code so far:
import tensorflow as tf
example_point_list = tf.constant([[3., 3.], [.2, .2], [.1, .1]]) // but with any number of points
centroid = tf.reduce_mean(example_point_list, 0)
// subtract???
origin_point_list = tf.sub(example_point_list, centroid)
The problem is that subtract works on an element by element basis so I have to create a centroid tensor with the same number of rows as the point list but there are no methods that do that.
(to put it in math terms)
A = [[1, 1],
[2, 2]
[3, 3]]
B = avg(A) // [2, 2]
// step I need to do but do not know how to do it
B -> B1 // [[2, 2], [2, 2], [2, 2]]
Result = A - B1
Any help is appreciated!
Because of broadcasting, you don't need to tile the rows. In fact, it's more efficient to not tile them and subtract vector from matrix directly. In your case it would look like this
tf.reset_default_graph()
example_points = np.array([[1, 1], [2, 2], [3, 3]], dtype=np.float32)
example_point_list = tf.placeholder(tf.float32)
centroid = tf.reduce_mean(example_point_list, 0)
result = example_point_list - centroid
sess = tf.InteractiveSession()
sess.run(result, feed_dict={example_point_list: example_points})
result
array([[-1., -1.],
[ 0., 0.],
[ 1., 1.]], dtype=float32)
If you really want to tile the centroid vector explicitly, you could do it using shape operator which can get shape during runtime
tf.reset_default_graph()
example_point_list0 = np.array([[1, 1], [2, 2], [3, 3]], dtype=np.float32)
example_point_list = tf.placeholder(tf.float32)
# get number of examples from the array: [3]
num_examples = tf.slice(tf.shape(example_points), [0], [1])
# reshape [3] into 3
num_examples_flat = tf.reshape(num_examples, ())
centroid = tf.reduce_mean(example_point_list, 0)
# reshape centroid vector [2, 2] into matrix [[2, 2]]
centroid_matrix = tf.reshape(centroid, [1, -1])
# assemble 3 into vector of dimensions to tile: [3, 1]
tile_shape = tf.pack([num_examples_flat, 1])
# tile [[2, 2]] into [[2, 2], [2, 2], [2, 2]]
centroid_tiled = tf.tile(centroid_matrix, tile_shape)
sess = tf.InteractiveSession()
sess.run(centroid_tiled, feed_dict={example_point_list: example_point_list0})
result
array([[ 2., 2.],
[ 2., 2.],
[ 2., 2.]], dtype=float32)