numpy: Broadcasting a vector horizontally - numpy

I have a 1-D array in numpy v. I'd like to copy it to make a matrix with each row being a copy of v. That's easy: np.broadcast_to(v, desired_shape).
However, if I'd like to treat v as a column vector, and copy it to make a matrix with each column being a copy of v, I can't find a simple way to do it. Through trial and error, I'm able to do this:
np.broadcast_to(v.reshape(v.shape[0], 1), desired_shape)
While that works, I can't claim to understand it (even though I wrote it!).
Part of the problem is that numpy doesn't seem to have a concept of a column vector (hence the reshape hack instead of what in math would just be .T).
But, a deeper part of the problem seems to be that broadcasting only works vertically, not horizontally. Or perhaps a more correct way to say it would be: broadcasting only works on the higher dimensions, not the lower dimensions. I'm not even sure if that's correct.
In short, while I understand the concept of broadcasting in general, when I try to use it for particular applications, like copying the col vector to make a matrix, I get lost.
Can you help me understand or improve the readability of this code?

https://en.wikipedia.org/wiki/Transpose - this article on Transpose talks only of transposing a matrix.
https://en.wikipedia.org/wiki/Row_and_column_vectors -
a column vector or column matrix is an m × 1 matrix
a row vector or row matrix is a 1 × m matrix
You can easily create row or column vectors(matrix):
In [464]: np.array([[1],[2],[3]]) # column vector
Out[464]:
array([[1],
[2],
[3]])
In [465]: _.shape
Out[465]: (3, 1)
In [466]: np.array([[1,2,3]]) # row vector
Out[466]: array([[1, 2, 3]])
In [467]: _.shape
Out[467]: (1, 3)
But in numpy the basic structure is an array, not a vector or matrix.
[Array in Computer Science] - Generally, a collection of data items that can be selected by indices computed at run-time
A numpy array can have 0 or more dimensions. In contrast in MATLAB matrix has 2 or more dimensions. Originally a 2d matrix was all that MATLAB had.
To talk meaningfully about a transpose you have to have at least 2 dimensions. One may have size one, and map onto a 1d vector, but it still a matrix, a 2d object.
So adding a dimension to a 1d array, whether done with reshape or [:,None] is NOT a hack. It is a perfect valid and normal numpy operation.
The basic broadcasting rules are:
a dimension of size 1 can be changed to match the corresponding dimension of the other array
a dimension of size 1 can be added automatically on the left (front) to match the number of dimensions.
In this example, both steps apply: (5,)=>(1,5)=>(3,5)
In [458]: np.broadcast_to(np.arange(5), (3,5))
Out[458]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In this, we have to explicitly add the size one dimension on the right (end):
In [459]: np.broadcast_to(np.arange(5)[:,None], (5,3))
Out[459]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
np.broadcast_arrays(np.arange(5)[:,None], np.arange(3)) produces two (5,3) arrays.
np.broadcast_arrays(np.arange(5), np.arange(3)[:,None]) makes (3,5).
np.broadcast_arrays(np.arange(5), np.arange(3)) produces an error because it has no way of determining whether you want (5,3) or (3,5) or something else.

Broadcasting always adds new dimensions to the left because it'd be ambiguous and bug-prone to try to guess when you want new dimensions on the right. You can make a function to broadcast to the right by reversing the axes, broadcasting, and reversing back:
def broadcast_rightward(arr, shape):
return np.broadcast_to(arr.T, shape[::-1]).T

Related

Why is broadcasting done by aligning axes backwards

Numpy's broadcasting rules have bitten me once again and I'm starting to feel there may be a way of thinking about this
topic that I'm missing.
I'm often in situations as follows: the first axis of my arrays is reserved for something fixed, like the number of samples. The second axis could represent different independent variables of each sample, for some arrays, or it could be not existent when it feels natural that there only be one quantity attached to each sample in an array. For example, if the array is called price, I'd probably only use one axis, representing the price of each sample. On the other hand, a second axis is sometimes much more natural. For example, I could use a neural network to compute a quantity for each sample, and since neural networks can in general compute arbitrary multi valued functions, the library I use would in general return a 2d array and make the second axis singleton if I use it to compute a single dependent variable. I found this approach to use 2d arrays is also more amenable to future extensions of my code.
Long story short, I need to make decisions in various places of my codebase whether to store array as (1000,) or (1000,1), and changes of requirements occasionally make it necessary to switch from one format to the other.
Usually, these arrays live alongside arrays with up to 4 axes, which further increases the pressure to sometimes introduce singleton second axis, and then have the third axis represent a consistent semantic quality for all arrays that use it.
The problem now occurs when I add my (1000,) or (1000,1) arrays, expecting to get (1000,1), but get (1000,1000) because of implicit broadcasting.
I feel like this prevents giving semantic meaning to axes. Of course I could always use at least two axes, but that leads to the question where to stop: To be fail safe, continuing this logic, I'd have to always use arrays of at least 6 axes to represent everything.
I'm aware this is maybe not the best technically well defined question, but does anyone have a modus operandi that helps them avoid these kind of bugs?
Does anyone know the motivations of the numpy developers to align axes in reverse order for broadcasting? Was computational efficiency or another technical reason behind this, or a model of thinking that I don't understand?
In MATLAB broadcasting, a jonny-come-lately to this game, expands trailing dimensions. But there the trailing dimensions are outermost, that is order='F'. And since everything starts as 2d, this expansion only occurs when one array is 3d (or larger).
https://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/
explains, and gives a bit of history. My own history with the language is old enough, that the ma_expanded = ma(ones(3,1),:) style of expansion is familiar. octave added broadcasting before MATLAB.
To avoid ambiguity, broadcasting expansion can only occur in one direction. Expanding in the direction of the outermost dimension makes seems logical.
Compare (3,) expanded to (1,3) versus (3,1) - viewed as nested lists:
In [198]: np.array([1,2,3])
Out[198]: array([1, 2, 3])
In [199]: np.array([[1,2,3]])
Out[199]: array([[1, 2, 3]])
In [200]: (np.array([[1,2,3]]).T).tolist()
Out[200]: [[1], [2], [3]]
I don't know if there are significant implementation advantages. With the striding mechanism, adding a new dimension anywhere is easy. Just change the shape and strides, adding a 0 for the dimension that needs to be 'replicated'.
In [203]: np.broadcast_arrays(np.array([1,2,3]),np.array([[1],[2],[3]]),np.ones((3,3)))
Out[203]:
[array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]), array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]]), array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])]
In [204]: [x.strides for x in _]
Out[204]: [(0, 8), (8, 0), (24, 8)]

dimension of a tensor created by tf.zeros(n)

I'm confused by the dimension of a tensor created with tf.zeros(n). For instance, if I write: tf.zeros(6).eval.shape, this will return me (6, ). What dimension is this? is this a matrix of 6 rows and arbitrary # of columns? Or is this a matrix of 6 columns with arbitrary # of rows?
weights = tf.random_uniform([3, 6], minval=-1, maxval=1, seed=1)- this is 3X6 matrix
b=tf.zeros(6).eval- I'm not sure what dimension this is.
Why I am able to add the two like weights+b? If I understand correctly, in order for the two to be added, b needs to be 3X1 dimension.
why i am able to add the two like weights+b?
Operator + is the same as using tf.add() (<obj>.__add__() calls the tf.add() or tf.math.add()) and if you read the documentation it says:
NOTE: math.add supports broadcasting. AddN does not. More about broadcasting here
Now I'm quoting from numpy broadcasting rules (which are the same for tensorflow):
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
So you're able to add two tensors with different shapes because they have the same trailing dimensions. If you change the dimension of your weights tensor to, let's say [3, 5], you will get InvalidArgumentError exception because trailing dimensions differ.
(6,) is python syntax for a tuple with 6 as a single element. Hence the shape here is a uni-dimensional vector of length 6.

Why the second dimension in a Numpy array is empty?

Why the output here
array = np.arange(3)
array.shape
is
(3,)
and not
(1,3)
What does the missing dimension means or equals?
In case there's confusion, (3,) doesn't mean there's a missing dimension. The comma is part of the standard Python notation for a single element tuple. Shapes (1,3), (3,), and (3,1) are distinct,
While they can contain the same 3 elements, their use in calculations (broadcasting) is different, their print format is different, and their list equivalent is different:
In [21]: np.array([1,2,3])
Out[21]: array([1, 2, 3])
In [22]: np.array([1,2,3]).tolist()
Out[22]: [1, 2, 3]
In [23]: np.array([1,2,3]).reshape(1,3).tolist()
Out[23]: [[1, 2, 3]]
In [24]: np.array([1,2,3]).reshape(3,1).tolist()
Out[24]: [[1], [2], [3]]
And we don't have to stop at adding just one singleton dimension:
In [25]: np.array([1,2,3]).reshape(1,3,1).tolist()
Out[25]: [[[1], [2], [3]]]
In [26]: np.array([1,2,3]).reshape(1,3,1,1).tolist()
Out[26]: [[[[1]], [[2]], [[3]]]]
In numpy an array can have 0, 1, 2 or more dimensions. 1 dimension is just as logical as 2.
In MATLAB a matrix always has 2 dim (or more), but it doesn't have to be that way. Strictly speaking MATLAB doesn't even have scalars. An array with shape (3,) is missing a dimension only if MATLAB is taken as the standard.
numpy is built on Python which as scalars, and lists (which can nest). How many dimensions does a Python list have?
If you want to get into history, MATLAB was developed as a front end to a set of Fortran linear algebra routines. Given the problems those routines solved the concept of matrix with 2 dimensions, and row vs column vectors made sense. It wasn't until version 3.something that MATLAB was generalized to allow more than 2 dimensions (in the late 1990s).
numpy is based on several attempts to provide arrays to Python (e.g. numeric). Those developers took a more general approach to arrays, one where 2d was an artificial constraint. That has precedence in computer languages and mathematics (and physics). APL was developed in the 1960s, first as a mathematical notation, and then as a computer language. Like numpy its arrays can be 0d or higher. (Since I used APL before I used MATLAB, the numpy approach feels quite natural.)
In APL there aren't separate lists or tuples. So the shape of an array, rho A is itself an array, and rho rho A is the number of dimensions of A, also called the rank.
http://docs.dyalog.com/14.0/Dyalog%20APL%20Idioms.pdf

How to retain indices of a matrix while working on one of its submatrices?

I am trying to implement an algorithm that iteratively removes some rows and columns of a matrix and continues processing the remaining submatrix. However, I would like to know the index of a value in the original matrix rather than the remaining submatrix.
For example, assume that a matrix x is built using
x = np.arange(9).reshape(3, 3)
Now, I would like to find the index of the element that is equal to 8 in the submatrix defined below:
np.where(x[1:, 1:] == 8)
By default, numpy returns (array[1], array[1]) because it is finding the element in the sliced submatrix. What I like to be returned instead is (array[2], array[2]), which is the index of 8 in the original matrix.
What is an efficient solution to this problem?
P.S.
The submatrix may be built arbitrarily. For example, I may need to keep rows, 0 and 1, but columns 0 and 2.
Each submatrix may be sliced in next iterations to make a smaller submatrix. I still would like to have access to the index in the original matrix. In other words, I am looking for a solution that works on submatrices of submatrices as well.
I recently learned about indexing with arrays where submatrices of a matrix can be selected using another numpy array. I think what I can do to solve the problem is to map indices of the submatrix to elements of the indexing array.
For example, in the example above, the submatrix can be defined like this:
row_idx = np.array([1, 2])
col_idx = np.array([1, 2])
np.where(x[row_idx[:, None], col_idx] == 8)
This will still return the same (array[1], array[1]) output, but I can use these indices to lookup the elements of row_idx and col_idx in order to find the corresponding indices in the original matrix, i.e. row_idx[1] and col_idx[1].

How does the gradient of the sum trick work to get maxpooling positions in keras?

The keras examples directory contains a lightweight version of a stacked what-where autoencoder (SWWAE) which they train on MNIST data. (https://github.com/fchollet/keras/blob/master/examples/mnist_swwae.py)
In the original SWWAE paper, the authors compute the what and where using soft functions. However, in the keras implementation, they use a trick to get these locations. I would like to understand this trick.
Here is the code of the trick.
def getwhere(x):
''' Calculate the 'where' mask that contains switches indicating which
index contained the max value when MaxPool2D was applied. Using the
gradient of the sum is a nice trick to keep everything high level.'''
y_prepool, y_postpool = x
return K.gradients(K.sum(y_postpool), y_prepool) # How exactly does this line work?
Where y_prepool is a MxN matrix and y_postpool is a M/2 x N/2 matrix (lets assume canonical pooling of a size 2 pixels).
I have verified that the output of getwhere() is a bed of nails matrix where the nails indicate the position of the max (the local argmax if you will).
Can someone construct a small example demonstrating how getwhere works using this "Trick?"
Lets focus on the simplest example, without really talking about convolutions, say we have a vector
x = [1 4 2]
which we max-pool over (with a single, big window), we get
mx = 4
mathematically speaking, it is:
mx = x[argmax(x)]
now, the "trick" to recover one hot mask used by pooling is to do
magic = d mx / dx
there is no gradient for argmax, however it "passes" the corresponding gradient to an element in a vector at the location of maximum element, so:
d mx / dx = [0/dx[1] dx[2]/dx[2] 0/dx[3]] = [0 1 0]
as you can see, all the gradient for non-maximum elements are zero (due to argmax), and "1" appears at the maximum value because dx/x = 1.
Now for "proper" maxpool you have many pooling regions, connected to many input locations, thus taking analogous gradient of sum of pooled values, will recover all the indices.
Note however, that this trick will not work if you have heavily overlapping kernels - you might end up with bigger values than "1". Basically if a pixel is max-pooled by K kernels, than it will have value K, not 1, for example:
[1 ,2, 3]
x = [13,3, 1]
[4, 2, 9]
if we max pool with 2x2 window we get
mx = [13,3]
[13,9]
and the gradient trick gives you
[0, 0, 1]
magic = [2, 0, 0]
[0, 0, 1]