How to change each element in an array to the mean of the array using NumPy? - numpy

I am new to Python. In one of my assignment question, part of the question requires us to compute the average of each element in a sub-matrix and replace each element with the mean using operators that's available in Numpy.
An example of the matrix could be
M = [[[1,2,3],[2,3,4]],[[3,4,5],[4,5,6]]]
Through some operations, it is expected to get a matrix like the following:
M = [[[2,2,2],[3,3,3]],[[4,4,4],[5,5,5]]]
I have looked at some numpy documentations and still haven't figured out, would really appreciate if someone can help.

You have a few different options here. All of them follow the same general idea. You have an MxNxL array and you want to apply a reduction operation along the last axis that will leave you with an MxN result by default. However, you want to broadcast that result across the same MxNxL shape you started with.
Numpy has a parameter in most reduction operations that allows you to keep the reduced dimension present in the output array, which will allow you to easily broadcast that result into the correct sized matrix. The parameter is called keepdims, you can read more in the documentation to numpy.mean.
Here are a few approaches that all take advantage of this.
Setup
avg = M.mean(-1, keepdims=1)
# array([[[2.],
# [3.]],
#
# [[4.],
# [5.]]])
Option 1
Assign to a view of the array. However, it will also coerce float averages to int, so cast your array to float first for precision if you want to do this.
M[:] = avg
Option 2
An efficient read only view using np.broadcast_to
np.broadcast_to(avg, M.shape)
Option 3
Broadcasted multiplication, more for demonstration than anything.
avg * np.ones(M.shape)
All will produce (same except for possibly the dtype):
array([[[2., 2., 2.],
[3., 3., 3.]],
[[4., 4., 4.],
[5., 5., 5.]]])

In one line of code:
M.mean(-1, keepdims=1) * np.ones(M.shape)

Related

Combining feature matrixes of different shapes into a single feature

A library that I am using only supports 1 feature matrix as an input.
Therefore, I would like to merge my two features into a single feature.
Feature #1: a simple float e.g. tensor([1.9])
Feature #2: categorical that I would like to one-hot encode tensor([0., 1., 0])
tensor([
[1.9, 0., 0.], # row 1 for float
[0., 1., 0.] # row 2 for OHE
])
My plan would be to take the 1x1 feature and the 3x1 feature merge them into a 3x2.
For the float row, I would always have the 2nd and 3rd entries zeroed out. <-- is there a better approach? e.g. should i use three 1.9's?
Would this method give the affect of training on both features simultaneously?
Yes, what you propose would work, in that the model would just learn to ignore the second and third indices. But since those are never used, you can just concatenate them directly, i.e.
tensor([1.9, 0., 1., 0.])
you don't need to "indicate" in any way to the model that the first value is a scalar and the rest operate as a one-hot encoding. The model will figure out the relevant features for the task you care about.

Why is broadcasting done by aligning axes backwards

Numpy's broadcasting rules have bitten me once again and I'm starting to feel there may be a way of thinking about this
topic that I'm missing.
I'm often in situations as follows: the first axis of my arrays is reserved for something fixed, like the number of samples. The second axis could represent different independent variables of each sample, for some arrays, or it could be not existent when it feels natural that there only be one quantity attached to each sample in an array. For example, if the array is called price, I'd probably only use one axis, representing the price of each sample. On the other hand, a second axis is sometimes much more natural. For example, I could use a neural network to compute a quantity for each sample, and since neural networks can in general compute arbitrary multi valued functions, the library I use would in general return a 2d array and make the second axis singleton if I use it to compute a single dependent variable. I found this approach to use 2d arrays is also more amenable to future extensions of my code.
Long story short, I need to make decisions in various places of my codebase whether to store array as (1000,) or (1000,1), and changes of requirements occasionally make it necessary to switch from one format to the other.
Usually, these arrays live alongside arrays with up to 4 axes, which further increases the pressure to sometimes introduce singleton second axis, and then have the third axis represent a consistent semantic quality for all arrays that use it.
The problem now occurs when I add my (1000,) or (1000,1) arrays, expecting to get (1000,1), but get (1000,1000) because of implicit broadcasting.
I feel like this prevents giving semantic meaning to axes. Of course I could always use at least two axes, but that leads to the question where to stop: To be fail safe, continuing this logic, I'd have to always use arrays of at least 6 axes to represent everything.
I'm aware this is maybe not the best technically well defined question, but does anyone have a modus operandi that helps them avoid these kind of bugs?
Does anyone know the motivations of the numpy developers to align axes in reverse order for broadcasting? Was computational efficiency or another technical reason behind this, or a model of thinking that I don't understand?
In MATLAB broadcasting, a jonny-come-lately to this game, expands trailing dimensions. But there the trailing dimensions are outermost, that is order='F'. And since everything starts as 2d, this expansion only occurs when one array is 3d (or larger).
https://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/
explains, and gives a bit of history. My own history with the language is old enough, that the ma_expanded = ma(ones(3,1),:) style of expansion is familiar. octave added broadcasting before MATLAB.
To avoid ambiguity, broadcasting expansion can only occur in one direction. Expanding in the direction of the outermost dimension makes seems logical.
Compare (3,) expanded to (1,3) versus (3,1) - viewed as nested lists:
In [198]: np.array([1,2,3])
Out[198]: array([1, 2, 3])
In [199]: np.array([[1,2,3]])
Out[199]: array([[1, 2, 3]])
In [200]: (np.array([[1,2,3]]).T).tolist()
Out[200]: [[1], [2], [3]]
I don't know if there are significant implementation advantages. With the striding mechanism, adding a new dimension anywhere is easy. Just change the shape and strides, adding a 0 for the dimension that needs to be 'replicated'.
In [203]: np.broadcast_arrays(np.array([1,2,3]),np.array([[1],[2],[3]]),np.ones((3,3)))
Out[203]:
[array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]), array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]]), array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])]
In [204]: [x.strides for x in _]
Out[204]: [(0, 8), (8, 0), (24, 8)]

Freezing specific values in a weight matrix in tensorflow

Assuming I have a a weight matrix that looks like [[a , b ], [c, d]], is it possible in Tensorflow to fix the values of b and c to zero such that they don't change during optimization?
Some sample code:
A = tf.Variable([[1., 0.], [3., 0.]])
A1 = A[:,0:1] # just some slicing of your variable
A2 = A[:,1:2]
A2_stop = tf.stop_gradient(tf.identity(A2))
A = tf.concat((A1, A2_stop), axis=1)
Actually, tf.identity is needed to stop the gradient before A2.
There are three ways to do this, you can
Break apart your weight matrix into multiple variables, and make only some of them trainable.
Hack the gradient calculation to be zero for the constant elements.
Hack the gradient application to reset the values of the constant elements.

numpy dot product with inner dimension zero gives unexpected result

My calculation consists of putting many matrices in one big block matrix. Some of these matrices can be empty in certain cases. These empty matrices give unexpected results.
The problem comes down to this:
b
Out[117]: array([], dtype=int32)
X = A[b,:]
Out[118]: array([], shape=(0, 3), dtype=float64)
X is the empty matrix. The matrix it gets multiplied by is also empty due to the code.
Y = array([]).dot(X)
Out[119]: array([ 0., 0., 0.])
I realise that the size of Y is correct according to algebra: (1x0).(0x3)=(1x3). But I was expecting an empty matrix to be the result, since the inner dimmension of the matrices are zero (not one),
I would rather not check for these matrices to be empty, because putting the block matrix together, would have to be rewriten for every combination of the possible empty matrices.
Is there a solution to this problem? I was thinking of wrapping the dot function and only proceding if the inner dimension is not zero. But I feel like there is a cleaner solution.
edit:
I should clarify i bit more with what I mean with that I rather not check for zero dimension. The equations that i put into a block matrix consists of a hundreths of these dot products. Each dot product represents a component in an electrical network. X being empty means that there is no such component present in the network. But if I would have to compose the final (block) matrix dependent on which elements are presents. Then this would mean thousands of lines of code. Because the [ 0., 0., 0.] equation adds an incorrect equation. Which I would rather not do.
The bad news is that the shape of the result is both expected and correct.
The good news is that there is a nearly trivial check to see if a matrix is empty or not for all cases using the total number of elements in the result, provided by the size attribute:
b = ...
X = ...
Y = array([]).dot(X)
if Y.size:
# You have a non-empty result
EDIT
You can use the same logic to filter your input vectors. Since you want to do calculations only for non-empty vectors, you may want to try something like:
if b.size and X.size:
Y = b.dot(X)
# Add Y to your block matrix, knowing that it is of the expected size

What's the purpose of numpy.empty() over numpy.ndarray()?

It seems that anything that numpy.empty() can do can be done just as easily using numpy.ndarray(), e.g.:
>>> np.empty(shape=(2, 2), dtype=np.dtype('double'))
array([[ 0., 0.],
[ 0., 0.]])
>>> np.ndarray(shape=(2, 2), dtype=np.dtype('double'))
array([[ 0., 0.],
[ 0., 0.]])
>>>
Why do we need numpy.empty()? Can it do something that numpy.ndarray() cannot do just as simply? Is it just serving an annotational purpose to emphasize to the code reader that you are creating an uninitialized array?
Always use np.empty. np.ndarray is the low-level way to construct an array. It is used by np.empty or np.array. np.ndarray exposes some details you should not (accidentally) use yourself.
From the docstring:
Docstring:
ndarray(shape, dtype=float, buffer=None, offset=0,
strides=None, order=None)
An array object represents a multidimensional, homogeneous array
of fixed-size items. An associated data-type object describes the
format of each element in the array (its byte-order, how many bytes it
occupies in memory, whether it is an integer, a floating point number,
or something else, etc.)
Arrays should be constructed using array, zeros or empty (refer
to the See Also section below). The parameters given here refer to
a low-level method (ndarray(...)) for instantiating an array.
For more information, refer to the numpy module and examine the
the methods and attributes of an array.
Get the docstring with:
>>> help(np.ndarray)
or in IPython:
In: [1] np.ndarray?
EDIT
And as #hpaulj pointed out in a comment, it is useful to read all relevant documentation. Always prefer zeros over empty, unless you have a strong reason to do otherwise. From the docsting of empty:
Notes
empty, unlike zeros, does not set the array values to zero,
and may therefore be marginally faster. On the other hand, it requires
the user to manually set all the values in the array, and should be
used with caution.