numpy dot product with inner dimension zero gives unexpected result - numpy

My calculation consists of putting many matrices in one big block matrix. Some of these matrices can be empty in certain cases. These empty matrices give unexpected results.
The problem comes down to this:
b
Out[117]: array([], dtype=int32)
X = A[b,:]
Out[118]: array([], shape=(0, 3), dtype=float64)
X is the empty matrix. The matrix it gets multiplied by is also empty due to the code.
Y = array([]).dot(X)
Out[119]: array([ 0., 0., 0.])
I realise that the size of Y is correct according to algebra: (1x0).(0x3)=(1x3). But I was expecting an empty matrix to be the result, since the inner dimmension of the matrices are zero (not one),
I would rather not check for these matrices to be empty, because putting the block matrix together, would have to be rewriten for every combination of the possible empty matrices.
Is there a solution to this problem? I was thinking of wrapping the dot function and only proceding if the inner dimension is not zero. But I feel like there is a cleaner solution.
edit:
I should clarify i bit more with what I mean with that I rather not check for zero dimension. The equations that i put into a block matrix consists of a hundreths of these dot products. Each dot product represents a component in an electrical network. X being empty means that there is no such component present in the network. But if I would have to compose the final (block) matrix dependent on which elements are presents. Then this would mean thousands of lines of code. Because the [ 0., 0., 0.] equation adds an incorrect equation. Which I would rather not do.

The bad news is that the shape of the result is both expected and correct.
The good news is that there is a nearly trivial check to see if a matrix is empty or not for all cases using the total number of elements in the result, provided by the size attribute:
b = ...
X = ...
Y = array([]).dot(X)
if Y.size:
# You have a non-empty result
EDIT
You can use the same logic to filter your input vectors. Since you want to do calculations only for non-empty vectors, you may want to try something like:
if b.size and X.size:
Y = b.dot(X)
# Add Y to your block matrix, knowing that it is of the expected size

Related

How to do 2D Convolution only at a specific location?

This question has been asked multiple times but still I could not get what I was looking for. Imagine
data=np.random.rand(N,N) #shape N x N
kernel=np.random.rand(3,3) #shape M x M
I know convolution typically means placing the kernel all over the data. But in my case N and M are of the orders of 10000. So I wish to get the value of the convolution at a specific location in the data, say at (10,37) without doing unnecessary calculations at all locations. So the output will be just a number. The main goal is to reduce the computation and memory expenses. Is there any inbuilt function that does this with minimal adjustments?
Indeed, applying the convolution for a particular position coincides with the mere sum over the entries of a (pointwise) multiplication of the submatrix in data and the flipped kernel itself. Here, is a reproducible example.
Code
N = 1000
M = 3
np.random.seed(777)
data = np.random.rand(N,N) #shape N x N
kernel= np.random.rand(M,M) #shape M x M
# Pointwise convolution = pointwise product
data[10:10+M,37:37+M]*kernel[::-1, ::-1]
>array([[0.70980514, 0.37426475, 0.02392947],
[0.24387766, 0.1985901 , 0.01103323],
[0.06321042, 0.57352696, 0.25606805]])
with output
conv = np.sum(data[10:10+M,37:37+M]*kernel[::-1, ::-1])
conv
>2.45430578
The kernel is being flipped by definition of the convolution as explained in here and was kindly pointed Warren Weckesser. Thanks!
The key is to make sense of the index you provided. I assumed it refers to the upper left corner of the sub-matrix in data. However, it can refer to the midpoint as well when M is odd.
Concept
A different example with N=7 and M=3 exemplifies the idea
and is presented in here for the kernel
kernel = np.array([[3,0,-1], [2,0,1], [4,4,3]])
which, when flipped, yields
k[::-1,::-1]
> array([[ 3, 4, 4],
[ 1, 0, 2],
[-1, 0, 3]])
EDIT 1:
Please note that the lecturer in this video does not explicitly mention that flipping the kernel is required before the pointwise multiplication to adhere to the mathematically proper definition of convolution.
EDIT 2:
For large M and target index close to the boundary of data, a ValueError: operands could not be broadcast together with shapes ... might be thrown. To prevent this, padding the matrix data with zeros can prevent this (although it blows up the memory requirement). I.e.
data = np.pad(data, pad_width=M, mode='constant')

Why is broadcasting done by aligning axes backwards

Numpy's broadcasting rules have bitten me once again and I'm starting to feel there may be a way of thinking about this
topic that I'm missing.
I'm often in situations as follows: the first axis of my arrays is reserved for something fixed, like the number of samples. The second axis could represent different independent variables of each sample, for some arrays, or it could be not existent when it feels natural that there only be one quantity attached to each sample in an array. For example, if the array is called price, I'd probably only use one axis, representing the price of each sample. On the other hand, a second axis is sometimes much more natural. For example, I could use a neural network to compute a quantity for each sample, and since neural networks can in general compute arbitrary multi valued functions, the library I use would in general return a 2d array and make the second axis singleton if I use it to compute a single dependent variable. I found this approach to use 2d arrays is also more amenable to future extensions of my code.
Long story short, I need to make decisions in various places of my codebase whether to store array as (1000,) or (1000,1), and changes of requirements occasionally make it necessary to switch from one format to the other.
Usually, these arrays live alongside arrays with up to 4 axes, which further increases the pressure to sometimes introduce singleton second axis, and then have the third axis represent a consistent semantic quality for all arrays that use it.
The problem now occurs when I add my (1000,) or (1000,1) arrays, expecting to get (1000,1), but get (1000,1000) because of implicit broadcasting.
I feel like this prevents giving semantic meaning to axes. Of course I could always use at least two axes, but that leads to the question where to stop: To be fail safe, continuing this logic, I'd have to always use arrays of at least 6 axes to represent everything.
I'm aware this is maybe not the best technically well defined question, but does anyone have a modus operandi that helps them avoid these kind of bugs?
Does anyone know the motivations of the numpy developers to align axes in reverse order for broadcasting? Was computational efficiency or another technical reason behind this, or a model of thinking that I don't understand?
In MATLAB broadcasting, a jonny-come-lately to this game, expands trailing dimensions. But there the trailing dimensions are outermost, that is order='F'. And since everything starts as 2d, this expansion only occurs when one array is 3d (or larger).
https://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/
explains, and gives a bit of history. My own history with the language is old enough, that the ma_expanded = ma(ones(3,1),:) style of expansion is familiar. octave added broadcasting before MATLAB.
To avoid ambiguity, broadcasting expansion can only occur in one direction. Expanding in the direction of the outermost dimension makes seems logical.
Compare (3,) expanded to (1,3) versus (3,1) - viewed as nested lists:
In [198]: np.array([1,2,3])
Out[198]: array([1, 2, 3])
In [199]: np.array([[1,2,3]])
Out[199]: array([[1, 2, 3]])
In [200]: (np.array([[1,2,3]]).T).tolist()
Out[200]: [[1], [2], [3]]
I don't know if there are significant implementation advantages. With the striding mechanism, adding a new dimension anywhere is easy. Just change the shape and strides, adding a 0 for the dimension that needs to be 'replicated'.
In [203]: np.broadcast_arrays(np.array([1,2,3]),np.array([[1],[2],[3]]),np.ones((3,3)))
Out[203]:
[array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]), array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]]), array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])]
In [204]: [x.strides for x in _]
Out[204]: [(0, 8), (8, 0), (24, 8)]

How to change each element in an array to the mean of the array using NumPy?

I am new to Python. In one of my assignment question, part of the question requires us to compute the average of each element in a sub-matrix and replace each element with the mean using operators that's available in Numpy.
An example of the matrix could be
M = [[[1,2,3],[2,3,4]],[[3,4,5],[4,5,6]]]
Through some operations, it is expected to get a matrix like the following:
M = [[[2,2,2],[3,3,3]],[[4,4,4],[5,5,5]]]
I have looked at some numpy documentations and still haven't figured out, would really appreciate if someone can help.
You have a few different options here. All of them follow the same general idea. You have an MxNxL array and you want to apply a reduction operation along the last axis that will leave you with an MxN result by default. However, you want to broadcast that result across the same MxNxL shape you started with.
Numpy has a parameter in most reduction operations that allows you to keep the reduced dimension present in the output array, which will allow you to easily broadcast that result into the correct sized matrix. The parameter is called keepdims, you can read more in the documentation to numpy.mean.
Here are a few approaches that all take advantage of this.
Setup
avg = M.mean(-1, keepdims=1)
# array([[[2.],
# [3.]],
#
# [[4.],
# [5.]]])
Option 1
Assign to a view of the array. However, it will also coerce float averages to int, so cast your array to float first for precision if you want to do this.
M[:] = avg
Option 2
An efficient read only view using np.broadcast_to
np.broadcast_to(avg, M.shape)
Option 3
Broadcasted multiplication, more for demonstration than anything.
avg * np.ones(M.shape)
All will produce (same except for possibly the dtype):
array([[[2., 2., 2.],
[3., 3., 3.]],
[[4., 4., 4.],
[5., 5., 5.]]])
In one line of code:
M.mean(-1, keepdims=1) * np.ones(M.shape)

Freezing specific values in a weight matrix in tensorflow

Assuming I have a a weight matrix that looks like [[a , b ], [c, d]], is it possible in Tensorflow to fix the values of b and c to zero such that they don't change during optimization?
Some sample code:
A = tf.Variable([[1., 0.], [3., 0.]])
A1 = A[:,0:1] # just some slicing of your variable
A2 = A[:,1:2]
A2_stop = tf.stop_gradient(tf.identity(A2))
A = tf.concat((A1, A2_stop), axis=1)
Actually, tf.identity is needed to stop the gradient before A2.
There are three ways to do this, you can
Break apart your weight matrix into multiple variables, and make only some of them trainable.
Hack the gradient calculation to be zero for the constant elements.
Hack the gradient application to reset the values of the constant elements.

Verify that points lie on a grid of specified pitch

While I am trying to solve this problem in a context where numpy is used heavily (and therefore an elegant numpy-based solution would be particularly welcome) the fundamental problem has nothing to do with numpy (or even Python) as such.
The task is to create an automated test for an algorithm which is supposed to produce points distributed on a grid whose pitch is specified as an input to the algorithm. The absolute positions of the points do not matter, but their relative positions do. For example, following
collection_of_points = algorithm(data, pitch=[1.3, 1.5, 2])
collection_of_points should contain only points whose x-coordinates differ by multiples of 1.3, whose y-coordinates differ by multiples of 1.5 and whose z-coordinates differ by multiples of 2.
The test should verify that this condition is satisfied.
One thing that I have tried, which doesn't seem too ugly, but doesn't work is
points = algo(data, pitch=requested_pitch)
for p1, p2 in itertools.combinations(points, 2):
distance_between_points = np.array(p2) - np.array(p1)
assert np.allclose(distance_between_points % requested_pitch, 0)
[ Aside for those unfamiliar with python or numpy:
itertools.combinations(points, 2) is a simple way of iterating through all pairs of points
Arithmetic operations on np.arrays are performed elementwise, so np.array([5,6,7]) % np.array([2,3,4]) evaluates to np.array([1, 0, 3]) via np.array([5%2, 6%3, 7%4])
np.allclose checks whether all corresponding elements in the two inputs arrays are approximately equal, and numpy automatically pretends that the 0 which is passed in as the second argument, was really an all-zero array of the correct size
]
To see why the idea shown above fails, consider a desired pitch of 3 and two points which are separated by 8.9999999 in the relevant dimension. 8.999999 % 3 is around 2.999999 which is nowhere near the required 0.
In all of this, I can't help feeling that I'm missing something obvious or that I'm re-inventing some wheel.
Can you suggest an elegant way of writing such a check?
Change your assertion to:
np.all(np.logical_or(np.isclose(x % y, 0), np.isclose((x % y) - y, 0)))
If you want to make it more readable, you should functionalize the statement. Something like:
def is_multiple(x, y, rtol=1e-05, atol=1e-08):
"""
Test if x is a multiple of y.
"""
remainder = x % y
is_zero = np.isclose(remainder, 0., rtol, atol)
is_y = np.isclose(remainder, y, rtol, atol)
return np.logical_or(is_zero, is_y)
And then:
assert np.all(is_multiple(distance_between_points, requested_pitch))