How to do 2D Convolution only at a specific location? - numpy

This question has been asked multiple times but still I could not get what I was looking for. Imagine
data=np.random.rand(N,N) #shape N x N
kernel=np.random.rand(3,3) #shape M x M
I know convolution typically means placing the kernel all over the data. But in my case N and M are of the orders of 10000. So I wish to get the value of the convolution at a specific location in the data, say at (10,37) without doing unnecessary calculations at all locations. So the output will be just a number. The main goal is to reduce the computation and memory expenses. Is there any inbuilt function that does this with minimal adjustments?

Indeed, applying the convolution for a particular position coincides with the mere sum over the entries of a (pointwise) multiplication of the submatrix in data and the flipped kernel itself. Here, is a reproducible example.
Code
N = 1000
M = 3
np.random.seed(777)
data = np.random.rand(N,N) #shape N x N
kernel= np.random.rand(M,M) #shape M x M
# Pointwise convolution = pointwise product
data[10:10+M,37:37+M]*kernel[::-1, ::-1]
>array([[0.70980514, 0.37426475, 0.02392947],
[0.24387766, 0.1985901 , 0.01103323],
[0.06321042, 0.57352696, 0.25606805]])
with output
conv = np.sum(data[10:10+M,37:37+M]*kernel[::-1, ::-1])
conv
>2.45430578
The kernel is being flipped by definition of the convolution as explained in here and was kindly pointed Warren Weckesser. Thanks!
The key is to make sense of the index you provided. I assumed it refers to the upper left corner of the sub-matrix in data. However, it can refer to the midpoint as well when M is odd.
Concept
A different example with N=7 and M=3 exemplifies the idea
and is presented in here for the kernel
kernel = np.array([[3,0,-1], [2,0,1], [4,4,3]])
which, when flipped, yields
k[::-1,::-1]
> array([[ 3, 4, 4],
[ 1, 0, 2],
[-1, 0, 3]])
EDIT 1:
Please note that the lecturer in this video does not explicitly mention that flipping the kernel is required before the pointwise multiplication to adhere to the mathematically proper definition of convolution.
EDIT 2:
For large M and target index close to the boundary of data, a ValueError: operands could not be broadcast together with shapes ... might be thrown. To prevent this, padding the matrix data with zeros can prevent this (although it blows up the memory requirement). I.e.
data = np.pad(data, pad_width=M, mode='constant')

Related

How to convert this numpy one-liner into Tensorflow backend code?

I have multiple depthmaps which show a car from different angles. I need to calculate how well they match together in my loss function, so I have to reproject them into a different view. The depthmaps live in a cube that is relative to the length of the vehicle. The images have the shape (256,256). I already wrote the code to convert them to a pointcloud with backend functions (256*256,3). I can reproject this pointcloud to the side view with numpy like this:
reProj = np.zeros((256, 256), np.float32)
reProj[pointCloud[:, 1], pointCloud[:, 2]] = pointCloud[:, 0]
How can I convert this into keras backend code? I suspect there should be a gather somewhere in there, but I just cannot get it working.
Example:
Source depth image:
Reprojected:
Thanks for your help!
Edit: Minimal working example with data: https://easyupload.io/rwutwa
You can do this by using tf.matmul() the first input will be your pointcloud, from the dimensions i am assuming you are storing for every pixel a 3d vector x,y,z. The second input will be the 3d rotation matrix coresponding to the projection you need, keep in mind this works for every angle you want to you just need to define the 3x3 matrix.
If i understand correctly your data you need to rotate over x 90 degrees so the matrix would be
1 0 0
0 0 -1
0 1 0
read more on rotation matrices here https://en.wikipedia.org/wiki/Rotation_matrix
just go to the tree dimension and see what you need
So i finally figured it out, I was just thinking about it wrong. It is not a gather operation, is it a scatter. This works perfect now!
indices = K.stack([p[:, 1], p[:, 2]], -1)
indices = K.reshape(indices, (256, 256, 2))
indices = K.clip(indices, 0, 256 - 1)
updates = K.reshape(p[:,0], (256,256))
reProj = tf.tensor_scatter_nd_max(tf.zeros((256, 256), tf.int32), indices, updates)

How does the gradient of the sum trick work to get maxpooling positions in keras?

The keras examples directory contains a lightweight version of a stacked what-where autoencoder (SWWAE) which they train on MNIST data. (https://github.com/fchollet/keras/blob/master/examples/mnist_swwae.py)
In the original SWWAE paper, the authors compute the what and where using soft functions. However, in the keras implementation, they use a trick to get these locations. I would like to understand this trick.
Here is the code of the trick.
def getwhere(x):
''' Calculate the 'where' mask that contains switches indicating which
index contained the max value when MaxPool2D was applied. Using the
gradient of the sum is a nice trick to keep everything high level.'''
y_prepool, y_postpool = x
return K.gradients(K.sum(y_postpool), y_prepool) # How exactly does this line work?
Where y_prepool is a MxN matrix and y_postpool is a M/2 x N/2 matrix (lets assume canonical pooling of a size 2 pixels).
I have verified that the output of getwhere() is a bed of nails matrix where the nails indicate the position of the max (the local argmax if you will).
Can someone construct a small example demonstrating how getwhere works using this "Trick?"
Lets focus on the simplest example, without really talking about convolutions, say we have a vector
x = [1 4 2]
which we max-pool over (with a single, big window), we get
mx = 4
mathematically speaking, it is:
mx = x[argmax(x)]
now, the "trick" to recover one hot mask used by pooling is to do
magic = d mx / dx
there is no gradient for argmax, however it "passes" the corresponding gradient to an element in a vector at the location of maximum element, so:
d mx / dx = [0/dx[1] dx[2]/dx[2] 0/dx[3]] = [0 1 0]
as you can see, all the gradient for non-maximum elements are zero (due to argmax), and "1" appears at the maximum value because dx/x = 1.
Now for "proper" maxpool you have many pooling regions, connected to many input locations, thus taking analogous gradient of sum of pooled values, will recover all the indices.
Note however, that this trick will not work if you have heavily overlapping kernels - you might end up with bigger values than "1". Basically if a pixel is max-pooled by K kernels, than it will have value K, not 1, for example:
[1 ,2, 3]
x = [13,3, 1]
[4, 2, 9]
if we max pool with 2x2 window we get
mx = [13,3]
[13,9]
and the gradient trick gives you
[0, 0, 1]
magic = [2, 0, 0]
[0, 0, 1]

C-Support Vector Classification Comprehension

I have a question regarding a code snipped which I have found i a book.
The author creates two categories of sample points. Next the author learns a model and plots the SVC model onto the "blobs".
This is the code snipped:
# create 50 separable points
X, y = make_blobs(n_samples=50, centers=2,
random_state=0, cluster_std=0.60)
# fit the support vector classifier model
clf = SVC(kernel='linear')
clf.fit(X, y)
# plot the data
fig, ax = plt.subplots(figsize=(8, 6))
point_style = dict(cmap='Paired', s=50)
ax.scatter(X[:, 0], X[:, 1], c=y, **point_style)
# format plot
format_plot(ax, 'Input Data')
ax.axis([-1, 4, -2, 7])
# Get contours describing the model
xx = np.linspace(-1, 4, 10)
yy = np.linspace(-2, 7, 10)
xy1, xy2 = np.meshgrid(xx, yy)
Z = np.array([clf.decision_function([t])
for t in zip(xy1.flat, xy2.flat)]).reshape(xy1.shape)
line_style = dict(levels = [-1.0, 0.0, 1.0],
linestyles = ['dashed', 'solid', 'dashed'],
colors = 'gray', linewidths=1)
ax.contour(xy1, xy2, Z, **line_style)
The result is the following:
My question is now, why do we create "xx" and "yy" as well as "xy1" and "xy2"? Because actually we want to show the SVC "function" for the X and y data and if we pass xy1 and xy2 as well as Z (which is also created with xy1 and xy2) to the meshgrid function to plot the meshgrid, there is no connection to the data with which the SVC model was learned...isn't it?
Can anybody explain this to me please or give a recommendation how to solve this more easily?
Thanks for your answers
I'll start with short broad answers. ax.contour() is just one way to plot the separating hyperplane and its "parallel" planes. You can certainly plot it by calculating the plane, like this example.
To answer your last question, in my opinion it's already a relatively simple (in math and logic) and easy (in coding) way to plot your model. And it is especially useful when your separating hyperplane is not mathematically easy to describe (such as polynomial and RBF kernel for non-linear separation), like this example.
To address your second question and comments, and to answer your first question, yes you're right, xx, yy, xy1, xy2 and Z all have very limited connect to your (simulated blobs of) data. They are used for drawing the hyperplanes to describe your model.
That should answer your questions. But please allow me to give some more details here in case others are not familiar with the topic as you do. The only connection between your data and xx, yy, xy1, xy2, Z is:
xx, yy, xy1 and xy2 sample an area surrounding the simulated data. Specifically, the simulated data centered around 2. xx sets a limit between (-1, 4) and yy sets a limit between (-2, 7). One can check the "meshgrid" by ax.scatter(xy1, xy2).
Z is a calculation for all sample points in the "meshgrid". It calculates the normalized distance from a sample point to the separating hyperplane. Z is the levels on the contour plot.
ax.contour then uses the "meshgrid" and Z to plot contour lines. Here are some key points:
xy1 and xy2 are both 2-D specifying the (x, y) coordinates of the surface. They list sample points in the area row by row.
Z is a 2-D array with the same shape as xy1 and xy2. It defines the level at each point so that the program can "understand" the shape of the 3-dimensional surface.
levels = [-1.0, 0.0, 1.0] indicates that there are 3 curves (lines in this case) at corresponding levels to draw. In related to SVC, level 0 is the separating hyperplane; level -1 and 1 are very close (differ by a ζi) to the maximum margin separating hyperplane.
linestyles = ['dashed', 'solid', 'dashed'] indicates that the separating hyperplan is drawn as a solid line and the two planes on both sides are drawn as a dashed line.
Edit (in response to the comment):
Mathematically, the decision function should be a sign function which tell us a point is level 0 or 1, as you said. However, when you check values in Z, you will find they are continuous data. The decision_function(X) works in a way that the sign of the value indicates the classification, while the absolute value is the "Distance of the samples X to the separating hyperplane" which reflects (kind of) the confidence/significance of the predicted classification. This is critical to the plot of model. If Z is categorical, you would have contour lines which makes an area like a mesh rather than a single contour line. It will be like the colormesh in the example; but you won't see that with ax.contour() since it's not a correct behavior for a contour plot.

what is the best way to multiply tensors in tensorflow

Suppose that I have tensors x[i,j,k] and y[p,q] in a graph. What is the correct way to specify the tensor z[i,j,k,p,q] = x[i,j,k]y[p,q]? This is the coordinate representation of the tensor product of x and y. I can get the job done using a combination of tf.expand_dims, tf.mult and tf.tile, but I feel like there should be a better way...
I think you can get away without the tile operation using broadcasting.
x_reshaped = tf.reshape(x, (i, j, k, 1, 1))
y_reshaped = tf.reshape(y, (1, 1, 1, p, q))
z = x_reshaped * y_reshaped
When a dimension has size 1 and does not match the size of the other tensor's dimensions it is being multiplied with, it is copied / broadcasted automatically along that dimension and the product is carried out. Tile is often unnecessary. I actually don't think I have ever even used tile in tensorflow. Here I also used reshape rather than expand_dims but the result is the same either way.

Tensorflow extract_glimpse offset

I am trying to use the extract_glimpse function of tensorflow but I encounter some difficulties with the offset parameter.
Let's assume that I have a batch of one single channel 5x5 matrix called M and that I want to extract a 3x3 matrix of it.
When I call extract_glimpse([M], [3,3], [[1,1]], centered=False, normalized=False), it returns the result I am expecting: the 3x3 matrix centered at the position (1,1) in M.
But when I call extract_glimpse([M], [3,3], [[2,1]], centered=False, normalized=False), it doesn't return the 3x3 matrix centered at the position (2,1) in M but it returns the same as in the first call.
What is the point that I don't get?
The pixel coordinates actually have a range of 2 times the size (not documented - so it is a bug indeed). This is at least true in the case of centered=True and normalized=False. With those settings, the offsets range from minus the size to plus the size of the tensor. I therefore wrote a wrapper that is more intuitive to numpy users, using pixel coordinates starting at (0,0). This wrapper and more details about the problem are available on the tensorflow GitHub page.
For your particular case, I would try something like:
offsets1 = [-5 + 3,
-5 + 3]
extract_glimpse([M], [3,3], [offsets1], centered=True, normalized=False)
offsets2 = [-5 + 3 + 2,
-5 + 3]
extract_glimpse([M], [3,3], [offsets2], centered=True, normalized=False)