Freezing specific values in a weight matrix in tensorflow - tensorflow

Assuming I have a a weight matrix that looks like [[a , b ], [c, d]], is it possible in Tensorflow to fix the values of b and c to zero such that they don't change during optimization?

Some sample code:
A = tf.Variable([[1., 0.], [3., 0.]])
A1 = A[:,0:1] # just some slicing of your variable
A2 = A[:,1:2]
A2_stop = tf.stop_gradient(tf.identity(A2))
A = tf.concat((A1, A2_stop), axis=1)
Actually, tf.identity is needed to stop the gradient before A2.

There are three ways to do this, you can
Break apart your weight matrix into multiple variables, and make only some of them trainable.
Hack the gradient calculation to be zero for the constant elements.
Hack the gradient application to reset the values of the constant elements.

Related

How to change each element in an array to the mean of the array using NumPy?

I am new to Python. In one of my assignment question, part of the question requires us to compute the average of each element in a sub-matrix and replace each element with the mean using operators that's available in Numpy.
An example of the matrix could be
M = [[[1,2,3],[2,3,4]],[[3,4,5],[4,5,6]]]
Through some operations, it is expected to get a matrix like the following:
M = [[[2,2,2],[3,3,3]],[[4,4,4],[5,5,5]]]
I have looked at some numpy documentations and still haven't figured out, would really appreciate if someone can help.
You have a few different options here. All of them follow the same general idea. You have an MxNxL array and you want to apply a reduction operation along the last axis that will leave you with an MxN result by default. However, you want to broadcast that result across the same MxNxL shape you started with.
Numpy has a parameter in most reduction operations that allows you to keep the reduced dimension present in the output array, which will allow you to easily broadcast that result into the correct sized matrix. The parameter is called keepdims, you can read more in the documentation to numpy.mean.
Here are a few approaches that all take advantage of this.
Setup
avg = M.mean(-1, keepdims=1)
# array([[[2.],
# [3.]],
#
# [[4.],
# [5.]]])
Option 1
Assign to a view of the array. However, it will also coerce float averages to int, so cast your array to float first for precision if you want to do this.
M[:] = avg
Option 2
An efficient read only view using np.broadcast_to
np.broadcast_to(avg, M.shape)
Option 3
Broadcasted multiplication, more for demonstration than anything.
avg * np.ones(M.shape)
All will produce (same except for possibly the dtype):
array([[[2., 2., 2.],
[3., 3., 3.]],
[[4., 4., 4.],
[5., 5., 5.]]])
In one line of code:
M.mean(-1, keepdims=1) * np.ones(M.shape)

Compact and natural way to write matrix product of vectors in Numpy

In scientific computing I often want to do vector multiplications like
a x b^T
with a and b being row vectors and b^T is the transpose of the vector. So if a and b are of shape [n, 1] and [m, 1], the resulting matrix has shape [n, m]
Is there a good and straight forward way to write this multiplication in numpy?
Example:
a = np.array([1,2,3])
b = np.array([4,5,6,7])
Adding axes manually works:
a[:,np.newaxis] # b[np.newaxis,:]
and gives the correct result:
[[ 4 5 6 7]
[ 8 10 12 14]
[12 15 18 21]]
Einstein notation would be another way, but still somewhat weird.
np.einsum('a,b->ab', a,b)
What I was hoping to work, but doesn't work, is the following:
a # b.T
Any other approaches to do this?
In MATLAB matrix multiplication is the norm, using *. Element wise multiplication uses .* operator. Also matrices are atleast 2d.
In numpy, elementwise multiplication uses *. Matrix multiplication is done with np.dot (or its method), and more recently with the # operator (np.matmul). numpy adds broadcasting, which gives the elementwise multiplication a lot more expresiveness.
With your 2 examples arrays, of shape (3,) and (4,) the options of making a (3,4) outer product https://en.wikipedia.org/wiki/Outer_product include:
np.outer(a,b)
np.einsum('i,j->ij, a, b) # matching einstein index notation
a[:,None] * b # the most idiomatic numpy expression
This last works because of broadcasting. a[:, None], like a.reshape(-1,1) turns the (3,) array into a (3,1). b[None, :] turns a (4,) into (1,4). But broadcasting can perform this upgrade automatically (and unambiguously).
(3,1) * (4,) => (3,1) * (1,4) => (3,4)
Broadcasting does not work with np.dot. So we need
a[:, None].dot(b[None, :]) # (3,1) dot with (1,4)
The key with dot is that the last dim of a pairs with the 2nd to last of b. (np.dot also works with 2 matching 1d arrays, performing the conventional vector dot product).
# (matmul) introduces an operator that works like dot, at least in the 2d with 2d case. With higher dimensional arrays they work differently.
a[:,None].dot(b[None,:])
np.dot(a[:,None], b[None,:])
a[:,None] # b[None,:]
a[:,None] # b[:,None].T
and the reshape equivalents all create the desired (3,4) array.
np.tensordot can handle other dimensions combinations, but it works by reshaping and transposing the inputs, so in the end it can pass them to dot. It then transforms the result back into desired shape.
Quick time tests show that np.dot versions tend to be fastest - because they delegate the action to fast BLAS like libraries. For the other versions, the delegation is a bit more indirect, or they use numpy's own compiled code.
In the comments, multiple solutions were proposed, which I summarize here:
np.outer(a,b), which basically reformulates this multiplicaten as a set problem (thanks to Brenlla)
a[:,np.newaxis]*b (thanks to Divakar)
a.reshape((-1,1)) # b.reshape((-1,1)).T or just as well
a.reshape((-1,1)) # b.reshape((1,-1)) . It is a bit long, but
shows that these numpy matrix operations actually need matrices as
inputs, not only vectors (thanks to Warren Weckesser and
heltonbiker)
For completeness, my previous already working examples:
a[:,np.newaxis] # b[np.newaxis,:]
np.einsum('a,b->ab', a,b)
Remark: To reduce the number of characters even more, one can use None instead of np.newaxis.

How to construct a tensor weights whose certain elements are zero?

I want to construct a weight whose certain elements are zero and never change, and other elements are the variables.For example:
[[0,0,a,0],[0,0,b,0],[0,0,0,c],[0,0,0,d]]
This is a tf variable, and all zeros stay unchanged. Only a, b, c, d are tuned using gradient descent.
Are there anyone who knows how to define such a matrix?
You should look into SparseTensor. It is highly optimised for operations where tensor consists of many zeros.
So, in your case, to initialise SparseTensor:
a,b,c,d = 10,20,30,40
sparse = tf.SparseTensor([[0,2], [1,2], [2,3], [3,3]], [a,b,c,d], [4,4])

numpy dot product with inner dimension zero gives unexpected result

My calculation consists of putting many matrices in one big block matrix. Some of these matrices can be empty in certain cases. These empty matrices give unexpected results.
The problem comes down to this:
b
Out[117]: array([], dtype=int32)
X = A[b,:]
Out[118]: array([], shape=(0, 3), dtype=float64)
X is the empty matrix. The matrix it gets multiplied by is also empty due to the code.
Y = array([]).dot(X)
Out[119]: array([ 0., 0., 0.])
I realise that the size of Y is correct according to algebra: (1x0).(0x3)=(1x3). But I was expecting an empty matrix to be the result, since the inner dimmension of the matrices are zero (not one),
I would rather not check for these matrices to be empty, because putting the block matrix together, would have to be rewriten for every combination of the possible empty matrices.
Is there a solution to this problem? I was thinking of wrapping the dot function and only proceding if the inner dimension is not zero. But I feel like there is a cleaner solution.
edit:
I should clarify i bit more with what I mean with that I rather not check for zero dimension. The equations that i put into a block matrix consists of a hundreths of these dot products. Each dot product represents a component in an electrical network. X being empty means that there is no such component present in the network. But if I would have to compose the final (block) matrix dependent on which elements are presents. Then this would mean thousands of lines of code. Because the [ 0., 0., 0.] equation adds an incorrect equation. Which I would rather not do.
The bad news is that the shape of the result is both expected and correct.
The good news is that there is a nearly trivial check to see if a matrix is empty or not for all cases using the total number of elements in the result, provided by the size attribute:
b = ...
X = ...
Y = array([]).dot(X)
if Y.size:
# You have a non-empty result
EDIT
You can use the same logic to filter your input vectors. Since you want to do calculations only for non-empty vectors, you may want to try something like:
if b.size and X.size:
Y = b.dot(X)
# Add Y to your block matrix, knowing that it is of the expected size

How to fetch gradients with respect to certain occurrences of variables in tensorflow?

Since tensorflow supports variable reuse, some part of computing graph may occur multiple times in both forward and backward process. So my question is, is it possible to update variables with respect their certain occurrences in the compute graph?
For example, in X_A->Y_B->Y_A->Y_B, Y_B occurs twice, how to update them respectively? I mean, at first, we take the latter occurrence as constant, and update the previous one, then do opposite.
A more simple example is, say X_A, Y_B, Y_A are all scalar variable, then let Z = X_A * Y_B * Y_A * Y_B, here the gradient of Z w.r.t both occurrences of Y_B is X_A * Y_B * Y_A, but actually the gradient of Z to Y_B is 2*X_A * Y_B * Y_A. In this example computing gradients respectively may seems unnecessary, but not always are those computation commutative.
In the first example, gradients to the latter occurrence may be computed by calling tf.stop_gradient on X_A->Y_B. But I could not think of a way to fetch the previous one. Is there a way to do it in tensorflow's python API?
Edit:
#Seven provided an example on how to deal with it when reuse a single variable. However often it's a variable scope that is reused, which contains many variables and functions that manage them. As far as I know, their is no way to reuse a variable scope with applying tf.stop_gradient to all variables it contains.
With my understanding, when you use A = tf.stop_gradient(A), A will be considered as a constant. I have an example here, maybe it can help you.
import tensorflow as tf
wa = tf.get_variable('a', shape=(), dtype=tf.float32,
initializer=tf.constant_initializer(1.5))
b = tf.get_variable('b', shape=(), dtype=tf.float32,
initializer=tf.constant_initializer(7))
x = tf.placeholder(tf.float32, shape=())
l = tf.stop_gradient(wa*x) * (wa*x+b)
op_gradient = tf.gradients(l, x)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print sess.run([op_gradient], feed_dict={x:11})
I have a workaround for this question. Define a custom getter for the concerning variable scope, which wraps the default getter with tf.stop_gradient. This could set all variables returned in this scope as a Tensor contributing no gradients, though sometimes things get complicated because it returns a Tensor instead of a variable, such as when using tf.nn.batch_norm.