Updating variable values in tensorflow - tensorflow

I've have a basic question about updating the values of tensors via the tensorflow python api.
Consider the code snippet:
x = tf.placeholder(shape=(None,10), ... )
y = tf.placeholder(shape=(None,), ... )
W = tf.Variable( randn(10,10), dtype=tf.float32 )
yhat = tf.matmul(x, W)
Now let's assume I want to implement some sort of algorithm that iteratively updates the value of W (e.g. some optimization algo). This will involve steps like:
for i in range(max_its):
resid = y_hat - y
W = f(W , resid) # some update
the problem here is that W on the LHS is a new tensor, not the W that is used in yhat = tf.matmul(x, W)! That is, a new variable is created and the value of W used in my "model" doesn't update.
Now one way around this would be
for i in range(max_its):
resid = y_hat - y
W = f(W , resid) # some update
yhat = tf.matmul( x, W)
which results in the creation of a new "model" for each iteration of my loop !
Is there a better way to implement this (in python) without creating a whole bunch of new models for each iteration of the loop - but instead updating the original tensor W "in-place" so to speak?

Variables have an assign method. Try:W.assign(f(W,resid))

#aarbelle's terse answer is correct, I'll expand it a bit in case someone needs more info. The last 2 lines below is used for updating W.
x = tf.placeholder(shape=(None,10), ... )
y = tf.placeholder(shape=(None,), ... )
W = tf.Variable(randn(10,10), dtype=tf.float32 )
yhat = tf.matmul(x, W)
...
for i in range(max_its):
resid = y_hat - y
update = W.assign(f(W , resid)) # do not forget to initialize tf variables.
# "update" above is just a tf op, you need to run the op to update W.
sess.run(update)

Precisely, the answer should be sess.run(W.assign(f(W,resid))). Then use sess.run(W) to show the change.

Related

LSTM from scratch in tensorflow 2

I'm trying to make LSTM in tensorflow 2.1 from scratch, without using the one already supplied with keras (tf.keras.layers.LSTM), just to learn and code something. To do so, I've defined a class "Model" that when called (like with model(input)) it computes the matrix multiplications of the LSTM. I'm pasting here part of my code, the other parts are on github (link)
class Model(object):
[...]
def __call__(self, inputs):
assert inputs.shape == (vocab_size, T_steps)
outputs = []
for time_step in range(T_steps):
x = inputs[:,time_step]
x = tf.expand_dims(x,axis=1)
z = tf.concat([self.h_prev,x],axis=0)
f = tf.matmul(self.W_f, z) + self.b_f
f = tf.sigmoid(f)
i = tf.matmul(self.W_i, z) + self.b_i
i = tf.sigmoid(i)
o = tf.matmul(self.W_o, z) + self.b_o
o = tf.sigmoid(o)
C_bar = tf.matmul(self.W_C, z) + self.b_C
C_bar = tf.tanh(C_bar)
C = (f * self.C_prev) + (i * C_bar)
h = o * tf.tanh(C)
v = tf.matmul(self.W_v, h) + self.b_v
v = tf.sigmoid(v)
y = tf.math.softmax(v, axis=0)
self.h_prev = h
self.C_prev = C
outputs.append(y)
outputs = tf.squeeze(tf.stack(outputs,axis=1))
return outputs
But this neural netoworks has three problems:
1) it is way slow during training. In comparison a model that uses tf.keras.layers.LSTM() is trained more than 10 times faster. Why is this? Maybe because I didn't use a minibatch training, but a stochastic one?
2) the NN seems to not learn anything at all. After just some (very few!) training examples, the loss seems to settle down and it won't decrease anymore, but rather it oscillates around the reached value. After training, I tested the NN making it generate some text, but it just outputs non-sense gibberish. Why isn't learning anything?
3) the loss function outputs very high values. I've coded a categorical cross-entropy loss function but, with 100 characters long sequence, the value of the function is over 370 per training example. Shouldn't it be way lower than this?
I've wrote the loss function like this:
def compute_loss(predictions, desired_outputs):
l = 0
for i in range(T_steps):
l -= tf.math.log(predictions[desired_outputs[i], i])
return l
I know they're open questions, but unfortunately I can't make it works. So any answer, even a short answer that help me to make myself solve the problem, is fine :)

matmul function for vector with tensor multiplication in tensorflow

In general when we multiply a vector v of dimension 1*n with a tensor T of dimension m*n*k, we expect to get a matrix/tensor of dimension m*k/m*1*k. This means that our tensor has m slices of matrices with dimension n*k, and v is multiplied to each matrix and the resulting vectors are stacked together. In order to do this multiplication in tensorflow, I came up with the following formulation. I am just wondering if there is any built-in function that does this standard multiplication straightforward?
T = tf.Variable(tf.random_normal((m,n,k)), name="tensor")
v = tf.Variable(tf.random_normal((1,n)), name="vector")
c = tf.stack([v,v]) # m times, here set m=2
output = tf.matmul(c,T)
You can do it with:
tf.reduce_sum(tf.expand_dims(v,2)*T,1)
Code:
m, n, k = 2, 3, 4
T = tf.Variable(tf.random_normal((m,n,k)), name="tensor")
v = tf.Variable(tf.random_normal((1,n)), name="vector")
c = tf.stack([v,v]) # m times, here set m=2
out1 = tf.matmul(c,T)
out2 = tf.reduce_sum(tf.expand_dims(v,2)*T,1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
n_out1 = sess.run(out1)
n_out2 = sess.run(out2)
#both n_out1 and n_out2 matches
Not sure if there is a better way, but it sounds like you could use tf.map_fn like this:
output = tf.map_fn(lambda x: tf.matmul(v, x), T)

Reevaluate dependencies of a while loop

I am trying to understand how while loops work in tensorflow. In particular I have a variable, x say, that I update in the while loop, and then I have some values that depends on x, but when running the while loop the values does not seem to be updated when x changes.
The following code where I have tried to implement a simple gradient decent optimizer might illustrate what I mean:
import tensorflow as tf
x = tf.Variable(initial_value=4, dtype=tf.float32, trainable=False)
y = tf.multiply(x,x)
grad = tf.gradients(y, x)
def update_g():
with tf.control_dependencies(grad):
return tf.identity(grad[0])
iterations = tf.placeholder(tf.int32)
i = tf.constant(0, dtype=tf.int32)
g = tf.Variable(initial_value=grad[0], dtype=tf.float32, trainable=False)
c = lambda i_loop, x_loop, g_loop: i_loop < iterations
b = lambda i_loop, x_loop, g_loop: [i_loop+1, tf.assign(x, x_loop - 10*g_loop), update_g()]
l = tf.while_loop(c, b, [i, x, g], back_prop=False, parallel_iterations=1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
res_g = sess.run(grad)
res_l = sess.run(l, feed_dict={iterations: 10})
res_x = sess.run(x)
print(res_g)
print(res_l)
print(res_x)
Running this on tensorflow 1.0 gives this result for me:
[8.0]
[10, -796.0, 8.0]
-796.0
and the issue is that the value of the gradient is not updated as x changes.
I have tried various variations on the above code, but can not seem to find a version that works. Basically my question is if the above can be made to work, or do I need to rethink the approach.
(Maybe I should add that I am not interested in writing a gradient decent optimizer, I just built this to have something simple and understandable to work with.)
With some help from the other answer I managed to get this working. Posting the complete code here as a second answer:
x = tf.constant(4, dtype=tf.float32)
y = tf.multiply(x,x)
grad = tf.gradients(y, x)
def loop_grad(x_loop):
y2 = tf.multiply(x_loop, x_loop)
return tf.gradients(y2, x_loop)[0]
iterations = tf.placeholder(tf.int32)
i = tf.constant(0, dtype=tf.int32)
c = lambda i_loop, x_loop: i_loop < iterations
b = lambda i_loop, x_loop: [i_loop+1, x_loop - 0.1*loop_grad(x_loop)]
l = tf.while_loop(c, b, [i, x], back_prop=False, parallel_iterations=1)
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.05)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
sess.run(tf.global_variables_initializer())
res_g = sess.run(grad)
res_l = sess.run(l, feed_dict={iterations: 100000})
res_x = sess.run(x)
print(res_g)
print(res_l)
print(res_x)
changing the learning rate from the code in the question and increasing the number of iterations gives the output:
[8.0]
[100000, 5.1315068e-38]
4.0
Which seems to be working. It runs reasonably fast even with high iteration count, so there does not seem to be something really horrible going on with updating the graph in each iteration of the while loop, a fear of which probably was one reason why I didn't opt for this approach from the start.
Having tf.Variable objects as loop variables for while loops is not supported, and will behave in weird nondeterministic ways. Always use tf.assign and friends to update the value of a tf.Variable.

How to use `sparse_softmax_cross_entropy_with_logits`: without getting Incompatible Shapes Error

I would like to use the sparse_softmax_cross_entropy_with_logits
with the julia TensorFlow wrapper.
The operations is defined in the code here.
Basically, as I understand it the first argument should be logits, that would normally be fed to softmax to get them to be category probabilities (~1hot output).
And the second should be the correct labels as label ids.
I have adjusted the example code from the TensorFlow.jl readme
See below:
using Distributions
using TensorFlow
# Generate some synthetic data
x = randn(100, 50)
w = randn(50, 10)
y_prob = exp(x*w)
y_prob ./= sum(y_prob,2)
function draw(probs)
y = zeros(size(probs))
for i in 1:size(probs, 1)
idx = rand(Categorical(probs[i, :]))
y[i, idx] = 1
end
return y
end
y = draw(y_prob)
# Build the model
sess = Session(Graph())
X = placeholder(Float64)
Y_obs = placeholder(Float64)
Y_obs_lbl = indmax(Y_obs, 2)
variable_scope("logisitic_model", initializer=Normal(0, .001)) do
global W = get_variable("weights", [50, 10], Float64)
global B = get_variable("bias", [10], Float64)
end
L = X*W + B
Y=nn.softmax(L)
#costs = log(Y).*Y_obs #Dense (Orginal) way
costs = nn.sparse_softmax_cross_entropy_with_logits(L, Y_obs_lbl+1) #sparse way
Loss = -reduce_sum(costs)
optimizer = train.AdamOptimizer()
minimize_op = train.minimize(optimizer, Loss)
saver = train.Saver()
# Run training
run(sess, initialize_all_variables())
cur_loss, _ = run(sess, [Loss, minimize_op], Dict(X=>x, Y_obs=>y))
When I run it however, I get an error:
Tensorflow error: Status: Incompatible shapes: [1,100] vs. [100,10]
[[Node: gradients/SparseSoftmaxCrossEntropyWithLogits_10_grad/mul = Mul[T=DT_DOUBLE, _class=[], _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/SparseSoftmaxCrossEntropyWithLogits_10_grad/ExpandDims, SparseSoftmaxCrossEntropyWithLogits_10:1)]]
in check_status(::TensorFlow.Status) at /home/ubuntu/.julia/v0.5/TensorFlow/src/core.jl:101
in run(::TensorFlow.Session, ::Array{TensorFlow.Port,1}, ::Array{Any,1}, ::Array{TensorFlow.Port,1}, ::Array{Ptr{Void},1}) at /home/ubuntu/.julia/v0.5/TensorFlow/src/run.jl:96
in run(::TensorFlow.Session, ::Array{TensorFlow.Tensor,1}, ::Dict{TensorFlow.Tensor,Array{Float64,2}}) at /home/ubuntu/.julia/v0.5/TensorFlow/src/run.jl:143
This only happens when I try to train it.
If I don't include an optimise function/output then it works fine.
So I am doing something that screws up the gradient math.

What does opt.apply_gradients() do in TensorFlow?

The documentation is not quite clear about this. I suppose the gradients one can obtain by opt.compute_gradients(E, [v]) contain the ∂E/∂x = g(x) for each element x of the tensor that v stores. Does opt.apply_gradients(grads_and_vars) essentially execute x ← -η·g(x), where η is the learning rate? That would imply that if I want to add a positive additive change p to the variable, I would need to need to change g(x) ← g(x) - (1/η)p, e.g. like this:
opt = tf.train.GradientDescentOptimizer(learning_rate=l)
grads_and_vars = opt.compute_gradients(loss, var_list)
for l, gv in enumerate(grads_and_vars):
grads_and_vars[l] = (gv[0] - (1/l) * p, gv[1])
train_op = opt.apply_gradients(grads_and_vars)
Is there a better way to do this?
The update rule that the apply_gradients method actually applies depends on the specific optimizer. Take a look at the implementation of apply_gradients in the tf.train.Optimizer class here. It relies on the derived classes implementing the update rule in the methods _apply_dense and _apply_spares. The update rule you are referring to is implemented by the GradientDescentOptimizer.
Regarding your desired positive additive update: If what you are calling opt is an instantiation of GradientDescentOptimizer, then you could indeed achieve what you want to do by
grads_and_vars = opt.compute_gradients(E, [v])
eta = opt._learning_rate
my_grads_and_vars = [(g-(1/eta)*p, v) for g, v in grads_and_vars]
opt.apply_gradients(my_grads_and_vars)
The more elegant way to do this is probably to write a new optimizer (inheriting from tf.train.Optimizer) that implements your desired update rule directly.
You can also use eager execution API.
import tensorflow as tf
tf.enable_eager_execution()
tfe = tf.contrib.eager
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
grad = tfe.implicit_gradients(loss)
optimizer.apply_gradients(grad(model_fn, val_list))
I will make an instance for it as follow:
import tensorflow as tf
tf.enable_eager_exeuction()
tfe = tf.contrib.eager
W = tfe.Variable(np.random.randn())
b = tfe.Variable(np.random.randn())
def linear_regression(inputs):
return inputs * W + b;
def MSE(model_fn, inputs, labels):
return tf.reduce_sum(tf.pow(model_fn(inputs) - labels, 2)) / (2 * n_samples)
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.001)
grad = tfe.implicit_gradients(MSE)
optimizer.apply_gradients(grad(linear_regression, train_X, train_Y)) # train_X and train_Y are your input data and label