I want to use l2-regularizatin with Dynamic_rnn in tensorflow but it seems this is not handled gracefully currently. While loop is the source of error. Below is a sample code snippet to reproduce the problem
import numpy as np
import tensorflow as tf
tf.reset_default_graph()
batch = 2
dim = 3
hidden = 4
with tf.variable_scope('test', regularizer=tf.contrib.layers.l2_regularizer(0.001)):
lengths = tf.placeholder(dtype=tf.int32, shape=[batch])
inputs = tf.placeholder(dtype=tf.float32, shape=[batch, None, dim])
cell = tf.nn.rnn_cell.GRUCell(hidden)
cell_state = cell.zero_state(batch, tf.float32)
output, _ = tf.nn.dynamic_rnn(cell, inputs, lengths, initial_state=cell_state)
inputs_ = np.asarray([[[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3]],
[[6, 6, 6], [7, 7, 7], [8, 8, 8], [9, 9, 9]]],
dtype=np.int32)
lengths_ = np.asarray([3, 1], dtype=np.int32)
this_throws_error = tf.losses.get_regularization_loss()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
output_ = sess.run(output, {inputs: inputs_, lengths: lengths_})
print(output_)
INFO:tensorflow:Cannot use 'test/rnn/gru_cell/gates/kernel/Regularizer/l2_regularizer' as input to 'total_regularization_loss' because 'test/rnn/gru_cell/gates/kernel/Regularizer/l2_regularizer' is in a while loop.
total_regularization_loss while context: None
test/rnn/gru_cell/gates/kernel/Regularizer/l2_regularizer while context: test/rnn/while/while_context
How can i add l2 regularization if i have dynamic_rnn in my network? Currently i can be going ahead with getting trainable collection at the loss calculation and adding l2 loss there but i also have word vectors as trainable parameters which i dont want to regularize on
I've encountered the same issue, and I've been trying to solve it with tensorflow==1.9.0.
Code:
import numpy as np
import tensorflow as tf
tf.reset_default_graph()
batch = 2
dim = 3
hidden = 4
with tf.variable_scope('test', regularizer=tf.contrib.layers.l2_regularizer(0.001)):
lengths = tf.placeholder(dtype=tf.int32, shape=[batch])
inputs = tf.placeholder(dtype=tf.float32, shape=[batch, None, dim])
cell = tf.nn.rnn_cell.GRUCell(hidden)
cell_state = cell.zero_state(batch, tf.float32)
output, _ = tf.nn.dynamic_rnn(cell, inputs, lengths, initial_state=cell_state)
inputs_ = np.asarray([[[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3]],
[[6, 6, 6], [7, 7, 7], [8, 8, 8], [9, 9, 9]]],
dtype=np.int32)
lengths_ = np.asarray([3, 1], dtype=np.int32)
this_throws_error = tf.losses.get_regularization_loss()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
output_ = sess.run(output, {inputs: inputs_, lengths: lengths_})
print(output_)
print(sess.run(this_throws_error))
This is the result of running the code:
...
File "/Users/piero/Development/mlenv3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_util.py", line 314, in CheckInputFromValidContext
raise ValueError(error_msg + " See info log for more details.")
ValueError: Cannot use 'test/rnn/gru_cell/gates/kernel/Regularizer/l2_regularizer' as input to 'total_regularization_loss' because 'test/rnn/gru_cell/gates/kernel/Regularizer/l2_regularizer' is in a while loop. See info log for more details.
Then I tried to put the dynamic_rnn call outside of the variable scope:
import numpy as np
import tensorflow as tf
tf.reset_default_graph()
batch = 2
dim = 3
hidden = 4
with tf.variable_scope('test', regularizer=tf.contrib.layers.l2_regularizer(0.001)):
lengths = tf.placeholder(dtype=tf.int32, shape=[batch])
inputs = tf.placeholder(dtype=tf.float32, shape=[batch, None, dim])
cell = tf.nn.rnn_cell.GRUCell(hidden)
cell_state = cell.zero_state(batch, tf.float32)
output, _ = tf.nn.dynamic_rnn(cell, inputs, lengths, initial_state=cell_state)
inputs_ = np.asarray([[[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3]],
[[6, 6, 6], [7, 7, 7], [8, 8, 8], [9, 9, 9]]],
dtype=np.int32)
lengths_ = np.asarray([3, 1], dtype=np.int32)
this_throws_error = tf.losses.get_regularization_loss()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
output_ = sess.run(output, {inputs: inputs_, lengths: lengths_})
print(output_)
print(sess.run(this_throws_error))
In theory this should be fine, as the regularization applies to the weights of the rnn that should contain variables initialized when the rnn cells are created.
This is the output:
[[[ 0. 0. 0. 0. ]
[ 0.1526176 0.33048663 -0.02288104 -0.1016309 ]
[ 0.24402776 0.68280864 -0.04888818 -0.26671126]
[ 0. 0. 0. 0. ]]
[[ 0.01998052 0.82368904 -0.00891946 -0.38874635]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]]
0.0
So placing the dynami_rnn call outside the variable scope works, in the sense that does not return errors, but the value of the loss is 0, suggesting that it's actually not really considering any weight from the rnn to compute the l2 loss.
Then I tried with tensorflow==1.12.0.
This is the output for the first script with dynamic_rnn inside the scope:
[[[ 0. 0. 0. 0. ]
[-0.17653276 0.06490126 0.02065791 -0.05175343]
[-0.413078 0.14486027 0.03922977 -0.1465032 ]
[ 0. 0. 0. 0. ]]
[[-0.5176822 0.03947531 0.00206934 -0.5542746 ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]]
0.010403235
And this is the output with dynamic_rnn outside the scope:
[[[ 0. 0. 0. 0. ]
[ 0.04208181 0.03031874 -0.1749279 0.04617848]
[ 0.12169671 0.09322995 -0.29029205 0.08247502]
[ 0. 0. 0. 0. ]]
[[ 0.09673716 0.13300316 -0.02427006 0.00156245]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]]
0.0
The fact that the version with dynamic_rnn inside the scope returns a non-zero value suggests that it is working correctly, while in the other case, the returned value 0 suggests that it is not behaving as expected.
So the bottom line is: this was a bug in tensorflow that they solved between version 1.9.0 and version 1.12.0.
Related
Is there a way to directly update the elements in tf.Variable X at indices without creating a new tensor having the same shape as X?
tf.tensor_scatter_nd_update create a new tensor hence it appears not updateing the original tf.Variable.
This operation creates a new tensor by applying sparse updates to the input tensor.
tf.Variable assign apparently needs a new tensor value which has the same shape of X to update the tf.Variable X.
assign(
value, use_locking=False, name=None, read_value=True
)
value A Tensor. The new value for this variable.
About the tf.tensor_scatter_nd_update, you're right that it returns a new tf.tensor (and not tf.Variable). But about the assign which is an attribute of tf.Variable, I think you somewhat misread the document; the value is just the new item that you want to assign in particular indices of your old variable.
AFAIK, in tensorflow all tensors are immutable like python numbers and strings; you can never update the contents of a tensor, only create a new one, source. And directly updating or manipulating of tf.tensor or tf.Variable such as numpy like item assignment is still not supported. Check the following Github issues to follow up the discussions: #33131, #14132.
In numpy, we can do an in-place item assignment that you showed in the comment box.
import numpy as np
a = np.array([1,2,3])
print(a) # [1 2 3]
a[1] = 0
print(a) # [1 0 3]
A similar result can be achieved in tf.Variable with assign attribute.
import tensorflow as tf
b = tf.Variable([1,2,3])
b.numpy() # array([1, 2, 3], dtype=int32)
b[1].assign(0)
b.numpy() # array([1, 0, 3], dtype=int32)
Later, we can convert it to tf. tensor as follows.
b_ten = tf.convert_to_tensor(b)
b_ten.numpy() # array([1, 0, 3], dtype=int32)
We can do such item assignment in tf.tensor too but we need to convert it to tf.Variable first, (I know, not very intuitive).
tensor = [[1, 1], [1, 1], [1, 1]] # tf.rank(tensor) == 2
indices = [[0, 1], [2, 0]] # num_updates == 2, index_depth == 2
updates = [5, 10] # num_updates == 2
x = tf.tensor_scatter_nd_update(tensor, indices, updates)
x
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[ 1, 5],
[ 1, 1],
[10, 1]], dtype=int32)>
x = tf.Variable(x)
x
<tf.Variable 'Variable:0' shape=(3, 2) dtype=int32, numpy=
array([[ 1, 5],
[ 1, 1],
[10, 1]], dtype=int32)>
x[0].assign([5,1])
x
<tf.Variable 'Variable:0' shape=(3, 2) dtype=int32, numpy=
array([[ 5, 1],
[ 1, 1],
[10, 1]], dtype=int32)>
x = tf.convert_to_tensor(x)
x
<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[ 5, 1],
[ 1, 1],
[10, 1]], dtype=int32)>
Trying implement a mask for a sequence of timeperiods, with zero-padding, to an LSTM network.
Each sequence of timeperiods is of varying length, hence requiring padding & masking.
I am trying to model sequences of length 96(timeperiods), and features=33. Simplified data (7 timeperiods and 3 features) are shown:
example state at a timeperiod = [4, 2, 9] at time0(t0)
example sequence = [[2, 3, 6], [1, 6, 8], [2, 9, 4], [2, 7, 3]] at t(0), t(1), t(2), t(3)
example_padded1 = [[2, 3, 6], [1, 6, 8], [2, 9, 4], [2, 7, 3], 0, 0, 0] at t(0) to t(6)
example_padded2 = [[2, 6, 0], [1, 6, 3], [2, 9, 7], [2, 7, 3], 0., 0., 0.]
example_padded3 = [[2, 6, 0], [5, 8, 3], [9, 4, 7], [2, 5, 3], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]
Submitting each example sequence to:
seq = example_padded1
masking = layers.Masking(mask_value=0)
masked_output = masking(seq)
print(masked_output._keras_mask)
Gives Errors:
padded1 error: InvalidArgumentError: cannot compute Pack as input #2(zero-based) was expected to be a
float tensor but is a int32 tensor [Op:Pack] name: packed
padded2 error: InvalidArgumentError: Shapes of all inputs must match:
values[0].shape = [3] != values[2].shape = [] [Op:Pack] name: packed
padded3 error: Error: 'list' object has no attribute 'dtype' when
when checking for mask_value = 0
I then added an input layer to define shape of a sequence:
seq_len, n_features = 7, 3
inp = Input(shape=(seq_len, n_features))
masking = layers.Masking(mask_value=0, input_shape=inp)
masked_output2 = masking(seq)
print(masked_output2._keras_mask)
But got error:
TypeError: Cannot iterate over a tensor with unknown first dimension.
(Python 3.8, TF2)
Have also been trying Embedding, but that seems even more problematical
How to implement a mask for variable length sequences, which are then padded?
Have possibly resolved this. Problem lies in how I am building the array of sequences.
Each of my sequences consists of:
[array([1,2,3]),array([2,3,4]), array([4,5,6]), array([0,0,0]), , array([0,0,0]), , array([0,0,0]),, array([0,0,0])]
a list of arrays...
when it should be a single array, like:
array([[1,2,3], [2,3,4], [4,5,6], [0,0,0], [0,0,0], [0,0,0], [0,0,0]])
Given a 2D tensor
T = [[1, 2, 3]
[4, 5, 6]]
and a 1D tensor containing horizontal shifts, say, s = [0, -2, 1], how can I obtain the following 3D tensor R?
R[0] = T
R[1] = [[3, 0, 0], # shifted two to the left,
[6, 0, 0]] # padding the rest with zeros
R[2] = [[0, 1, 2], # shifted one to the right,
[0, 4, 5]] # padding the rest with zeros
I know about tf.contrib.image.translate, but that isn't differentiable, so I am looking for some elegant combination of padding/slicing/looping/concatenating operations that accomplishes the same thing.
I have only come up with two ways to use tf.map_fn(). The first method is to fill about 0 in T and slice it.
import tensorflow as tf
T = tf.constant([[1, 2, 3],[4, 5, 6]],dtype=tf.float32)
s = tf.constant([0, -2, 1])
left = tf.reduce_max(s)
right = tf.reduce_min(s)
left_mask = tf.zeros(shape=(tf.shape(T)[0],left))
right_mask = tf.zeros(shape=(tf.shape(T)[0],tf.abs(right)))
tmp_slice = tf.concat([left_mask,T,right_mask],axis=-1)
result = tf.map_fn(lambda x: tmp_slice[:,left-x:left-x+tf.shape(T)[1]],s,dtype=T.dtype)
grads = tf.gradients(ys=result,xs=T)
with tf.Session() as sess:
print(sess.run(result))
print(sess.run(grads))
# print
[[[1. 2. 3.]
[4. 5. 6.]]
[[3. 0. 0.]
[6. 0. 0.]]
[[0. 1. 2.]
[0. 4. 5.]]]
[array([[2., 2., 2.],
[2., 2., 2.]], dtype=float32)]
The second method is to compute a corresponding mask matrix by tf.sequence_mask and tf.roll().Then take the value by tf.where().
import tensorflow as tf
T = tf.constant([[1, 2, 3],[4, 5, 6]],dtype=tf.float32)
s = tf.constant([0, -2, 1])
def mask_f(x):
indices = tf.tile([x], (tf.shape(T)[0],))
mask = tf.sequence_mask(tf.shape(T)[1]-tf.abs(indices),tf.shape(T)[1])
mask = tf.roll(mask,shift=tf.maximum(0,x),axis=-1)
return tf.where(mask,tf.roll(T,shift=x,axis=-1),tf.zeros_like(T))
result = tf.map_fn(lambda x:mask_f(x),s,dtype=T.dtype)
grads = tf.gradients(ys=result,xs=T)
with tf.Session() as sess:
print(sess.run(result))
print(sess.run(grads))
# print
[[[1. 2. 3.]
[4. 5. 6.]]
[[3. 0. 0.]
[6. 0. 0.]]
[[0. 1. 2.]
[0. 4. 5.]]]
[array([[2., 2., 2.],
[2., 2., 2.]], dtype=float32)]
Update
I found new method to achieve it. In essence, horizontal shifts are T multiplied by an offset identity matrix. So we can use np.eye() to create factor.
import tensorflow as tf
import numpy as np
T = tf.constant([[1, 2, 3],[4, 5, 6]],dtype=tf.float32)
s = tf.constant([0, -2, 1])
new_T = tf.tile(tf.expand_dims(T,axis=0),[tf.shape(s)[0],1,1])
s_factor = tf.map_fn(lambda x: tf.py_func(lambda y: np.eye(T.get_shape().as_list()[-1],k=y),[x],tf.float64),s,tf.float64)
result = tf.matmul(new_T,tf.cast(s_factor,new_T.dtype))
grads = tf.gradients(ys=result,xs=T)
with tf.Session() as sess:
print(sess.run(result))
print(sess.run(grads))
# print
[[[1. 2. 3.]
[4. 5. 6.]]
[[3. 0. 0.]
[6. 0. 0.]]
[[0. 1. 2.]
[0. 4. 5.]]]
[array([[2., 2., 2.],
[2., 2., 2.]], dtype=float32)]
Sorry for the inaccurate title. Here is the detail description of the problem: assume a tensor of shape (?, 2), e.g., a tensor T [[0,1], [0,2], [0,0], [1, 4], [1, 3], [2,0], [2,0], [2,0],[2,0]]. How to count how many zero showing up for every T[:, 0]. For the example above, because there is [0,0] and [2,0], the answer is 2.
More examples:
[[0,1], [0,2], [0,1], [1, 4], [1, 3], [2,0], [2,0], [2,0],[2,0]] (Answer: 1, because of [2,0])
[[0,1], [0,2], [0,1], [1, 4], [1, 3], [2,0], [2,0], [2,0],[2,0],[3,0]] (Answer: 2, because of [2, 0] and [3,0])
If I got what you are looking for, the question is how many unique "[X, 0]-pairs" you have in the data. If so, this should do it:
import tensorflow as tf
x = tf.placeholder(shape=(None, 2), dtype=tf.int32)
indices = tf.where(tf.equal(x[:,1], tf.constant(0, dtype=tf.int32)))
unique_values, _ = tf.unique(tf.squeeze(tf.gather(x[:, 0], indices)))
no_unique_values = tf.shape(unique_values, out_type=tf.int32)
data = [ .... ]
with tf.Session() as sess:
no_unique = sess.run(fetches=[no_unique_values], feed_dict={x: data})
Here is a solution I got myself
def get_unique(ts):
ts_part = ts[:, 1]
where = tf.where(tf.equal(0, ts_part))
gather_nd = tf.gather_nd(ts, where)
gather_plus = gather_nd[:, 0] + gather_nd[:, 1]
unique_values, _ = tf.unique(gather_plus)
return tf.shape(unique_values)[0]
For example, there is a tensor
a=[[1,2,3,4,5],
[2,3,4,5,6]]
indices =[[1, 0, 1, 0, 0],
[0, 1, 0, 0, 0]]
I would only like to use activation on the elements (from a) whose index are with value 1 (from b). For example, I only want to use activation function on the elements from a with the indices [0,0], [0,2], [1,1] .
Thanks!
You can use tf.where:
tf.where(tf.cast(indices, dtype=tf.bool), tf.nn.sigmoid(a), a)
For your example:
import tensorflow as tf
a = tf.constant([[1,2,3,4,5], [2,3,4,5,6]], dtype=tf.float32)
indices = tf.constant([[1, 0, 1, 0, 0], [0, 1, 0, 0, 0]],
dtype = tf.int32)
result = tf.where(tf.cast(indices, dtype=tf.bool), tf.nn.sigmoid(a), a)
with tf.Session() as sess:
print(sess.run(result))
This prints:
[[ 0.7310586 2. 0.95257413 4. 5. ]
[ 2. 0.95257413 4. 5. 6 ]]