optimize variable with dynamic shape - tensorflow

I want to use a variable where the shape is unknown in advance and it will change from time to time (although ndim is known and fixed).
I declare it like:
initializer = tf.random_uniform_initializer()
shape = (s0, s1, s2) # these are symbolic vars
foo_var = tf.Variable(initializer(shape=shape), name="foo", validate_shape=False)
This seems to work when I create the computation graph up to the point where I want to optimize w.r.t. this variable, i.e.:
optimizer = tf.train.AdamOptimizer(learning_rate=0.1, epsilon=1e-4)
optim = optimizer.minimize(loss, var_list=[foo_var])
That fails in the optimizer in some function create_zeros_slot where it seems to depend on the static shape information (it uses primary.get_shape().as_list()). (I reported this upstream here.)
So, using the optimizer works only with variables with static shape?
I.e. for every change of the shape of the variable, I need to rebuild the computation graph?
Or is there any way to avoid the recreation?

What you are doing does not make any sense. How would you optimize the values of a dynamic variable if its shape is changing? Sometimes a value would be there and sometimes it would not. When you go to save the graph which shape would the variable be in? The adam optimizer also needs to store information about each parameter in a variable between executions which it can not do without knowing the size.
Now if what you are looking to do is only use part of the variable at a time you can take slices of it and use them. This will work fine so long as the variable has a fixed shape of the maximum bounds of the slice. For instance...
initializer = tf.random_uniform_initializer()
shape = (S0, S1, S2) # these are now constants for the max bounds of si
foo_var = tf.Variable(initializer(shape=shape), name="foo")
foo_var_sub = foo_var[:s0, :s1, :s2]
In this case the optimizer will only act on the parts of foo_var which contributed to the slice.

My solution at the moment is a bit ugly but works.
def _tf_create_slot_var(primary, val, scope):
"""Helper function for creating a slot variable."""
from tensorflow.python.ops import variables
slot = variables.Variable(val, name=scope, trainable=False, validate_shape=primary.get_shape().is_fully_defined())
# pylint: disable=protected-access
if isinstance(primary, variables.Variable) and primary._save_slice_info:
# Primary is a partitioned variable, so we need to also indicate that
# the slot is a partitioned variable. Slots have the same partitioning
# as their primaries.
real_slot_name = scope[len(primary.op.name + "/"):-1]
slice_info = primary._save_slice_info
slot._set_save_slice_info(variables.Variable.SaveSliceInfo(
slice_info.full_name + "/" + real_slot_name,
slice_info.full_shape[:],
slice_info.var_offset[:],
slice_info.var_shape[:]))
# pylint: enable=protected-access
return slot
def _tf_create_zeros_slot(primary, name, dtype=None, colocate_with_primary=True):
"""Create a slot initialized to 0 with same shape as the primary object.
Args:
primary: The primary `Variable` or `Tensor`.
name: Name to use for the slot variable.
dtype: Type of the slot variable. Defaults to the type of `primary`.
colocate_with_primary: Boolean. If True the slot is located
on the same device as `primary`.
Returns:
A `Variable` object.
"""
if dtype is None:
dtype = primary.dtype
from tensorflow.python.ops import array_ops
val = array_ops.zeros(
primary.get_shape().as_list() if primary.get_shape().is_fully_defined() else tf.shape(primary),
dtype=dtype)
from tensorflow.python.training import slot_creator
return slot_creator.create_slot(primary, val, name, colocate_with_primary=colocate_with_primary)
def monkey_patch_tf_slot_creator():
"""
The TensorFlow optimizers cannot handle variables with unknown shape.
We hack this.
"""
from tensorflow.python.training import slot_creator
slot_creator._create_slot_var = _tf_create_slot_var
slot_creator.create_zeros_slot = _tf_create_zeros_slot
Then I need to call monkey_patch_tf_slot_creator() at some point.

Related

In OpenMDAO's ExecComp, is shape_by_conn compatible with has_diag_partials?

I have an om.ExecComp that performs a simple operation:
"d_sq = x**2 + y**2"
where x, y, and d_sq are always 1D np.arrays. I'd like to be able to use this with large arrays without allocating a large dense matrix. I'd also like the length of the array to be configured based on the shape of the connections.
However, if I specify x={"shape_by_conn": True} rather than x={"shape":100000}, even if I also have has_diag_partials=True, it attempts to allocate a 100000^2 array. Is there a way to make these two options compatible?
First, I'll note that you're using ExecComp a bit outside its design intended purpose. Thats not to say that you're something totally invalid, but generally speaking ExecComp was designed for small, cheap calculations. Passing it giant arrays is not something we test for.
That being said, I think what you want will work. When you use shape_by_conn in this you need to be sure to size both your inputs and outputs. I've provided an example, along with a manually defined component that does the same thing. Since your equations are pretty simple, this would be a little faster overall.
import numpy as np
import openmdao.api as om
class SparseCalc(om.ExplicitComponent):
def setup(self):
self.add_input('x', shape_by_conn=True)
self.add_input('y', shape_by_conn=True)
self.add_output('d_sq', shape_by_conn=True, copy_shape='x')
def setup_partials(self):
# when using shape_by_conn, you need to delcare partials
# in this secondary method
md = self.get_io_metadata(iotypes='input')
# everything should be the same shape, so just need this one
x_shape = md['x']['shape']
row_col = np.arange(x_shape[0])
self.declare_partials('d_sq', 'x', rows=row_col, cols=row_col)
self.declare_partials('d_sq', 'y', rows=row_col, cols=row_col)
def compute(self, inputs, outputs):
outputs['d_sq'] = inputs['x']**2 + inputs['y']**2
def compute_partials(self, inputs, J):
J['d_sq', 'x'] = 2*inputs['x']
J['d_sq', 'y'] = 2*inputs['y']
if __name__ == "__main__":
p = om.Problem()
# use IVC here, because you have to have something connected to
# in order to use shape_by_conn. Normally IVC is not needed
ivc = p.model.add_subsystem('ivc', om.IndepVarComp(), promotes=['*'])
ivc.add_output('x', 3*np.ones(10))
ivc.add_output('y', 2*np.ones(10))
# p.model.add_subsystem('sparse_calc', SparseCalc(), promotes=['*'])
p.model.add_subsystem('sparse_exec_calc',
om.ExecComp('d_sq = x**2 + y**2',
x={'shape_by_conn':True},
y={'shape_by_conn':True},
d_sq={'shape_by_conn':True,
'copy_shape':'x'},
has_diag_partials=True),
promotes=['*'])
p.setup(force_alloc_complex=True)
p.run_model()
If you still find this isn't working as expected, please feel free to submit a bug report with a test case that shows the problem clearly (i.e. will raise the error you're seeing). In this case, the provided manual component can serve as a workaround.

How to keep vectors in a dictionary in tensorflow?

It seems that tf.lookup.experimental.DenseHashTable cannot hold vectors and I could not find examples of how to use it.
Below you can find a simple implementation of dictionary of vectors in Tensorflow. It is also an example of usage of tf.lookup.experimental.DenseHashTable and tf.TensorArray.
As said, vectors cannot be kept in tf.lookup.experimental.DenseHashTable, and therefore tf.TensorArray is used to keep the actual vectors.
Of course, this is a simple example, and it does not include deletion of entries in the dictionary - an operation that will require some management of the free cells of the array. Also, you should read in the respective API pages of tf.lookup.experimental.DenseHashTable and tf.TensorArray how to tune them for your needs.
import tensorflow as tf
class DictionaryOfVectors:
def __init__(self, dtype):
empty_key = tf.constant('')
deleted_key = tf.constant('deleted')
self.ht = tf.lookup.experimental.DenseHashTable(key_dtype=tf.string,
value_dtype=tf.int32,
default_value=-1,
empty_key=empty_key,
deleted_key=deleted_key)
self.ta = tf.TensorArray(dtype, size=0, dynamic_size=True, clear_after_read=False)
self.inserts_counter = 0
#tf.function
def insertOrAssign(self, key, vec):
# Insert the vector to the TensorArray. The write() method returns a new
# TensorArray object with flow that ensures the write occurs. It should be
# used for subsequent operations.
with tf.init_scope():
self.ta = self.ta.write(self.inserts_counter, vec)
# Insert the same counter value to the hash table
self.ht.insert_or_assign(key, self.inserts_counter)
self.inserts_counter += 1
#tf.function
def lookup(self, key):
with tf.init_scope():
index = self.ht.lookup(key)
return self.ta.read(index)
dictionary_of_vectors = DictionaryOfVectors(dtype=tf.float32)
dictionary_of_vectors.insertOrAssign('first', [1,2,3,4,5])
print(dictionary_of_vectors.lookup('first'))
The example is a bit more sophisticated, as the insert and lookup methods are decorated with #tf.function. Because the methods change variables defined outside of them, the tf.init_scope() is used. You might ask what is changed in the lookup() method as it actually only reads from the hash table and the array. The reason is that in graph mode, the index that is returned from the lookup() call is a Tensor, and in the TensorArray implementation there is a line containing if index < 0: which fails with:
OperatorNotAllowedInGraphError: using a tf.Tensor as a Python bool is not allowed.
When we use the tf.init_scope(), as explained in its API documentation, "code inside an init_scope block runs with eager execution enabled even when tracing a tf.function". So in that case that index is not a Tensor but as scalar.

Select variable_scope dynamically at runtime

I want to change the variable_scope by the value of some tensors. For an easy example, I defined a very simple code like this:
import tensorflow as tf
def calculate_variable(scope):
with tf.variable_scope(scope or type(self).__name__, reuse=tf.AUTO_REUSE):
w = tf.get_variable('ww', shape=[5], initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))
return w
w = calculate_variable('in_first')
w1 = calculate_variable('in_second')
The function is very simple. It just returns value defined in a certain variable scope. Now, 'w' and 'w1' would have different values.
What I want to do is to select this variable scope by some condition of tensors. Assuming I have two tensors x, y, if their value is same, I want to get value from the function above with certain variable scope.
x = tf.constant(3)
y = tf.constant(3)
condi = tf.cond(tf.equal(x, y), lambda: 'in_first', lambda: 'in_second')
w_cond = calculate_variable(condi)
I tried many other methods and searched the Internet. However, whenever I want to determine variable_scope from condition of tensors in a similar way to this example, it shows an error.
TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed. Use `if t is not None:` instead of `if t:` to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
Is there any good workaround?
The way you stated it, this is not possible. variable_scope class explicitly checks that name_or_scope argument is either a string or a VariableScope instance:
...
if not isinstance(self._name_or_scope,
(VariableScope,) + six.string_types):
raise TypeError("VariableScope: name_or_scope must be a string or "
"VariableScope.")
It does not accept a Tensor. This is reasonable, because variable scope is part of graph definition, one can't define variables dynamically.
The closest supported expression is this:
x = tf.constant(3)
y = tf.constant(3)
w_cond = tf.cond(tf.equal(x, y),
lambda: calculate_variable('in_first'),
lambda: calculate_variable('in_second'))
... where you can select any of the two variables at runtime.

What's the difference between `tf.random_normal` and `tf.random_normal_initializer`?

Using the two functions seems to get the same result.
t4 = tf.get_variable('t4', initializer=tf.random_normal((2,), seed=0))
t5 = tf.get_variable('t5', shape=(2,), initializer=tf.random_normal_initializer(seed=0))
And I find inside the random_normal_initializer() also use the random_normal().
I indistinctly realize the difference between them. The random_normal will return a constant tensor, but the random_normal_initializer will return value after init.
I want to know more about how to use of these two functions at the right time.
Does it use random_normal to init a variable will actually init twice(after init the variable)? In other words, if there performance problems between them.
Maxim's answer to this question is excellent, but I want to answer a slightly more simple question (with a few examples) that the OP might be asking:
Most basic answer: tf.random_normal is a Tensor; buttf.random_normal_initializer is a RandomNormal, not a Tensor. I think simple code best clarifies the difference between these two:
# Simple examples to clarify tf.random_normal from tf.random_normal_initializer
tf.reset_default_graph()
# OP's code
t4 = tf.get_variable('t4', initializer=tf.random_normal((2,), seed=0))
t5 = tf.get_variable('t5', shape=(2,), initializer=tf.random_normal_initializer(seed=0))
# clarifying Tensor vs Initializer outside the context of get_variable.
t6 = tf.random_normal((2,),seed=0)
t7 = tf.random_normal_initializer(seed=0)
# types
print(type(t6)) # <class 'tensorflow.python.framework.ops.Tensor'>
print(type(t7)) # <class 'tensorflow.python.ops.init_ops.RandomNormal'>
# run the graph...
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# OP's code
print(sess.run(t4)) #[-0.39915761 2.10443926]
print(sess.run(t5)) #[-0.39915761 2.10443926]
# tf.random_normal is a Tensor
print(sess.run(t6)) #[-0.39915761 2.10443926]
# tf.random_normal_initializer returns a tf.RandomNormal, not a Tensor or Op, so can't be sess.run()!
try:
print(sess.run(t7)) # Exception!
except:
print("Exception!")
# But notice that you don't need to initialize an initializer, just a variable.
t8 = tf.random_normal_initializer(seed=0)
t9 = tf.get_variable('t9',shape=(2,), initializer=t8)
sess.run(t9.initializer) # still need to initialize the variable
print(sess.run(t9)) #[-0.39915761 2.10443926]
In your setting: Now, as far as the code you are calling goes, there is no real difference; the initializer keyword is overloaded to accept both and will behave as Maxim indicates. From the tf/ops/variable_scope docs:
if initializer is None:
init, initializing_from_value = self._get_default_initializer(
name=name, shape=shape, dtype=dtype)
if initializing_from_value:
init_shape = None
else:
init_shape = var_shape
elif callable(initializer):
init = initializer
init_shape = var_shape
elif isinstance(initializer, ops.Tensor):
init = array_ops.slice(initializer, var_offset, var_shape)
# Use the dtype of the given tensor.
dtype = init.dtype.base_dtype
init_shape = None
else:
init = ops.convert_to_tensor(initializer, dtype=dtype)
init = array_ops.slice(init, var_offset, var_shape)
init_shape = None
tf.random_normal returns a tensor of the specified shape filled with random normal values. In addition, it creates a number of under-the-hood ops to compute the value:
random_normal/shape
random_normal/mean
random_normal/stddev
random_normal/RandomStandardNormal
random_normal/mul
At runtime, consecutive evaluations of this tensor produce a new value, but not other nodes are added.
tf.random_normal_initializer is an Initializer instance, which invokes tf.random_normal upon calling. So there is no big difference between tf.random_normal_initializer and tf.random_normal. Even if you call the init twice, neither of those will add new nodes to the graph. But both add 6 additional nodes in compilation.
Another alternative (that may be even more efficient in some cases) is initialization with numpy.random.normal array, like this:
t1 = tf.Variable(name='t1', initial_value=np.random.normal(size=(2,)))
This way no random_normal nodes are added to the graph, neither in compilation or in runtime.
UPD: tensorflow adds the const op .../initial_value in this case and the whole numpy array is going to be present in the graph, which may be a problem if the array is large.

Accessing learned weights of a DNN in CNTK

How can one access to the learned weights of a DNN saved as following:
lstm_network_output.save(model_path)
The weights/parameters of a network can be accessed by calling ‘lstm_network_output.parameters’ which returns a list of ‘Parameter’ variable objects. The value of a Parameter can be obtained using ‘value’ property of the Parameter object in the form of a numpy array. The value of the Parameter can be updated by ‘.value = ’.
If you used name= properties in creating your model, you can also identify layers by name. For example:
model = Sequential([Embedding(300, name='embed'), Recurrence(LSTM(500)), Dense(10)])
E = model.embed.E # accesses the embedding matrix of the embed layer
To know that the parameter is .E, please consult the docstring of the respective function (e.g. help(Embedding)). (In Dense and Convolution, the parameters would be .W and .b.)
The pattern above is for named layers, which are created using as_block(). You can also name intermediate variables, and access them in the same way. E.g.:
W = Parameter((13,42), init=0, name='W')
x = Input(13)
y = times(x, W, name='times1')
W_recovered = y.times1.W
# e.g. check the shape to see that they are the same
W_recovered.shape # --> (13, 42)
W.shape # --> (13, 42)
Technically, this will search all parameters that feed y. In case of a more complex network, you may end up having multiple parameters of the same name. Then an error will be thrown due to the ambiguity. In that case, you must work the .parameters tuple mentioned in Anna's response.
This python code worked for me to visualize some weights:
import numpy as np
import cntk as C
dnnFile = C.cntk_py.Function.load('Models\ConvNet_MNIST_5.dnn') # load model from MS example
layer8 = dnnFile.parameters()[8].value()
filter_num = 0
sliced = layer8.asarray()[ filter_num ][ 0 ] # shows filter works on input image
print(sliced)