How do you create a dynamic_rnn with dynamic "zero_state" (Fails with Inference) - tensorflow

I have been working with the "dynamic_rnn" to create a model.
The model is based upon a 80 time period signal, and I want to zero the "initial_state" before each run so I have setup the following code fragment to accomplish this:
state = cell_L1.zero_state(self.BatchSize,Xinputs.dtype)
outputs, outState = rnn.dynamic_rnn(cell_L1,Xinputs,initial_state=state, dtype=tf.float32)
This works great for the training process. The problem is once I go to the inference, where my BatchSize = 1, I get an error back as the rnn "state" doesn't match the new Xinputs shape. So what I figured is I need to make "self.BatchSize" based upon the input batch size rather than hard code it. I tried many different approaches, and none of them have worked. I would rather not pass a bunch of zeros through the feed_dict as it is a constant based upon the batch size.
Here are some of my attempts. They all generally fail since the input size is unknown upon building the graph:
state = cell_L1.zero_state(Xinputs.get_shape()[0],Xinputs.dtype)
.....
state = tf.zeros([Xinputs.get_shape()[0], self.state_size], Xinputs.dtype, name="RnnInitializer")
Another approach, thinking the initializer might not get called until run-time, but still failed at graph build:
init = lambda shape, dtype: np.zeros(*shape)
state = tf.get_variable("state", shape=[Xinputs.get_shape()[0], self.state_size],initializer=init)
Is there a way to get this constant initial state to be created dynamically or do I need to reset it through the feed_dict with tensor-serving code? Is there a clever way to do this only once within the graph maybe with an tf.Variable.assign?

The solution to the problem was how to obtain the "batch_size" such that the variable is not hard coded.
This was the correct approach from the given example:
Xinputs = tf.placeholder(tf.int32, (None, self.sequence_size, self.num_params), name="input")
state = cell_L1.zero_state(Xinputs.get_shape()[0],Xinputs.dtype)
The problem is the use of "get_shape()[0]", this returns the "shape" of the tensor and takes the batch_size value at [0]. The documentation doesn't seem to be that clear, but this appears to be a constant value so when you load the graph into an inference, this value is still hard coded (maybe only evaluated at graph creation?).
Using the "tf.shape()" function, seems to do the trick. This doesn't return the shape, but a tensor. So this seems to be updated more at run-time. Using this code fragment solved the problem of a training batch of 128 and then loading the graph into TensorFlow-Service inference handling a batch of just 1.
Xinputs = tf.placeholder(tf.int32, (None, self.sequence_size, self.num_params), name="input")
batch_size = tf.shape(Xinputs)[0]
state = self.cell_L1.zero_state(batch_size,Xinputs.dtype)
Here is a good link to TensorFlow FAQ which describes this approach 'How do I build a graph that works with variable batch sizes?':
https://www.tensorflow.org/resources/faq

Related

How do I explicitly split a Dataset tuple in Tensorflow's functional API into two separate layers from just one input layer?

Context
The input to my model is a BatchDataset object called dataset_train, and it is batched to yield (training_data, label).
For some of the machinery in my model, I need to be able to split the Dataset tuple inside the model and independently access both the data and the label. This is a single input model with multiple outputs, so I am using Tensorflow's Functional API. For the sake of reproducibility, I am working with timeseries, so a toy dataset would look like this:
time = np.arange(1000)
data = np.random.randn(1000)
label = np.random.randn(1000)
training_data = np.zeros(shape=(time.size,2))
training_data[:,0] = time
training_data[:,1] = data
dataset_train = tf.keras.utils.timeseries_dataset_from_array(
data = training_data,
targets = label,
batch_size = batch_size,
sequence_length = sequence_length,
sequence_stride = 1,
)
Note: Sequence Length and batch_size are additional semi-arbitrary hyperparameters that are not important for the purposes of this question.
Question
How do I split apart the Dataset in Tensorflow's Functional API into the training data element and the label element?
Here is pseudocode of what I am looking for:
input = Single Input Layer that defines something capable of accepting dataset_train
training_data = input.element_spec[0]
label = input.element_spec[1]
After that point, my model can perform it's actions on training_data and label independently.
First Solution I tried:
I first started by trying to define two input layers and pass each element of the dataset tuple to each input layer, and the act on each input layer independently.
training_data = tf.keras.Input(shape=(sequence_length,2))
label = tf.keras.Input(shape = sequence_length)
#model machinery
model = tf.keras.Model(
inputs = [training_data, label],
outputs = [output_1, output_2]
)
#model machinery
history = model.fit(dataset_train, epochs = 500)
The first problem I had with this is that I got the following error:
ValueError: Layer "model_5" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, 2) dtype=float64>]
This is a problem, because if I actually pass the model a dictionary of datasets (nevermind that this isn't supported) then I introduce a circular dependency where in order to use model.predict, it expects labels for the inputs to model.predict. In other words, I need the answers to get the answers. Because I need to pass it only a single Dataset to prevent introducing this circular dependency (tensorflow implicitly assumes that the second element in a Dataset is the label, and doesn't require Datasets with labels for model.predict), I decided to abandon this strategy for unpacking the Input layer directly within the functional API for the model.
Second Solution I tried:
I thought maybe I could unpack the Dataset using the .get_single_element() method in the following code excerpt
input = tf.keras.Input(shape = (sequence_length, 2))
training_dataset, label = input.get_single_element()
This gave the following error:
AttributeError: 'KerasTensor' object has no attribute 'get_single_element'
I then thought the problem was that because the symbolic tensor wasn't of type Dataset, I needed to define the input layer to expect a Dataset. After reading through the documentation and spending ~9 hours messing around, I realized that tf.keras.Input takes an argument called type_spec, which allows the user to specify exactly the type of symbolic tensor to create (I think - I'm still a little shaky on understanding exactly what's going on and I'm more than a little sleep deprived, which isn't helping). As it turns out there's a way to generate the type_spec from the dataset itself, so I did that to make sure that I wasn't making a mistake in generating it.
input = tf.keras.Input(tensor = dataset_train)
training_dataset, label = input.get_single_element()
Which gives the following error:
AttributeError: 'BatchDataset' object has no attribute 'dtype'
I'm not really sure why I get this error, but I tried to circumvent it by explicitly defining the type_spec in the Input layer
input = tf.keras.Input(type_spec: tf.data.DatasetSpec.from_value(dataset_train))
training_dataset, label = input.get_single_element()
Which gives the following error:
ValueError: KerasTensor only supports TypeSpecs that have a shape field; got DatasetSpec, which does not have a shape.
I also had tried to make the DatasetSpec manually instead of generating it using .from_value earlier and had gotten the same error. I thought then it was just because I was messing it up, but now that I've gotten this error from .from_value, I'm beginning to suspect that this line of solutions won't work because DatasetSpec implicitly is missing a shape. I might also be confused, because performing dataset_train.element_spec clearly reveals that the dataset does have a shape, so I'm not sure why Tensorflow can't infer from it.
Any help in furthering either of those non-functional solutions so that I can explicitly access the training_data and label separately from an input Dataset inside the Functional API would be much appreciated!

Using tf.train.Saver() on convolutional layers in tensorflow

I'm attempting to use tf.train.Saver() to apply transfer learning between two convolutional neural network graphs in tensorflow and I'd like to validate that my methods are working as expected. Is there a way to inspect the trainable features in a tf.layers.conv2d() layer?
My methods
1. initialize layer
conv1 = tf.layers.conv2d(inputs=X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize,
strides=conv1_stride, padding=conv1_pad,
activation=tf.nn.relu,
kernel_initializer=tf.contrib.layers.xavier_initializer(),
bias_initializer=tf.zeros_initializer(), trainable=True,
name="conv1")
2. {Train the network}
3. Save current graph
tf.train.Saver().save(sess, "./my_model_final.ckpt")
4. Build new graph that includes the same layer, load specified weights with Saver()
reuse_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="conv[1]")
reuse_vars_dict = dict([(var.op.name, var) for var in reuse_vars])
restore_saver = tf.train.Saver(reuse_vars_dict)
...
restore_saver.restore(sess, "./my_model_final.ckpt")
5. {Train and evaluate the new graph}
My Question:
1) My code works 'as expected' and without error, but I'm not 100% confident it's working like I think it is. Is there a way to print the trainable features from a layer to ensure that I'm loading and saving weights correctly? Is there a "better" way to save/load parameters with the tf.layers API? I noticed a request on GitHub related to this. Ideally, I'd like to check these values on the first graph a) after initialization b) after training and on the new graph i) after loading the weights ii) after training/evaluation.
Is there a way to print the trainable features from a layer to ensure that I'm loading and saving weights correctly?
Yes, you first need to get a handle on the layer's variables. There are several ways to do that, but arguably the simplest is using the get_collection() function:
conv1_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,
scope="conv1")
Note that the scope here is treated as a regular expression, so you can write things like conv[123] if you want all variables from scopes conv1, conv2 and conv3.
If you just want trainable variables, you can replace GLOBAL_VARIABLES with TRAINABLE_VARIABLES.
If you just want to check a single variable, such as the layer's kernel, then you can use get_tensor_by_name() like this:
graph = tf.get_default_graph()
kernel_var = graph.get_tensor_by_name("conv1/kernel:0")
Yet another option is to just iterate on all variables and filter based on their names:
conv1_vars = [var for var in tf.global_variables()
if var.op.name.startswith("conv1/")]
Once you have a handle on these variables, you can just evaluate them at different points, e.g. just after initialization, just after restoring the graph, just after training, and so on, and compare the values. For example, this is how you would get the values just after initialization:
with tf.Session() as sess:
init.run()
conv1_var_values_after_init = sess.run(conv1_vars)
Then once you have captured the variable values at the various points that you are interested in, you can check whether or not they are equal (or close enough, taking into account tiny floating point imprecisions) like so:
same = np.allclose(conv1_var_values_after_training,
conv1_var_values_after_restore)
Is there a "better" way to save/load parameters with the tf.layers API?
Not that I'm aware of. The feature request you point to is not really about saving/loading the parameters to disk, but rather to be able to easily get a handle on a layer's variables, and to easily create an assignment node to set their values.
For example, it will be possible (in TF 1.4) to get a handle on a layer's kernel and get its value very simply, like this:
conv1_kernel_value = conv1.kernel.eval()
Of course, you can use this to get/set a variable's value and load/save it to disk, like this:
conv1 = tf.layers.conv2d(...)
new_kernel = tf.placeholder(...)
assign_kernel = conv1.kernel.assign(new_kernel)
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
loaded_kernel = my_function_to_load_kernel_value_from_disk(...)
assign_kernel.run(feed_dict={new_kernel: loaded_kernel})
...
It's not pretty. It might be useful if you want to load/save to a database (instead of a flat file), but in general I would recommend using a Saver.
I hope this helps.

Using tf.cond() to feed my graph for training and validation

In my TensorFlow code I want my network to take inputs from one of the two StagingArea objects depending upon whether I want to do training or testing.
A part of the graph construction code I wrote is as follows :
with tf.device("/gpu:0"):
for i in range(numgpus):
with tf.variable_scope(tf.get_variable_scope(), reuse=i>0) as vscope:
with tf.device('/gpu:{}'.format(i)):
with tf.name_scope('GPU-Tower-{}'.format(i)) as scope:
phase = tf.get_variable("phase", [], initializer=tf.zeros_initializer(),dtype=tf.uint8, trainable=False)
phaseassigntest = phase.assign(1)
phaseassigntrain = phase.assign(0)
phasetest = tf.equal(phase, 0)
is_training = tf.cond(phasetest, lambda: tf.constant(True), lambda: tf.constant(False))
trainstagingarea = tf.contrib.staging.StagingArea([tf.float32, tf.int32], shapes=[[trainbatchsize, 3, 221, 221], [trainbatchsize]], capacity=20)
putoptrain = trainstagingarea.put(train_iterator.get_next())
trainputop.append(putoptrain)
getoptrain = trainstagingarea.get()
traingetop.append(getoptrain)
trainclearop = trainstagingarea.clear()
trainstageclear.append(trainclearop)
trainsizeop = trainstagingarea.size()
trainstagesize.append(trainsizeop)
valstagingarea = tf.contrib.staging.StagingArea([tf.float32, tf.int32], shapes=[[valbatchsize, 3, 221, 221], [valbatchsize]], capacity=20)
putopval = valstagingarea.put(val_iterator.get_next())
valputop.append(putopval)
getopval = valstagingarea.get()
valgetop.append(getopval)
valclearop = valstagingarea.clear()
valstageclear.append(valclearop)
valsizeop = valstagingarea.size()
valstagesize.append(valsizeop)
#elem = valgetop[i]
elem = tf.cond(is_training,lambda: traingetop[i],lambda: valgetop[i])
img = elem[0]
label = elem[1]
labelonehot = tf.one_hot(label, depth=numclasses)
net, networksummaries = overfeataccurate(img,numclasses=numclasses, phase=is_training)
I have used tf.cond to make sure that the network is fed by one of the two StagingArea objects. One is meant for training and the other one is meant for validation.
Now, when I try to execute the graph as follows, I do not get any result and infact the code just hangs and I have to kill the process.
with tf.Session(graph=g,config=config) as sess:
sess.run(init_op)
sess.run(tf.local_variables_initializer())
sess.run(val_initialize)
for i in range(20):
sess.run(valputop)
print(sess.run(valstagesize))
writer = tf.summary.FileWriter('.', graph=tf.get_default_graph())
epoch = 0
iter = 0
print("Performing Validation")
sess.run(phaseassigntest)
saver = tf.train.Saver()
while(epoch<10):
time_init = time.time()
while True:
try:
[val_accu, _, summaries] = sess.run([towervalidation, towervalidationupdateop,validation_summary_op])
print(val_accu)
when instead of tf.cond() I directly assign elem = valgetop[i], the code works just fine.
Am I missing something over here ?
What is the right way to feed my network based on whether I want to do training or testing ?
NOTE The error does not go away even if I set numgpus to 1.
Your problem
What you think tf.cond does
Based on the flag, execute what is required to put either traingetop[i] or valgetop[i] into your elem tensor.
What tf.cond actually does
Executes what is required to get both traingetop[i] and valgetop[i], then passes one of them into your elem tensor.
So
The reason it is hanging forever is because it's waiting for an element to be added to your training staging area (so that it can get that element and discard it). You're forgiven for not realising this is what it's doing; it's actually very counter-intuitive. The documentation is awfully unclear on how to deal with this.
Recommended Solution (by Tensorflow documentation)
If you really need the queues to be in the same graph, then you need to make two copies of your ENTIRE graph, one that is fed by your training staging area, and one that is fed by your validation staging area. Then you just use the relevant tensor in your sess.run call. I recommend creating a function that takes a queue output tensor, and returns a model_output tensor. Now you have a train_time_output tensor and a validation_time_output tensor, and you can choose which one you want to execute in your sess.run.
Warning
You need to make sure that you aren't actually creating new variables to go along with these new ops. To do that take a look at the latest documentation on variables. It looks like they've simplified it from v0.12, and it essentially boils down to using tf.get_variable instead of tf.Variable to create your variables.
My preferred work around
Although that is the recommended solution (AFAIK), it is extremely unsatisfying to me; you're creating a whole other set of operations on the graph that just happen to use the same weights. It seems like there's a lot of potential for programmer error by abusing the separation between train time and test/validation time (resulting in the model acting unexpectedly different at these times). Worse; it doesn't solve the problem of tf.cond demanding the values for inputs to both branches, it just forces you to copy your whole graph, which is not always possible.
I prefer to just not have my queues in the graph like that, and treat the model as a function which can be fed an example without caring where it's from. That is, I would instantiate the model with a tf.placeholder as the input, and at execution time I would use feed_dict to actually provide the value. It would function something like this
#inside main training loop
if time_to_train:
example = sess.run(traingettop)
else:
example = sess.run(valgettop)
result = sess.run(model_output, {input_placeholder: example})
It's very useful to note that you can use the feed_dict to feed any value for any tensor anywhere in your model. So, you can change any model definition that, due to tf.cond would always require an input, like:
a = tf.constant(some_value)
b = tf.placeholder(tf.float32)
flag = tf.placeholder(tf.bool, [])
one_of_them = tf.cond(flag, a, b)
model_output = build_graph(one_of_them)
Into a definition that doesn't, like:
a = tf.constant(some_value)
model_output = build_graph(a)
Remembering that you can always overwrite what a is at execution time:
# In main training loop,
sess.run(train_op, {a: some_other_value})
This essentially pushes the conditional into native python land. In your code you might end up with something like:
if condition_satisfied:
sess.run(train_op, {a:some_other_value})
else:
sess.run(train_op)
Performance concerns
If you are using tensorflow on a single machine, then there is practically no performance cost to this solution, as the numpy array/s put into the example python variable are actually still stored on the GPU.
If you are using tensorflow in a distributed fashion, then this solution would kill your performance; it would require sending the example from whatever machine it's on to the master so that it can send it back.

How to initialize a keras tensor employed in an API model

I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.

Keras - coding a custom optimizer and attempting to compute a second gradient inside get_updates

I am a researcher in optimization and I trying to write a custom optimizer. I have come across a problem. I have asked in many places and so far no response.
Take any optimizer code, say just copy SGD. In the beginning of get_updates, you see
grads = self.get_gradients(loss, params)
now add the following line right after this one:
gradsb = self.get_gradients(loss, [tf.Variable(a) for a in params])
this should compute the gradients at a new tensor, with all the values the same as before
now try to see what you get:
for a in gradsb:
print(a)
you get a list of Nones (but if you print the list grads you see that they are still Tensors)
Why?
And how to circumvent this problem? This is important as I'd like to compute the gradients at another point for my algorithm.
When you write gradsb = self.get_gradients(loss, [tf.Variable(a) for a in params]) you are defining a new tf.Variable for each a in params. Because the loss does not depend on these new variables, your gradients are None.
If you want to compute a second gradient you need to make sure that you're computing it with respect to Tensors that the objective does depend on.
Apparently even replacing the current vector of parameters is not OK!! If I type this in the code:
grads = self.get_gradients(loss, params)
tempparam = [tf.Variable(a) for a in params]
params = [tf.add(a,a) for a in params]
gradsn = self.get_gradients(loss, params)
for a in gradsn:
print(a)
params = [tf.Variable(a) for a in tempparam]
The result is still that None is printed!!
I know you understand what I am trying to do, at each iteration of get_updates, I would like to compute the gradients at a (slightly) different value of the parameter tensors, and use that to construct the update to the parameters for optimization and training. Is there any way to do this within the keras package?