Loop in tensorflow - tensorflow

I changed my question to explain my issue better:
I have a function: output_image = my_dunc(x) that x should be like (1, 4, 4, 1)
Please help me to fix the error in this part:
out = tf.Variable(tf.zeros([1, 4, 4, 3]))
index = tf.constant(0)
def condition(index):
return tf.less(index, tf.subtract(tf.shape(x)[3], 1))
def body(index):
out[:, :, :, index].assign(my_func(x[:, :, :, index]))
return tf.add(index, 1), out
out = tf.while_loop(condition, body, [index])
ValueError: The two structures don't have the same nested structure.
First structure: type=list str=[]
Second structure: type=list str=[<tf.Tensor 'while_10/Add_3:0' shape=() dtype=int32>, <tf.Variable 'Variable_2:0' shape=(1, 4, 4, 3) dtype=float32_ref>]
More specifically: The two structures don't have the same number of elements. First structure: type=list str=[<tf.Tensor 'while_10/Identity:0' shape=() dtype=int32>]. Second structure: type=list str=[<tf.Tensor 'while_10/Add_3:0' shape=() dtype=int32>, <tf.Variable 'Variable_2:0' shape=(1, 4, 4, 3) dtype=float32_ref>]
I tested my code and I can get result from out = my_func(x[:, :, :, i]) with different values for i and also while_loop works when I comment the line out[:, :, :, index].assign(my_func(x[:, :, :, index])). Something is wrong in that line.

I understand that there is no for-loop and so on and just while, why?
Control structures are hard to get right and hard to optimize. In your case, what if the next example in the same batch has 5 channels. You would need to run 5 loop iterations and either mess up or waste compute resources for the first example with only 3 channels.
You need to think what exactly you are trying to achieve. Commonly you would have different weights for each channel so the system can't just create them out of thin air, they need to be trained properly.
If you just want to apply the same logic 3 times just re-arrange your tensor to be (3, 4, 4, 1). You get 3 results and you do what you want with them.
Usually when you actually need for loops (when handling sequences) you pad the examples so that they all have the same length and generate a model where the loop in unrolled (you would have 3 different operations, one for each iteration of the loop). Look for dynamic_rnn or static_rnn (first one can handle different lengths for each batch).

I understand that there is no for-loop and so on and just while, why?
According to Implementation of Control Flow in TensorFlow
They should fit well with the dataflow model of TensorFlow, and should be amenable to parallel and distributed execution and automatic differentiation.
I think distributed data flow graphs and Automatic differentiation across devices could have been the constraints leading to the introduction of very few such loop primitives.
There are several diagrams in this doc. that distributed computing experts can understand better. A more thorough explanation is beyond me.

Related

Using tf.where() and tf.gather_nd() with None dimension

I am tackling a machine learning problem in which I feed my network with data of shape (batch_size, n_objects, n_features). So, each training instance comes with a given number of objects, each of them having a given number of features. Among these features I have electric charge, and while writing a custom loss function I would like to use only the neutral objects to compute it. Thus, starting from a tensor of shape (batch_size, n_objects, n_features) I would like to get a tensor of shape (batch_size, n_neutral_objects, n_features). In doing this, I'm facing a couple of problems.
First of all, I made a try by creating a tensor by hand. I have 3 training instances, each one having 2 objects, each one having 3 features. I try to get the neutral objects using tf.where() and tf.gather() methods in the following way (suppose that electric charge is the 2nd feature):
a = tf.constant([[[3.5, 0, 6], [2.1, 1, 2.9]], [[1.5, 1, 4.5], [2.0, 0, 4.2]], [[6.2, 0, 6.1], [4.8, 1, 3.4]]]) #toy input tensor
b = tf.where(a[:,:,1] == 0) #find neutral objects (charge is 2nd feature)
c = tf.gather_nd(a,b) #gather them
print(c)
This kind of works, as I get
[[3.5 0. 6. ]
[2. 0. 4.2]
[6.2 0. 6.1]], shape=(3, 3), dtype=float32)
as an output, which are the desired objects. But I've somehow lost the first dimension, as I don't want a tensor of shape (3, 3), but rather one of shape (3, 1, 3), namely still 3 input instances, each one having only one neutral object, each of them having 3 features.
Things get worse if I plug my approach into my TF model. In this real-life case, my batch size is None and I am thus dealing with tensors of shape (None, 4000, 14) (so 4000 objects for each training instance, 14 features each). This is the code I tried
def get_neutrals(tensor):
print("tensor.get_shape()", tensor.get_shape())
charges = tensor[:,:,4] #charge is the 5th feature in this case
print("charges.get_shape()", charges.get_shape())
where_neutrals = tf.where(charges == 0) # get the neutrals only
print("where_neutrals.get_shape()", where_neutrals.get_shape())
print("tf.gather_nd(tensor, where_neutrals).get_shape()", tf.gather_nd(tensor, where_neutrals).get_shape())
return tf.gather_nd(tensor, where_neutrals)
and this is what I get printed if I call my method:
tensor.get_shape() (None, 4000, 14)
charges.get_shape() (None, 4000)
where_neutrals.get_shape() (None, 2)
tf.gather_nd(tensor, where_neutrals).get_shape() (None, 14)
The last two shapes are completely unexpected and I don't know why they look like this. Can anyone here help with this?
Thanks a lot, cheers,
F.

Tensorflow custom Estimator with Dataset API: embedding lookup (feature_column) NMT task

my question is close in nature to Feature Columns Embedding lookup, however I was unable to comment on the answer given there (not enough rep), and I think the answerer either did not fully understand the question, or the answer was not exactly what was asked.
GOAL
To serve a custom Estimator which uses Dataset API to feed in data. The task is NMT (seq2seq).
Issue
Estimator requires feature_columns as input for serving. My NMT task has only one feature, the input sentence to translate (or possibly each word in the sentence is a feature?). And so I am unsure how to build a feature_column (and thus an embedding_column and finally an input_layer) using my input sentences as a feature that can be fed into an RNN (which expects an embedding_lookup [batch_size, max_seqence_len, embedding_dim]) which will finally allow me to serve the Estimator.
Background
I am attempting to utilize a custom estimator to feed a seq2seq style NMT implementation. I need to be able to serve the model via tf-serving, which estimators seem to make relatively easy.
However I hit a road block with 'how' to serve the model. From what I can tell I need 'feature_columns' that will serve as the input into the model.
https://github.com/MtDersvan/tf_playground/blob/master/wide_and_deep_tutorial/wide_and_deep_basic_serving.md
Shows that you need to have an export_input_fn which uses a feature_spec which needs a feature_column(s) as input. This makes sense, however, for my use case I do not have a bunch of (different) features, instead I have input sentences (where each word is a feature) that need to be looked up via embeddings and used as features...
So I know I need the input into my model to be feature column(s). My input for NMT is simply a tensor of [batch_size, max_sequence_len] which is filled with the indices of the words from the sentences (e.g., for batch_size=1 [3, 17, 132, 2, 1, 0, ...] where each index should map to an embedding vector). Typically I would feed this into a embedding_lookup via
embs = tf.get_variable('embedding', [vocab_size, embedding_dim])
tf.nn.embedding_lookup(embs, inputs)
and I would be good to go, I could feed this to an RNN as inputs and the rest is history, not a problem.
BUT, this is where I hit the issue, I need to use feature_columns (so I can serve the model). The answer given to the question I mentioned at the beginning of this shows how to use embedding_column, but instead he is suggesting that embedding should look up an entire sentence as one single feature, but traditionally you would look up each word in the sentence and get its embedding.
Similarly, https://www.damienpontifex.com/2018/01/02/using-tensorflow-feature-columns-in-your-custom-estimator-model/
Shows 'how to implement a feature-column in a custom estimator' and indeed his 'Before' code is exactly right (as I wrote out), a tf.get_variable into a tf.nn.embedding_lookup, but his 'after' code, again, only takes in 1 feature (the entire sentence?).
I have verified this by using their code and feeding my data in [batch_size, max_seq_len] to the tf.feature_column.categorical_column_with_identity, and the output tensor is [batch_size, embedding_dim]
the sequence information is lost? Or does it simply get flattened? when I print the output its size (?, embedding_dim) where ? is typically my batch_size.
EDIT: I have verified the shape is [batch_size, embedding_dim], it is not just flattened... So the sequence info is lost
I'm guessing it must be treating the input as 1 single input feature (thus the batch_size=1 ex [3, 17, 132, 2, 1, 0, ...] where each index maps to an embedding vector) would map to a single feature which is not what is wanted, we want each index to map to an embedding and the needed output is [batch_size, max_seq_len, embedding_dim].
It sounds like what I instead need, is not one categorical_column_with_*, but a max_seq_len amount of them (1 for each word in my sequence), does this sound right? Each word would be a feature for my model so I am leaning toward this being the correct approach, but this also has issues. I am using the Dataset API, so in my input_train_fn() I load my data from a file, and then use tf.data.Dataset.from_tensor_slices(data, labels) to split the data into tensors which I can then dataset.batch(batch_size).make_one_shot_iterator().get_next() to feed into my Estimator. I cannot iterator over each batch (Tesors are not iterable) so I cannot simply make 100 feature_columns for each input batch...
Does anyone have any idea how to do this? This embedding lookup is a very straightforward thing to do with simple placeholders or variables (and a common approach in NLP tasks). But when I venture into Dataset API and Estimators I run into a wall with very little in the way of information (that is not a basic example).
I admit I may have gaps in my understanding, custom estimators and dataset API are new to me and finding information on them can be difficult at times. So feel free to fire off information at me.
Thanks for reading my wall of text and hopefully helping me (and the others I've seen ask a question similar to this but get no answer https://groups.google.com/a/tensorflow.org/forum/#!searchin/discuss/embeddings$20in$20custom$20estimator/discuss/U3vFQF_jeaY/EjgwRQ3RDQAJ I feel bad for this guy, his question was not really answered (for the same reason outlined here, and his thread got hijacked...).
If I understand correctly, You want to use Estimator API to build a SeqSeq Model. A good place to start here, Look into the Problems-Solutions/text folder.
To answer your question on how to use emedding lookup, here is a one example
vocab_table = lookup.index_table_from_file(vocabulary_file='data/vocab.csv', num_oov_buckets=1, default_value=-1)
text = features[commons.FEATURE_COL]
words = tf.string_split(text)
dense_words = tf.sparse_tensor_to_dense(words, default_value=commons.PAD_WORD)
word_ids = vocab_table.lookup(dense_words)
padding = tf.constant([[0, 0], [0, commons.MAX_DOCUMENT_LENGTH]])
# Pad all the word_ids entries to the maximum document length
word_ids_padded = tf.pad(word_ids, padding)
word_id_vector = tf.slice(word_ids_padded, [0, 0], [-1, commons.MAX_DOCUMENT_LENGTH])
word_id_vector = {commons.FEATURE_COL: word_id_vector}
bow_column = tf.feature_column.categorical_column_with_identity(commons.FEATURE_COL, num_buckets=params.N_WORDS)
bow_embedding_column = tf.feature_column.embedding_column(bow_column, dimension=50, combiner='sqrtn')
bow = tf.feature_column.input_layer(word_id_vector, feature_columns=[bow_embedding_column])
logits = tf.layers.dense(bow, 2, activation=None)
The above code can be wrapped in Estimator model_fn. The above repo contains this code. Please take a look at it.
So the way I ended up making this work is I made each word an input feature, then I simply do the wrd_2_idx conversion, pass that in as a feature in a numerical_column(s, you have max_seq_lens of these) and then pass those columns to input_layer. Then in my graph I uses these features and lookup the embedding as normal. Basically circumventing the embedding_column lookup since I can't figure out how to make it act the way I want. This is probably not optimal but it works and trains...
I'll leave this as the accepted answer and hope sometime in the future either I figure out a better way to do it, or someone else can enlighten me to the best way to approach this.
I managed to get this working ... also got derailed by the fact that the RNN did not consume an embedding.
What I did to get this working (in the simplest case):
#features[VALUE_FEATURE_NAME] is shape (?, 200), ie. 200 words per row
inputs = tf.contrib.layers.embed_sequence(
features[VALUES_FEATURE_NAME], 3000, 5,
)
# create an LSTM cell of size 100
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(200)
# create the complete LSTM
_, final_states = tf.nn.dynamic_rnn(
lstm_cell, inputs, dtype=tf.float32)
outputs = final_states.h
So I guess the answer lies in the tensorflow docs for dynamic rnn
Creates a recurrent neural network specified by RNNCell cell.
Performs fully dynamic unrolling of inputs.
So the unrolling here means that the RNN consumes [batch,time_steps,values] as an input.
Bests
You can use tf.contrib.feature_column.sequence_categorical_column_with_vocabulary_list and tf.contrib.feature_column.sequence_input_layer to solve it.
The demo code is as follows:
import tensorflow as tf
if __name__ == "__main__":
#tf.enable_eager_execution()
feature = {
'aa': [['1', '2', '3'],
['-1', '4', '-1'],
['2', '-1', '-1'],
['4', '5', '6']]
}
aa_id = tf.contrib.feature_column.sequence_categorical_column_with_vocabulary_list(
'aa', ['1', '2', '3', '4', '5']
)
seq_emb_matrix = tf.feature_column.embedding_column(aa_id, 2)
seq_tensor, seq_length = tf.contrib.feature_column.sequence_input_layer(feature, [seq_emb_matrix])
seq_tensor1, seq_length1 = tf.contrib.feature_column.sequence_input_layer(feature1, [seq_emb_matrix])
seq_tensor2 = tf.squeeze(seq_tensor1)
# print(tensor)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
a, a_len = sess.run([seq_tensor, seq_length])
b, b_len = sess.run([seq_tensor1, seq_length1])
print(a)
print('a_len', a_len)
print(a.shape)
print('-'*50)
print(b)
print('b_len', b_len)
print(b.shape)
print(sess.run([seq_tensor2]))
print results are as follows:
[[[ 0.5333682 -0.39895234]
[ 0.5335079 0.64998794]
[-1.0432893 -0.8391434 ]]
[[ 0. 0. ]
[-0.29623085 -0.17570129]
[ 0. 0. ]]
[[ 0.5335079 0.64998794]
[ 0. 0. ]
[ 0. 0. ]]
[[-0.29623085 -0.17570129]
[ 0.7100604 0.9935588 ]
[ 0. 0. ]]]
('a_len', array([3, 3, 3, 3]))
(4, 3, 2)
--------------------------------------------------
[[[-0.24147142 -0.37740025]]
[[-0.6222648 1.3862932 ]]
[[ 1.2226609 -0.68292266]]]
('b_len', array([1, 1, 1]))
(3, 1, 2)
[array([[-0.24147142, -0.37740025],
[-0.6222648 , 1.3862932 ],
[ 1.2226609 , -0.68292266]], dtype=float32)]

Reusing layer weights in Tensorflow

I am using tf.slim to implement an autoencoder. I's fully convolutional with the following architecture:
[conv, outputs = 1] => [conv, outputs = 15] => [conv, outputs = 25] =>
=> [conv_transpose, outputs = 25] => [conv_transpose, outputs = 15] =>
[conv_transpose, outputs = 1]
It has to be fully convolutional and I cannot do pooling (limitations of the larger problem). I want to use tied weights, so
encoder_W_3 = decoder_W_1_Transposed
(so the weights of the first decoder layer are the ones of the last encoder layer, transposed).
If I reuse weights the regular way tfslim lets you reuse them, i.e. reuse = True and then just provide the scope name of the layer you want to reuse, I get size issue:
ValueError: Trying to share variable cnn_block_3/weights, but specified shape (21, 11, 25, 25) and found shape (21, 11, 15, 25).
This makes sense, if you do not transpose the weights of the previous model. Does anyone have an idea on how I can transpose those weights?
PS: I know this is very abstract and hand-waving, but I am working with a custom api, on top of tfslim, so I can't post code examples here.
Does anyone have an idea on how I can transpose those weights?
Transposition is simple:
new_weights = tf.transpose(weights, perm=[0, 1, 3, 2])
will swap the last two axes.
However, as #Seven mentioned, that wouldn't be enough to address the error, as the total number of weights changed.

How to freeze/lock weights of one TensorFlow variable (e.g., one CNN kernel of one layer)

I have a TensorFlow CNN model that is performing well and we would like to implement this model in hardware; i.e., an FPGA. It's a relatively small network but it would be ideal if it were smaller. With that goal, I've examined the kernels and find that there are some where the weights are quite strong and there are others that aren't doing much at all (the kernel values are all close to zero). This occurs specifically in layer 2, corresponding to the tf.Variable() named, "W_conv2". W_conv2 has shape [3, 3, 32, 32]. I would like to freeze/lock the values of W_conv2[:, :, 29, 13] and set them to zero so that the rest of the network can be trained to compensate. Setting the values of this kernel to zero effectively removes/prunes the kernel from the hardware implementation thus achieving the goal stated above.
I have found similar questions with suggestions that generally revolve around one of two approaches;
Suggestion #1:
tf.Variable(some_initial_value, trainable = False)
Implementing this suggestion freezes the entire variable. I want to freeze just a slice, specifically W_conv2[:, :, 29, 13].
Suggestion #2:
Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list)
Again, implementing this suggestion does not allow the use of slices. For instance, if I try the inverse of my stated goal (optimize only a single kernel of a single variable) as follows:
Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list = W_conv2[:,:,0,0]))
I get the following error:
NotImplementedError: ('Trying to optimize unsupported type ', <tf.Tensor 'strided_slice_2228:0' shape=(3, 3) dtype=float32>)
Slicing tf.Variables() isn't possible in the way that I've tried it here. The only thing that I've tried which comes close to doing what I want is using .assign() but this is extremely inefficient, cumbersome, and caveman-like as I've implemented it as follows (after the model is trained):
for _ in range(10000):
# get a new batch of data
# reset the values of W_conv2[:,:,29,13]=0 each time through
for m in range(3):
for n in range(3):
assign_op = W_conv2[m,n,29,13].assign(0)
sess.run(assign_op)
# re-train the rest of the network
_, loss_val = sess.run([optimizer, loss], feed_dict = {
dict_stuff_here
})
print(loss_val)
The model was started in Keras then moved to TensorFlow since Keras didn't seem to have a mechanism to achieve the desired results. I'm starting to think that TensorFlow doesn't allow for pruning but find this hard to believe; it just needs the correct implementation.
A possible approach is to initialize these specific weights with zeros, and modify the minimization process such that gradients won't be applied to them. It can be done by replacing the call to minimize() with something like:
W_conv2_weights = np.ones((3, 3, 32, 32))
W_conv2_weights[:, :, 29, 13] = 0
W_conv2_weights_const = tf.constant(W_conv2_weights)
optimizer = tf.train.RMSPropOptimizer(0.001)
W_conv2_orig_grads = tf.gradients(loss, W_conv2)
W_conv2_grads = tf.multiply(W_conv2_weights_const, W_conv2_orig_grads)
W_conv2_train_op = optimizer.apply_gradients(zip(W_conv2_grads, W_conv2))
rest_grads = tf.gradients(loss, rest_of_vars)
rest_train_op = optimizer.apply_gradients(zip(rest_grads, rest_of_vars))
tf.group([rest_train_op, W_conv2_train_op])
I.e,
Preparing a constant Tensor for canceling the appropriate gradients
Compute gradients only for W_conv2, then multiply element-wise with the constant W_conv2_weights to zero the appropriate gradients and only then apply gradients.
Compute and apply gradients "normally" to the rest of the variables.
Group the 2 train ops to a single training op.

tensorflow shape of a tiled tensor

I have a variable a of dimension (1, 5) which I want to 'tile' as many times as the size of my mini-batch. For example, if the mini-batch size is 32 then I want to construct a tensor c of dimension (32, 5) where each row has values the same as the original (1, 5) variable a.
But I only know the mini-batch size at run time: it's the size of dimension 0 of a placeholder b: tf.shape(b)[0]
Here's my code to construct c:
a = tf.Variable(np.random.uniform(size=(1,5)))
b = tf.placeholder(shape=[None, 12], dtype=tf.float32)
batch_size = tf.shape(b)[0]
c = tf.tile(a, tf.pack([batch_size, 1]))
This runs fine. Howeverc.get_shape() returns (?, ?). I don't understand why this doesn't return (?, 5) instead.
This is causing an issue later in my code when I construct a matrix variable W with number of columns c.get_shape()[1] which I expect to return 5 rather than ?.
Any help would be appreciated. Thanks.
[EDIT: This was fixed in a commit to TensorFlow on August 10, 2016.]
This is a known limitation of TensorFlow's shape inference: when the multiples argument to tf.tile() is a computed value (such as the result of tf.pack() here), and its value is not trivially computable at graph construction time (in this case, because it depends on a tf.placeholder(), which has no value until it is fed), the current shape inference will throw its hands up and declare that the shape is unknown (but with the same rank as the input, a).
The current workaround is to use Tensor.set_shape(), which allows you as the programmer to provide additional shape information when you know more than the shape inference does. For example, you could do:
a = tf.Variable(np.random.uniform(size=(1, 5)))
b = tf.placeholder(shape=[None, 12], dtype=tf.float32)
batch_size = tf.shape(b)[0]
c = tf.tile(a, tf.pack([batch_size, 1]))
c.set_shape([None, a.get_shape()[1]]) # or `c.set_shape([None, 5])`
However, we recently added some features that make it possible to propagate partially computed values that may be used as shapes, and this can be adapted to aid the shape function for tf.tile(). I have created a GitHub issue to track this, and I have a fix being tested right now.