Reusing layer weights in Tensorflow - tensorflow

I am using tf.slim to implement an autoencoder. I's fully convolutional with the following architecture:
[conv, outputs = 1] => [conv, outputs = 15] => [conv, outputs = 25] =>
=> [conv_transpose, outputs = 25] => [conv_transpose, outputs = 15] =>
[conv_transpose, outputs = 1]
It has to be fully convolutional and I cannot do pooling (limitations of the larger problem). I want to use tied weights, so
encoder_W_3 = decoder_W_1_Transposed
(so the weights of the first decoder layer are the ones of the last encoder layer, transposed).
If I reuse weights the regular way tfslim lets you reuse them, i.e. reuse = True and then just provide the scope name of the layer you want to reuse, I get size issue:
ValueError: Trying to share variable cnn_block_3/weights, but specified shape (21, 11, 25, 25) and found shape (21, 11, 15, 25).
This makes sense, if you do not transpose the weights of the previous model. Does anyone have an idea on how I can transpose those weights?
PS: I know this is very abstract and hand-waving, but I am working with a custom api, on top of tfslim, so I can't post code examples here.

Does anyone have an idea on how I can transpose those weights?
Transposition is simple:
new_weights = tf.transpose(weights, perm=[0, 1, 3, 2])
will swap the last two axes.
However, as #Seven mentioned, that wouldn't be enough to address the error, as the total number of weights changed.

Related

How to multiply tensors with different shapes/dimensions?

I have a convolutional autoencoder model. While an autoencoder typically focuses on reconstructing the input without using any label information, I want to use the class label to perform class conditional scaling/shifting after convolutions. I am curious if utilizing the label in this way might help produce better reconstructions.
num_filters = 32
input_img = layers.Input(shape=(28, 28, 1)) # input image
label = layers.Input(shape=(10,)) # label
# separate scale value for each of the filter dimensions
scale = layers.Dense(num_filters, activation=None)(label)
# conv_0 produces something of shape (None,14,14,32)
conv_0 = layers.Conv2D(num_filters, (3, 3), strides=2, activation=None, padding='same')(input_img)
# TODO: Need help here. Multiply conv_0 by scale along each of the filter dimensions.
# This still outputs something of shape (None,14,14,32)
# Essentially each 14x14x1 has it's own scalar multiplier
In the example above, the output of the convolutional layer is (14,14,32) and the scale layer is of shape (32,). I want the convolutional output to be multiplied by the corresponding scale value along each filter dimension. For example, if these were numpy arrays I could do something like conv_0[:, :, i] * scale[i] for i in range(32).
I looked at tf.keras.layers.Multiply which can be found here, but based on the documentation I believe that takes in tensors of the same size as input. How do I work around this?
You don't have to loop. Simply do the following by making two tensors broadcast-compatible,
out = layers.Multiply()([conv_0, tf.expand_dims(tf.expand_dims(scale,axis=1), axis=1)])
I dont know if i actually understood what you are trying to achieve but i did a quick numpy test. I believe it should hold in tensorflow also:
conv_0 = np.ones([14, 14, 32])
scale = np.array([ i + 1 for i in range(32)])
result = conv_0 * scale
check whether channel-wise slices actually scaled element-wise in this case by the element found at index 1 in scale, which is 2
conv_0_slice_1 = conv_0[:, :, 1]
result_slice_1 = result[:, :, 1]

TensorflowJS : What is the parameter filters?

I'm learning TensorflowJS and I'm working on CNN.
I'm following this and in this tutorial you have to parameter the first layer like that
// In the first layer of out convolutional neural network we have
// to specify the input shape. Then we specify some paramaters for
// the convolution operation that takes place in this layer.
model.add(tf.layers.conv2d({
inputShape: [IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS],
kernelSize: 5,
filters: 8,
strides: 1,
activation: 'relu',
kernelInitializer: 'varianceScaling'
}));
filters. The number of filter windows of size kernelSize to apply to the input data. Here, we will apply 8 filters to the data.
Despite the little explanation I still not understand what the filters are :( Can somebody explain me ?
Thank you.
It's not correct definition, but perhaps it will help you with intuition. Filters are like channels. If you have a 28 x 28 pix image and this image holds RGB colors, we can say you have 28 x 28 x 3 dimension size of a picture, where 3 = [red, green, blue]. If you set filters to 10, (and let's assume first two guys stay the same), you will get 28x28x10 dimensions for your original input. It's very useful for feature detection. But it's very expensive for computing.

Tensorflow custom Estimator with Dataset API: embedding lookup (feature_column) NMT task

my question is close in nature to Feature Columns Embedding lookup, however I was unable to comment on the answer given there (not enough rep), and I think the answerer either did not fully understand the question, or the answer was not exactly what was asked.
GOAL
To serve a custom Estimator which uses Dataset API to feed in data. The task is NMT (seq2seq).
Issue
Estimator requires feature_columns as input for serving. My NMT task has only one feature, the input sentence to translate (or possibly each word in the sentence is a feature?). And so I am unsure how to build a feature_column (and thus an embedding_column and finally an input_layer) using my input sentences as a feature that can be fed into an RNN (which expects an embedding_lookup [batch_size, max_seqence_len, embedding_dim]) which will finally allow me to serve the Estimator.
Background
I am attempting to utilize a custom estimator to feed a seq2seq style NMT implementation. I need to be able to serve the model via tf-serving, which estimators seem to make relatively easy.
However I hit a road block with 'how' to serve the model. From what I can tell I need 'feature_columns' that will serve as the input into the model.
https://github.com/MtDersvan/tf_playground/blob/master/wide_and_deep_tutorial/wide_and_deep_basic_serving.md
Shows that you need to have an export_input_fn which uses a feature_spec which needs a feature_column(s) as input. This makes sense, however, for my use case I do not have a bunch of (different) features, instead I have input sentences (where each word is a feature) that need to be looked up via embeddings and used as features...
So I know I need the input into my model to be feature column(s). My input for NMT is simply a tensor of [batch_size, max_sequence_len] which is filled with the indices of the words from the sentences (e.g., for batch_size=1 [3, 17, 132, 2, 1, 0, ...] where each index should map to an embedding vector). Typically I would feed this into a embedding_lookup via
embs = tf.get_variable('embedding', [vocab_size, embedding_dim])
tf.nn.embedding_lookup(embs, inputs)
and I would be good to go, I could feed this to an RNN as inputs and the rest is history, not a problem.
BUT, this is where I hit the issue, I need to use feature_columns (so I can serve the model). The answer given to the question I mentioned at the beginning of this shows how to use embedding_column, but instead he is suggesting that embedding should look up an entire sentence as one single feature, but traditionally you would look up each word in the sentence and get its embedding.
Similarly, https://www.damienpontifex.com/2018/01/02/using-tensorflow-feature-columns-in-your-custom-estimator-model/
Shows 'how to implement a feature-column in a custom estimator' and indeed his 'Before' code is exactly right (as I wrote out), a tf.get_variable into a tf.nn.embedding_lookup, but his 'after' code, again, only takes in 1 feature (the entire sentence?).
I have verified this by using their code and feeding my data in [batch_size, max_seq_len] to the tf.feature_column.categorical_column_with_identity, and the output tensor is [batch_size, embedding_dim]
the sequence information is lost? Or does it simply get flattened? when I print the output its size (?, embedding_dim) where ? is typically my batch_size.
EDIT: I have verified the shape is [batch_size, embedding_dim], it is not just flattened... So the sequence info is lost
I'm guessing it must be treating the input as 1 single input feature (thus the batch_size=1 ex [3, 17, 132, 2, 1, 0, ...] where each index maps to an embedding vector) would map to a single feature which is not what is wanted, we want each index to map to an embedding and the needed output is [batch_size, max_seq_len, embedding_dim].
It sounds like what I instead need, is not one categorical_column_with_*, but a max_seq_len amount of them (1 for each word in my sequence), does this sound right? Each word would be a feature for my model so I am leaning toward this being the correct approach, but this also has issues. I am using the Dataset API, so in my input_train_fn() I load my data from a file, and then use tf.data.Dataset.from_tensor_slices(data, labels) to split the data into tensors which I can then dataset.batch(batch_size).make_one_shot_iterator().get_next() to feed into my Estimator. I cannot iterator over each batch (Tesors are not iterable) so I cannot simply make 100 feature_columns for each input batch...
Does anyone have any idea how to do this? This embedding lookup is a very straightforward thing to do with simple placeholders or variables (and a common approach in NLP tasks). But when I venture into Dataset API and Estimators I run into a wall with very little in the way of information (that is not a basic example).
I admit I may have gaps in my understanding, custom estimators and dataset API are new to me and finding information on them can be difficult at times. So feel free to fire off information at me.
Thanks for reading my wall of text and hopefully helping me (and the others I've seen ask a question similar to this but get no answer https://groups.google.com/a/tensorflow.org/forum/#!searchin/discuss/embeddings$20in$20custom$20estimator/discuss/U3vFQF_jeaY/EjgwRQ3RDQAJ I feel bad for this guy, his question was not really answered (for the same reason outlined here, and his thread got hijacked...).
If I understand correctly, You want to use Estimator API to build a SeqSeq Model. A good place to start here, Look into the Problems-Solutions/text folder.
To answer your question on how to use emedding lookup, here is a one example
vocab_table = lookup.index_table_from_file(vocabulary_file='data/vocab.csv', num_oov_buckets=1, default_value=-1)
text = features[commons.FEATURE_COL]
words = tf.string_split(text)
dense_words = tf.sparse_tensor_to_dense(words, default_value=commons.PAD_WORD)
word_ids = vocab_table.lookup(dense_words)
padding = tf.constant([[0, 0], [0, commons.MAX_DOCUMENT_LENGTH]])
# Pad all the word_ids entries to the maximum document length
word_ids_padded = tf.pad(word_ids, padding)
word_id_vector = tf.slice(word_ids_padded, [0, 0], [-1, commons.MAX_DOCUMENT_LENGTH])
word_id_vector = {commons.FEATURE_COL: word_id_vector}
bow_column = tf.feature_column.categorical_column_with_identity(commons.FEATURE_COL, num_buckets=params.N_WORDS)
bow_embedding_column = tf.feature_column.embedding_column(bow_column, dimension=50, combiner='sqrtn')
bow = tf.feature_column.input_layer(word_id_vector, feature_columns=[bow_embedding_column])
logits = tf.layers.dense(bow, 2, activation=None)
The above code can be wrapped in Estimator model_fn. The above repo contains this code. Please take a look at it.
So the way I ended up making this work is I made each word an input feature, then I simply do the wrd_2_idx conversion, pass that in as a feature in a numerical_column(s, you have max_seq_lens of these) and then pass those columns to input_layer. Then in my graph I uses these features and lookup the embedding as normal. Basically circumventing the embedding_column lookup since I can't figure out how to make it act the way I want. This is probably not optimal but it works and trains...
I'll leave this as the accepted answer and hope sometime in the future either I figure out a better way to do it, or someone else can enlighten me to the best way to approach this.
I managed to get this working ... also got derailed by the fact that the RNN did not consume an embedding.
What I did to get this working (in the simplest case):
#features[VALUE_FEATURE_NAME] is shape (?, 200), ie. 200 words per row
inputs = tf.contrib.layers.embed_sequence(
features[VALUES_FEATURE_NAME], 3000, 5,
)
# create an LSTM cell of size 100
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(200)
# create the complete LSTM
_, final_states = tf.nn.dynamic_rnn(
lstm_cell, inputs, dtype=tf.float32)
outputs = final_states.h
So I guess the answer lies in the tensorflow docs for dynamic rnn
Creates a recurrent neural network specified by RNNCell cell.
Performs fully dynamic unrolling of inputs.
So the unrolling here means that the RNN consumes [batch,time_steps,values] as an input.
Bests
You can use tf.contrib.feature_column.sequence_categorical_column_with_vocabulary_list and tf.contrib.feature_column.sequence_input_layer to solve it.
The demo code is as follows:
import tensorflow as tf
if __name__ == "__main__":
#tf.enable_eager_execution()
feature = {
'aa': [['1', '2', '3'],
['-1', '4', '-1'],
['2', '-1', '-1'],
['4', '5', '6']]
}
aa_id = tf.contrib.feature_column.sequence_categorical_column_with_vocabulary_list(
'aa', ['1', '2', '3', '4', '5']
)
seq_emb_matrix = tf.feature_column.embedding_column(aa_id, 2)
seq_tensor, seq_length = tf.contrib.feature_column.sequence_input_layer(feature, [seq_emb_matrix])
seq_tensor1, seq_length1 = tf.contrib.feature_column.sequence_input_layer(feature1, [seq_emb_matrix])
seq_tensor2 = tf.squeeze(seq_tensor1)
# print(tensor)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
a, a_len = sess.run([seq_tensor, seq_length])
b, b_len = sess.run([seq_tensor1, seq_length1])
print(a)
print('a_len', a_len)
print(a.shape)
print('-'*50)
print(b)
print('b_len', b_len)
print(b.shape)
print(sess.run([seq_tensor2]))
print results are as follows:
[[[ 0.5333682 -0.39895234]
[ 0.5335079 0.64998794]
[-1.0432893 -0.8391434 ]]
[[ 0. 0. ]
[-0.29623085 -0.17570129]
[ 0. 0. ]]
[[ 0.5335079 0.64998794]
[ 0. 0. ]
[ 0. 0. ]]
[[-0.29623085 -0.17570129]
[ 0.7100604 0.9935588 ]
[ 0. 0. ]]]
('a_len', array([3, 3, 3, 3]))
(4, 3, 2)
--------------------------------------------------
[[[-0.24147142 -0.37740025]]
[[-0.6222648 1.3862932 ]]
[[ 1.2226609 -0.68292266]]]
('b_len', array([1, 1, 1]))
(3, 1, 2)
[array([[-0.24147142, -0.37740025],
[-0.6222648 , 1.3862932 ],
[ 1.2226609 , -0.68292266]], dtype=float32)]

How to freeze/lock weights of one TensorFlow variable (e.g., one CNN kernel of one layer)

I have a TensorFlow CNN model that is performing well and we would like to implement this model in hardware; i.e., an FPGA. It's a relatively small network but it would be ideal if it were smaller. With that goal, I've examined the kernels and find that there are some where the weights are quite strong and there are others that aren't doing much at all (the kernel values are all close to zero). This occurs specifically in layer 2, corresponding to the tf.Variable() named, "W_conv2". W_conv2 has shape [3, 3, 32, 32]. I would like to freeze/lock the values of W_conv2[:, :, 29, 13] and set them to zero so that the rest of the network can be trained to compensate. Setting the values of this kernel to zero effectively removes/prunes the kernel from the hardware implementation thus achieving the goal stated above.
I have found similar questions with suggestions that generally revolve around one of two approaches;
Suggestion #1:
tf.Variable(some_initial_value, trainable = False)
Implementing this suggestion freezes the entire variable. I want to freeze just a slice, specifically W_conv2[:, :, 29, 13].
Suggestion #2:
Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list)
Again, implementing this suggestion does not allow the use of slices. For instance, if I try the inverse of my stated goal (optimize only a single kernel of a single variable) as follows:
Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list = W_conv2[:,:,0,0]))
I get the following error:
NotImplementedError: ('Trying to optimize unsupported type ', <tf.Tensor 'strided_slice_2228:0' shape=(3, 3) dtype=float32>)
Slicing tf.Variables() isn't possible in the way that I've tried it here. The only thing that I've tried which comes close to doing what I want is using .assign() but this is extremely inefficient, cumbersome, and caveman-like as I've implemented it as follows (after the model is trained):
for _ in range(10000):
# get a new batch of data
# reset the values of W_conv2[:,:,29,13]=0 each time through
for m in range(3):
for n in range(3):
assign_op = W_conv2[m,n,29,13].assign(0)
sess.run(assign_op)
# re-train the rest of the network
_, loss_val = sess.run([optimizer, loss], feed_dict = {
dict_stuff_here
})
print(loss_val)
The model was started in Keras then moved to TensorFlow since Keras didn't seem to have a mechanism to achieve the desired results. I'm starting to think that TensorFlow doesn't allow for pruning but find this hard to believe; it just needs the correct implementation.
A possible approach is to initialize these specific weights with zeros, and modify the minimization process such that gradients won't be applied to them. It can be done by replacing the call to minimize() with something like:
W_conv2_weights = np.ones((3, 3, 32, 32))
W_conv2_weights[:, :, 29, 13] = 0
W_conv2_weights_const = tf.constant(W_conv2_weights)
optimizer = tf.train.RMSPropOptimizer(0.001)
W_conv2_orig_grads = tf.gradients(loss, W_conv2)
W_conv2_grads = tf.multiply(W_conv2_weights_const, W_conv2_orig_grads)
W_conv2_train_op = optimizer.apply_gradients(zip(W_conv2_grads, W_conv2))
rest_grads = tf.gradients(loss, rest_of_vars)
rest_train_op = optimizer.apply_gradients(zip(rest_grads, rest_of_vars))
tf.group([rest_train_op, W_conv2_train_op])
I.e,
Preparing a constant Tensor for canceling the appropriate gradients
Compute gradients only for W_conv2, then multiply element-wise with the constant W_conv2_weights to zero the appropriate gradients and only then apply gradients.
Compute and apply gradients "normally" to the rest of the variables.
Group the 2 train ops to a single training op.

tensorflow shape of a tiled tensor

I have a variable a of dimension (1, 5) which I want to 'tile' as many times as the size of my mini-batch. For example, if the mini-batch size is 32 then I want to construct a tensor c of dimension (32, 5) where each row has values the same as the original (1, 5) variable a.
But I only know the mini-batch size at run time: it's the size of dimension 0 of a placeholder b: tf.shape(b)[0]
Here's my code to construct c:
a = tf.Variable(np.random.uniform(size=(1,5)))
b = tf.placeholder(shape=[None, 12], dtype=tf.float32)
batch_size = tf.shape(b)[0]
c = tf.tile(a, tf.pack([batch_size, 1]))
This runs fine. Howeverc.get_shape() returns (?, ?). I don't understand why this doesn't return (?, 5) instead.
This is causing an issue later in my code when I construct a matrix variable W with number of columns c.get_shape()[1] which I expect to return 5 rather than ?.
Any help would be appreciated. Thanks.
[EDIT: This was fixed in a commit to TensorFlow on August 10, 2016.]
This is a known limitation of TensorFlow's shape inference: when the multiples argument to tf.tile() is a computed value (such as the result of tf.pack() here), and its value is not trivially computable at graph construction time (in this case, because it depends on a tf.placeholder(), which has no value until it is fed), the current shape inference will throw its hands up and declare that the shape is unknown (but with the same rank as the input, a).
The current workaround is to use Tensor.set_shape(), which allows you as the programmer to provide additional shape information when you know more than the shape inference does. For example, you could do:
a = tf.Variable(np.random.uniform(size=(1, 5)))
b = tf.placeholder(shape=[None, 12], dtype=tf.float32)
batch_size = tf.shape(b)[0]
c = tf.tile(a, tf.pack([batch_size, 1]))
c.set_shape([None, a.get_shape()[1]]) # or `c.set_shape([None, 5])`
However, we recently added some features that make it possible to propagate partially computed values that may be used as shapes, and this can be adapted to aid the shape function for tf.tile(). I have created a GitHub issue to track this, and I have a fix being tested right now.