Retrain tensorflow model for additional categories - tensorflow

I'm building a program using tensorflow image classification. I got tensorflow from github, and what I know is pretty much, how to run classify_image.py!
What I want to do is have an option to train the model in a simple manner. For example, the model knows "keys", but I want to train it for "HouseKeys" which have a fancy keyfob or something. Is there some sort of script I can use to say "take these 20 images and learn HouseKeys" so the model can distinguish "keys" from "HouseKeys"?
Excuse my nooblines, and thank you in advance!
Edit: Obviously, it is very important that the model keeps its knowledge of all the other categories it knew previously, since being able to recognize only "HouseKeys" is absolutely useless.

You can do this. However, it will need some adjustments probably.
I don't know exactly the script you are referring to but I'm going to assume you have at least two python files. One is the actual neural network and the other one handles training and evaluation.
The first thing you need to do is make sure the neural network can handle new classes. Look for something like this:
input_y = tf.placeholder(tf.float32, [None, classes], name="input_y")
A lot of the time, if you see tensors who's name contains x (input_x for example) they refer to the data, the training input.
Tensors that have y in their name, like the example above, usually refer to the labels.
The code above says input_y is a tensor (think array for the moment) of type float32, of variable length (None from [None, classes]) but with each element of dimension classes.
If classes was 3, input_y could look like this:
[[0, 0, 1], [1, 0, 0], [0, 1, 0]]
Just as well, it could look like this:
[[0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0], [0, 0, 1]]
Although the length can vary, we must always have as elements a size 3 (classes)
As for the meaning, [0, 0, 1] for example means this is a label for class 2, because we have 1 at index 2 ( look for one hot notation ).
The point of this is, a neural network with this sort of input can learn up to 3 classes. Each input for the tensor x has associated a label from the tensor y and the labels from y can be either 0, 1 or 2 in one hot notation.
With something like this, you can learn for example "keys", "HouseKeys" and "CarKeys" but you will not be able to add "OfficeKeys" for example.
So, first step is make sure your network can learn the maximum number of labels you want.
It does not have to learn them all at once. This brings us to point 2:
Take a look here. This is the documentation for the Tensorflow Saver class. This will allow you to save and load models.
For your problem, this translates into training the model on a 2 class data set, saving it, generating a 3 class data set, loading the previously saved model and train on the new data set. It will have the same "knowledge" (weights) as the model you saved, but it will start to adjust them to fit the third class.
But for this, you will need to make sure the network can, from the begining, handle 3 classes.
Hope this helps!

Related

Tensorflow difference between tf.losses.softmax and tf.nn.softmax

What is the difference between tf.nn.softmax_cross_entropy_with_logits and tf.losses.softmax_cross_entropy and when to use which function??
Been a while since I last used Tensorflow, but I'm pretty sure the latter is a wrapper for the former.
The tf.losses.softmax_cross_entropy simply uses tf.nn.softmax_cross_entropy_with_logits for calculating the loss, while allowing you to use extra features, like class weights, label smoothing etc.
tf.losses.softmax_cross_entropy also adds the loss to the tf.GraphKeys.LOSSES collection, so that you don't need to "route" it all the way to the top, but you can simply collect your losses
losses = tf.get_collection(tf.GraphKeys.LOSSES)
loss = tf.reduce_sum(reg_losses)
I personally liked being able to personalize my loss functions, so I used the "nn" one, but if you are not going to do anything outside the box, use tf.losses.
I also like to use sparse_softmax_cross_entropy, since most often I was working with mutually exclusive classes, and this way I could avoid converting the labels to one hot encoding myself.
The difference is that in tf.nn.softmax you need to use one-hot encoding for your labels, while in tf.nn.sparse_softmax_cross_entropy, you can use integers to specify what class number the label belongs too, ie. the difference is
labels = [
[0, 0, 1],
[1, 0, 0],
[0, 1, 0],
]
vs
labels = [2, 0, 1]
Finally, if you are not planning on doing much low-level stuff, I recommend you take a look at Keras.

Tensorflow custom Estimator with Dataset API: embedding lookup (feature_column) NMT task

my question is close in nature to Feature Columns Embedding lookup, however I was unable to comment on the answer given there (not enough rep), and I think the answerer either did not fully understand the question, or the answer was not exactly what was asked.
GOAL
To serve a custom Estimator which uses Dataset API to feed in data. The task is NMT (seq2seq).
Issue
Estimator requires feature_columns as input for serving. My NMT task has only one feature, the input sentence to translate (or possibly each word in the sentence is a feature?). And so I am unsure how to build a feature_column (and thus an embedding_column and finally an input_layer) using my input sentences as a feature that can be fed into an RNN (which expects an embedding_lookup [batch_size, max_seqence_len, embedding_dim]) which will finally allow me to serve the Estimator.
Background
I am attempting to utilize a custom estimator to feed a seq2seq style NMT implementation. I need to be able to serve the model via tf-serving, which estimators seem to make relatively easy.
However I hit a road block with 'how' to serve the model. From what I can tell I need 'feature_columns' that will serve as the input into the model.
https://github.com/MtDersvan/tf_playground/blob/master/wide_and_deep_tutorial/wide_and_deep_basic_serving.md
Shows that you need to have an export_input_fn which uses a feature_spec which needs a feature_column(s) as input. This makes sense, however, for my use case I do not have a bunch of (different) features, instead I have input sentences (where each word is a feature) that need to be looked up via embeddings and used as features...
So I know I need the input into my model to be feature column(s). My input for NMT is simply a tensor of [batch_size, max_sequence_len] which is filled with the indices of the words from the sentences (e.g., for batch_size=1 [3, 17, 132, 2, 1, 0, ...] where each index should map to an embedding vector). Typically I would feed this into a embedding_lookup via
embs = tf.get_variable('embedding', [vocab_size, embedding_dim])
tf.nn.embedding_lookup(embs, inputs)
and I would be good to go, I could feed this to an RNN as inputs and the rest is history, not a problem.
BUT, this is where I hit the issue, I need to use feature_columns (so I can serve the model). The answer given to the question I mentioned at the beginning of this shows how to use embedding_column, but instead he is suggesting that embedding should look up an entire sentence as one single feature, but traditionally you would look up each word in the sentence and get its embedding.
Similarly, https://www.damienpontifex.com/2018/01/02/using-tensorflow-feature-columns-in-your-custom-estimator-model/
Shows 'how to implement a feature-column in a custom estimator' and indeed his 'Before' code is exactly right (as I wrote out), a tf.get_variable into a tf.nn.embedding_lookup, but his 'after' code, again, only takes in 1 feature (the entire sentence?).
I have verified this by using their code and feeding my data in [batch_size, max_seq_len] to the tf.feature_column.categorical_column_with_identity, and the output tensor is [batch_size, embedding_dim]
the sequence information is lost? Or does it simply get flattened? when I print the output its size (?, embedding_dim) where ? is typically my batch_size.
EDIT: I have verified the shape is [batch_size, embedding_dim], it is not just flattened... So the sequence info is lost
I'm guessing it must be treating the input as 1 single input feature (thus the batch_size=1 ex [3, 17, 132, 2, 1, 0, ...] where each index maps to an embedding vector) would map to a single feature which is not what is wanted, we want each index to map to an embedding and the needed output is [batch_size, max_seq_len, embedding_dim].
It sounds like what I instead need, is not one categorical_column_with_*, but a max_seq_len amount of them (1 for each word in my sequence), does this sound right? Each word would be a feature for my model so I am leaning toward this being the correct approach, but this also has issues. I am using the Dataset API, so in my input_train_fn() I load my data from a file, and then use tf.data.Dataset.from_tensor_slices(data, labels) to split the data into tensors which I can then dataset.batch(batch_size).make_one_shot_iterator().get_next() to feed into my Estimator. I cannot iterator over each batch (Tesors are not iterable) so I cannot simply make 100 feature_columns for each input batch...
Does anyone have any idea how to do this? This embedding lookup is a very straightforward thing to do with simple placeholders or variables (and a common approach in NLP tasks). But when I venture into Dataset API and Estimators I run into a wall with very little in the way of information (that is not a basic example).
I admit I may have gaps in my understanding, custom estimators and dataset API are new to me and finding information on them can be difficult at times. So feel free to fire off information at me.
Thanks for reading my wall of text and hopefully helping me (and the others I've seen ask a question similar to this but get no answer https://groups.google.com/a/tensorflow.org/forum/#!searchin/discuss/embeddings$20in$20custom$20estimator/discuss/U3vFQF_jeaY/EjgwRQ3RDQAJ I feel bad for this guy, his question was not really answered (for the same reason outlined here, and his thread got hijacked...).
If I understand correctly, You want to use Estimator API to build a SeqSeq Model. A good place to start here, Look into the Problems-Solutions/text folder.
To answer your question on how to use emedding lookup, here is a one example
vocab_table = lookup.index_table_from_file(vocabulary_file='data/vocab.csv', num_oov_buckets=1, default_value=-1)
text = features[commons.FEATURE_COL]
words = tf.string_split(text)
dense_words = tf.sparse_tensor_to_dense(words, default_value=commons.PAD_WORD)
word_ids = vocab_table.lookup(dense_words)
padding = tf.constant([[0, 0], [0, commons.MAX_DOCUMENT_LENGTH]])
# Pad all the word_ids entries to the maximum document length
word_ids_padded = tf.pad(word_ids, padding)
word_id_vector = tf.slice(word_ids_padded, [0, 0], [-1, commons.MAX_DOCUMENT_LENGTH])
word_id_vector = {commons.FEATURE_COL: word_id_vector}
bow_column = tf.feature_column.categorical_column_with_identity(commons.FEATURE_COL, num_buckets=params.N_WORDS)
bow_embedding_column = tf.feature_column.embedding_column(bow_column, dimension=50, combiner='sqrtn')
bow = tf.feature_column.input_layer(word_id_vector, feature_columns=[bow_embedding_column])
logits = tf.layers.dense(bow, 2, activation=None)
The above code can be wrapped in Estimator model_fn. The above repo contains this code. Please take a look at it.
So the way I ended up making this work is I made each word an input feature, then I simply do the wrd_2_idx conversion, pass that in as a feature in a numerical_column(s, you have max_seq_lens of these) and then pass those columns to input_layer. Then in my graph I uses these features and lookup the embedding as normal. Basically circumventing the embedding_column lookup since I can't figure out how to make it act the way I want. This is probably not optimal but it works and trains...
I'll leave this as the accepted answer and hope sometime in the future either I figure out a better way to do it, or someone else can enlighten me to the best way to approach this.
I managed to get this working ... also got derailed by the fact that the RNN did not consume an embedding.
What I did to get this working (in the simplest case):
#features[VALUE_FEATURE_NAME] is shape (?, 200), ie. 200 words per row
inputs = tf.contrib.layers.embed_sequence(
features[VALUES_FEATURE_NAME], 3000, 5,
)
# create an LSTM cell of size 100
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(200)
# create the complete LSTM
_, final_states = tf.nn.dynamic_rnn(
lstm_cell, inputs, dtype=tf.float32)
outputs = final_states.h
So I guess the answer lies in the tensorflow docs for dynamic rnn
Creates a recurrent neural network specified by RNNCell cell.
Performs fully dynamic unrolling of inputs.
So the unrolling here means that the RNN consumes [batch,time_steps,values] as an input.
Bests
You can use tf.contrib.feature_column.sequence_categorical_column_with_vocabulary_list and tf.contrib.feature_column.sequence_input_layer to solve it.
The demo code is as follows:
import tensorflow as tf
if __name__ == "__main__":
#tf.enable_eager_execution()
feature = {
'aa': [['1', '2', '3'],
['-1', '4', '-1'],
['2', '-1', '-1'],
['4', '5', '6']]
}
aa_id = tf.contrib.feature_column.sequence_categorical_column_with_vocabulary_list(
'aa', ['1', '2', '3', '4', '5']
)
seq_emb_matrix = tf.feature_column.embedding_column(aa_id, 2)
seq_tensor, seq_length = tf.contrib.feature_column.sequence_input_layer(feature, [seq_emb_matrix])
seq_tensor1, seq_length1 = tf.contrib.feature_column.sequence_input_layer(feature1, [seq_emb_matrix])
seq_tensor2 = tf.squeeze(seq_tensor1)
# print(tensor)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
a, a_len = sess.run([seq_tensor, seq_length])
b, b_len = sess.run([seq_tensor1, seq_length1])
print(a)
print('a_len', a_len)
print(a.shape)
print('-'*50)
print(b)
print('b_len', b_len)
print(b.shape)
print(sess.run([seq_tensor2]))
print results are as follows:
[[[ 0.5333682 -0.39895234]
[ 0.5335079 0.64998794]
[-1.0432893 -0.8391434 ]]
[[ 0. 0. ]
[-0.29623085 -0.17570129]
[ 0. 0. ]]
[[ 0.5335079 0.64998794]
[ 0. 0. ]
[ 0. 0. ]]
[[-0.29623085 -0.17570129]
[ 0.7100604 0.9935588 ]
[ 0. 0. ]]]
('a_len', array([3, 3, 3, 3]))
(4, 3, 2)
--------------------------------------------------
[[[-0.24147142 -0.37740025]]
[[-0.6222648 1.3862932 ]]
[[ 1.2226609 -0.68292266]]]
('b_len', array([1, 1, 1]))
(3, 1, 2)
[array([[-0.24147142, -0.37740025],
[-0.6222648 , 1.3862932 ],
[ 1.2226609 , -0.68292266]], dtype=float32)]

What are the parameters of TensorFlow's dynamic_rnn for this simple data set?

I want to train an RNN language model using TensorFlow.
My training data is a sequence of 5 tokens represented with integers like so
x = [0, 1, 2, 3, 4]
I want the unrolled length of the RNN to be 4, and the training batch size to be 2. (I chose these values in order to require padding.)
Each token has an embedding of length 3 like so
0 -> [0, 0 ,0]
1 -> [10, 10, 10]
2 -> [20, 20, 20]
3 -> [30, 30, 30]
4 -> [40, 40, 40]
What should I pass as parameters to tf.nn.dynamic_rnn?
This is mostly a repost of "How is the input tensor for TensorFlow's tf.nn.dynamic_rnn operator structured?".
That was helpfully answered by Eugene Brevdo. However he slightly misunderstood my question because I didn't have enough TensorFlow knowledge to ask it clearly. (Specifically he thought I meant the batch size to be 1.) Rather than risk additional confusion by editing the original question, I think it is clearest if I just rephrase it here.
I'm trying to figure this out for myself by writing an Example TensorFlow RNN Language Model.
most rnn cells require floating point inputs, so you should first do an embedding lookup on your integer tensor to go from the Categorical values to floating point vectors in your dictionary/embedding. i believe the function is tf.nn.embedding_lookup. the output of that should be a 3-tensor shaped batch x time x embedding_depth (in your case, embedding depth is 3)
you can feed embedding_lookup an integer tensor shaped batch_size x time.

How is the input tensor for TensorFlow's tf.nn.dynamic_rnn operator structured?

I am trying to write a language model using word embeddings and recursive neural networks in TensorFlow 0.9.0 using the tf.nn.dynamic_rnn graph operation, but I don't understand how the input tensor is structured.
Let's say I have a corpus of n words. I embed each word in a vector of length e, and I want my RNN to unroll to t time steps. Assuming I use the default time_major = False parameter, what shape would my input tensor [batch_size, max_time, input_size] have?
Maybe a specific tiny example will make this question clearer. Say I have a corpus consisting of n=8 words that looks like this.
1, 2, 3, 3, 2, 1, 1, 2
Say I embed it in a vector of size e=3 with the embeddings 1 -> [10, 10, 10], 2 -> [20, 20, 20], and 3 -> [30, 30, 30], what would my input tensor look like?
I've read the TensorFlow Recurrent Neural Network tutorial, but that doesn't use tf.nn.dynamic_rnn. I've also read the documentation for tf.nn.dynamic_rnn, but find it confusing. In particular I'm not sure what "max_time" and "input_size" mean here.
Can anyone give the shape of the input tensor in terms of n, t, and e, and/or an example of what that tensor would look like initialized with data from the small corpus I describe?
TensorFlow 0.9.0, Python 3.5.1, OS X 10.11.5
In your case, it looks like batch_size = 1, since you're looking at a single example. So max_time is n=8 and input_size is the input depth, in your case e=3. So you would want to construct an input tensor which is shaped [1, 8, 3]. It's batch_major, so the first dimension (the batch dimension) is 1. If, say, you had another input at the same time, with n=6 words, then you would combine the two by padding this second example to 8 words (by padding zeros for the last 2 word embeddings) and you would have an inputs size of [2, 8, 3].

TensorFlow Embedding Lookup

I am trying to learn how to build RNN for Speech Recognition using TensorFlow. As a start, I wanted to try out some example models put up on TensorFlow page TF-RNN
As per what was advised, I had taken some time to understand how word IDs are embedded into a dense representation (Vector Representation) by working through the basic version of word2vec model code. I had an understanding of what tf.nn.embedding_lookup actually does, until I actually encountered the same function being used with two dimensional array in TF-RNN ptb_word_lm.py, when it did not make sense any more.
what I though tf.nn.embedding_lookup does:
Given a 2-d array params, and a 1-d array ids, function tf.nn.embedding_lookup fetches rows from params, corresponding to the indices given in ids, which holds with the dimension of output it is returning.
What I am confused about:
When tried with same params, and 2-d array ids, tf.nn.embedding_lookup returns 3-d array, instead of 2-d which I do not understand why.
I looked up the manual for Embedding Lookup, but I still find it difficult to understand how the partitioning works, and the result that is returned. I recently tried some simple example with tf.nn.embedding_lookup and it appears that it returns different values each time. Is this behaviour due to the randomness involved in partitioning ?
Please help me understand how tf.nn.embedding_lookup works, and why is used in both word2vec_basic.py, and ptb_word_lm.py i.e., what is the purpose of even using them ?
There is already an answer on what does tf.nn.embedding_lookup here.
When tried with same params, and 2-d array ids, tf.nn.embedding_lookup returns 3-d array, instead of 2-d which I do not understand why.
When you had a 1-D list of ids [0, 1], the function would return a list of embeddings [embedding_0, embedding_1] where embedding_0 is an array of shape embedding_size. For instance the list of ids could be a batch of words.
Now, you have a matrix of ids, or a list of list of ids. For instance, you now have a batch of sentences, i.e. a batch of list of words, i.e. a list of list of words.
If your list of sentences is: [[0, 1], [0, 3]] (sentence 1 is [0, 1], sentence 2 is [0, 3]), the function will compute a matrix of embeddings, which will be of shape [2, 2, embedding_size]and will look like:
[[embedding_0, embedding_1],
[embedding_0, embedding_3]]
Concerning the partition_strategy argument, you don't have to bother about it. Basically, it allows you to give a list of embedding matrices as params instead of 1 matrix, if you have limitations in computation.
So, you could split your embedding matrix of shape [1000, embedding_size] in ten matrices of shape [100, embedding_size] and pass this list of Variables as params. The argument partition_strategy handles the distribution of the vocabulary (the 1000 words) among the 10 matrices.