What is the difference between tf.nn.softmax_cross_entropy_with_logits and tf.losses.softmax_cross_entropy and when to use which function??
Been a while since I last used Tensorflow, but I'm pretty sure the latter is a wrapper for the former.
The tf.losses.softmax_cross_entropy simply uses tf.nn.softmax_cross_entropy_with_logits for calculating the loss, while allowing you to use extra features, like class weights, label smoothing etc.
tf.losses.softmax_cross_entropy also adds the loss to the tf.GraphKeys.LOSSES collection, so that you don't need to "route" it all the way to the top, but you can simply collect your losses
losses = tf.get_collection(tf.GraphKeys.LOSSES)
loss = tf.reduce_sum(reg_losses)
I personally liked being able to personalize my loss functions, so I used the "nn" one, but if you are not going to do anything outside the box, use tf.losses.
I also like to use sparse_softmax_cross_entropy, since most often I was working with mutually exclusive classes, and this way I could avoid converting the labels to one hot encoding myself.
The difference is that in tf.nn.softmax you need to use one-hot encoding for your labels, while in tf.nn.sparse_softmax_cross_entropy, you can use integers to specify what class number the label belongs too, ie. the difference is
labels = [
[0, 0, 1],
[1, 0, 0],
[0, 1, 0],
]
vs
labels = [2, 0, 1]
Finally, if you are not planning on doing much low-level stuff, I recommend you take a look at Keras.
Related
In Tensorflow 2.0, I'm trying to build a model that classifies my objects onto two categories: positives and negatives.
I want to use tf.keras.metrics.FalsePositives and tf.keras.metrics.FalseNegatives metrics to see how the model improves with every epoch. Both of these metrics have assertions stipulating: [predictions must be >= 0] and [predictions must be <= 1].
The problem is that an untrained model can generate an arbitrary number as a prediction. But even a trained model can sometimes produce an output slightly above 1 or slightly below 0.
Is there any way to disable these assertions?
Alternatively, is there any suitable activation function that forces the model outputs into [0, 1] range without causing any problems with the learning rate?
The sigmoid activation function is a suitable alternative if outputs must be in the range [0, 1] as it also ranges from 0 t0 1.
my question is close in nature to Feature Columns Embedding lookup, however I was unable to comment on the answer given there (not enough rep), and I think the answerer either did not fully understand the question, or the answer was not exactly what was asked.
GOAL
To serve a custom Estimator which uses Dataset API to feed in data. The task is NMT (seq2seq).
Issue
Estimator requires feature_columns as input for serving. My NMT task has only one feature, the input sentence to translate (or possibly each word in the sentence is a feature?). And so I am unsure how to build a feature_column (and thus an embedding_column and finally an input_layer) using my input sentences as a feature that can be fed into an RNN (which expects an embedding_lookup [batch_size, max_seqence_len, embedding_dim]) which will finally allow me to serve the Estimator.
Background
I am attempting to utilize a custom estimator to feed a seq2seq style NMT implementation. I need to be able to serve the model via tf-serving, which estimators seem to make relatively easy.
However I hit a road block with 'how' to serve the model. From what I can tell I need 'feature_columns' that will serve as the input into the model.
https://github.com/MtDersvan/tf_playground/blob/master/wide_and_deep_tutorial/wide_and_deep_basic_serving.md
Shows that you need to have an export_input_fn which uses a feature_spec which needs a feature_column(s) as input. This makes sense, however, for my use case I do not have a bunch of (different) features, instead I have input sentences (where each word is a feature) that need to be looked up via embeddings and used as features...
So I know I need the input into my model to be feature column(s). My input for NMT is simply a tensor of [batch_size, max_sequence_len] which is filled with the indices of the words from the sentences (e.g., for batch_size=1 [3, 17, 132, 2, 1, 0, ...] where each index should map to an embedding vector). Typically I would feed this into a embedding_lookup via
embs = tf.get_variable('embedding', [vocab_size, embedding_dim])
tf.nn.embedding_lookup(embs, inputs)
and I would be good to go, I could feed this to an RNN as inputs and the rest is history, not a problem.
BUT, this is where I hit the issue, I need to use feature_columns (so I can serve the model). The answer given to the question I mentioned at the beginning of this shows how to use embedding_column, but instead he is suggesting that embedding should look up an entire sentence as one single feature, but traditionally you would look up each word in the sentence and get its embedding.
Similarly, https://www.damienpontifex.com/2018/01/02/using-tensorflow-feature-columns-in-your-custom-estimator-model/
Shows 'how to implement a feature-column in a custom estimator' and indeed his 'Before' code is exactly right (as I wrote out), a tf.get_variable into a tf.nn.embedding_lookup, but his 'after' code, again, only takes in 1 feature (the entire sentence?).
I have verified this by using their code and feeding my data in [batch_size, max_seq_len] to the tf.feature_column.categorical_column_with_identity, and the output tensor is [batch_size, embedding_dim]
the sequence information is lost? Or does it simply get flattened? when I print the output its size (?, embedding_dim) where ? is typically my batch_size.
EDIT: I have verified the shape is [batch_size, embedding_dim], it is not just flattened... So the sequence info is lost
I'm guessing it must be treating the input as 1 single input feature (thus the batch_size=1 ex [3, 17, 132, 2, 1, 0, ...] where each index maps to an embedding vector) would map to a single feature which is not what is wanted, we want each index to map to an embedding and the needed output is [batch_size, max_seq_len, embedding_dim].
It sounds like what I instead need, is not one categorical_column_with_*, but a max_seq_len amount of them (1 for each word in my sequence), does this sound right? Each word would be a feature for my model so I am leaning toward this being the correct approach, but this also has issues. I am using the Dataset API, so in my input_train_fn() I load my data from a file, and then use tf.data.Dataset.from_tensor_slices(data, labels) to split the data into tensors which I can then dataset.batch(batch_size).make_one_shot_iterator().get_next() to feed into my Estimator. I cannot iterator over each batch (Tesors are not iterable) so I cannot simply make 100 feature_columns for each input batch...
Does anyone have any idea how to do this? This embedding lookup is a very straightforward thing to do with simple placeholders or variables (and a common approach in NLP tasks). But when I venture into Dataset API and Estimators I run into a wall with very little in the way of information (that is not a basic example).
I admit I may have gaps in my understanding, custom estimators and dataset API are new to me and finding information on them can be difficult at times. So feel free to fire off information at me.
Thanks for reading my wall of text and hopefully helping me (and the others I've seen ask a question similar to this but get no answer https://groups.google.com/a/tensorflow.org/forum/#!searchin/discuss/embeddings$20in$20custom$20estimator/discuss/U3vFQF_jeaY/EjgwRQ3RDQAJ I feel bad for this guy, his question was not really answered (for the same reason outlined here, and his thread got hijacked...).
If I understand correctly, You want to use Estimator API to build a SeqSeq Model. A good place to start here, Look into the Problems-Solutions/text folder.
To answer your question on how to use emedding lookup, here is a one example
vocab_table = lookup.index_table_from_file(vocabulary_file='data/vocab.csv', num_oov_buckets=1, default_value=-1)
text = features[commons.FEATURE_COL]
words = tf.string_split(text)
dense_words = tf.sparse_tensor_to_dense(words, default_value=commons.PAD_WORD)
word_ids = vocab_table.lookup(dense_words)
padding = tf.constant([[0, 0], [0, commons.MAX_DOCUMENT_LENGTH]])
# Pad all the word_ids entries to the maximum document length
word_ids_padded = tf.pad(word_ids, padding)
word_id_vector = tf.slice(word_ids_padded, [0, 0], [-1, commons.MAX_DOCUMENT_LENGTH])
word_id_vector = {commons.FEATURE_COL: word_id_vector}
bow_column = tf.feature_column.categorical_column_with_identity(commons.FEATURE_COL, num_buckets=params.N_WORDS)
bow_embedding_column = tf.feature_column.embedding_column(bow_column, dimension=50, combiner='sqrtn')
bow = tf.feature_column.input_layer(word_id_vector, feature_columns=[bow_embedding_column])
logits = tf.layers.dense(bow, 2, activation=None)
The above code can be wrapped in Estimator model_fn. The above repo contains this code. Please take a look at it.
So the way I ended up making this work is I made each word an input feature, then I simply do the wrd_2_idx conversion, pass that in as a feature in a numerical_column(s, you have max_seq_lens of these) and then pass those columns to input_layer. Then in my graph I uses these features and lookup the embedding as normal. Basically circumventing the embedding_column lookup since I can't figure out how to make it act the way I want. This is probably not optimal but it works and trains...
I'll leave this as the accepted answer and hope sometime in the future either I figure out a better way to do it, or someone else can enlighten me to the best way to approach this.
I managed to get this working ... also got derailed by the fact that the RNN did not consume an embedding.
What I did to get this working (in the simplest case):
#features[VALUE_FEATURE_NAME] is shape (?, 200), ie. 200 words per row
inputs = tf.contrib.layers.embed_sequence(
features[VALUES_FEATURE_NAME], 3000, 5,
)
# create an LSTM cell of size 100
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(200)
# create the complete LSTM
_, final_states = tf.nn.dynamic_rnn(
lstm_cell, inputs, dtype=tf.float32)
outputs = final_states.h
So I guess the answer lies in the tensorflow docs for dynamic rnn
Creates a recurrent neural network specified by RNNCell cell.
Performs fully dynamic unrolling of inputs.
So the unrolling here means that the RNN consumes [batch,time_steps,values] as an input.
Bests
You can use tf.contrib.feature_column.sequence_categorical_column_with_vocabulary_list and tf.contrib.feature_column.sequence_input_layer to solve it.
The demo code is as follows:
import tensorflow as tf
if __name__ == "__main__":
#tf.enable_eager_execution()
feature = {
'aa': [['1', '2', '3'],
['-1', '4', '-1'],
['2', '-1', '-1'],
['4', '5', '6']]
}
aa_id = tf.contrib.feature_column.sequence_categorical_column_with_vocabulary_list(
'aa', ['1', '2', '3', '4', '5']
)
seq_emb_matrix = tf.feature_column.embedding_column(aa_id, 2)
seq_tensor, seq_length = tf.contrib.feature_column.sequence_input_layer(feature, [seq_emb_matrix])
seq_tensor1, seq_length1 = tf.contrib.feature_column.sequence_input_layer(feature1, [seq_emb_matrix])
seq_tensor2 = tf.squeeze(seq_tensor1)
# print(tensor)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.tables_initializer())
a, a_len = sess.run([seq_tensor, seq_length])
b, b_len = sess.run([seq_tensor1, seq_length1])
print(a)
print('a_len', a_len)
print(a.shape)
print('-'*50)
print(b)
print('b_len', b_len)
print(b.shape)
print(sess.run([seq_tensor2]))
print results are as follows:
[[[ 0.5333682 -0.39895234]
[ 0.5335079 0.64998794]
[-1.0432893 -0.8391434 ]]
[[ 0. 0. ]
[-0.29623085 -0.17570129]
[ 0. 0. ]]
[[ 0.5335079 0.64998794]
[ 0. 0. ]
[ 0. 0. ]]
[[-0.29623085 -0.17570129]
[ 0.7100604 0.9935588 ]
[ 0. 0. ]]]
('a_len', array([3, 3, 3, 3]))
(4, 3, 2)
--------------------------------------------------
[[[-0.24147142 -0.37740025]]
[[-0.6222648 1.3862932 ]]
[[ 1.2226609 -0.68292266]]]
('b_len', array([1, 1, 1]))
(3, 1, 2)
[array([[-0.24147142, -0.37740025],
[-0.6222648 , 1.3862932 ],
[ 1.2226609 , -0.68292266]], dtype=float32)]
I'm building a program using tensorflow image classification. I got tensorflow from github, and what I know is pretty much, how to run classify_image.py!
What I want to do is have an option to train the model in a simple manner. For example, the model knows "keys", but I want to train it for "HouseKeys" which have a fancy keyfob or something. Is there some sort of script I can use to say "take these 20 images and learn HouseKeys" so the model can distinguish "keys" from "HouseKeys"?
Excuse my nooblines, and thank you in advance!
Edit: Obviously, it is very important that the model keeps its knowledge of all the other categories it knew previously, since being able to recognize only "HouseKeys" is absolutely useless.
You can do this. However, it will need some adjustments probably.
I don't know exactly the script you are referring to but I'm going to assume you have at least two python files. One is the actual neural network and the other one handles training and evaluation.
The first thing you need to do is make sure the neural network can handle new classes. Look for something like this:
input_y = tf.placeholder(tf.float32, [None, classes], name="input_y")
A lot of the time, if you see tensors who's name contains x (input_x for example) they refer to the data, the training input.
Tensors that have y in their name, like the example above, usually refer to the labels.
The code above says input_y is a tensor (think array for the moment) of type float32, of variable length (None from [None, classes]) but with each element of dimension classes.
If classes was 3, input_y could look like this:
[[0, 0, 1], [1, 0, 0], [0, 1, 0]]
Just as well, it could look like this:
[[0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0], [0, 0, 1]]
Although the length can vary, we must always have as elements a size 3 (classes)
As for the meaning, [0, 0, 1] for example means this is a label for class 2, because we have 1 at index 2 ( look for one hot notation ).
The point of this is, a neural network with this sort of input can learn up to 3 classes. Each input for the tensor x has associated a label from the tensor y and the labels from y can be either 0, 1 or 2 in one hot notation.
With something like this, you can learn for example "keys", "HouseKeys" and "CarKeys" but you will not be able to add "OfficeKeys" for example.
So, first step is make sure your network can learn the maximum number of labels you want.
It does not have to learn them all at once. This brings us to point 2:
Take a look here. This is the documentation for the Tensorflow Saver class. This will allow you to save and load models.
For your problem, this translates into training the model on a 2 class data set, saving it, generating a 3 class data set, loading the previously saved model and train on the new data set. It will have the same "knowledge" (weights) as the model you saved, but it will start to adjust them to fit the third class.
But for this, you will need to make sure the network can, from the begining, handle 3 classes.
Hope this helps!
I am trying to learn how to build RNN for Speech Recognition using TensorFlow. As a start, I wanted to try out some example models put up on TensorFlow page TF-RNN
As per what was advised, I had taken some time to understand how word IDs are embedded into a dense representation (Vector Representation) by working through the basic version of word2vec model code. I had an understanding of what tf.nn.embedding_lookup actually does, until I actually encountered the same function being used with two dimensional array in TF-RNN ptb_word_lm.py, when it did not make sense any more.
what I though tf.nn.embedding_lookup does:
Given a 2-d array params, and a 1-d array ids, function tf.nn.embedding_lookup fetches rows from params, corresponding to the indices given in ids, which holds with the dimension of output it is returning.
What I am confused about:
When tried with same params, and 2-d array ids, tf.nn.embedding_lookup returns 3-d array, instead of 2-d which I do not understand why.
I looked up the manual for Embedding Lookup, but I still find it difficult to understand how the partitioning works, and the result that is returned. I recently tried some simple example with tf.nn.embedding_lookup and it appears that it returns different values each time. Is this behaviour due to the randomness involved in partitioning ?
Please help me understand how tf.nn.embedding_lookup works, and why is used in both word2vec_basic.py, and ptb_word_lm.py i.e., what is the purpose of even using them ?
There is already an answer on what does tf.nn.embedding_lookup here.
When tried with same params, and 2-d array ids, tf.nn.embedding_lookup returns 3-d array, instead of 2-d which I do not understand why.
When you had a 1-D list of ids [0, 1], the function would return a list of embeddings [embedding_0, embedding_1] where embedding_0 is an array of shape embedding_size. For instance the list of ids could be a batch of words.
Now, you have a matrix of ids, or a list of list of ids. For instance, you now have a batch of sentences, i.e. a batch of list of words, i.e. a list of list of words.
If your list of sentences is: [[0, 1], [0, 3]] (sentence 1 is [0, 1], sentence 2 is [0, 3]), the function will compute a matrix of embeddings, which will be of shape [2, 2, embedding_size]and will look like:
[[embedding_0, embedding_1],
[embedding_0, embedding_3]]
Concerning the partition_strategy argument, you don't have to bother about it. Basically, it allows you to give a list of embedding matrices as params instead of 1 matrix, if you have limitations in computation.
So, you could split your embedding matrix of shape [1000, embedding_size] in ten matrices of shape [100, embedding_size] and pass this list of Variables as params. The argument partition_strategy handles the distribution of the vocabulary (the 1000 words) among the 10 matrices.
I have been using Tensorflow with the l-bfgs optimizer from openopt. It was pretty easy to setup callbacks to allow Tensorflow to compute gradients and loss evaluations for the l-bfgs, however, I am having some trouble figuring out how to introduce stochastic elements like dropout into the training procedure.
During the line search, l-bfgs performs multiple evaluations of the loss function, which need to operate on the same network as the prior gradient evaluation. However, it seems that for each evaluation of the tf.nn.dropout function, a new set of dropouts is created. I am looking for a way to fix the dropout over multiple evaluations of the loss function, and then allow it to change between the gradient steps of the l-bfgs. I'm assuming this has something to do with the control flow ops in tensorflow, but there isn't really a good tutorial on how to use these and they are a little enigmatic to me.
Thanks for your help!
Drop-out relies on uses random_uniform which is a stateful op, and I don't see a way to reset it. However, you can hack around it by substituting your own random numbers and feeding them to the same input point as random_uniform, replacing the generated values
Taking the following code:
tf.reset_default_graph()
a = tf.constant([1, 1, 1, 1, 1], dtype=tf.float32)
graph_level_seed = 1
operation_level_seed = 1
tf.set_random_seed(graph_level_seed)
b = tf.nn.dropout(a, 0.5, seed=operation_level_seed)
Visualize the graph to see where random_uniform is connected
You can see dropout takes input of random_uniform through the Add op which has a name mydropout/random_uniform/(random_uniform). Actually the /(random_uniform) suffix is there for UI reasons, and the true name is mydropout/random_uniform as you can see by printing tf.get_default_graph().as_graph_def(). That gives you shortened tensor name. Now you append :0 to get actual tensor name. (side-note: operation could produce multiple tensors which correspond to suffixes :0, :1 etc. Since having one output is the most common case, :0 is implicit in GraphDef and node input is equivalent to node:0. However :0 is not implicit when using feed_dict so you have to explicitly write node:0)
So now you can fix the seed by generating your own random numbers (of the same shape as incoming tensor), and reusing them between invocations.
tf.reset_default_graph()
a = tf.constant([1, 1, 1, 1, 1], dtype=tf.float32)
graph_level_seed = 1
operation_level_seed = 1
tf.set_random_seed(graph_level_seed)
b = tf.nn.dropout(a, 0.5, seed=operation_level_seed, name="mydropout")
random_numbers = np.random.random(a.get_shape()).astype(dtype=np.float32)
sess = tf.Session()
print sess.run(b, feed_dict={"mydropout/random_uniform:0":random_numbers})
print sess.run(b, feed_dict={"mydropout/random_uniform:0":random_numbers})
You should see the same set of numbers with 2 run calls.