Tensorflow - How to ignore certain labels - tensorflow

I'm trying to implement a fully convolutional network and train it on the Pascal VOC dataset, however after reading up on the labels in the set, I see that I need to somehow ignore the "void" label. In Caffe their softmax function has an argument to ignore labels, so I'm wondering what the mechanic is, so I can implement something similar in tensorflow.
Thanks

In tensorflow you're feeding the data in feed_dict right? Generally you'd want to just pre-process the data and remove the unwanted samples - don't give them to tensorflow for processing.
My prefered approach is a producer-consumer model where you fire up a tensorflow queue and load it with samples from a loader thread which just skips enqueuing your void samples.
In training your model dequeue samples in the model (you don't use feed_dict in the optimize step). This way you're not bothering to write out a whole new dataset with the specific preprocessing step you're interested in today (tomorrow you're likely to find you want to do some other preprocessing step).
As a side comment, I think tensorflow is a little more do-it-yourself than some other frameworks. But I tend to like that, it abstracts enough to be convenient, but not so much that you don't understand what's happening. When you implement it you understand it, that's the motto that comes to mind with tensorflow.

Related

Using dynamically generated data with keras

I'm training a neural network using keras but I'm not sure how to feed the training data into the model in the way that I want.
My training data set is effectively infinite, I have some code to generate training examples as needed, so I just want to pipe a continuous stream of novel data into the network. keras seems to want me to specify my entire dataset in advance by creating a numpy array with everything in it, but this obviously wont work with my approach.
I've experimented with creating a generator class based on keras.utils.Sequence which seems like a better fit, but it still requires me to specify a length via the __len__ method which makes me think it will only create that many examples before recycling them. Can someone suggest a better approach?

Reusing transformations between training and predictions

I'd like to apply stemming to my training data set. I can do this outside of tensorflow as part of training data prep, but I then need to do the same process on prediction request data before calling the (stored) model.
Is there a way of implementing this transformation in tensorflow itself so the transformation is used for both training and predictions?
This problem becomes more annoying if the transformation requires knowledge of the whole dataset, normalisation for example.
Can you easily express your processing (e.g. stemming) as a tensorflow operation? If yes, then you can build your graph in a way that both your inputs and predictions can make use of the same set of operations. Otherwise, there isn't much harm in calling the same (non tensorflow) function for both pre-processing and for predictions.
Re normalisation: you would find the dataset statistics (means, variance, etc. depending on how exactly you are normalizing) and then hardcode them for the pre/post-processing so I don't think that's really an annoying case.

Tensorflow: How to create new neuron (Not perceptron neuron)

So tensorflow is extremely useful at creating neural networks that involve perceptron neurons. However, if one wanted to use a new type of neuron instead of the classic perceptron neuron, is this possible through augmenting tensorflow code? I can't seem to find an answer. I understand this would change the forward propagation, and more mathematical calculations, and I am willing to change all the necessary areas.
I am also aware that I can just code from scratch the layers I need, and the neurons I had in mind, but tensorflow nevertheless has GPU integration, so one can see its more ideal to manipulate their code as opposed to creating my own from scratch.
Has anyone experimented with this? My goal is to create neural network structures that use a different type of neuron than the classic perceptron.
If someone who knows where in tensorflow I could look to see where they initialize the perceptron neurons, I would very much appreciate it!
Edit:
To be more specific, is it possible to alter code in tensorflow to use a different neuron type rather than the perceptron to invoke the tensorlfow Module: tf.layers for example? Or tf.nn? (conv2D, batch-norm, max-pool, etc). I can figure out the details. I just need to know where (I'm sure they're a few locations) I would go about changing code for this.
However, if one wanted to use a new type of neuron instead of the classic perceptron neuron, is this possible through augmenting tensorflow code?
Yes. Tensorflow provides you the possibility to define a computational graph. It then can automatically calculate the gradient for that. No need to do it yourself. This is the reason why you define it symbolically. You might want to read the whitepaper or start with a tutorial.

Looking for clarification on "running" Tensorflow models

from Tensorflow's documentation, there seems to be a large array of options for "running", serving, testing, and predicting using a Tensorflow model. I've made a model very similar to MNIST, where it outputs a distribution from an image. For a beginner, what would be the easiest way to take one or a few images, and send them through the model, getting an output prediction? It is mostly for experimentation purposes. Sorry if this is too redundant, but all my research has led me to so many different ways of doing this and the documentation doesn't really give any info on the pros and cons of the different methods. Thanks
I guess you are using placeholders for your model input and then using feed_dict to feed values into your model.
If that's the case the simplest way would be after you have a trained model you save it using tf.saver. Then you can have a test script where you restore your model and then sess.run on your output variable with a feed_dict of whatever you want your input to be.

Sequence Labeling in TensorFlow

I have managed to train a word2vec with tensorflow and I want to feed those results into an rnn with lstm cells for sequence labeling.
1) It's not really clear on how to use your trained word2vec model for a rnn. (How to feed the result?)
2) I don't find much documentation on how to implement a sequence labeling lstm. (How do I bring in my labels?)
Could someone point me in the right direction on how to start with this task?
I suggest you start by reading the RNN tutorial and sequence-to-sequence tutorial. They explain how to build LSTMs in TensorFlow. Once you're comfortable with that, you'll have to find the right embedding Variable and assign it using your pre-trained word2vec model.
I realize this was posted a while ago, but I found this Gist about sequence labeling and this Gist about variable sequence labeling really helpful for figuring out sequence labeling. The basic outline (the gist of the Gist):
Use dynamic_rnn to handle unrolling your network for training and prediction. This method has moved around some in the API, so you may have to find it for your version, but just Google it.
Arrange your data into batches of size [batch_size, sequence_length, num_features], and your labels into batches of size [batch_size, sequence_length, num_classes]. Note that you want a label for every time step in your sequence.
For variable-length sequences, pass a value to the sequence_length argument of the dynamic_rnn wrapper for each sequence in your batch.
Training the RNN is very similar to training any other neural network once you have the network structure defined: feed it training data and target labels and watch it learn!
And some caveats:
With variable-length sequences, you will need to build masks for calculating your error metrics and stuff. It's all in the second link above, but don't forget when you make your own error metrics! I ran in to this a couple of times and it made my networks look like they were doing much worse on variable-length sequences.
You might want to add a regularization term to your loss function. I had some convergence issues without this.
I recommend using tf.train.AdamOptimizer with the default settings at first. Depending on your data, this may not converge and you will need to adjust the settings. This article does a good job of explaining what the different knobs do. Start reading from the beginning, some of the knobs are explained before the Adam section.
Hopefully these links are helpful to others in the future!