How to perform a multi label classification with tensorflow? [closed] - tensorflow

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm new to tensorflow and would like to know if there is any tutorial or example of a multilabel classification with multiple network outputs.
I'm asking this because I have a collection of images, in which, each image can belong to several classes and my output needs to have a score of each class.
I also do not know if the tensorflow follows some file pattern of the images and the classes, so if someone has some example it would facilitate a lot.
Thank you.

You can also try transforming your problem from a multi-label to multi-class classification using a Label Powerset approach. Label Powerset transformation treats every label combination attested in the training set as a different class and constructs one instance of a multi-class clasifier - and after prediction converts the assigned classes back to multi-label case. It is provided in scikit-multilearn and scikit-compatibility wrapper over the tensorflow Estimator or via an input_fn or use skflow. Then just plug it into an instance of LabelPowerset.
The code could go as follows:
from skmultilearn.problem_transform import LabelPowerset
import tensorflow.contrib.learn as skflow
# assume data is loaded using
# and is available in X_train/X_test, y_train/y_test
# initialize LabelPowerset multi-label classifier
# with tensor flow DNN base classifier
classifier = LabelPowerset(skflow.TensorFlowDNNClassifier(OPTIONS))
# train
classifier.fit(X_train, y_train)
# predict
predictions = classifier.predict(X_test)

The most naive (and reasonable) approach would be to train a classification network, and remove the softmax layer and replace it with a vector of sigmoids. This way you can have multiple units with an activation of 1.
You can see on TF-slim examples for classification networks. Under the path datasets you will find examples on how to prepare the TFExample "file pattern" for images and classes

Most solutions refer to sigmoid loss, and sigmoid do solve multi-label classification well in my case by tf.nn.sigmoid_cross_entropy_with_logits(labels,logits) in tensorflow.
However, when I handled class unbalance problem, where negative cases is much more than positive cases, I found my edited softsign loss worked much better than sigmoid. The adjust coefficient gamma is added to label to lower negative class's gradient by 3/4.
def unbalance_softsign_loss(labels, logits):
gamma = 1.25 *labels - 0.25
res = 1 - tf.log1p( gamma*logits/(1+ tf.abs(logits)) )
return res
where labels is multi-hot encoding vectors like [0, 1, 0, 1, 0], logits ~ (-inf, inf)

Related

How to avoid overfitting with keras? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
def build_model():
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[32,32,3]))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(500, activation="relu"))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(300, activation="relu"))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(10, activation="softmax"))
model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.SGD(), metrics=['accuracy'])
return model
keras_clf = keras.wrappers.scikit_learn.KerasClassifier(build_model)
def exponential_decay_fn(epoch):
return 0.05 * 0.1**(epoch / 20)
lr_scheduler = keras.callbacks.LearningRateScheduler(exponential_decay_fn)
history = keras_clf.fit(np.array(X_train_new), np.array(y_train_new), epochs=100,
validation_data=(np.array(X_validation), np.array(y_validation)),
callbacks=[keras.callbacks.EarlyStopping(patience=10),lr_scheduler])
I use 'drop out', 'early stopping', and 'lr scheduler'. The results seem overfitting, I tried to reduce n_neurons of hidden layers to (300, 100). The results were underfitting, the accuracy of the train set was only around 0.5.
Are there any suggestions?
i dealing with these issue I first start out with a simple model like just a few dense layer with not a lot of nodes. I run the model and look at the resultant training accuracy. First step in modelling is to get a high training accuracy. You can add more layers and or more nodes in each layer until you get a satisfactory level of accuracy. Once that is achieved then start to evaluate the validation loss. If after a certain number of epochs the training loss continues to decrease but the validation loss starts to TREND upward then you are in an over fitting condition. Now the word TREND is import. I can't tell from you graphs if you are really overfitting but it looks to me that the validation loss has reached its minimum and is probably oscillating around the minimum. This is normal and is NOT overfitting. If you have an adjustable lr callback that monitors validation loss or alternately a learning rate scheduler lowering the learning may get you to a lower minimum loss but at some point (provided you run for enough epochs) continually reducing the learning rate doesn't get you to a lower minimum loss. The model has just done the best it can.
Now if you are REALLY over fitting you can take remedial actions. One is to add more dropout at the potential of reduced training accuracy. Another is to add L1 and or L2 regularization. Documentation for that is here.. If your training accuracy is high but your validation accuracy is poor it usually implies you need more training samples because the samples you have are not fully representative of the data probability distribution. More training data is always better. I notice you have 10 classes. Look at the balance of your dataset. If the classes have a significantly different number of samples this can cause problems. There are a bunch of methods to handle that problem like over-sampling under represented classes, under-sampling over represented classes, or a combination of both. An easy method is to use the class_weight parameter in model.fit. Look at your validation set and make sure it is not using to many samples from under represented classes. Always best to select the validation set randomly from the overall data set.

Keras - Regularization & custom loss [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have built a custom Keras model which consists of various layers. Since I wanted to add L2 regularization to such layers, I've passed an instance of keras.regularizers.l2 as the argument for the kernel_regularizer parameter of those layers (as an example, see the constructor of keras.layers.Conv2D). Now, if I were to train this model using, say, Keras's implementation of the binary cross-entropy loss (keras.losses.BinaryCrossEntropy), I would be sure that the L2 regularization that I've specified would be taken into consideration when computing the loss.
In my case, however, I have a custom loss function that requires several other parameters aside from y_true and y_pred, meaning that there's no way I can pass this function as the argument for the loss parameter of model.compile(...) (in fact, I don't even call model.compile(...)). As a result, I also had to write a custom training loop. In other words, instead of simply running model.fit(...), I had to:
Perform forward propagation by calling model(x)
Compute the loss
Compute the gradients of the loss with respect to the model's weights (that is, model.trainable_variables) with tf.GradientTape
Apply the gradients
Repeat
My question is: in which phase is regularization accounted for?
During forward propagation?
During the computation/application of the gradients?
Keep in mind that my custom loss function does NOT account for regularization, so if it's not accounted for in any of the two phases I've mentioned above, then I'm actually training a model with no regularization whatsoever (even though I've provided a value for the kernel_regularizer argument in each layer that my network is made of). In that case, would I be forced to compute the regularization term by hand and add it to the loss?
Regularization losses are computed on the forward pass of the model, and their gradients are applied on the backward pass. I don't think that your training step is applying any weight regularization, and consequently your model isn't regularized. One way to check this would be to actually look at the weights of a trained model - if they're sparse, it means you've regularized the weights in some way. L1 regularization will actually push some weights to 0. L2 regularization does a similar thing, but often results in less sparse weights.
This post outlines writing a training loop from scratch in Keras and has a section on model regularization. The author adds the loss from regularization layers in his training step with the following command:
loss += sum(model.losses)
I think this may be what you need. If you are still unsure, I would train a model with the line above in the training loop, and another model without that line. Inspecting the weights of the trained models will give you some input on whether or not the weight regularization is working as expected.

word embeddings in tensorflow (no pre_trained) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am new to tensorflow and trying to look at different examples of tensorflow to understand it better.
Now I have seen this line being used in many tensorflow examples without mentioning of any specific embedding algorithm being used for getting the words embeddings.
embeddings = tf.Variable(tf.random_uniform((vocab_size, embed_dim), -1, 1))
embed = tf.nn.embedding_lookup(embeddings, input_data)
Here are some examples:
https://github.com/Decalogue/dlnd_tv_script_generation/blob/master/dlnd_tv_script_generation.py
https://github.com/ajmaradiaga/cervantes-text-generation/blob/master/cervants_nn.py
I understand that the first line will initialize the embedding of the words by random distribution but will the embedding vectors further be trained in the model to give more accurate representation of the words (and change the initial random values to more accurate numbers) and if yes what is the actual method being used when there is no mention of any obvious embedding methods such as using word2vec and glove inside the code (or feeding the pre_tained vectors of these methods instead of random numbers in the beginning)?
Yes, those embeddings are trained further just like weights and biases otherwise representing words with some random values wouldn't make any sense. Those embeddings are updated while training like you would update a weight matrix, that is, by using optimization methods like Gradient Descent or Adam optimizer, etc.
When we use pre-trained embeddings like word2vec, they're already trained on very large datasets and are quite accurate representations of words already hence, they don't need any further training. If you are asking how those are trained, there are two main training algorithms that can be used to learn the embedding from the text; they are Continuous Bag of Words (CBOW) and Skip Grams. Explaining them completely is not possible here but I would suggest taking help from Google. This article might get you started.

RNN with simultaneous POS tagging and sentiment classification? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm working on a problem where I need to perform simultaneous part-of-speech (POS) tagging and sentiment analysis. I'm using Tensorflow and am considering Keras.
I have a large data set of English sentences that have been labelled with both POS tags and with sentiment (negative, neutral, positive).
Is it possible to train a recurrent neural network (vanilla RNN, GRU, or LSTM) to learn both POS tagging and sentiment classification? Of course, during test time, I'd like to enter a sentence and have the RNN generate predictions for both the POS tags and the sentiment together.
I was thinking of the following RNN architecture. I'm not sure if it's possible with Tensorflow (which I've been using) or with Keras (which I'm just learning now). I've previously implemented RNNs that do one task, not two.
Thanks for any help.
A really simple Keras model that might work for POS-tagging might look like this:
from keras.layers import Dense, LSTM
from keras.models import Model, Sequential
model = Sequential()
model.add(
LSTM(
hidden_layer_size,
return_sequences=True,
input_shape=(seq_length, nb_words),
unroll=True
)
)
model.add(Dense(nb_pos_types, activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")
Where I assume various parameters:
hidden_layer_size: whatever dimension of the internal recurrent layer.
seq_length: the input sequence length.
nb_words: vocabulary size, for the one-hot encoded inputs detailing which word for which sequence position.
nb_pos_types: number of different possible POS labels (for the one-hot encoded labels).
The goal is to modify a simple network like this so that it also makes a prediction about sentiment (not clear if your sentiment is a score or a category label, but I will assume a category label), and so that the loss function includes a penalty term for that sentiment prediction.
There are many ways to do this, but one common way is to "fork" a new spoke of the model off of some early layer, and have this spoke produce the additional prediction (often referred to as "multi-task" or "joint-task" learning).
To do this, we'll start off the same with Sequential, but rename it as base_model to make it clear that it serves as a base set of layers before branching for multiple tasks. Then we'll use Keras's functional syntax to do what we need with each branch before combining them together as multiple outputs of a final_model, in which we can express part of the overall loss function for each output.
Here's how we could modify the above example to do it:
base_model = Sequential()
base_model.add(
LSTM(
hidden_layer_size,
return_sequences=True,
input_shape=(seq_length, nb_words),
unroll=True
)
)
# Get a handle to the output of the recurrent layer.
rec_output = base_model.outputs[0]
# Create a layer representing the POS prediction.
pos_spoke = Dense(nb_pos_types, activation="softmax",
name="pos")(rec_output)
# Create a layer representing the sentiment prediction.
# I assume `nb_sentiments` is the number of sentiment categories.
sentiment_spoke = Dense(nb_sentiments, activation="softmax",
name="sentiment")(rec_output)
# Reunify into a single model which takes the same inputs as
# determined for `base_model`, and provides a list of 2 outputs,
# one for each spoke (POS and sentiment).
final_model = Model(inputs=base_model.inputs,
outputs=[pos_spoke, sentiment_spoke])
# Finally, use a dictionary for the loss function to specify the
# loss for each output, and optionally separate weights for when
# the losses are added as a weighted sum for the total loss.
final_model.compile(
optimizer='rmsprop',
loss={'pos': 'categorical_crossentropy',
'sentiment': 'categorical_crossentropy'},
loss_weights={'pos': 1.0, 'sentiment': 1.0}
)
And finally when calling final_model.fit, you'll supply a list for the labels, containing two tensors or arrays of labels, associated with each output.
You can read more about multi-output losses and architectures at the Keras docs on multi-input and multi-output models.
Finally, note that this is an exceedingly simple model (and would likely not perform well-- it's only meant for illustration). You could use the spokes we created, pos_spoke and sentiment_spoke to have additional layers with more sophisticated network topologies if you have particular POS-specific or sentiment-specific architectures.
Instead of defining them straight away as Dense, they could be additional recurrent layers, perhaps even convolutional, etc., followed by some eventual Dense layer whose variable name and layer name would be used for the appropriate places in the outputs and losses.
Also be aware of the use of return_sequences=True here. This allows for POS and sentiment prediction at each step in the sequence, even though you likely would only care about sentiment prediction at the end. One likely option would be to modify sentiment_spoke to operate only on the final sequence element from rec_output, or another (less likely) option would be to repeat the sentence's overall sentiment label for every word in the input sequence.

DeepLearning Anomaly Detection for images

I am still relatively new to the world of Deep Learning. I wanted to create a Deep Learning model (preferably using Tensorflow/Keras) for image anomaly detection. By anomaly detection I mean, essentially a OneClassSVM.
I have already tried sklearn's OneClassSVM using HOG features from the image. I was wondering if there is some example of how I can do this in deep learning. I looked up but couldn't find one single code piece that handles this case.
The way of doing this in Keras is with the KerasRegressor wrapper module (they wrap sci-kit learn's regressor interface). Useful information can also be found in the source code of that module. Basically you first have to define your Network Model, for example:
def simple_model():
#Input layer
data_in = Input(shape=(13,))
#First layer, fully connected, ReLU activation
layer_1 = Dense(13,activation='relu',kernel_initializer='normal')(data_in)
#second layer...etc
layer_2 = Dense(6,activation='relu',kernel_initializer='normal')(layer_1)
#Output, single node without activation
data_out = Dense(1, kernel_initializer='normal')(layer_2)
#Save and Compile model
model = Model(inputs=data_in, outputs=data_out)
#you may choose any loss or optimizer function, be careful which you chose
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Then, pass it to the KerasRegressor builder and fit with your data:
from keras.wrappers.scikit_learn import KerasRegressor
#chose your epochs and batches
regressor = KerasRegressor(build_fn=simple_model, nb_epoch=100, batch_size=64)
#fit with your data
regressor.fit(data, labels, epochs=100)
For which you can now do predictions or obtain its score:
p = regressor.predict(data_test) #obtain predicted value
score = regressor.score(data_test, labels_test) #obtain test score
In your case, as you need to detect anomalous images from the ones that are ok, one approach you can take is to train your regressor by passing anomalous images labeled 1 and images that are ok labeled 0.
This will make your model to return a value closer to 1 when the input is an anomalous image, enabling you to threshold the desired results. You can think of this output as its R^2 coefficient to the "Anomalous Model" you trained as 1 (perfect match).
Also, as you mentioned, Autoencoders are another way to do anomaly detection. For this I suggest you take a look at the Keras Blog post Building Autoencoders in Keras, where they explain in detail about the implementation of them with the Keras library.
It is worth noticing that Single-class classification is another way of saying Regression.
Classification tries to find a probability distribution among the N possible classes, and you usually pick the most probable class as the output (that is why most Classification Networks use Sigmoid activation on their output labels, as it has range [0, 1]). Its output is discrete/categorical.
Similarly, Regression tries to find the best model that represents your data, by minimizing the error or some other metric (like the well-known R^2 metric, or Coefficient of Determination). Its output is a real number/continuous (and the reason why most Regression Networks don't use activations on their outputs). I hope this helps, good luck with your coding.