Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
Improve this question
I am somewhat new to the concept of the metrics MAE and RMSE, I know that using these metrics instead of accuracy is reccomended since I use regression instead of classification. I am wondering how to measure the true accuracy of my model, the labeled sets are either -1 or 1 depending on the specified inputs, and my model outputs both negative and positive numbers linearly. Here are the following graphs that were returned on training:
My model doesn't appear to look overfitted in comparison to both training and testing lines, also what does it signify that RMSE is .5 and cannot go any lower? Thank you.
Mean squared error calculates the squared difference between the predicted labels and the true labels.
On the other hand, Root mean squared error calculates the squared difference between the predicted labels and the true labels just like MSE, but unlike MSE, it then takes the square root of it. Therefore, RMSE calculates the absolute distance between the predicted labels and the true labels.
For example, if your model predicts 1 but the true label is -1, then,
MSE = {1-(-1)}^2 = 4
RMSE = √MSE = √4 = 2
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 months ago.
Improve this question
It looks like most predicted values are close to 0.5. How can the predicted values follow closer the original values?
normalizer = layers.Normalization()
normalizer.adapt(np.array(X_train))
model = keras.Sequential([
normalizer,
layers.Dense(8, activation='relu'),
layers.Dense(1, activation='linear'),
layers.Normalization()
])
There might be many issues here, but definitely you cannot normalize data at the output. You are literally saying "on average, I am expecting my output to be 0 and have unit variance". This makes sense iff your target is a standard, normalised Gaussian, but from the plot you can tell clearly it is not. Normalising inputs, or internal activations is fine, as there is always the final layer to apply final affine mapping. But if you do so at the end of the network, you are just making it impossible to learn most targets/signals.
Once this is solved, a network with 8 hidden neurons is extremely tiny and there is absolutely no guarantee it can learn anything, your training loss is very far from 0, you should make it much, much more expressive, and try to get your training to 0, if you can't do this - you have a bug somewhere else in the code (or the model is not expressive enough).
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
def build_model():
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[32,32,3]))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(500, activation="relu"))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(300, activation="relu"))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(10, activation="softmax"))
model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.SGD(), metrics=['accuracy'])
return model
keras_clf = keras.wrappers.scikit_learn.KerasClassifier(build_model)
def exponential_decay_fn(epoch):
return 0.05 * 0.1**(epoch / 20)
lr_scheduler = keras.callbacks.LearningRateScheduler(exponential_decay_fn)
history = keras_clf.fit(np.array(X_train_new), np.array(y_train_new), epochs=100,
validation_data=(np.array(X_validation), np.array(y_validation)),
callbacks=[keras.callbacks.EarlyStopping(patience=10),lr_scheduler])
I use 'drop out', 'early stopping', and 'lr scheduler'. The results seem overfitting, I tried to reduce n_neurons of hidden layers to (300, 100). The results were underfitting, the accuracy of the train set was only around 0.5.
Are there any suggestions?
i dealing with these issue I first start out with a simple model like just a few dense layer with not a lot of nodes. I run the model and look at the resultant training accuracy. First step in modelling is to get a high training accuracy. You can add more layers and or more nodes in each layer until you get a satisfactory level of accuracy. Once that is achieved then start to evaluate the validation loss. If after a certain number of epochs the training loss continues to decrease but the validation loss starts to TREND upward then you are in an over fitting condition. Now the word TREND is import. I can't tell from you graphs if you are really overfitting but it looks to me that the validation loss has reached its minimum and is probably oscillating around the minimum. This is normal and is NOT overfitting. If you have an adjustable lr callback that monitors validation loss or alternately a learning rate scheduler lowering the learning may get you to a lower minimum loss but at some point (provided you run for enough epochs) continually reducing the learning rate doesn't get you to a lower minimum loss. The model has just done the best it can.
Now if you are REALLY over fitting you can take remedial actions. One is to add more dropout at the potential of reduced training accuracy. Another is to add L1 and or L2 regularization. Documentation for that is here.. If your training accuracy is high but your validation accuracy is poor it usually implies you need more training samples because the samples you have are not fully representative of the data probability distribution. More training data is always better. I notice you have 10 classes. Look at the balance of your dataset. If the classes have a significantly different number of samples this can cause problems. There are a bunch of methods to handle that problem like over-sampling under represented classes, under-sampling over represented classes, or a combination of both. An easy method is to use the class_weight parameter in model.fit. Look at your validation set and make sure it is not using to many samples from under represented classes. Always best to select the validation set randomly from the overall data set.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
In TensorFlow we have tf.nn.softmax_cross_entropy_with_logits which only allows you to use your predicted logits and the index of gold labels (one-hot). However, sometimes we want to compute the cross entropy of two distributions, i.e., the gold standard is not one-hot. How can I achieve this purpose?
Actually tf.nn.softmax_cross_entropy_with_logits does not impose the restriction that labels must be on-hot encoded, so you can go ahead and use non-one-hot label vectors. You might be confusing this with tf.nn.sparse_softmax_cross_entropy_with_logits which does impose this restriction.
To the other part of your question– if you want to compute the cross-entropy between two normalized distributions in tensors p and q, you can use the formula yourself if you make sure use tf.math.xlogy so that you get zero for x=0 and y=0. So, letting p and q be two tensors representing normalized distributions across axis 1 you would have–
ce = - tf.reduce_sum(tf.math.xlogy(p, q), axis=1)
On the other hand, its likey that you actually have some logits that are output by a model (rather than a normalized distribution q that is computed from the logits). In this case it would be better to compute the cross-entropy by applying log-softmax of your logits
ce = - tf.reduce_sum(p * tf.nn.log_softmax(logits, axis=1), axis=1)
(thereby avoiding numerical instability of explicitly computing a softmax distribution and then immediately taking it's log). In the typical ML setting p would be your "labels" and q & logits is the output of your model. Note that this works fine for non-one-hot labels p.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am new to tensorflow and trying to look at different examples of tensorflow to understand it better.
Now I have seen this line being used in many tensorflow examples without mentioning of any specific embedding algorithm being used for getting the words embeddings.
embeddings = tf.Variable(tf.random_uniform((vocab_size, embed_dim), -1, 1))
embed = tf.nn.embedding_lookup(embeddings, input_data)
Here are some examples:
https://github.com/Decalogue/dlnd_tv_script_generation/blob/master/dlnd_tv_script_generation.py
https://github.com/ajmaradiaga/cervantes-text-generation/blob/master/cervants_nn.py
I understand that the first line will initialize the embedding of the words by random distribution but will the embedding vectors further be trained in the model to give more accurate representation of the words (and change the initial random values to more accurate numbers) and if yes what is the actual method being used when there is no mention of any obvious embedding methods such as using word2vec and glove inside the code (or feeding the pre_tained vectors of these methods instead of random numbers in the beginning)?
Yes, those embeddings are trained further just like weights and biases otherwise representing words with some random values wouldn't make any sense. Those embeddings are updated while training like you would update a weight matrix, that is, by using optimization methods like Gradient Descent or Adam optimizer, etc.
When we use pre-trained embeddings like word2vec, they're already trained on very large datasets and are quite accurate representations of words already hence, they don't need any further training. If you are asking how those are trained, there are two main training algorithms that can be used to learn the embedding from the text; they are Continuous Bag of Words (CBOW) and Skip Grams. Explaining them completely is not possible here but I would suggest taking help from Google. This article might get you started.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm new to tensorflow and would like to know if there is any tutorial or example of a multilabel classification with multiple network outputs.
I'm asking this because I have a collection of images, in which, each image can belong to several classes and my output needs to have a score of each class.
I also do not know if the tensorflow follows some file pattern of the images and the classes, so if someone has some example it would facilitate a lot.
Thank you.
You can also try transforming your problem from a multi-label to multi-class classification using a Label Powerset approach. Label Powerset transformation treats every label combination attested in the training set as a different class and constructs one instance of a multi-class clasifier - and after prediction converts the assigned classes back to multi-label case. It is provided in scikit-multilearn and scikit-compatibility wrapper over the tensorflow Estimator or via an input_fn or use skflow. Then just plug it into an instance of LabelPowerset.
The code could go as follows:
from skmultilearn.problem_transform import LabelPowerset
import tensorflow.contrib.learn as skflow
# assume data is loaded using
# and is available in X_train/X_test, y_train/y_test
# initialize LabelPowerset multi-label classifier
# with tensor flow DNN base classifier
classifier = LabelPowerset(skflow.TensorFlowDNNClassifier(OPTIONS))
# train
classifier.fit(X_train, y_train)
# predict
predictions = classifier.predict(X_test)
The most naive (and reasonable) approach would be to train a classification network, and remove the softmax layer and replace it with a vector of sigmoids. This way you can have multiple units with an activation of 1.
You can see on TF-slim examples for classification networks. Under the path datasets you will find examples on how to prepare the TFExample "file pattern" for images and classes
Most solutions refer to sigmoid loss, and sigmoid do solve multi-label classification well in my case by tf.nn.sigmoid_cross_entropy_with_logits(labels,logits) in tensorflow.
However, when I handled class unbalance problem, where negative cases is much more than positive cases, I found my edited softsign loss worked much better than sigmoid. The adjust coefficient gamma is added to label to lower negative class's gradient by 3/4.
def unbalance_softsign_loss(labels, logits):
gamma = 1.25 *labels - 0.25
res = 1 - tf.log1p( gamma*logits/(1+ tf.abs(logits)) )
return res
where labels is multi-hot encoding vectors like [0, 1, 0, 1, 0], logits ~ (-inf, inf)