I am trying to train yolov4 using already saved weights in colab. But the training process stops abruptly after loading the weights. Below is the log, after Create 6 permanent cpu-threads, execution stops:
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Total BFLOPS 59.563
avg_outputs = 489778
Allocate additional workspace_size = 52.43 MB
Loading weights from
/content/drive/MyDrive/YOLOv4_weight/backup/yolov4_custom_train_final.weights...
seen 64, trained: 192 K-images (3 Kilo-batches_64)
Done! Loaded 162 layers from weights-file
Learning Rate: 0.01, Momentum: 0.949, Decay: 0.0005
Detection layer: 139 - type = 27
Detection layer: 150 - type = 27
Detection layer: 161 - type = 27
Saving weights to backup/yolov4_custom_train_final.weights
Create 6 permanent cpu-threads
The weights I am trying to use were generated after 3000 epochs and yolov4.conv.137 was used as the initial weights data; as the accuracy was low I want to run it for 5000 epochs with the saved weights.
Related
every one:
I just run the transformer in Tensorflow tutorial with Google Colaboratory. After 20 epoch training, I got the model performance in :
Epoch 20 Batch 700 Loss 0.5509 Accuracy 0.3449
Saving checkpoint for epoch 20 at ./checkpoints/train/ckpt-4
Epoch 20 Loss 0.5510 Accuracy 0.3449
Time taken for 1 epoch: 63.83616662025452 secs
Why is the accuracy so low ? And how can I improve it ?
You can train the model for 100 epochs to improve the accuracy.
I am new to machine learning and lstm. I am referring this link LSTM for multistep forecasting for Encoder-Decoder LSTM Model With Multivariate Input section.
Here is my dataset description after reshaping the train and test set.
print(dataset.shape)
print(train_x.shape, train_y.shape)
print((test.shape)
(2192, 15)
(1806, 14, 14) (1806, 7, 1)
(364, 15)
In above I have n_input=14, n_out=7.
Here is my lstm model description:
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 2, 100, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
On evaluating the model, I am getting the output as:
Epoch 98/100
- 8s - loss: 64.6554
Epoch 99/100
- 7s - loss: 64.4012
Epoch 100/100
- 7s - loss: 63.9625
According to my understanding: (Please correct me if I am wrong)
Here my model accuracy is 63.9625 (by seeing the last epoch 100). Also, this is not stable since there is a gap between epoch 99 and epoch 100.
Here are my questions:
How epoch and batch size above defined is related to gaining model accuracy? How its increment and decrement affect model accuracy?
Is my above-defined epoch, batch, n_input is correct for the model?
How can I increase my model accuracy? Is the above dataset size is good enough for this model?
I am not able to link all this parameter and kindly help me in understanding how to achieve more accuracy by the above factor.
Having a very large epoch size will not necessarily improve your accuracy. Epoch sizes can increase the accuracy up to a certain limit beyond which you begin to overfit your model. Having a very low one will also result in underfitting. See this. So looking at the huge difference between epoch 99 and epoch 100, you can already tell that you are overfitting the model. As a rule of thumb, when you notice the accuracy stops increasing, that is the ideal number of epochs you should have usually between 1 and 10. 100 seems too much already.
Batch size does not affect your accuracy. This is just used to control the speed or performance based on the memory in your GPU. If you have huge memory, you can have a huge batch size so training will be faster.
What you can do to increase your accuracy is:
1. Increase your dataset for the training.
2. Try using Convolutional Networks instead. Find more on convolutional networks from this youtube channel or in a nutshell, CNN's help you identify what features to focus on in training your model.
3. Try other algorithms.
There is no well defined formula for batch size. Typically a larger batch size will run faster, but may compromise your accuracy. You will have to play around with the number.
However, one component with regards to epochs that you are missing is validation. It is normal to have a validation dataset and observe whether this accuracy over this dataset goes up or down. If the accuracy over this dataset goes up, you can multiply your learning rate by 0.8. See this link: https://machinelearningmastery.com/difference-test-validation-datasets/
I am adding my RNN text classification model. I am using last state to classify text. Dataset is small I am using glove vector for embedding.
def rnn_inputs(FLAGS, input_data):
with tf.variable_scope('rnn_inputs', reuse=True):
W_input = tf.get_variable("W_input", [FLAGS.en_vocab_size, FLAGS.num_hidden_units])
embeddings = tf.nn.embedding_lookup(W_input, input_data)
return embeddings
self.inputs_X = tf.placeholder(tf.int32, shape=[None, None, FLAGS.num_dim_input], name='inputs_X')
self.targets_y = tf.placeholder(tf.float32, shape=[None, None], name='targets_y')
self.dropout = tf.placeholder(tf.float32, name='dropout')
self.seq_leng = tf.placeholder(tf.int32, shape=[None, ], name='seq_leng')
with tf.name_scope("RNNcell"):
stacked_cell = rnn_cell(FLAGS, self.dropout)
with tf.name_scope("Inputs"):
with tf.variable_scope('rnn_inputs'):
W_input = tf.get_variable("W_input", [FLAGS.en_vocab_size, FLAGS.num_hidden_units], initializer=tf.truncated_normal_initializer(stddev=0.1))
inputs = rnn_inputs(FLAGS, self.inputs_X)
#initial_state = stacked_cell.zero_state(FLAGS.batch_size, tf.float32)
with tf.name_scope("DynamicRnn"):
# flat_inputs = tf.reshape(inputs, [FLAGS.batch_size, -1, FLAGS.num_hidden_units])
flat_inputs = tf.transpose(tf.reshape(inputs, [-1, FLAGS.batch_size, FLAGS.num_hidden_units]), perm=[1, 0, 2])
all_outputs, state = tf.nn.dynamic_rnn(cell=stacked_cell, inputs=flat_inputs, sequence_length=self.seq_leng, dtype=tf.float32)
outputs = state[0]
with tf.name_scope("Logits"):
with tf.variable_scope('rnn_softmax'):
W_softmax = tf.get_variable("W_softmax", [FLAGS.num_hidden_units, FLAGS.num_classes])
b_softmax = tf.get_variable("b_softmax", [FLAGS.num_classes])
logits = rnn_softmax(FLAGS, outputs)
probabilities = tf.nn.softmax(logits, name="probabilities")
self.accuracy = tf.equal(tf.argmax(self.targets_y,1), tf.argmax(logits,1))
with tf.name_scope("Loss"):
self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=self.targets_y))
with tf.name_scope("Grad"):
self.lr = tf.Variable(0.0, trainable=False)
trainable_vars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(self.loss, trainable_vars), FLAGS.max_gradient_norm)
optimizer = tf.train.AdamOptimizer(self.lr)
self.train_optimizer = optimizer.apply_gradients(zip(grads, trainable_vars))
sampling_outputs = all_outputs[0]
sampling_logits = rnn_softmax(FLAGS, sampling_outputs)
self.sampling_probabilities = tf.nn.softmax(sampling_logits)
Output printed
EPOCH 7 SUMMARY 40 STEP
Training loss 0.439
Training accuracy 0.247
----------------------
Validation loss 0.452
Validation accuracy 0.234
----------------------
Saving the model.
EPOCH 8 SUMMARY 45 STEP
Training loss 0.429
Training accuracy 0.281
----------------------
Validation loss 0.462
Validation accuracy 0.203
----------------------
Saving the model.
EPOCH 9 SUMMARY 50 STEP
Training loss 0.428
Training accuracy 0.268
----------------------
Validation loss 0.465
Validation accuracy 0.188
----------------------
Saving the model.
EPOCH 10 SUMMARY 55 STEP
Training loss 0.424
Training accuracy 0.284
----------------------
Validation loss 0.455
Validation accuracy 0.172
----------------------
Saving the model.
EPOCH 11 SUMMARY 60 STEP
Training loss 0.421
Training accuracy 0.305
----------------------
Validation loss 0.461
Validation accuracy 0.156
----------------------
Saving the model.
EPOCH 12 SUMMARY 65 STEP
Training loss 0.418
Training accuracy 0.299
----------------------
Validation loss 0.462
Validation accuracy 0.141
----------------------
Saving the model.
EPOCH 13 SUMMARY 70 STEP
Training loss 0.416
Training accuracy 0.286
----------------------
Validation loss 0.462
Validation accuracy 0.156
----------------------
Saving the model.
EPOCH 14 SUMMARY 75 STEP
Training loss 0.413
Training accuracy 0.323
----------------------
Validation loss 0.468
Validation accuracy 0.141
----------------------
Saving the model.
After 165 EPOCH
EPOCH 165 SUMMARY 830 STEP
Training loss 0.306
Training accuracy 0.544
----------------------
Validation loss 0.547
Validation accuracy 0.109
----------------------
Saving the model.
If training loss goes down, but validation loss goes up, it is likely that you are running into the problem of overfitting. This means: Generally speaking it is not that hard for a machine learning algorithm to perform exceptionally well on the training set (i.e. training loss is very low). If the algorithm just memorizes the training data set, it will produce a perfect score.
The challenge in machine learning however is to devise a model that performs well on unseen data, i.e. data that was not presented to the algorithm during training. This is what your validation set represents. If a model performs well on unseen data, we say that it generalizes well. If a model performs only well on training data, we call this overfitting. A model that does not generalize well is essentially useless, as it did not learn anything about the underlying structure of the data but just memorized the training set. This is useless because a trained model will be used on new data and probably never data used during training.
So how can you prevent that:
Reduce your model's capacity, i.e. take a simpler model and see if this can still accomplish the task. A less powerful model is less susceptible to simply memorize the data. Cf. also Occam's razor.
Use regularization: use e.g. dropout regularization in your model or add e.g. L1 or L2 norm of your trainable parameters to your loss function.
To get more information about this, search online for regularization, overfitting, etc.
Firstly, I would like to apologise as I am not allowed post much code because it's for a university project, but I am seriously stuck.
I am trying to train a ConvNet using the CIFAR-10 dataset with TensorFlow using the following model:
Image: [32,32,3]
conv1: 5,5,3,64 + bias[64](initialised to 0.0's)
norm1: depth_radis=4, bias=1.0, alpha=0.001/9.0, beta=0.75
pool1: ksize=[1,3,3,1], strides=[1,2,2,1], padding=SAME
conv2: 5,5,64,64 + bias[64](initialised to 0.1's)
pool2: ksize=[1,3,3,1], strides=[1,2,2,1], padding=SAME
norm2: depth_radis=4, bias=1.0, alpha=0.001/9.0, beta=0.75
local1: 8*8*64, 384 + bias[384](initialised to 0.1's)
local2: 384, 192 + bias[192](initialised to 0.1's)
dropout: keep_prob=0.5
softmax: [192,10] + bias[10](initialised to 0.0's)
However, the results I'm getting are (with batches of 1000):
step 0, training accuracy 0.09
step 1, training accuracy 0.096
step 2, training accuracy 0.1
step 3, training accuracy 0.108
step 4, training accuracy 0.122
step 5, training accuracy 0.094
step 6, training accuracy 0.086
step 7, training accuracy 0.082
step 8, training accuracy 0.104
step 9, training accuracy 0.09
I'm using the following to update weights:
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(y_conv + 1e-10, y_))
train_step = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy)
This is the guide I've been reading: https://www.tensorflow.org/versions/r0.11/tutorials/deep_cnn/index.html#convolutional-neural-networks
I have tried varying the learning rate from 1e-1 to 1e-8, but no luck.
Any help is greatly appreciated. Thanks in advance.
Use tf.nn.sparse_softmax_cross_entropy_with_logits instead of tf.nn.softmax_cross_entropy_with_logits.
you could try to do more things on your dataset:
normalize your image
shuffle your training dataset to reduce the iid(Independent and identically distributed) of data
try grayscale image to see some baseline of your model
I am training a bi-directional LSTM network, but when I train it, I got this as below:
"
Iter 3456, Minibatch Loss= 10.305597, Training Accuracy= 0.25000
Iter 3840, Minibatch Loss= 22.018646, Training Accuracy= 0.00000
Iter 4224, Minibatch Loss= 34.245750, Training Accuracy= 0.00000
Iter 4608, Minibatch Loss= 13.833059, Training Accuracy= 0.12500
Iter 4992, Minibatch Loss= 19.687658, Training Accuracy= 0.00000
"
Even iteration is 50 0000, loss and accuracy is nearly the same. My setting is below:
# Parameters
learning_rate = 0.0001
training_iters = 20000#120000
batch_size = tf.placeholder(dtype=tf.int32)#24,128
display_step = 16#10
test_num = 275#245
keep_prob = tf.placeholder("float") #probability for dropout
kp = 1.0
# Network Parameters
n_input = 240*160 #28 # MNIST data input (img shape: 28*28)
n_steps = 16 #28 # timesteps
n_hidden = 500 # hidden layer num of features
n_classes = 20
Is this the problem of techniques or schemes?
The first thing I would try is varying the learning rate to see if you can get the loss to decrease.
It may also be helpful to compare the accuracy with some baselines (e.g. are you better than predicting the most frequent class in a classification problem).
If your loss is not decreasing at all for a wide range of learning rates I would start looking for bugs in the code (e.g. is the training op that updates weights actually run, do features and labels match, is your data randomized properly, ...).
If there is a problem with the technique (bidirection LSTM) depends on what task you are trying to accomplish. If you are actually applying this to MNIST (based on the comment in your code) then I would rather recommend some convolutional and maxpooling layers than an RNN.