Tensorflow keras fit - accuracy and loss both increasing drastically - tensorflow
ubuntu - 20.04
tensorflow 2.2
dataset used = MNIST
I am testing tensorflow and i notice that validation sparse_categorical_accuracy (accuracy) and validation SparseCategoricalCrossentropy (loss) both are increasing together which, does not make sense to me. I think the validation loss should be going down and validation accuracy increasing as the training progresses. Or, incase of overfitting, validation loss increasing and validation accuracy going down. But, validation loss and validation accuracy both are increasing as the training progresses. The training schedule however, is progressing according to expectation i.e training loss going down and training accuracy going up
Here is the code and the output:
#testing without preprocess monsoon
import tensorflow as tf
from tensorflow import keras as k
from tensorflow.keras import layers as l
import tensorflow_addons as tfa
mnist = tf.keras.datasets.mnist
(x_t,y_t),(x_te,y_te) = mnist.load_data()
x_t = x_t.reshape(60000,-1)
x_te = x_te.reshape(10000,-1)
d_x_t = tf.data.Dataset.from_tensor_slices(x_t)
d_y_t = tf.data.Dataset.from_tensor_slices(y_t)
dataset = tf.data.Dataset.zip((d_x_t,d_y_t)).shuffle(1000).batch(32)
d_x_te = tf.data.Dataset.from_tensor_slices(x_te)
d_y_te = tf.data.Dataset.from_tensor_slices(y_te)
dataset_test = tf.data.Dataset.zip((d_x_te,d_y_te)).shuffle(1000,seed=42).batch(32)
inp = k.Input((784,))
x = l.BatchNormalization()(inp)
x1 = l.Dense(1024,activation='relu',name='dense_1')(x)
x1=l.Dropout(0.5)(x1)
x1 = l.BatchNormalization()(x1)
x2 = l.Dense(512,activation='relu',name='dense_2')(x1)
x3 = l.Dense(512,activation='relu',name='dense_3')(x)
x = x3+x2
x=l.Dropout(0.5)(x)
x = l.BatchNormalization()(x)
x = l.Dense(10,activation='relu',name='dense_4')(x)
predictions = l.Dense(10,activation=None,name='preds')(x)
model = k.Model(inputs=inp,outputs=predictions)
opt=tfa.optimizers.MovingAverage(
k.optimizers.Adam(),
True,
0.99,
None,
'MovingAverage',
clipnorm=5
)
model.compile(optimizer=opt,
loss=k.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['sparse_categorical_accuracy'])
print('# Fit model on training data')
history = model.fit(dataset,
epochs=30,
steps_per_epoch=1875,
validation_data = dataset_test,
validation_steps = 313)
print('\nhistory dict:', history.history)
model.evaluate(dataset_test,batch_size=32,steps=331)
The learning evolution that i am getting is:
# Fit model on training data
Epoch 1/30
WARNING:tensorflow:From /home/nitin/anaconda3/envs/tensorflow/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
1875/1875 [==============================] - 49s 26ms/step - loss: 0.3614 - sparse_categorical_accuracy: 0.8913 - val_loss: 0.3355 - val_sparse_categorical_accuracy: 0.9548
Epoch 2/30
1875/1875 [==============================] - 49s 26ms/step - loss: 0.1899 - sparse_categorical_accuracy: 0.9427 - val_loss: 1.2028 - val_sparse_categorical_accuracy: 0.9641
Epoch 3/30
1875/1875 [==============================] - 51s 27ms/step - loss: 0.1546 - sparse_categorical_accuracy: 0.9521 - val_loss: 1.6385 - val_sparse_categorical_accuracy: 0.9673
Epoch 4/30
1875/1875 [==============================] - 38s 20ms/step - loss: 0.1357 - sparse_categorical_accuracy: 0.9585 - val_loss: 2.8285 - val_sparse_categorical_accuracy: 0.9697
Epoch 5/30
1875/1875 [==============================] - 38s 20ms/step - loss: 0.1253 - sparse_categorical_accuracy: 0.9608 - val_loss: 3.8489 - val_sparse_categorical_accuracy: 0.9697
Epoch 6/30
1875/1875 [==============================] - 29s 16ms/step - loss: 0.1149 - sparse_categorical_accuracy: 0.9646 - val_loss: 2.1872 - val_sparse_categorical_accuracy: 0.9699
Epoch 7/30
1875/1875 [==============================] - 29s 16ms/step - loss: 0.1094 - sparse_categorical_accuracy: 0.9646 - val_loss: 2.9429 - val_sparse_categorical_accuracy: 0.9695
Epoch 8/30
1875/1875 [==============================] - 29s 16ms/step - loss: 0.1066 - sparse_categorical_accuracy: 0.9667 - val_loss: 5.6166 - val_sparse_categorical_accuracy: 0.9710
Epoch 9/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0991 - sparse_categorical_accuracy: 0.9688 - val_loss: 3.9547 - val_sparse_categorical_accuracy: 0.9710
Epoch 10/30
1875/1875 [==============================] - 29s 16ms/step - loss: 0.0948 - sparse_categorical_accuracy: 0.9701 - val_loss: 4.8149 - val_sparse_categorical_accuracy: 0.9713
Epoch 11/30
1875/1875 [==============================] - 29s 16ms/step - loss: 0.0850 - sparse_categorical_accuracy: 0.9727 - val_loss: 7.4974 - val_sparse_categorical_accuracy: 0.9712
Epoch 12/30
1875/1875 [==============================] - 29s 16ms/step - loss: 0.0879 - sparse_categorical_accuracy: 0.9719 - val_loss: 4.3669 - val_sparse_categorical_accuracy: 0.9714
Epoch 13/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0817 - sparse_categorical_accuracy: 0.9743 - val_loss: 9.2499 - val_sparse_categorical_accuracy: 0.9725
Epoch 14/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0805 - sparse_categorical_accuracy: 0.9737 - val_loss: 7.5436 - val_sparse_categorical_accuracy: 0.9716
Epoch 15/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0798 - sparse_categorical_accuracy: 0.9751 - val_loss: 14.2331 - val_sparse_categorical_accuracy: 0.9712
Epoch 16/30
1875/1875 [==============================] - 29s 16ms/step - loss: 0.0745 - sparse_categorical_accuracy: 0.9757 - val_loss: 7.9517 - val_sparse_categorical_accuracy: 0.9715
Epoch 17/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0745 - sparse_categorical_accuracy: 0.9761 - val_loss: 7.9719 - val_sparse_categorical_accuracy: 0.9702
Epoch 18/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0741 - sparse_categorical_accuracy: 0.9763 - val_loss: 13.8696 - val_sparse_categorical_accuracy: 0.9665
Epoch 19/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0728 - sparse_categorical_accuracy: 0.9760 - val_loss: 20.2949 - val_sparse_categorical_accuracy: 0.9688
Epoch 20/30
1875/1875 [==============================] - 45s 24ms/step - loss: 0.0699 - sparse_categorical_accuracy: 0.9775 - val_loss: 8.8696 - val_sparse_categorical_accuracy: 0.9713
Epoch 21/30
1875/1875 [==============================] - 29s 16ms/step - loss: 0.0699 - sparse_categorical_accuracy: 0.9777 - val_loss: 12.9682 - val_sparse_categorical_accuracy: 0.9723
Epoch 22/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0674 - sparse_categorical_accuracy: 0.9781 - val_loss: 61.1677 - val_sparse_categorical_accuracy: 0.9692
Epoch 23/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0651 - sparse_categorical_accuracy: 0.9798 - val_loss: 21.3270 - val_sparse_categorical_accuracy: 0.9697
Epoch 24/30
1875/1875 [==============================] - 31s 16ms/step - loss: 0.0624 - sparse_categorical_accuracy: 0.9800 - val_loss: 62.2778 - val_sparse_categorical_accuracy: 0.9685
Epoch 25/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0665 - sparse_categorical_accuracy: 0.9792 - val_loss: 24.9327 - val_sparse_categorical_accuracy: 0.9687
Epoch 26/30
1875/1875 [==============================] - 46s 24ms/step - loss: 0.0605 - sparse_categorical_accuracy: 0.9805 - val_loss: 42.0141 - val_sparse_categorical_accuracy: 0.9700
Epoch 27/30
1875/1875 [==============================] - 29s 16ms/step - loss: 0.0601 - sparse_categorical_accuracy: 0.9806 - val_loss: 54.8586 - val_sparse_categorical_accuracy: 0.9695
Epoch 28/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0583 - sparse_categorical_accuracy: 0.9811 - val_loss: 25.3613 - val_sparse_categorical_accuracy: 0.9680
Epoch 29/30
1875/1875 [==============================] - 29s 16ms/step - loss: 0.0576 - sparse_categorical_accuracy: 0.9811 - val_loss: 23.2299 - val_sparse_categorical_accuracy: 0.9710
Epoch 30/30
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0566 - sparse_categorical_accuracy: 0.9817 - val_loss: 16.5671 - val_sparse_categorical_accuracy: 0.9728
history dict: {'loss': [0.36135926842689514, 0.1898646354675293, 0.15456895530223846, 0.13569727540016174, 0.12525275349617004, 0.1148592159152031, 0.10943067818880081, 0.1066298857331276, 0.09912335127592087, 0.09476170688867569, 0.08501157909631729, 0.0879492461681366, 0.08170024305582047, 0.08047273010015488, 0.07976552098989487, 0.07453753799200058, 0.07450901716947556, 0.07413797080516815, 0.07278618961572647, 0.0698995441198349, 0.06988336145877838, 0.06740442663431168, 0.06507138162851334, 0.06242847815155983, 0.0665266141295433, 0.06050613150000572, 0.06005210056900978, 0.05830719694495201, 0.05763527378439903, 0.05664650723338127], 'sparse_categorical_accuracy': [0.8913000226020813, 0.9427499771118164, 0.9521499872207642, 0.9585333466529846, 0.9607999920845032, 0.9645500183105469, 0.9645666480064392, 0.9666833281517029, 0.9687666893005371, 0.9701166749000549, 0.9726999998092651, 0.9719499945640564, 0.9742666482925415, 0.9736999869346619, 0.9750999808311462, 0.9757000207901001, 0.9760833382606506, 0.9763166904449463, 0.9759833216667175, 0.977483332157135, 0.9777166843414307, 0.9780833125114441, 0.9798333048820496, 0.9800000190734863, 0.9792333245277405, 0.9805499911308289, 0.9805999994277954, 0.9810666441917419, 0.9810666441917419, 0.9816833138465881], 'val_loss': [0.33551061153411865, 1.2028071880340576, 1.6384732723236084, 2.828489065170288, 3.8488738536834717, 2.187160015106201, 2.9428975582122803, 5.6166462898254395, 3.954725503921509, 4.814915657043457, 7.4974141120910645, 4.366909503936768, 9.24986457824707, 7.543578147888184, 14.233136177062988, 7.951717853546143, 7.971870422363281, 13.869564056396484, 20.29490089416504, 8.869643211364746, 12.968180656433105, 61.167701721191406, 21.327049255371094, 62.27778625488281, 24.932708740234375, 42.01411437988281, 54.85857009887695, 25.361297607421875, 23.229896545410156, 16.56712532043457], 'val_sparse_categorical_accuracy': [0.954800009727478, 0.9641000032424927, 0.9672999978065491, 0.9696999788284302, 0.9696999788284302, 0.9699000120162964, 0.9695000052452087, 0.9710000157356262, 0.9710000157356262, 0.9713000059127808, 0.9711999893188477, 0.9714000225067139, 0.9725000262260437, 0.9715999960899353, 0.9711999893188477, 0.9714999794960022, 0.9702000021934509, 0.9664999842643738, 0.9688000082969666, 0.9713000059127808, 0.9722999930381775, 0.9692000150680542, 0.9696999788284302, 0.968500018119812, 0.9686999917030334, 0.9700000286102295, 0.9695000052452087, 0.9679999947547913, 0.9710000157356262, 0.9728000164031982]}
302/331 [==========================>...] - ETA: 0s - loss: 17.1192 - sparse_categorical_accuracy: 0.9725WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 331 batches). You may need to use the repeat() function when building your dataset.
313/331 [===========================>..] - 1s 3ms/step - loss: 16.5671 - sparse_categorical_accuracy: 0.9728
[16.567113876342773, 0.9728000164031982]
If training loss is decreasing and validation one is increasing when it's likely you've overfitted the model.
Also I have some doubts about this line:
x = x3+x2
As I understand, you want to create short connection. But in keras you should use Add layer to do this.
Related
What the meaning of loss < 0
I am comparing two model the one uses binary_crossentropy(Model A) as optimizer, another uses mean_squared_error(Model B) Model A) self.seq_len = 2 in_out_neurons = 50 n_hidden = 500 model = Sequential() model.add(LSTM(n_hidden, batch_input_shape=(None, self.seq_len, in_out_neurons), return_sequences=True)) model.add(Dense(in_out_neurons, activation="relu")) optimizer = Adam(learning_rate=0.001) #model.compile(loss="mean_squared_error", optimizer=optimizer) model.compile(loss='binary_crossentropy', optimizer=optimizer) Epoch 1/10 718/718 [==============================] - 32s 42ms/step - loss: -0.0633 - val_loss: -0.0649 Epoch 2/10 718/718 [==============================] - 33s 46ms/step - loss: -0.0632 - val_loss: -0.0572 Epoch 3/10 718/718 [==============================] - 43s 60ms/step - loss: -0.0592 - val_loss: -0.0570 Epoch 4/10 718/718 [==============================] - 51s 71ms/step - loss: -0.0522 - val_loss: -0.0431 Epoch 5/10 718/718 [==============================] - 50s 69ms/step - loss: -0.0566 - val_loss: -0.0535 Epoch 6/10 718/718 [==============================] - 49s 68ms/step - loss: -0.0567 - val_loss: -0.0537 Epoch 7/10 718/718 [==============================] - 48s 67ms/step - loss: -0.0627 - val_loss: -0.0499 Epoch 8/10 718/718 [==============================] - 51s 71ms/step - loss: -0.0621 - val_loss: -0.0614 Epoch 9/10 718/718 [==============================] - 47s 65ms/step - loss: -0.0645 - val_loss: -0.0653 Epoch 10/10 718/718 [==============================] - 43s 60ms/step - loss: -0.0661 - val_loss: -0.0622 Model B) self.seq_len = 2 in_out_neurons = 50 n_hidden = 500 model = Sequential() model.add(LSTM(n_hidden, batch_input_shape=(None, self.seq_len, in_out_neurons), return_sequences=True)) model.add(Dense(in_out_neurons, activation="relu")) optimizer = Adam(learning_rate=0.001) model.compile(loss="mean_squared_error", optimizer=optimizer) #model.compile(loss='binary_crossentropy', optimizer=optimizer) Epoch 1/10 718/718 [==============================] - 36s 48ms/step - loss: 0.0189 - val_loss: 0.0190 Epoch 2/10 718/718 [==============================] - 46s 64ms/step - loss: 0.0188 - val_loss: 0.0189 Epoch 3/10 718/718 [==============================] - 48s 67ms/step - loss: 0.0187 - val_loss: 0.0189 Epoch 4/10 718/718 [==============================] - 58s 81ms/step - loss: 0.0187 - val_loss: 0.0188 Epoch 5/10 718/718 [==============================] - 62s 87ms/step - loss: 0.0186 - val_loss: 0.0188 Epoch 6/10 718/718 [==============================] - 72s 100ms/step - loss: 0.0186 - val_loss: 0.0188 Epoch 7/10 718/718 [==============================] - 73s 102ms/step - loss: 0.0185 - val_loss: 0.0187 Epoch 8/10 718/718 [==============================] - 60s 84ms/step - loss: 0.0185 - val_loss: 0.0187 Epoch 9/10 718/718 [==============================] - 64s 89ms/step - loss: 0.0185 - val_loss: 0.0187 Epoch 10/10 718/718 [==============================] - 64s 89ms/step - loss: 0.0185 - val_loss: 0.0187 Model B's loss is more than 0 , so it could be understood. However,Model A's loss is less than 0, what does it mean??
Cross entropy is calculated as minus expected value of logarithm of the result. Usually it is used after sigmoid or softmax activation, where all values <= 1, their logarithms <= 0, and thus the result is >= 0. But you use it after relu activation that can give values > 1, that's why you obtain the result < 0. The moral is that the output layer activation and the loss should correspond to each other and must make sense from the point of view of the task you are trying to solve. Otherwise you may obtain senseless results.
Unclear difference between progress output in TensorFlow 2.3.0 and 1.15.0 in same code
I am new in ML. I have installed two tensorflows in two anaconda environments 1.15.0 and 2.3.0. (1.15.0 for be able to use my old GTX 660 videocard) and saw the difference in output progress info when training the same model. Сode from book "Deep Learning with Python" by François Chollet: import numpy as np import os data_dir='C:/Users/Username/_JupyterDocs/sund/data' fname = os.path.join(data_dir, 'jena_climate_2009_2016.csv') os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"]="-1" f = open(fname) data = f.read() f.close() lines = data.split('\n') header = lines[0].split(',') lines = lines[1:] float_data = np.zeros((len(lines), len(header) - 1)) for i, line in enumerate(lines): values = [float(x) for x in line.split(',')[1:]] float_data[i, :] = values mean = float_data[:200000].mean(axis=0) float_data -= mean std = float_data[:200000].std(axis=0) float_data /= std def generator(data, lookback, delay, min_index, max_index, shuffle=False, batch_size=128, step=6): if max_index is None: max_index = len(data) - delay - 1 i = min_index + lookback while 1: if shuffle: rows = np.random.randint(min_index + lookback, max_index, size=batch_size) else: if i + batch_size >= max_index: i = min_index + lookback rows = np.arange(i, min(i + batch_size, max_index)) i += len(rows) samples = np.zeros((len(rows), lookback // step, data.shape[-1])) targets = np.zeros((len(rows),)) for j, row in enumerate(rows): indices = range(rows[j] - lookback, rows[j], step) samples[j] = data[indices] targets[j] = data[rows[j] + delay ][1] yield samples, targets lookback = 1440 step = 6 delay = 144 batch_size = 128 train_gen = generator(float_data, lookback=lookback, delay=delay, min_index=0, max_index=200000, shuffle=True, step=step, batch_size=batch_size) val_gen = generator(float_data, lookback=lookback, delay=delay, min_index=200001, max_index=300000, step=step, batch_size=batch_size) test_gen = generator(float_data, lookback=lookback, delay=delay, min_index=300001, max_index=None, step=step, batch_size=batch_size) val_steps = (300000 - 200001 - lookback) // batch_size test_steps = (len(float_data) - 300001 - lookback) // batch_size import time from tensorflow.keras.models import Sequential from tensorflow.keras import layers from tensorflow.keras.optimizers import RMSprop model = Sequential() model.add(layers.GRU(32, input_shape=(None, float_data.shape[-1]))) model.add(layers.Dense(1)) model.compile(optimizer=RMSprop(), loss='mae') start = time.perf_counter() history = model.fit_generator(train_gen, steps_per_epoch=500, epochs=20, validation_data=val_gen, validation_steps=val_steps, verbose=1) elapsed = time.perf_counter() - start f = open("C:/Users/Username/Desktop/log1.txt", "a") f.write('Elapsed %.3f seconds.' % elapsed) f.close() print('Elapsed %.3f seconds.' % elapsed) TF 2.3.0 progress output: -Warning about deprecated in output: WARNING:tensorflow:From C:\Users\Username\AppData\Local\Temp/ipykernel_10804/2601851929.py:13: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version. Instructions for updating: Please use Model.fit, which supports generators. -Output: Epoch 1/20 500/500 [==============================] - 45s 89ms/step - loss: 0.3050 - val_loss: 0.2686 Epoch 2/20 500/500 [==============================] - 45s 90ms/step - loss: 0.2841 - val_loss: 0.2658 Epoch 3/20 500/500 [==============================] - 46s 92ms/step - loss: 0.2771 - val_loss: 0.2653 Epoch 4/20 500/500 [==============================] - 46s 91ms/step - loss: 0.2729 - val_loss: 0.2795 Epoch 5/20 500/500 [==============================] - 45s 90ms/step - loss: 0.2690 - val_loss: 0.2644 Epoch 6/20 500/500 [==============================] - 45s 90ms/step - loss: 0.2632 - val_loss: 0.2673 Epoch 7/20 500/500 [==============================] - 45s 90ms/step - loss: 0.2602 - val_loss: 0.2641 Epoch 8/20 500/500 [==============================] - 45s 90ms/step - loss: 0.2549 - val_loss: 0.2667 Epoch 9/20 500/500 [==============================] - 45s 91ms/step - loss: 0.2507 - val_loss: 0.2768 Epoch 10/20 500/500 [==============================] - 45s 90ms/step - loss: 0.2447 - val_loss: 0.2785 Epoch 11/20 500/500 [==============================] - 45s 90ms/step - loss: 0.2422 - val_loss: 0.2763 Epoch 12/20 500/500 [==============================] - 45s 90ms/step - loss: 0.2354 - val_loss: 0.2794 Epoch 13/20 500/500 [==============================] - 46s 92ms/step - loss: 0.2320 - val_loss: 0.2807 Epoch 14/20 500/500 [==============================] - 45s 89ms/step - loss: 0.2277 - val_loss: 0.2848 Epoch 15/20 500/500 [==============================] - 45s 90ms/step - loss: 0.2222 - val_loss: 0.2909 Epoch 16/20 500/500 [==============================] - 45s 90ms/step - loss: 0.2178 - val_loss: 0.2910 Epoch 17/20 500/500 [==============================] - 45s 89ms/step - loss: 0.2152 - val_loss: 0.2918 Epoch 18/20 500/500 [==============================] - 45s 90ms/step - loss: 0.2112 - val_loss: 0.2917 Epoch 19/20 500/500 [==============================] - 44s 89ms/step - loss: 0.2103 - val_loss: 0.2979 Epoch 20/20 500/500 [==============================] - 45s 89ms/step - loss: 0.2068 - val_loss: 0.2986 Elapsed 904.779 seconds. TF 1.15.0 progress output: -Warning about deprecated in output: WARNING:tensorflow:From C:\Users\Username\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. Output: Epoch 1/20 WARNING:tensorflow:From C:\Users\Username\anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where 499/500 [============================>.] - ETA: 0s - loss: 0.3014Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.2285 500/500 [==============================] - 63s 126ms/step - loss: 0.3014 - val_loss: 0.2686 Epoch 2/20 499/500 [============================>.] - ETA: 0s - loss: 0.2836Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.2225 500/500 [==============================] - 62s 123ms/step - loss: 0.2836 - val_loss: 0.2667 Epoch 3/20 499/500 [============================>.] - ETA: 0s - loss: 0.2761Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.3162 500/500 [==============================] - 62s 123ms/step - loss: 0.2762 - val_loss: 0.2721 Epoch 4/20 499/500 [============================>.] - ETA: 0s - loss: 0.2731Epoch 1/20 769/500 [==============================================] - 16s 21ms/step - loss: 0.2422 500/500 [==============================] - 62s 124ms/step - loss: 0.2730 - val_loss: 0.2667 Epoch 5/20 499/500 [============================>.] - ETA: 0s - loss: 0.2667Epoch 1/20 769/500 [==============================================] - 16s 21ms/step - loss: 0.3732 500/500 [==============================] - 61s 122ms/step - loss: 0.2667 - val_loss: 0.2663 Epoch 6/20 499/500 [============================>.] - ETA: 0s - loss: 0.2613Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.2088 500/500 [==============================] - 62s 124ms/step - loss: 0.2613 - val_loss: 0.2648 Epoch 7/20 499/500 [============================>.] - ETA: 0s - loss: 0.2544Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.3043 500/500 [==============================] - 62s 125ms/step - loss: 0.2544 - val_loss: 0.2710 Epoch 8/20 499/500 [============================>.] - ETA: 0s - loss: 0.2493Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.2767 500/500 [==============================] - 63s 127ms/step - loss: 0.2493 - val_loss: 0.2717 Epoch 9/20 499/500 [============================>.] - ETA: 0s - loss: 0.2455Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.2336 500/500 [==============================] - 62s 124ms/step - loss: 0.2455 - val_loss: 0.2743 Epoch 10/20 499/500 [============================>.] - ETA: 0s - loss: 0.2406Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.3041 500/500 [==============================] - 63s 126ms/step - loss: 0.2406 - val_loss: 0.2776 Epoch 11/20 499/500 [============================>.] - ETA: 0s - loss: 0.2345Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.2655 500/500 [==============================] - 62s 124ms/step - loss: 0.2344 - val_loss: 0.2779 Epoch 12/20 499/500 [============================>.] - ETA: 0s - loss: 0.2310Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.3085 500/500 [==============================] - 62s 124ms/step - loss: 0.2310 - val_loss: 0.2800 Epoch 13/20 499/500 [============================>.] - ETA: 0s - loss: 0.2271Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.3029 500/500 [==============================] - 64s 127ms/step - loss: 0.2271 - val_loss: 0.2839 Epoch 14/20 499/500 [============================>.] - ETA: 0s - loss: 0.2226Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.3110 500/500 [==============================] - 62s 125ms/step - loss: 0.2226 - val_loss: 0.2886 Epoch 15/20 499/500 [============================>.] - ETA: 0s - loss: 0.2190Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.3329 500/500 [==============================] - 62s 123ms/step - loss: 0.2190 - val_loss: 0.2919 Epoch 16/20 499/500 [============================>.] - ETA: 0s - loss: 0.2170Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.3022 500/500 [==============================] - 62s 125ms/step - loss: 0.2170 - val_loss: 0.2937 Epoch 17/20 499/500 [============================>.] - ETA: 0s - loss: 0.2132Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.2463 500/500 [==============================] - 62s 124ms/step - loss: 0.2132 - val_loss: 0.3004 Epoch 18/20 499/500 [============================>.] - ETA: 0s - loss: 0.2101Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.3423 500/500 [==============================] - 62s 124ms/step - loss: 0.2101 - val_loss: 0.3018 Epoch 19/20 499/500 [============================>.] - ETA: 0s - loss: 0.2072Epoch 1/20 769/500 [==============================================] - 17s 23ms/step - loss: 0.2689 500/500 [==============================] - 62s 125ms/step - loss: 0.2073 - val_loss: 0.3045 Epoch 20/20 499/500 [============================>.] - ETA: 0s - loss: 0.2066Epoch 1/20 769/500 [==============================================] - 17s 22ms/step - loss: 0.2809 500/500 [==============================] - 62s 124ms/step - loss: 0.2066 - val_loss: 0.2978 Elapsed 1245.008 seconds. What the two additional progress bar in each epoch in TF 1.15.0 output?
From Documentation: verbose: Integer. 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch Default is 1. and this is internal TensorFlow warning, you can safely ignore. which is tell us about future versions of TensorFlow, no actions from your side is needed.
Keras ReduceLROnPlateau - How to check if it is applied
Please advise how to confirm if ReduceLROnPlateau has been actually applied and what learning rates have been applied at each epoch. The patience of ReduceLROnPlateau is set to 2 and monitor var_loss. self._history = self.model.fit( self.X.shuffle(1000).batch(self.batch_size).prefetch(1), epochs=self.num_epochs, batch_size=self.batch_size, validation_data=self.V.shuffle(1000).batch(self.batch_size).prefetch(1), callbacks=[ EaryStoppingCallback(patience=self.early_stop_patience), ReduceLRCallback(patience=self.reduce_lr_patience), # <---- set to 2 TensorBoardCallback(self.log_directory) ] ) class ReduceLRCallback(tf.keras.callbacks.ReduceLROnPlateau): """Reduce learning rate when a metric has stopped improving. See https://keras.io/api/callbacks/reduce_lr_on_plateau/ """ def __init__(self, patience=3): assert patience > 0 super().__init__( monitor="val_loss", factor=0.3, patience=patience, ) The training val_loss have increased more than twice but have not seen any information if ReduceLROnPlateau has been applied. _________________________________________________________________ Epoch 1/20 3990/3990 [==============================] - 860s 214ms/step - loss: 0.1705 - accuracy: 0.9386 - val_loss: 0.1626 - val_accuracy: 0.9456 Epoch 2/20 3990/3990 [==============================] - 847s 212ms/step - loss: 0.1618 - accuracy: 0.9412 - val_loss: 0.1433 - val_accuracy: 0.9456 Epoch 3/20 3990/3990 [==============================] - 846s 212ms/step - loss: 0.1593 - accuracy: 0.9425 - val_loss: 0.1478 - val_accuracy: 0.9438 Epoch 4/20 3990/3990 [==============================] - 846s 212ms/step - loss: 0.1567 - accuracy: 0.9427 - val_loss: 0.1428 - val_accuracy: 0.9468 Epoch 5/20 3990/3990 [==============================] - 846s 212ms/step - loss: 0.1558 - accuracy: 0.9425 - val_loss: 0.1502 - val_accuracy: 0.9425 Epoch 6/20 3990/3990 [==============================] - 843s 211ms/step - loss: 0.1554 - accuracy: 0.9433 - val_loss: 0.1453 - val_accuracy: 0.9456 Epoch 7/20 3990/3990 [==============================] - 843s 211ms/step - loss: 0.1482 - accuracy: 0.9454 - val_loss: 0.1362 - val_accuracy: 0.9477 Epoch 8/20 3990/3990 [==============================] - 843s 211ms/step - loss: 0.1475 - accuracy: 0.9449 - val_loss: 0.1373 - val_accuracy: 0.9471 Epoch 9/20 3990/3990 [==============================] - 845s 212ms/step - loss: 0.1468 - accuracy: 0.9460 - val_loss: 0.1362 - val_accuracy: 0.9485 Epoch 10/20 3990/3990 [==============================] - 843s 211ms/step - loss: 0.1448 - accuracy: 0.9462 - val_loss: 0.1344 - val_accuracy: 0.9489 Epoch 11/20 3990/3990 [==============================] - 846s 212ms/step - loss: 0.1447 - accuracy: 0.9458 - val_loss: 0.1346 - val_accuracy: 0.9483 Epoch 12/20 3990/3990 [==============================] - 843s 211ms/step - loss: 0.1444 - accuracy: 0.9460 - val_loss: 0.1342 - val_accuracy: 0.9483
Marco Cerliani's answer will probably solve it for you. That gives console output so you can verify the LR was actually reduced. If you needed to check the model's learning rate in code, you can use tf.keras.backend.get_value(model.optimizer.lr)
tf.distribute.MirroredStrategy.scope mode setting vocab_size does not match but does not report an error
I am using tf.distribute.MirroredStrategy() to train the textcnn model, but when I set vocab_size=0 or other wrong values, no error will be reported in this mode. When tf.distribute.MirroredStrategy() is not used, the wrong vocab_size will immediately report an error use wrong value fro vocab_size: model=TextCNN(padding_size,vocab_size-10,embed_size,filter_num,num_classes) model.compile(loss='sparse_categorical_crossentropy',optimizer=tf.keras.optimizers.Adam(),metrics=['accuracy']) model.fit(train_dataset, epochs=epoch,validation_data=valid_dataset, callbacks=callbacks) Error: 2 root error(s) found. (0) Invalid argument: indices[63,10] = 4726 is not in [0, 4726) [[node text_cnn_1/embedding/embedding_lookup (defined at <ipython-input-7-6ef8a4397184>:37) ]] [[Adam/Adam/update/AssignSubVariableOp/_45]] (1) Invalid argument: indices[63,10] = 4726 is not in [0, 4726) [[node text_cnn_1/embedding/embedding_lookup (defined at <ipython-input-7-6ef8a4397184>:37) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_234431] but with strategy.scope() no Error and works well: strategy = tf.distribute.MirroredStrategy() with strategy.scope(): print(vocab_size) model=TextCNN(padding_size,vocab_size-1000,embed_size,filter_num,num_classes) model.compile(loss='sparse_categorical_crossentropy',optimizer=tf.keras.optimizers.Adam(),metrics=['accuracy']) model.fit(train_dataset, epochs=epoch,validation_data=valid_dataset, callbacks=callbacks) log like this(looks very good): Learning rate for epoch 1 is 0.0010000000474974513 2813/2813 [==============================] - 16s 6ms/step - loss: 0.8097 - accuracy: 0.7418 - val_loss: 0.4567 - val_accuracy: 0.8586 - lr: 0.0010 Epoch 2/15 2813/2813 [==============================] - ETA: 0s - loss: 0.4583 - accuracy: 0.8560 Learning rate for epoch 2 is 0.0010000000474974513 2813/2813 [==============================] - 14s 5ms/step - loss: 0.4583 - accuracy: 0.8560 - val_loss: 0.4051 - val_accuracy: 0.8756 - lr: 0.0010 Epoch 3/15 2810/2813 [============================>.] - ETA: 0s - loss: 0.3909 - accuracy: 0.8768 Learning rate for epoch 3 is 0.0010000000474974513 2813/2813 [==============================] - 14s 5ms/step - loss: 0.3909 - accuracy: 0.8767 - val_loss: 0.3853 - val_accuracy: 0.8844 - lr: 0.0010 Epoch 4/15 2811/2813 [============================>.] - ETA: 0s - loss: 0.2999 - accuracy: 0.9047 Learning rate for epoch 4 is 9.999999747378752e-05 2813/2813 [==============================] - 14s 5ms/step - loss: 0.2998 - accuracy: 0.9047 - val_loss: 0.3700 - val_accuracy: 0.8865 - lr: 1.0000e-04 Epoch 5/15 2807/2813 [============================>.] - ETA: 0s - loss: 0.2803 - accuracy: 0.9114 Learning rate for epoch 5 is 9.999999747378752e-05 2813/2813 [==============================] - 15s 5ms/step - loss: 0.2803 - accuracy: 0.9114 - val_loss: 0.3644 - val_accuracy: 0.8888 - lr: 1.0000e-04 Epoch 6/15 2803/2813 [============================>.] - ETA: 0s - loss: 0.2639 - accuracy: 0.9162 Learning rate for epoch 6 is 9.999999747378752e-05 2813/2813 [==============================] - 14s 5ms/step - loss: 0.2636 - accuracy: 0.9163 - val_loss: 0.3615 - val_accuracy: 0.8896 - lr: 1.0000e-04 Epoch 7/15 2805/2813 [============================>.] - ETA: 0s - loss: 0.2528 - accuracy: 0.9188 Learning rate for epoch 7 is 9.999999747378752e-05 2813/2813 [==============================] - 14s 5ms/step - loss: 0.2526 - accuracy: 0.9189 - val_loss: 0.3607 - val_accuracy: 0.8909 - lr: 1.0000e-04 More simply,like this,run and no error: strategy = tf.distribute.MirroredStrategy() with strategy.scope(): model = Sequential() model.add(Embedding(1000, 64, input_length=20)) test_array=np.random.randint(10000,size=(32,20)) model.predict(test_array) why???
KerasLayer vs tf.keras.applications performances
I've trained some networks with ResNetV2 50 ( https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/4 ) and it work very well for my datasets. Then I tried tf.keras.applications.ResNet50 and accuracy is very lower than the other. Here two models: The first (with hub) base_model = hub.KerasLayer('https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/4', input_shape=(IMAGE_H, IMAGE_W, 3)) base_model.trainable = False model = tf.keras.Sequential([ base_model , Dense(num_classes, activation='softmax') ]) The second (with keras.applications) base_model = tf.keras.applications.ResNet50V2(input_shape=(IMAGE_H, IMAGE_W, 3), include_top=False, weights='imagenet', pooling='avg') base_model.trainable = False model = tf.keras.Sequential([ base_model, Dense(num_classes, activation='softmax') ]) The optimizer is the same (Adam), epochs, steps, dataset (train and validation), learning rate are the same as well. But the first start with a val_accuracy near 80% and end with an accuracy near 99%, the second start with 85% of val_accuracy from first to last epoch, as it's overfitting. I got the same behavior changing dataset and parameters for each model. What am I doing wrong?
Both tf.keras.applications.ResNet50V2 and hub.KerasLayer for ResNet50V2 are trained on the same image_net dataset and share the same weights, there is no difference between both. Coming to the accuracy difference, I have tried both the API's to load base model and run on 10 epochs. I saw a minor difference in the accuracy of both training and validation. Then again after running the TensorFlow hub base model for the second time, again I found some minor change in accuracy.This may be due to the order of calculation of these and also could be floating point precision error. Below are the outputs: tf.keras.applications.ResNet50V2 as base model. base_model = tf.keras.applications.ResNet50V2(input_shape=(96, 96, 3), include_top=False, weights='imagenet', pooling='avg') base_model.trainable = False model = tf.keras.Sequential([ base_model, Dense(10, activation='softmax') ]) model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"]) H = model.fit_generator(aug.flow(trainX, trainY, batch_size=16), validation_data=(testX, testY), steps_per_epoch=len(trainX) // 16, epochs=10) Output 1: Epoch 1/10 103/103 [==============================] - 55s 535ms/step - loss: 1.5820 - accuracy: 0.4789 - val_loss: 0.9162 - val_accuracy: 0.6949 Epoch 2/10 103/103 [==============================] - 57s 554ms/step - loss: 0.9539 - accuracy: 0.6534 - val_loss: 0.8376 - val_accuracy: 0.6852 Epoch 3/10 103/103 [==============================] - 55s 532ms/step - loss: 0.8610 - accuracy: 0.6944 - val_loss: 0.7104 - val_accuracy: 0.7240 Epoch 4/10 103/103 [==============================] - 55s 533ms/step - loss: 0.7671 - accuracy: 0.7214 - val_loss: 0.5988 - val_accuracy: 0.7918 Epoch 5/10 103/103 [==============================] - 55s 536ms/step - loss: 0.6994 - accuracy: 0.7526 - val_loss: 0.6029 - val_accuracy: 0.7676 Epoch 6/10 103/103 [==============================] - 55s 537ms/step - loss: 0.6880 - accuracy: 0.7508 - val_loss: 0.6121 - val_accuracy: 0.7724 Epoch 7/10 103/103 [==============================] - 55s 533ms/step - loss: 0.6588 - accuracy: 0.7593 - val_loss: 0.5486 - val_accuracy: 0.8015 Epoch 8/10 103/103 [==============================] - 55s 534ms/step - loss: 0.6640 - accuracy: 0.7630 - val_loss: 0.5287 - val_accuracy: 0.8232 Epoch 9/10 103/103 [==============================] - 54s 528ms/step - loss: 0.6004 - accuracy: 0.7881 - val_loss: 0.4598 - val_accuracy: 0.8426 Epoch 10/10 103/103 [==============================] - 55s 530ms/step - loss: 0.5583 - accuracy: 0.8016 - val_loss: 0.4605 - val_accuracy: 0.8426 Tensorflow Hub as base model. tf.keras.backend.clear_session() np.random.seed(42) tf.random.set_seed(42) base_model =hub.KerasLayer('https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/4', input_shape=(96, 96, 3)) base_model.trainable = False model = tf.keras.Sequential([base_model ,Dense(10, activation='softmax')]) model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"]) H = model.fit_generator(aug.flow(trainX, trainY, batch_size=16), validation_data=(testX, testY), steps_per_epoch=len(trainX) // 16, epochs=10) Output 2: Epoch 1/10 103/103 [==============================] - 54s 526ms/step - loss: 1.7543 - accuracy: 0.4464 - val_loss: 1.0185 - val_accuracy: 0.5981 Epoch 2/10 103/103 [==============================] - 53s 519ms/step - loss: 0.9827 - accuracy: 0.6283 - val_loss: 1.0067 - val_accuracy: 0.6416 Epoch 3/10 103/103 [==============================] - 56s 548ms/step - loss: 0.8719 - accuracy: 0.6944 - val_loss: 0.7195 - val_accuracy: 0.7240 Epoch 4/10 103/103 [==============================] - 54s 521ms/step - loss: 0.8177 - accuracy: 0.7208 - val_loss: 0.7490 - val_accuracy: 0.7385 Epoch 5/10 103/103 [==============================] - 54s 522ms/step - loss: 0.7641 - accuracy: 0.7379 - val_loss: 0.6325 - val_accuracy: 0.7797 Epoch 6/10 103/103 [==============================] - 54s 523ms/step - loss: 0.6551 - accuracy: 0.7655 - val_loss: 0.6431 - val_accuracy: 0.7579 Epoch 7/10 103/103 [==============================] - 55s 530ms/step - loss: 0.6538 - accuracy: 0.7734 - val_loss: 0.5824 - val_accuracy: 0.7797 Epoch 8/10 103/103 [==============================] - 55s 536ms/step - loss: 0.6387 - accuracy: 0.7691 - val_loss: 0.6254 - val_accuracy: 0.7772 Epoch 9/10 103/103 [==============================] - 56s 540ms/step - loss: 0.6394 - accuracy: 0.7685 - val_loss: 0.6539 - val_accuracy: 0.7554 Epoch 10/10 103/103 [==============================] - 55s 536ms/step - loss: 0.5816 - accuracy: 0.7955 - val_loss: 0.5703 - val_accuracy: 0.7990 Now if I run the Tensorflow hub model again. Output 3: Epoch 1/10 103/103 [==============================] - 55s 534ms/step - loss: 1.6412 - accuracy: 0.4764 - val_loss: 1.0697 - val_accuracy: 0.5738 Epoch 2/10 103/103 [==============================] - 54s 528ms/step - loss: 1.0312 - accuracy: 0.6412 - val_loss: 1.0196 - val_accuracy: 0.6077 Epoch 3/10 103/103 [==============================] - 59s 570ms/step - loss: 0.8710 - accuracy: 0.6975 - val_loss: 0.7088 - val_accuracy: 0.7240 Epoch 4/10 103/103 [==============================] - 54s 529ms/step - loss: 0.8108 - accuracy: 0.7128 - val_loss: 0.6539 - val_accuracy: 0.7458 Epoch 5/10 103/103 [==============================] - 54s 522ms/step - loss: 0.7311 - accuracy: 0.7440 - val_loss: 0.6029 - val_accuracy: 0.7676 Epoch 6/10 103/103 [==============================] - 54s 523ms/step - loss: 0.6683 - accuracy: 0.7612 - val_loss: 0.6621 - val_accuracy: 0.7506 Epoch 7/10 103/103 [==============================] - 54s 527ms/step - loss: 0.6518 - accuracy: 0.7753 - val_loss: 0.6166 - val_accuracy: 0.7700 Epoch 8/10 103/103 [==============================] - 54s 524ms/step - loss: 0.6147 - accuracy: 0.7795 - val_loss: 0.5611 - val_accuracy: 0.7797 Epoch 9/10 103/103 [==============================] - 54s 522ms/step - loss: 0.6671 - accuracy: 0.7667 - val_loss: 0.5126 - val_accuracy: 0.8087 Epoch 10/10 103/103 [==============================] - 54s 525ms/step - loss: 0.6090 - accuracy: 0.7832 - val_loss: 0.6355 - val_accuracy: 0.7627 Hope this answeres your question.