What is wrong with this model? I am new to learning about this stuff. Did I compile the model incorrectly or is it the structure itself?
Here's what the code looked like:
Y-set was defined as a list of floats [54.7, 52.5, 51.4, 51.5, 50.5] and so was x [0, 1.5, 2, 2.5, 3.5]
Here's the code and the results for the training:
model = Sequential([
Dense(units = 1, input_shape = [1]),
Dense(units = 60, activation = 'relu'),
Dense(units = 1)])
model.compile(optimizer = 'sgd', loss = 'mean_squared_error', metrics = ['mae'])
model.fit(x_set, y_set, epochs = 10)
This was the output it gave me:
Epoch 1/10
1/1 [==============================] - 0s 8ms/step - loss: 1519.2493 - mae: 37.8005
Epoch 2/10
1/1 [==============================] - 0s 9ms/step - loss: 577948.8750 - mae: 674.4330
Epoch 3/10
1/1 [==============================] - 0s 8ms/step - loss: 159614431746567700480.0000 -
mae: 11284396032.0000
Epoch 4/10
1/1 [==============================] - 0s 8ms/step - loss: inf - mae: inf
Epoch 5/10
1/1 [==============================] - 0s 9ms/step - loss: nan - mae: nan
Epoch 6/10
1/1 [==============================] - 0s 8ms/step - loss: nan - mae: nan
Epoch 7/10
1/1 [==============================] - 0s 8ms/step - loss: nan - mae: nan
Epoch 8/10
1/1 [==============================] - 0s 9ms/step - loss: nan - mae: nan
Epoch 9/10
1/1 [==============================] - 0s 10ms/step - loss: nan - mae: nan
Epoch 10/10
1/1 [==============================] - 0s 8ms/step - loss: nan - mae: nan
Obviously, it does not work.
You need to use the activation function to the layers in model definition to fix this issue.
Please check this fixed code:
import numpy as np
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
x_set=np.array([54.7, 52.5, 51.4, 51.5, 50.5])
y_set=np.array([0, 1.5, 2, 2.5, 3.5])
model = Sequential([
Dense(units = 1 , input_shape = [1,], activation = 'relu'),
Dense(units = 60, activation = 'relu'),
Dense(units = 1, activation = 'sigmoid')])
model.compile(optimizer = 'sgd', loss = 'mean_squared_error', metrics = ['mae'])
model.fit(x_set, y_set, epochs = 10)
Output:
Epoch 1/10
1/1 [==============================] - 1s 1s/step - loss: 2.1748 - mae: 1.3081
Epoch 2/10
1/1 [==============================] - 0s 13ms/step - loss: 2.1692 - mae: 1.3063
Epoch 3/10
1/1 [==============================] - 0s 17ms/step - loss: 2.1656 - mae: 1.3051
Epoch 4/10
1/1 [==============================] - 0s 17ms/step - loss: 2.1632 - mae: 1.3043
Epoch 5/10
1/1 [==============================] - 0s 18ms/step - loss: 2.1614 - mae: 1.3037
Epoch 6/10
1/1 [==============================] - 0s 29ms/step - loss: 2.1600 - mae: 1.3033
Epoch 7/10
1/1 [==============================] - 0s 30ms/step - loss: 2.1589 - mae: 1.3029
Epoch 8/10
1/1 [==============================] - 0s 10ms/step - loss: 2.1581 - mae: 1.3027
Epoch 9/10
1/1 [==============================] - 0s 12ms/step - loss: 2.1573 - mae: 1.3024
Epoch 10/10
1/1 [==============================] - 0s 12ms/step - loss: 2.1567 - mae: 1.3022
<keras.callbacks.History at 0x7f3b865a75d0>
Related
I am comparing two model the one uses binary_crossentropy(Model A) as optimizer, another uses mean_squared_error(Model B)
Model A)
self.seq_len = 2
in_out_neurons = 50
n_hidden = 500
model = Sequential()
model.add(LSTM(n_hidden, batch_input_shape=(None, self.seq_len, in_out_neurons), return_sequences=True))
model.add(Dense(in_out_neurons, activation="relu"))
optimizer = Adam(learning_rate=0.001)
#model.compile(loss="mean_squared_error", optimizer=optimizer)
model.compile(loss='binary_crossentropy', optimizer=optimizer)
Epoch 1/10
718/718 [==============================] - 32s 42ms/step - loss: -0.0633 - val_loss: -0.0649
Epoch 2/10
718/718 [==============================] - 33s 46ms/step - loss: -0.0632 - val_loss: -0.0572
Epoch 3/10
718/718 [==============================] - 43s 60ms/step - loss: -0.0592 - val_loss: -0.0570
Epoch 4/10
718/718 [==============================] - 51s 71ms/step - loss: -0.0522 - val_loss: -0.0431
Epoch 5/10
718/718 [==============================] - 50s 69ms/step - loss: -0.0566 - val_loss: -0.0535
Epoch 6/10
718/718 [==============================] - 49s 68ms/step - loss: -0.0567 - val_loss: -0.0537
Epoch 7/10
718/718 [==============================] - 48s 67ms/step - loss: -0.0627 - val_loss: -0.0499
Epoch 8/10
718/718 [==============================] - 51s 71ms/step - loss: -0.0621 - val_loss: -0.0614
Epoch 9/10
718/718 [==============================] - 47s 65ms/step - loss: -0.0645 - val_loss: -0.0653
Epoch 10/10
718/718 [==============================] - 43s 60ms/step - loss: -0.0661 - val_loss: -0.0622
Model B)
self.seq_len = 2
in_out_neurons = 50
n_hidden = 500
model = Sequential()
model.add(LSTM(n_hidden, batch_input_shape=(None, self.seq_len, in_out_neurons), return_sequences=True))
model.add(Dense(in_out_neurons, activation="relu"))
optimizer = Adam(learning_rate=0.001)
model.compile(loss="mean_squared_error", optimizer=optimizer)
#model.compile(loss='binary_crossentropy', optimizer=optimizer)
Epoch 1/10
718/718 [==============================] - 36s 48ms/step - loss: 0.0189 - val_loss: 0.0190
Epoch 2/10
718/718 [==============================] - 46s 64ms/step - loss: 0.0188 - val_loss: 0.0189
Epoch 3/10
718/718 [==============================] - 48s 67ms/step - loss: 0.0187 - val_loss: 0.0189
Epoch 4/10
718/718 [==============================] - 58s 81ms/step - loss: 0.0187 - val_loss: 0.0188
Epoch 5/10
718/718 [==============================] - 62s 87ms/step - loss: 0.0186 - val_loss: 0.0188
Epoch 6/10
718/718 [==============================] - 72s 100ms/step - loss: 0.0186 - val_loss: 0.0188
Epoch 7/10
718/718 [==============================] - 73s 102ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 8/10
718/718 [==============================] - 60s 84ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 9/10
718/718 [==============================] - 64s 89ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 10/10
718/718 [==============================] - 64s 89ms/step - loss: 0.0185 - val_loss: 0.0187
Model B's loss is more than 0 , so it could be understood.
However,Model A's loss is less than 0, what does it mean??
Cross entropy is calculated as minus expected value of logarithm of the result. Usually it is used after sigmoid or softmax activation, where all values <= 1, their logarithms <= 0, and thus the result is >= 0. But you use it after relu activation that can give values > 1, that's why you obtain the result < 0. The moral is that the output layer activation and the loss should correspond to each other and must make sense from the point of view of the task you are trying to solve. Otherwise you may obtain senseless results.
I made a TensorFlow sample which takes an integer and divides it by 5 and classified into the remainder.
https://colab.research.google.com/drive/1CQ5IKymDCuCzWNfgKQrZZSL3ifyzRJrA?usp=sharing
import numpy as np
from keras import models
from keras import optimizers
from keras import layers
from keras.utils import to_categorical
num_of_rows = 1500
num_of_classes = 5
X = np.abs(np.floor(np.random.randn(num_of_rows, 1)* 10000))
y = X % (num_of_classes)
X = X /100000
model = models.Sequential()
model.add(layers.Dense(25, input_dim=1))
model.add(layers.Dense(25, activation='relu'))
model.add(layers.Dense(25))
model.add(layers.Dense(num_of_classes, activation='softmax'))
model.compile(optimizer=optimizers.Adam(lr=0.001),loss='categorical_crossentropy',)
model.summary()
X_normal = X.astype('float')
print(X_normal.shape)
test_label = to_categorical(y,num_classes=num_of_classes)
print(y[:5])
model.fit(X_normal, test_label, epochs=100, batch_size=10)
The loss does not decrease when running with the above link.
Epoch 1/100
150/150 [==============================] - 0s 2ms/step - loss: 1.6094
Epoch 2/100
150/150 [==============================] - 0s 2ms/step - loss: 1.6102
Epoch 3/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6091
Epoch 4/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6090
Epoch 5/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6089
Epoch 6/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6089
Epoch 7/100
150/150 [==============================] - 0s 2ms/step - loss: 1.6089
Epoch 8/100
150/150 [==============================] - 0s 2ms/step - loss: 1.6090
Epoch 9/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6091
Epoch 10/100
150/150 [==============================] - 0s 1ms/step - loss: 1.6084
I need advice on what else I need to do
After some trial and error, I found a solution.
the Input is just an integer and I think it is 1 dimension.
it looks like as follows
X = [100 , 15, 48, 17, 22]
I try the test input's shape as follow:
X = [[1,0,0] , [0,1,5], [0,4, 8] , [0,1,7] , [0,2,2]]
The loss is decreased as expected
I have built an LSTM model which performs pretty well on my time series problem. What I am trying to do now is incorporate dropout and recurrent dropout to model the uncertainty around my predictions. From literature I have read, this should add noise to my predictions but the mean of these predictions shouldn't be too far from the values without using dropout during inference. In my case, when using dropout, even only 0.01, my prediction accuracy is approximately an order of magnitude larger than without dropout. For example, if i evaluate a single batch (16, 20, 14), without dropout the results are the same.
for x, y in test.take(1):
for i in range(10):
encoder_decoder_dropout.evaluate(x, y)
1/1 [==============================] - 0s 2ms/step - loss: 0.0584 - mean_absolute_error: 0.0584
1/1 [==============================] - 0s 1ms/step - loss: 0.0584 - mean_absolute_error: 0.0584
1/1 [==============================] - 0s 1ms/step - loss: 0.0584 - mean_absolute_error: 0.0584
1/1 [==============================] - 0s 1ms/step - loss: 0.0584 - mean_absolute_error: 0.0584
1/1 [==============================] - 0s 2ms/step - loss: 0.0584 - mean_absolute_error: 0.0584
1/1 [==============================] - 0s 1ms/step - loss: 0.0584 - mean_absolute_error: 0.0584
1/1 [==============================] - 0s 1ms/step - loss: 0.0584 - mean_absolute_error: 0.0584
1/1 [==============================] - 0s 1ms/step - loss: 0.0584 - mean_absolute_error: 0.0584
1/1 [==============================] - 0s 1ms/step - loss: 0.0584 - mean_absolute_error: 0.0584
1/1 [==============================] - 0s 1ms/step - loss: 0.0584 - mean_absolute_error: 0.0584
However, adding dropout of 0.1, I see the stochasticity in my results, but terrible accuracy:
for x, y in test.take(1):
for i in range(10):
encoder_decoder_dropout.evaluate(x, y)
1/1 [==============================] - 0s 2ms/step - loss: 0.5677 - mean_absolute_error: 0.5677
1/1 [==============================] - 0s 1ms/step - loss: 0.5512 - mean_absolute_error: 0.5512
1/1 [==============================] - 0s 1ms/step - loss: 0.6085 - mean_absolute_error: 0.6085
1/1 [==============================] - 0s 1ms/step - loss: 0.6091 - mean_absolute_error: 0.6091
1/1 [==============================] - 0s 1ms/step - loss: 0.6503 - mean_absolute_error: 0.6503
1/1 [==============================] - 0s 1ms/step - loss: 0.5438 - mean_absolute_error: 0.5438
1/1 [==============================] - 0s 1ms/step - loss: 0.5882 - mean_absolute_error: 0.5882
1/1 [==============================] - 0s 1ms/step - loss: 0.6500 - mean_absolute_error: 0.6500
1/1 [==============================] - 0s 1ms/step - loss: 0.6235 - mean_absolute_error: 0.6235
1/1 [==============================] - 0s 1ms/step - loss: 0.6119 - mean_absolute_error: 0.6119
My model is below. I am wondering if this is expected behaviour, or if I have implemented this poorly for MCdropout.
input= keras.Input(shape=(60))
x = layers.RepeatVector(30, input_shape=[60])(input)
x = layers.Bidirectional(tf.keras.layers.LSTM(100, dropout=0.1,
recurrent_dropout=0.1,
return_sequences=False))(x, training=True)
x = layers.Dense(look_forward*num_features,
kernel_initializer=tf.initializers.glorot_normal())(x)
output = layers.Reshape([look_forward, num_features])(x)
decoder_dropout = keras.Model(input, output, name="decoder_dropout")
encoder_decoder_dropout = tf.keras.Sequential([encoder, decoder_dropout])
encoder_decoder_dropout.set_weights(encoder_decoder_dropout_trained.get_weights())
encoder_decoder_dropout.compile(loss=tf.keras.losses.MeanAbsoluteError(),
optimizer=tf.optimizers.Adam(),
metrics=[tf.metrics.MeanAbsoluteError()])
Following is the reference code:
Xtrain2 = df2.iloc[:,:-1].values
ytrain2 = df2['L'].values.reshape((200,1))
print(Xtrain2.shape, ytrain2.shape)
#--------------------------
lrelu = lambda x: tf.keras.activations.relu(x, alpha=0.1)
model2 = tf.keras.models.Sequential([
tf.keras.layers.Dense(1501, input_dim=1501, activation='relu'),
tf.keras.layers.Dense(100, activation='relu'),
tf.keras.layers.Dense(100, activation='relu'),
#tf.keras.layers.Dense(1),
#tf.keras.layers.Dense(1, activation=lrelu)
tf.keras.layers.Dense(1, activation='sigmoid')
])
model2.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
#--------------------------
model2.fit(Xtrain2, ytrain2, epochs=50)#, verbose=0)
Just a simple attempt at a classifier. The last layer is sigmoid since it's just a binary classifier. The loss is also appropriate for the problem. Dimension of input is 1500 and the number of samples is 200. I get the following output:
(200, 1501) (200, 1)
Train on 200 samples
Epoch 1/50
200/200 [==============================] - 0s 2ms/sample - loss: 0.4201 - accuracy: 0.0300
Epoch 2/50
200/200 [==============================] - 0s 359us/sample - loss: -1.1114 - accuracy: 0.0000e+00
Epoch 3/50
200/200 [==============================] - 0s 339us/sample - loss: -4.6102 - accuracy: 0.0000e+00
Epoch 4/50
200/200 [==============================] - 0s 344us/sample - loss: -13.7864 - accuracy: 0.0000e+00
Epoch 5/50
200/200 [==============================] - 0s 342us/sample - loss: -34.7789 - accuracy: 0.0000e+00
.
.
.
Epoch 40/50
200/200 [==============================] - 0s 348us/sample - loss: -905166.4000 - accuracy: 0.3750
Epoch 41/50
200/200 [==============================] - 0s 344us/sample - loss: -1010177.5300 - accuracy: 0.3400
Epoch 42/50
200/200 [==============================] - 0s 354us/sample - loss: -1129819.1825 - accuracy: 0.3450
Epoch 43/50
200/200 [==============================] - 0s 379us/sample - loss: -1263355.3200 - accuracy: 0.3900
Epoch 44/50
200/200 [==============================] - 0s 359us/sample - loss: -1408803.0400 - accuracy: 0.3750
Epoch 45/50
200/200 [==============================] - 0s 355us/sample - loss: -1566850.5900 - accuracy: 0.3300
Epoch 46/50
200/200 [==============================] - 0s 359us/sample - loss: -1728280.7550 - accuracy: 0.3550
Epoch 47/50
200/200 [==============================] - 0s 354us/sample - loss: -1909759.2400 - accuracy: 0.3400
Epoch 48/50
200/200 [==============================] - 0s 379us/sample - loss: -2108889.7200 - accuracy: 0.3750
Epoch 49/50
200/200 [==============================] - 0s 369us/sample - loss: -2305491.9800 - accuracy: 0.3700
Epoch 50/50
200/200 [==============================] - 0s 374us/sample - loss: -2524282.6300 - accuracy: 0.3050
I don't see where I'm going wrong in the above code. Any help would be appreciated!
I am using tf.distribute.MirroredStrategy() to train the textcnn model, but when I set vocab_size=0 or other wrong values, no error will be reported in this mode. When tf.distribute.MirroredStrategy() is not used, the wrong vocab_size will immediately report an error
use wrong value fro vocab_size:
model=TextCNN(padding_size,vocab_size-10,embed_size,filter_num,num_classes)
model.compile(loss='sparse_categorical_crossentropy',optimizer=tf.keras.optimizers.Adam(),metrics=['accuracy'])
model.fit(train_dataset, epochs=epoch,validation_data=valid_dataset, callbacks=callbacks)
Error:
2 root error(s) found.
(0) Invalid argument: indices[63,10] = 4726 is not in [0, 4726)
[[node text_cnn_1/embedding/embedding_lookup (defined at <ipython-input-7-6ef8a4397184>:37) ]]
[[Adam/Adam/update/AssignSubVariableOp/_45]]
(1) Invalid argument: indices[63,10] = 4726 is not in [0, 4726)
[[node text_cnn_1/embedding/embedding_lookup (defined at <ipython-input-7-6ef8a4397184>:37) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_234431]
but with strategy.scope() no Error and works well:
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
print(vocab_size)
model=TextCNN(padding_size,vocab_size-1000,embed_size,filter_num,num_classes)
model.compile(loss='sparse_categorical_crossentropy',optimizer=tf.keras.optimizers.Adam(),metrics=['accuracy'])
model.fit(train_dataset, epochs=epoch,validation_data=valid_dataset, callbacks=callbacks)
log like this(looks very good):
Learning rate for epoch 1 is 0.0010000000474974513
2813/2813 [==============================] - 16s 6ms/step - loss: 0.8097 - accuracy: 0.7418 - val_loss: 0.4567 - val_accuracy: 0.8586 - lr: 0.0010
Epoch 2/15
2813/2813 [==============================] - ETA: 0s - loss: 0.4583 - accuracy: 0.8560
Learning rate for epoch 2 is 0.0010000000474974513
2813/2813 [==============================] - 14s 5ms/step - loss: 0.4583 - accuracy: 0.8560 - val_loss: 0.4051 - val_accuracy: 0.8756 - lr: 0.0010
Epoch 3/15
2810/2813 [============================>.] - ETA: 0s - loss: 0.3909 - accuracy: 0.8768
Learning rate for epoch 3 is 0.0010000000474974513
2813/2813 [==============================] - 14s 5ms/step - loss: 0.3909 - accuracy: 0.8767 - val_loss: 0.3853 - val_accuracy: 0.8844 - lr: 0.0010
Epoch 4/15
2811/2813 [============================>.] - ETA: 0s - loss: 0.2999 - accuracy: 0.9047
Learning rate for epoch 4 is 9.999999747378752e-05
2813/2813 [==============================] - 14s 5ms/step - loss: 0.2998 - accuracy: 0.9047 - val_loss: 0.3700 - val_accuracy: 0.8865 - lr: 1.0000e-04
Epoch 5/15
2807/2813 [============================>.] - ETA: 0s - loss: 0.2803 - accuracy: 0.9114
Learning rate for epoch 5 is 9.999999747378752e-05
2813/2813 [==============================] - 15s 5ms/step - loss: 0.2803 - accuracy: 0.9114 - val_loss: 0.3644 - val_accuracy: 0.8888 - lr: 1.0000e-04
Epoch 6/15
2803/2813 [============================>.] - ETA: 0s - loss: 0.2639 - accuracy: 0.9162
Learning rate for epoch 6 is 9.999999747378752e-05
2813/2813 [==============================] - 14s 5ms/step - loss: 0.2636 - accuracy: 0.9163 - val_loss: 0.3615 - val_accuracy: 0.8896 - lr: 1.0000e-04
Epoch 7/15
2805/2813 [============================>.] - ETA: 0s - loss: 0.2528 - accuracy: 0.9188
Learning rate for epoch 7 is 9.999999747378752e-05
2813/2813 [==============================] - 14s 5ms/step - loss: 0.2526 - accuracy: 0.9189 - val_loss: 0.3607 - val_accuracy: 0.8909 - lr: 1.0000e-04
More simply,like this,run and no error:
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = Sequential()
model.add(Embedding(1000, 64, input_length=20))
test_array=np.random.randint(10000,size=(32,20))
model.predict(test_array)
why???