I'm currently training a Deep Q Network with the gradient tape, as outlined in the below code.
with tf.GradientTape() as tape:
q_values_current_state_dqn = self.dqn_architecture(states)
one_hot_actions = tf.keras.utils.to_categorical(actions, self.num_legal_actions, dtype=np.float32) # e.g. [[0,0,1,0],[1,0,0,0],...]
q_values_current_state_dqn = tf.reduce_sum(tf.multiply(q_values_current_state_dqn, one_hot_actions), axis=1)
error = q_values_current_state_dqn - target_q_values
loss = tf.keras.losses.Huber()(target_q_values, q_values_current_state_dqn)
dqn_architecture_gradients = tape.gradient(loss, self.dqn_architecture.trainable_variables) # Computes the gradient using operations recorded in context of this tape.
self.dqn_architecture.optimizer.apply_gradients(zip(dqn_architecture_gradients, self.dqn_architecture.trainable_variables))
But I'd like to disable the logging of the training progress, as shown below:
1/1 [==============================] - 0s 34ms/step
1/1 [==============================] - 0s 10ms/step
1/1 [==============================] - 0s 11ms/step
1/1 [==============================] - 0s 10ms/step
1/1 [==============================] - 0s 11ms/step
1/1 [==============================] - 0s 10ms/step
1/1 [==============================] - 0s 13ms/step
1/1 [==============================] - 0s 10ms/step
1/1 [==============================] - 0s 10ms/step
1/1 [==============================] - 0s 10ms/step
1/1 [==============================] - 0s 11ms/step
1/1 [==============================] - 0s 10ms/step
1/1 [==============================] - 0s 11ms/step
1/1 [==============================] - 0s 10ms/step
1/1 [==============================] - 0s 10ms/step
1/1 [==============================] - 0s 10ms/step
1/1 [==============================] - 0s 10ms/step
1/1 [==============================] - 0s 12ms/step
I understand that you can set verbose equal to 0 when using model.fit(), but I'm unsure how to go about it when using gradient tape.
Any help would be appreciated.
Related
i build a model based on this architecture to make a binary classification ["0" for "Delay-insensitive", "1" for "Interactive"] using 5 features. The target column is vmcategory. When I train the model the accuracy remain at zero.
You can check the my colab here please.
Epoch 1/100
1/1 [==============================] - 29s 29s/step - loss: 0.6931 - accuracy: 0.0000e+00
Epoch 2/100
1/1 [==============================] - 7s 7s/step - loss: 0.6893 - accuracy: 0.0000e+00
Epoch 3/100
1/1 [==============================] - 7s 7s/step - loss: 0.6808 - accuracy: 0.0000e+00
Epoch 4/100
1/1 [==============================] - 7s 7s/step - loss: 0.6571 - accuracy: 0.0000e+00
Epoch 5/100
1/1 [==============================] - 7s 7s/step - loss: 0.5957 - accuracy: 0.0000e+00
Epoch 6/100
1/1 [==============================] - 7s 7s/step - loss: 0.5372 - accuracy: 0.0000e+00
Epoch 7/100
1/1 [==============================] - 7s 7s/step - loss: 0.3760 - accuracy: 0.0000e+00
Epoch 8/100
1/1 [==============================] - 7s 7s/step - loss: 0.2411 - accuracy: 0.0000e+00
Epoch 9/100
1/1 [==============================] - 7s 7s/step - loss: 0.1913 - accuracy: 0.0000e+00
Epoch 10/100
1/1 [==============================] - 7s 7s/step - loss: 0.0571 - accuracy: 0.0000e+00
Epoch 11/100
1/1 [==============================] - 7s 7s/step - loss: 0.0483 - accuracy: 0.0000e+00
Epoch 12/100
1/1 [==============================] - 7s 7s/step - loss: 0.0088 - accuracy: 0.0000e+00
Epoch 13/100
1/1 [==============================] - 7s 7s/step - loss: 6.1697e-04 - accuracy: 0.0000e+00
Epoch 14/100
1/1 [==============================] - 6s 6s/step - loss: 3.2386e-04 - accuracy: 0.0000e+00
Epoch 15/100
1/1 [==============================] - 6s 6s/step - loss: 6.8086e-06 - accuracy: 0.0000e+00
Epoch 16/100
1/1 [==============================] - 6s 6s/step - loss: 7.7796e-05 - accuracy: 0.0000e+00
Epoch 17/100
1/1 [==============================] - 7s 7s/step - loss: 1.1021e-06 - accuracy: 0.0000e+00
Epoch 18/100
1/1 [==============================] - 6s 6s/step - loss: 2.7273e-07 - accuracy: 0.0000e+00
Epoch 87/100
1/1 [==============================] - 6s 6s/step - loss: 1.0003e-13 - accuracy: 0.0000e+00
Epoch 88/100
1/1 [==============================] - 6s 6s/step - loss: 2.6685e-14 - accuracy: 0.0000e+00
Epoch 89/100
1/1 [==============================] - 7s 7s/step - loss: 2.4792e-12 - accuracy: 0.0000e+00
Epoch 90/100
1/1 [==============================] - 7s 7s/step - loss: 1.2417e-13 - accuracy: 0.0000e+00
Epoch 91/100
1/1 [==============================] - 7s 7s/step - loss: 1.4707e-11 - accuracy: 0.0000e+00
Epoch 92/100
1/1 [==============================] - 7s 7s/step - loss: 4.9625e-14 - accuracy: 0.0000e+00
Epoch 93/100
1/1 [==============================] - 7s 7s/step - loss: 3.7239e-13 - accuracy: 0.0000e+00
Epoch 94/100
1/1 [==============================] - 7s 7s/step - loss: 6.0243e-13 - accuracy: 0.0000e+00
Epoch 95/100
1/1 [==============================] - 6s 6s/step - loss: 1.4047e-11 - accuracy: 0.0000e+00
Epoch 96/100
1/1 [==============================] - 7s 7s/step - loss: 1.0687e-14 - accuracy: 0.0000e+00
Epoch 97/100
1/1 [==============================] - 7s 7s/step - loss: 3.4614e-16 - accuracy: 0.0000e+00
Epoch 98/100
1/1 [==============================] - 7s 7s/step - loss: 4.5617e-11 - accuracy: 0.0000e+00
Epoch 99/100
1/1 [==============================] - 7s 7s/step - loss: 1.5913e-14 - accuracy: 0.0000e+00
Epoch 100/100
1/1 [==============================] - 7s 7s/step - loss: 3.0236e-10 - accuracy: 0.0000e+00
you are using accuracy as metrics which expects class labels as input but you are providing class logits (or confidence) as input. Please replace accuracy with tf.keras.metrics.CategoricalAccuracy()
--Edit--
So, there is a different problem which i just noticed. You have 223461 such that the input features is a vector of 5 length and the aim is to do binary classification.
You are assuming the input samples as feature vectors and because of the same you are trying to predict 223461 classes. To fix this you would need to do the following changes
Make the following changes in the architecture,
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(n_outputs, activation='sigmoid')) # here n_outputs should be 1
Remove the Conv1D and GRU layers, your input features are tabular and does not need Convolutional operations.
Replace the softmax function with sigmoid as you are doing binary classification.
Replace the CategoricalAccuracy with 'accuracy'
Ensure your data is of shape [X_batch, 5]
Ensure your y is of shape [X_batch, 1]
here, X_batch could be of shape [223461, 5] and in complex models you might not process the whole data in a single loop and would use a small batch size.
Here is my complete code:
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.applications.vgg16 import decode_predictions
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.models import load_model
from tensorflow.keras.layers import Dense, Flatten, Dropout, Input
from tensorflow.keras.models import Model
import numpy as np
from os import listdir
from os.path import isfile, join
import matplotlib.pyplot as plt
train_cats_path = r"C:\Users\jrand\Downloads\dogscats\train\cats"
train_cats = [train_cats_path+"\\"+f for f in listdir(train_cats_path) if isfile(join(train_cats_path, f))]
train_cat_array = []
for tmp_cat in train_cats:
img = load_img(tmp_cat, target_size=(224, 224)) # Load image in with desired dimensions
img = img_to_array(img) # Convert the image to a numpy array
img = img.reshape(224, 224, 3)
img = preprocess_input(img)
train_cat_array.append(img)
x_train_cat_labels = np.array([1.0]*len(train_cat_array))
x_train_cat_images = np.array(train_cat_array)
train_dogs_path = r"C:\Users\jrand\Downloads\dogscats\train\cats"
train_dogs = [train_dogs_path+"\\"+f for f in listdir(train_dogs_path) if isfile(join(train_dogs_path, f))]
train_dog_array = []
for tmp_dog in train_dogs:
img = load_img(tmp_dog, target_size=(224, 224)) # Load image in with desired dimensions
img = img_to_array(img) # Convert the image to a numpy array
img = img.reshape(224, 224, 3)
img = preprocess_input(img)
train_dog_array.append(img)
x_train_dog_labels = np.array([0.0]*len(train_dog_array))
x_train_dog_images = np.array(train_dog_array)
print("len of dog images", len(x_train_dog_images), "len of cat images", len(x_train_cat_images))
print("len of dog labels", len(x_train_dog_labels), "len of cat labels", len(x_train_cat_labels))
x_train_images = np.concatenate([x_train_dog_images, x_train_cat_images])[0:1000]
x_train_labels = np.concatenate([x_train_dog_labels, x_train_cat_labels])[0:1000]
model = VGG16(weights = 'imagenet', include_top = False, input_shape = (224, 224, 3))
for layer in model.layers:
layer.trainable = False
x = Flatten()(model.output)
predictions = Dense(1, activation="sigmoid")(x)
new_model = Model(model.input, predictions)
# compile model
model.compile(optimizer="Adam", loss="binary_crossentropy", metrics=["accuracy"])
# train model
history = model.fit(x_train_images, x_train_labels, batch_size=1, epochs=100)
So, as you can see, I take the images of cats and dogs, generate labels for them, then I concatenate the two arrays together using numpy so that I can then train on them.
I guess my problems are as follows:
I have to reduce the size of my training set by splicing the arrays to the first 1000 images, which doesn't help things
I have to use a batch_size of 1, also doesn't help things
Can anyone give me some tips for improving this code and the NN performance?
Here is sample output as things currently stand:
len of dog images 11500 len of cat images 11500
len of dog labels 11500 len of cat labels 11500
2022-03-04 11:46:54.410085: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-04 11:46:55.170021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6613 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1
2022-03-04 11:46:55.170948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 6613 MB memory: -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:04:00.0, compute capability: 6.1
2022-03-04 11:46:56.198632: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/100
2022-03-04 11:46:56.725619: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8201
1000/1000 [==============================] - 9s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 2/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 3/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 4/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 5/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 6/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 7/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 8/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 9/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 10/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 11/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 12/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 13/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 14/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 15/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 16/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 17/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 18/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 19/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 20/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 21/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 22/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 23/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 24/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 25/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 26/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 27/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 28/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 29/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 30/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 31/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 32/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 33/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 34/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 35/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 36/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 37/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 38/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 39/100
1000/1000 [==============================] - 7s 7ms/step - loss: 2.1102 - accuracy: 1.0204e-04
Epoch 40/100
The reason I wrote this program is to get some experience with transfer learning. There may be better ways to accomplish this but this is the way I chose.
I updated the code and removed a loop I was messing with. Sorry about that.
I don't think the issue is the small dataset, since transfer learning is used to deal with smaller datasets.
The issue is that you are freezing all the layers of the pre-trained model (VGG), without adding any new Dense Layer. Then you call model.fit, but none of the layers are trainable. Therefore, nothing is allowed to change. In fact, your problem is not that you are getting very low accuracy, but that the accuracy doesn't change at all among epochs. This should be a red flag meaning something in your code is broken!
Try to add at least another Dense layer before the last.
EDIT:
You are also compiling and calling fit() on model instead of new_model.
I hope I've been helpful
I try to find in a graph the nodes which belong to a particular structure (for example a clique). I must therefore have at the output a vector [0,0,1,0,1,1,0 .......] where the 1 represents the nodes belonging to the cliques.
My inputs are graphs where each node of the graph is represented by an embedding vector, the input is in this form:
[[-1.548624, 2.6481668, 0.21574, -0.324527 ........]
[.....] ...[.....]].
the problem is that my model only learns one of the two classes, either (1) or (0) depending on the greater presence in the dataset. after doing a data rebalancing the results are around 0.5 accuracies.
I tried the data rebalancing, change of embedding method but the result remains the same.
Does anyone have any idea what is causing the problem?
here is the code:
def model(input_shape):
model = Sequential()
model.add(Conv1D(30,3,input_shape=input_shape,activation="sigmoid"))
model.add(MaxPool1D(9))
model.add(LSTM(50,return_sequences=True))
model.add(Flatten())
model.add(Dense(889,activation="sigmoid"))
#model.summary()
opt = tf.keras.optimizers.SGD(learning_rate=0.01)
model.compile(loss='mse',optimizer='sgd',metrics=['binary_accuracy'])
return model
train_x,train_y,test_x,test_y,val_x,val_y = load_data()
model=model((889,64))
model.fit(train_x,train_y,validation_data=(val_x,val_y),epochs=500,batch_size=4)
I am not sure of my (activation function, loss, metric) even if the best results given are with this.
120/120 [==============================] - 4s 13ms/step - loss: 0.7967 - binary_accuracy: 0.3721 - val_loss: 0.4342 - val_binary_accuracy: 0.3979
Epoch 2/8
120/120 [==============================] - 1s 8ms/step - loss: 0.3795 - binary_accuracy: 0.4164 - val_loss: 0.2758 - val_binary_accuracy: 0.4871
Epoch 3/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2594 - binary_accuracy: 0.5262 - val_loss: 0.2304 - val_binary_accuracy: 0.6379
Epoch 4/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2255 - binary_accuracy: 0.6643 - val_loss: 0.2181 - val_binary_accuracy: 0.6910
Epoch 5/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2161 - binary_accuracy: 0.6914 - val_loss: 0.2148 - val_binary_accuracy: 0.6921
Epoch 6/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2136 - binary_accuracy: 0.6922 - val_loss: 0.2139 - val_binary_accuracy: 0.6921
Epoch 7/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2132 - binary_accuracy: 0.6917 - val_loss: 0.2137 - val_binary_accuracy: 0.6921
Epoch 8/8
120/120 [==============================] - 1s 8ms/step - loss: 0.2129 - binary_accuracy: 0.6919 - val_loss: 0.2136 - val_binary_accuracy: 0.6921
5/5 [==============================] - 0s 11ms/step - loss: 0.2137 - binary_accuracy: 0.6915
[0.21371755003929138, 0.6915410757064819]
thank you in advance for your feedback;).
First of all, 1D convs are a lot different from 2D convs. For 2D convs you want to keep the kernel size low because the number of calculations scales as the square of the kernel size C*K^2 (for 3d convs it gets cubed!), but for 1D it scales linearly. You really want to up that kernel size.
Secondly, you need to normalize your inputs for certain ML approaches, neural networks being one of them.
Finally, I'm a bit worried CNN is the wrong approach for this anyway. On top of that you have a LSTM, which I don't really follow. Maybe start of with just a couple dense layers as a good baseline.
One debugging tip, make a really small dataset and make sure you can overfit i.e. memorize the training set.
We are implementing this following architecture in Tensorflow 2. RNN means LSTM in this model.
This model is used to realize a retrieval-based chatbot found in this paper and this blog. The blog author has a working model written in tf1.
Our Colab notebook can be accessed here. And the model architecture output from our code is
The ? symbol means batch_size we can predefine. 160 is the sentence length, which means the repetitions of the LSTM network, since we feed the LSTM one word at a time. 256 is the output shape of the LSTM network. The training dataset looks like this
This model was claimed to be working while training on a Ubuntu Corpus dataset with 1M records. However, when we train our model, the loss remains at 0.6931 after 2~3 iterations and we assume then the parameters, especially M, are not learned during training. We had the model run for 12 hours and the loss did not decrease at all. Here is the output from training
Now training the model...
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6938(195600,)
1000000/1000000 [==============================] - 2111s 2ms/sample - loss: 0.6938
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.7739(195600,)
1000000/1000000 [==============================] - 2106s 2ms/sample - loss: 0.7739
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6939(195600,)
1000000/1000000 [==============================] - 2163s 2ms/sample - loss: 0.6939
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6932(195600,)
1000000/1000000 [==============================] - 2101s 2ms/sample - loss: 0.6932
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2099s 2ms/sample - loss: 0.6931
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6932(195600,)
1000000/1000000 [==============================] - 2098s 2ms/sample - loss: 0.6932
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2102s 2ms/sample - loss: 0.6931
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2097s 2ms/sample - loss: 0.6931
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2107s 2ms/sample - loss: 0.6931
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2103s 2ms/sample - loss: 0.6931
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2103s 2ms/sample - loss: 0.6931
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2092s 2ms/sample - loss: 0.6931
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2099s 2ms/sample - loss: 0.6931
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2099s 2ms/sample - loss: 0.6931
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2096s 2ms/sample - loss: 0.6931
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2125s 2ms/sample - loss: 0.6931
Train on 1000000 samples
999936/1000000 [============================>.] - ETA: 0s - loss: 0.6931(195600,)
1000000/1000000 [==============================] - 2132s 2ms/sample - loss: 0.6931
Train on 1000000 samples
33792/1000000 [>.............................] - ETA: 24:23 - loss: 0.6931
However, our model will work if we truncate our dataset to 100,000 records, which means the loss will be decreasing steadily.
What could be the reason that this model could train on a small dataset but not on a large dataset?
Thanks.
I'm pretty new to keras I have built a simple network to try:
import numpy as np;
from keras.models import Sequential;
from keras.layers import Dense,Activation;
data= np.genfromtxt("./kerastests/mydata.csv", delimiter=';')
x_target=data[:,29]
x_training=np.delete(data,6,axis=1)
x_training=np.delete(x_training,28,axis=1)
model=Sequential()
model.add(Dense(20,activation='relu', input_dim=x_training.shape[1]))
model.add(Dense(10,activation='relu'))
model.add(Dense(1));
model.compile(optimizer='adam',loss='mean_squared_error',metrics=['accuracy'])
model.fit(x_training, x_target)
From my source data, I have removed 2 columns, as you can see. One is a column that came with dates in a string format (in the dataset, besides it, I have a column for the day, another for the month, and another for the year, so I don't need that column) and the other column is the column I use as target for the model).
When I train this model I get this output:
32/816 [>.............................] - ETA: 23s - loss: 13541942.0000 - acc: 0.0000e+00
800/816 [============================>.] - ETA: 0s - loss: 11575466.0400 - acc: 0.0000e+00
816/816 [==============================] - 1s - loss: 11536905.2353 - acc: 0.0000e+00
Epoch 2/10
32/816 [>.............................] - ETA: 0s - loss: 6794785.0000 - acc: 0.0000e+00
816/816 [==============================] - 0s - loss: 5381360.4314 - acc: 0.0000e+00
Epoch 3/10
32/816 [>.............................] - ETA: 0s - loss: 6235184.0000 - acc: 0.0000e+00
800/816 [============================>.] - ETA: 0s - loss: 5199512.8700 - acc: 0.0000e+00
816/816 [==============================] - 0s - loss: 5192977.4216 - acc: 0.0000e+00
Epoch 4/10
32/816 [>.............................] - ETA: 0s - loss: 4680165.5000 - acc: 0.0000e+00
736/816 [==========================>...] - ETA: 0s - loss: 5050110.3043 - acc: 0.0000e+00
816/816 [==============================] - 0s - loss: 5168771.5490 - acc: 0.0000e+00
Epoch 5/10
32/816 [>.............................] - ETA: 0s - loss: 5932391.0000 - acc: 0.0000e+00
768/816 [===========================>..] - ETA: 0s - loss: 5198882.9167 - acc: 0.0000e+00
816/816 [==============================] - 0s - loss: 5159585.9020 - acc: 0.0000e+00
Epoch 6/10
32/816 [>.............................] - ETA: 0s - loss: 4488318.0000 - acc: 0.0000e+00
768/816 [===========================>..] - ETA: 0s - loss: 5144843.8333 - acc: 0.0000e+00
816/816 [==============================] - 0s - loss: 5151492.1765 - acc: 0.0000e+00
Epoch 7/10
32/816 [>.............................] - ETA: 0s - loss: 6920405.0000 - acc: 0.0000e+00
800/816 [============================>.] - ETA: 0s - loss: 5139358.5000 - acc: 0.0000e+00
816/816 [==============================] - 0s - loss: 5169839.2941 - acc: 0.0000e+00
Epoch 8/10
32/816 [>.............................] - ETA: 0s - loss: 3973038.7500 - acc: 0.0000e+00
672/816 [=======================>......] - ETA: 0s - loss: 5183285.3690 - acc: 0.0000e+00
816/816 [==============================] - 0s - loss: 5141417.0000 - acc: 0.0000e+00
Epoch 9/10
32/816 [>.............................] - ETA: 0s - loss: 4969548.5000 - acc: 0.0000e+00
768/816 [===========================>..] - ETA: 0s - loss: 5126550.1667 - acc: 0.0000e+00
816/816 [==============================] - 0s - loss: 5136524.5098 - acc: 0.0000e+00
Epoch 10/10
32/816 [>.............................] - ETA: 0s - loss: 6334703.5000 - acc: 0.0000e+00
768/816 [===========================>..] - ETA: 0s - loss: 5197778.8229 - acc: 0.0000e+00
816/816 [==============================] - 0s - loss: 5141391.2059 - acc: 0.0000e+00
Why is this happening? My data is a time series. I know that for time series people do not usually use Dense neurons, but it is just a test. What really tricks me is that accuracy is always 0. And, with other tests, I did even lose: gets to a "NAN" value.
Could anybody help here?
Your model seems to correspond to a regression model for the following reasons:
You are using linear (the default one) as an activation function in the output layer (and relu in the layer before).
Your loss is loss='mean_squared_error'.
However, the metric that you use- metrics=['accuracy'] corresponds to a classification problem. If you want to do regression, remove metrics=['accuracy']. That is, use
model.compile(optimizer='adam',loss='mean_squared_error')
Here is a list of keras metrics for regression and classification (taken from this blog post):
Keras Regression Metrics
•Mean Squared Error: mean_squared_error, MSE or mse
•Mean Absolute Error: mean_absolute_error, MAE, mae
•Mean Absolute Percentage Error: mean_absolute_percentage_error, MAPE,
mape
•Cosine Proximity: cosine_proximity, cosine
Keras Classification Metrics
•Binary Accuracy: binary_accuracy, acc
•Categorical Accuracy: categorical_accuracy, acc
•Sparse Categorical Accuracy: sparse_categorical_accuracy
•Top k Categorical Accuracy: top_k_categorical_accuracy (requires you
specify a k parameter)
•Sparse Top k Categorical Accuracy: sparse_top_k_categorical_accuracy
(requires you specify a k parameter)
Add following to get metrics:
history = model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mean_squared_error'])
# OR
history = model.compile(optimizer='adam', loss='mean_absolute_error', metrics=['mean_absolute_error'])
history.history.keys()
history.history
I would like to point out something that is very important and has been unfortunately neglected: mean_squared_error is not an invalid loss function for classification.
The mathematical properties of cross_entropy in conjunction with the assumptions of mean_squared_error(both of which I will not expand upon in this comment) make the latter inappropriate or worse than the cross_entropy when it comes to training on classification problems.
Try this one.
while trying to solve the Titanic problem from kaggle, I forgot to fill the missing data from the Dataframe, because of which the missing data was filled with "nan".
The model threw a similar output
#------------------------------------------------------
Epoch 1/50
891/891 [==============================] - 3s 3ms/step - loss: 9.8239 - acc: 0.0000e+00
Epoch 2/50
891/891 [==============================] - 1s 2ms/step - loss: 9.8231 - acc: 0.0000e+00
Epoch 3/50
891/891 [==============================] - 1s 1ms/step - loss: 9.8231 - acc: 0.0000e+00
Epoch 4/50
891/891 [==============================] - 1s 1ms/step - loss: 9.8231 - acc: 0.0000e+00
Epoch 5/50
891/891 [==============================] - 1s 1ms/step - loss: 9.8231 - acc: 0.0000e+00
#------------------------------------------------------
Make sure you prepare your data before feeding it to the model.
In my case I had to do the following changes
+++++++++++++++++++++++++++++++++++
dataset[['Age']] = dataset[['Age']].fillna(value=dataset[['Age']].mean())
dataset[['Fare']] = dataset[['Fare']].fillna(value=dataset[['Fare']].mean())
dataset[['Embarked']] = dataset[['Embarked']].fillna(value=dataset['Embarked'].value_counts().idxmax())