Binary Classification for binary dataset with DNN - tensorflow

I have a dataset of binary data like this:
age (0-9)
age (10-19)
age (20-59)
age (10-19)
gender (male)
gender (female)
...
desired (very much)
desired (moderate)
desired (little)
desired (None)
1
0
0
0
0
1
...
0
1
0
0
0
0
1
0
1
0
...
1
0
0
0
the features here are the first few columns, and the target is the final 4 columns.
I'm trying here to use DNN implemented with tensorflow/keras to fit on this data.
here's my model and code:
input_layer = Input(shape=(len(x_training)))
x = Dense(30,activation="relu")(input_layer)
x = Dense(20,activation="relu")(x)
x = Dense(10,activation="relu")(x)
x = Dense(5,activation="relu")(x)
output_layer = Dense(4,activation="softmax")(x)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer="sgd",
loss="categorical_crossentropy",
metrics=['accuracy'])
model.fit(x=x_train,
y=y_train,
batch_size=128,
epochs=10,
validation_data=(x_validate,y_validate))
and this is the history of the training:
Epoch 1/10
2005/2005 [==============================] - 9s 4ms/step - loss: 1.3864 - accuracy: 0.2525 - val_loss: 1.3863 - val_accuracy: 0.2533
Epoch 2/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2518 - val_loss: 1.3864 - val_accuracy: 0.2486
Epoch 3/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2499 - val_loss: 1.3863 - val_accuracy: 0.2487
Epoch 4/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2515 - val_loss: 1.3863 - val_accuracy: 0.2539
Epoch 5/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2511 - val_loss: 1.3863 - val_accuracy: 0.2504
Epoch 6/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2501 - val_loss: 1.3863 - val_accuracy: 0.2484
Epoch 7/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2511 - val_loss: 1.3863 - val_accuracy: 0.2468
Epoch 8/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2509 - val_loss: 1.3863 - val_accuracy: 0.2519
Epoch 9/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2505 - val_loss: 1.3863 - val_accuracy: 0.2463
Epoch 10/10
2005/2005 [==============================] - 6s 3ms/step - loss: 1.3863 - accuracy: 0.2512 - val_loss: 1.3863 - val_accuracy: 0.2474
<tensorflow.python.keras.callbacks.History at 0x7f6893c61e90>
The accuracy and the loss doesn't change at all, I have tried the following experiments, and all gave the same result:
changed hidden layers activation to sigmoid, tanh
changed the final layer to be only one node and the y_train to be labeled with (1,2,3) instead of one hot encoding, and change the loss function to be sparse categorical cross entropy
changed the optimizer to Adam
changed the data to be in (-1,1) instead of (0,1)
What am I missing here?

I figured out some method to solve this problem, I don't think it's very scientific but actually it worked for my case
First I replaced every "1" in the training dataset with "0.8" and every "0" with "0.2".
Then I multiplied every similar features in some weight. for example if the age is "18" then the features first will be like this [0,1,0,0], then apply the first step the feats will be like this [0.2,0.8,0.2,0.2] then multiply this array to [0.1,0.2,0.3,0.4] and sum them together to give [0.32] which represents the age "18" somehow.
then by applying the previous stages to the features, I got an array of length 15 instead of 22.
The third stage was to apply feature size reduction using PCA to reduce number of features to 10.
This method was like extracting some other features from the existing features by giving it a new domain instead of the binary domain.
This gave me accuracy about 85% which was very satisfying to me.

Related

What the meaning of loss < 0

I am comparing two model the one uses binary_crossentropy(Model A) as optimizer, another uses mean_squared_error(Model B)
Model A)
self.seq_len = 2
in_out_neurons = 50
n_hidden = 500
model = Sequential()
model.add(LSTM(n_hidden, batch_input_shape=(None, self.seq_len, in_out_neurons), return_sequences=True))
model.add(Dense(in_out_neurons, activation="relu"))
optimizer = Adam(learning_rate=0.001)
#model.compile(loss="mean_squared_error", optimizer=optimizer)
model.compile(loss='binary_crossentropy', optimizer=optimizer)
Epoch 1/10
718/718 [==============================] - 32s 42ms/step - loss: -0.0633 - val_loss: -0.0649
Epoch 2/10
718/718 [==============================] - 33s 46ms/step - loss: -0.0632 - val_loss: -0.0572
Epoch 3/10
718/718 [==============================] - 43s 60ms/step - loss: -0.0592 - val_loss: -0.0570
Epoch 4/10
718/718 [==============================] - 51s 71ms/step - loss: -0.0522 - val_loss: -0.0431
Epoch 5/10
718/718 [==============================] - 50s 69ms/step - loss: -0.0566 - val_loss: -0.0535
Epoch 6/10
718/718 [==============================] - 49s 68ms/step - loss: -0.0567 - val_loss: -0.0537
Epoch 7/10
718/718 [==============================] - 48s 67ms/step - loss: -0.0627 - val_loss: -0.0499
Epoch 8/10
718/718 [==============================] - 51s 71ms/step - loss: -0.0621 - val_loss: -0.0614
Epoch 9/10
718/718 [==============================] - 47s 65ms/step - loss: -0.0645 - val_loss: -0.0653
Epoch 10/10
718/718 [==============================] - 43s 60ms/step - loss: -0.0661 - val_loss: -0.0622
Model B)
self.seq_len = 2
in_out_neurons = 50
n_hidden = 500
model = Sequential()
model.add(LSTM(n_hidden, batch_input_shape=(None, self.seq_len, in_out_neurons), return_sequences=True))
model.add(Dense(in_out_neurons, activation="relu"))
optimizer = Adam(learning_rate=0.001)
model.compile(loss="mean_squared_error", optimizer=optimizer)
#model.compile(loss='binary_crossentropy', optimizer=optimizer)
Epoch 1/10
718/718 [==============================] - 36s 48ms/step - loss: 0.0189 - val_loss: 0.0190
Epoch 2/10
718/718 [==============================] - 46s 64ms/step - loss: 0.0188 - val_loss: 0.0189
Epoch 3/10
718/718 [==============================] - 48s 67ms/step - loss: 0.0187 - val_loss: 0.0189
Epoch 4/10
718/718 [==============================] - 58s 81ms/step - loss: 0.0187 - val_loss: 0.0188
Epoch 5/10
718/718 [==============================] - 62s 87ms/step - loss: 0.0186 - val_loss: 0.0188
Epoch 6/10
718/718 [==============================] - 72s 100ms/step - loss: 0.0186 - val_loss: 0.0188
Epoch 7/10
718/718 [==============================] - 73s 102ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 8/10
718/718 [==============================] - 60s 84ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 9/10
718/718 [==============================] - 64s 89ms/step - loss: 0.0185 - val_loss: 0.0187
Epoch 10/10
718/718 [==============================] - 64s 89ms/step - loss: 0.0185 - val_loss: 0.0187
Model B's loss is more than 0 , so it could be understood.
However,Model A's loss is less than 0, what does it mean??
Cross entropy is calculated as minus expected value of logarithm of the result. Usually it is used after sigmoid or softmax activation, where all values <= 1, their logarithms <= 0, and thus the result is >= 0. But you use it after relu activation that can give values > 1, that's why you obtain the result < 0. The moral is that the output layer activation and the loss should correspond to each other and must make sense from the point of view of the task you are trying to solve. Otherwise you may obtain senseless results.

Validation accuracy zero and Loss is higher. Intent classification Using LSTM

I'm trying to Build and LSTM model for intent classification using Tensorflow, Keras. But whenever I'm training the model with 30 or 40 epochs, my 1st 20 validation accuracy is zero and loss is more than accuracy. and if I try to change the code a little bit, validation accuracy is getting lower than Loss.
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 100, 16) 16000
_________________________________________________________________
bidirectional (Bidirectional (None, 64) 12544
_________________________________________________________________
dense (Dense) (None, 24) 1560
_________________________________________________________________
dense_1 (Dense) (None, 3) 75
=================================================================
Total params: 30,179
Trainable params: 30,179
Non-trainable params: 0
_________________________________________________________________
Train on 200 samples, validate on 79 samples
loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy']
Epoch 1/40
2020-08-16 14:08:17.986786: W tensorflow/core/grappler/optimizers/implementation_selector.cc:310] Skipping optimization due to error while loading function libraries: Invalid argument: Functions '__inference___backward_standard_lstm_4932_5417' and '__inference___backward_standard_lstm_4932_5417_specialized_for_StatefulPartitionedCall_at___inference_distributed_function_6117' both implement 'lstm_6b97c168-2c7b-4dc3-93dd-5e68cddc574f' but their signatures do not match.
2020-08-16 14:08:20.550894: W tensorflow/core/grappler/optimizers/implementation_selector.cc:310] Skipping optimization due to error while loading function libraries: Invalid argument: Functions '__inference_standard_lstm_6338' and '__inference_standard_lstm_6338_specialized_for_sequential_bidirectional_forward_lstm_StatefulPartitionedCall_at___inference_distributed_function_7185' both implement 'lstm_62f19cc7-a1f0-447b-a17f-84a70fc095cd' but their signatures do not match.
200/200 - 10s - loss: 1.0798 - accuracy: 0.7850 - val_loss: 1.1327 - val_accuracy: 0.0000e+00
Epoch 2/40
200/200 - 1s - loss: 1.0286 - accuracy: 0.7850 - val_loss: 1.1956 - val_accuracy: 0.0000e+00
Epoch 3/40
200/200 - 1s - loss: 0.9294 - accuracy: 0.7850 - val_loss: 1.4287 - val_accuracy: 0.0000e+00
Epoch 4/40
200/200 - 1s - loss: 0.7026 - accuracy: 0.7850 - val_loss: 2.2190 - val_accuracy: 0.0000e+00
Epoch 5/40
200/200 - 1s - loss: 0.6183 - accuracy: 0.7850 - val_loss: 1.8499 - val_accuracy: 0.0000e+00
Epoch 6/40
200/200 - 1s - loss: 0.5980 - accuracy: 0.7850 - val_loss: 1.5809 - val_accuracy: 0.0000e+00
Epoch 7/40
200/200 - 1s - loss: 0.5927 - accuracy: 0.7850 - val_loss: 1.5118 - val_accuracy: 0.0000e+00
Epoch 8/40
200/200 - 1s - loss: 0.5861 - accuracy: 0.7850 - val_loss: 1.5711 - val_accuracy: 0.0000e+00
Epoch 9/40
200/200 - 1s - loss: 0.5728 - accuracy: 0.7850 - val_loss: 1.5106 - val_accuracy: 0.0000e+00
Epoch 10/40
200/200 - 1s - loss: 0.5509 - accuracy: 0.7850 - val_loss: 1.6389 - val_accuracy: 0.0000e+00
Epoch 11/40
200/200 - 1s - loss: 0.5239 - accuracy: 0.7850 - val_loss: 1.5991 - val_accuracy: 0.0000e+00
Epoch 12/40
200/200 - 1s - loss: 0.4860 - accuracy: 0.7850 - val_loss: 1.4903 - val_accuracy: 0.0000e+00
Epoch 13/40
200/200 - 1s - loss: 0.4388 - accuracy: 0.7850 - val_loss: 1.3937 - val_accuracy: 0.0000e+00
Epoch 14/40
200/200 - 1s - loss: 0.3859 - accuracy: 0.7850 - val_loss: 1.2329 - val_accuracy: 0.0000e+00
Epoch 15/40
200/200 - 1s - loss: 0.3460 - accuracy: 0.7850 - val_loss: 1.1700 - val_accuracy: 0.0000e+00
Epoch 16/40
200/200 - 1s - loss: 0.3323 - accuracy: 0.7850 - val_loss: 1.0077 - val_accuracy: 0.0127
Epoch 17/40
200/200 - 1s - loss: 0.3007 - accuracy: 0.8150 - val_loss: 1.2465 - val_accuracy: 0.2278
Epoch 18/40
200/200 - 0s - loss: 0.2752 - accuracy: 0.9200 - val_loss: 0.8890 - val_accuracy: 0.6329
Epoch 19/40
200/200 - 1s - loss: 0.2613 - accuracy: 0.9700 - val_loss: 0.9181 - val_accuracy: 0.6582
Epoch 20/40
200/200 - 1s - loss: 0.2447 - accuracy: 0.9600 - val_loss: 0.8786 - val_accuracy: 0.7468
Epoch 21/40
200/200 - 1s - loss: 0.2171 - accuracy: 0.9700 - val_loss: 0.7162 - val_accuracy: 0.8481
Epoch 22/40
200/200 - 1s - loss: 0.1949 - accuracy: 0.9700 - val_loss: 0.8051 - val_accuracy: 0.7848
Epoch 23/40
200/200 - 1s - loss: 0.1654 - accuracy: 0.9700 - val_loss: 0.4710 - val_accuracy: 0.8861
Epoch 24/40
200/200 - 1s - loss: 0.1481 - accuracy: 0.9700 - val_loss: 0.4209 - val_accuracy: 0.8861
Epoch 25/40
200/200 - 1s - loss: 0.1192 - accuracy: 0.9700 - val_loss: 0.3792 - val_accuracy: 0.8861
Epoch 26/40
200/200 - 1s - loss: 0.1022 - accuracy: 0.9700 - val_loss: 0.7279 - val_accuracy: 0.8101
Epoch 27/40
200/200 - 1s - loss: 0.0995 - accuracy: 0.9700 - val_loss: 1.3112 - val_accuracy: 0.6582
Epoch 28/40
200/200 - 1s - loss: 0.1161 - accuracy: 0.9650 - val_loss: 0.1435 - val_accuracy: 0.9747
Epoch 29/40
200/200 - 1s - loss: 0.0889 - accuracy: 0.9700 - val_loss: 0.3896 - val_accuracy: 0.8608
Epoch 30/40
200/200 - 1s - loss: 0.0830 - accuracy: 0.9700 - val_loss: 0.3840 - val_accuracy: 0.8608
Epoch 31/40
200/200 - 1s - loss: 0.0688 - accuracy: 0.9700 - val_loss: 0.3100 - val_accuracy: 0.9241
Epoch 32/40
200/200 - 1s - loss: 0.0611 - accuracy: 0.9700 - val_loss: 0.3524 - val_accuracy: 0.8987
Epoch 33/40
200/200 - 1s - loss: 0.0518 - accuracy: 0.9750 - val_loss: 0.4621 - val_accuracy: 0.8481
Epoch 34/40
200/200 - 1s - loss: 0.0457 - accuracy: 0.9900 - val_loss: 0.4344 - val_accuracy: 0.8481
Epoch 35/40
200/200 - 1s - loss: 0.0423 - accuracy: 0.9900 - val_loss: 0.4417 - val_accuracy: 0.8608
Epoch 36/40
200/200 - 1s - loss: 0.0372 - accuracy: 0.9900 - val_loss: 0.4701 - val_accuracy: 0.8481
Epoch 37/40
200/200 - 1s - loss: 0.0319 - accuracy: 0.9950 - val_loss: 0.3913 - val_accuracy: 0.8608
Epoch 38/40
200/200 - 1s - loss: 0.0309 - accuracy: 0.9950 - val_loss: 0.5739 - val_accuracy: 0.7975
Epoch 39/40
200/200 - 1s - loss: 0.0290 - accuracy: 0.9950 - val_loss: 0.5416 - val_accuracy: 0.8228
Epoch 40/40
200/200 - 1s - loss: 0.0292 - accuracy: 1.0000 - val_loss: 0.3162 - val_accuracy: 0.8861
There can be multiple reasons for the validation accuracy to be zero, you can check on these below things to make changes accordingly.
The samples you had taken are very less, train on 200 samples and validate on 79 samples, you can try increasing samples through some upsampling methods.
There is a possibility that there can be unseen data on validation and weights are not useful with unseen data.
For example, if you have digits from 1-9 to classify, if you keep 1-7 for training and 8-9 for validation this data becomes an unseen scenario and val_acc will be 0.
To fix this, shuffle the data before sending it to model.fit.
This can be a silly mistake, check if your validation data is mapped properly to its label.
You can also give validation data explicitly like below.
model.fit(X_train, y_train, batch_size = 32, nb_epoch=30,
shuffle=True,verbose=1,callbacks=[remote, early_stopping],
validation_data (X_validation, y_validation))

Issue with Tensorflow classification - loss not decreasing

Following is the reference code:
Xtrain2 = df2.iloc[:,:-1].values
ytrain2 = df2['L'].values.reshape((200,1))
print(Xtrain2.shape, ytrain2.shape)
#--------------------------
lrelu = lambda x: tf.keras.activations.relu(x, alpha=0.1)
model2 = tf.keras.models.Sequential([
tf.keras.layers.Dense(1501, input_dim=1501, activation='relu'),
tf.keras.layers.Dense(100, activation='relu'),
tf.keras.layers.Dense(100, activation='relu'),
#tf.keras.layers.Dense(1),
#tf.keras.layers.Dense(1, activation=lrelu)
tf.keras.layers.Dense(1, activation='sigmoid')
])
model2.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
#--------------------------
model2.fit(Xtrain2, ytrain2, epochs=50)#, verbose=0)
Just a simple attempt at a classifier. The last layer is sigmoid since it's just a binary classifier. The loss is also appropriate for the problem. Dimension of input is 1500 and the number of samples is 200. I get the following output:
(200, 1501) (200, 1)
Train on 200 samples
Epoch 1/50
200/200 [==============================] - 0s 2ms/sample - loss: 0.4201 - accuracy: 0.0300
Epoch 2/50
200/200 [==============================] - 0s 359us/sample - loss: -1.1114 - accuracy: 0.0000e+00
Epoch 3/50
200/200 [==============================] - 0s 339us/sample - loss: -4.6102 - accuracy: 0.0000e+00
Epoch 4/50
200/200 [==============================] - 0s 344us/sample - loss: -13.7864 - accuracy: 0.0000e+00
Epoch 5/50
200/200 [==============================] - 0s 342us/sample - loss: -34.7789 - accuracy: 0.0000e+00
.
.
.
Epoch 40/50
200/200 [==============================] - 0s 348us/sample - loss: -905166.4000 - accuracy: 0.3750
Epoch 41/50
200/200 [==============================] - 0s 344us/sample - loss: -1010177.5300 - accuracy: 0.3400
Epoch 42/50
200/200 [==============================] - 0s 354us/sample - loss: -1129819.1825 - accuracy: 0.3450
Epoch 43/50
200/200 [==============================] - 0s 379us/sample - loss: -1263355.3200 - accuracy: 0.3900
Epoch 44/50
200/200 [==============================] - 0s 359us/sample - loss: -1408803.0400 - accuracy: 0.3750
Epoch 45/50
200/200 [==============================] - 0s 355us/sample - loss: -1566850.5900 - accuracy: 0.3300
Epoch 46/50
200/200 [==============================] - 0s 359us/sample - loss: -1728280.7550 - accuracy: 0.3550
Epoch 47/50
200/200 [==============================] - 0s 354us/sample - loss: -1909759.2400 - accuracy: 0.3400
Epoch 48/50
200/200 [==============================] - 0s 379us/sample - loss: -2108889.7200 - accuracy: 0.3750
Epoch 49/50
200/200 [==============================] - 0s 369us/sample - loss: -2305491.9800 - accuracy: 0.3700
Epoch 50/50
200/200 [==============================] - 0s 374us/sample - loss: -2524282.6300 - accuracy: 0.3050
I don't see where I'm going wrong in the above code. Any help would be appreciated!

tf.distribute.MirroredStrategy.scope mode setting vocab_size does not match but does not report an error

I am using tf.distribute.MirroredStrategy() to train the textcnn model, but when I set vocab_size=0 or other wrong values, no error will be reported in this mode. When tf.distribute.MirroredStrategy() is not used, the wrong vocab_size will immediately report an error
use wrong value fro vocab_size:
model=TextCNN(padding_size,vocab_size-10,embed_size,filter_num,num_classes)
model.compile(loss='sparse_categorical_crossentropy',optimizer=tf.keras.optimizers.Adam(),metrics=['accuracy'])
model.fit(train_dataset, epochs=epoch,validation_data=valid_dataset, callbacks=callbacks)
Error:
2 root error(s) found.
(0) Invalid argument: indices[63,10] = 4726 is not in [0, 4726)
[[node text_cnn_1/embedding/embedding_lookup (defined at <ipython-input-7-6ef8a4397184>:37) ]]
[[Adam/Adam/update/AssignSubVariableOp/_45]]
(1) Invalid argument: indices[63,10] = 4726 is not in [0, 4726)
[[node text_cnn_1/embedding/embedding_lookup (defined at <ipython-input-7-6ef8a4397184>:37) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_234431]
but with strategy.scope() no Error and works well:
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
print(vocab_size)
model=TextCNN(padding_size,vocab_size-1000,embed_size,filter_num,num_classes)
model.compile(loss='sparse_categorical_crossentropy',optimizer=tf.keras.optimizers.Adam(),metrics=['accuracy'])
model.fit(train_dataset, epochs=epoch,validation_data=valid_dataset, callbacks=callbacks)
log like this(looks very good):
Learning rate for epoch 1 is 0.0010000000474974513
2813/2813 [==============================] - 16s 6ms/step - loss: 0.8097 - accuracy: 0.7418 - val_loss: 0.4567 - val_accuracy: 0.8586 - lr: 0.0010
Epoch 2/15
2813/2813 [==============================] - ETA: 0s - loss: 0.4583 - accuracy: 0.8560
Learning rate for epoch 2 is 0.0010000000474974513
2813/2813 [==============================] - 14s 5ms/step - loss: 0.4583 - accuracy: 0.8560 - val_loss: 0.4051 - val_accuracy: 0.8756 - lr: 0.0010
Epoch 3/15
2810/2813 [============================>.] - ETA: 0s - loss: 0.3909 - accuracy: 0.8768
Learning rate for epoch 3 is 0.0010000000474974513
2813/2813 [==============================] - 14s 5ms/step - loss: 0.3909 - accuracy: 0.8767 - val_loss: 0.3853 - val_accuracy: 0.8844 - lr: 0.0010
Epoch 4/15
2811/2813 [============================>.] - ETA: 0s - loss: 0.2999 - accuracy: 0.9047
Learning rate for epoch 4 is 9.999999747378752e-05
2813/2813 [==============================] - 14s 5ms/step - loss: 0.2998 - accuracy: 0.9047 - val_loss: 0.3700 - val_accuracy: 0.8865 - lr: 1.0000e-04
Epoch 5/15
2807/2813 [============================>.] - ETA: 0s - loss: 0.2803 - accuracy: 0.9114
Learning rate for epoch 5 is 9.999999747378752e-05
2813/2813 [==============================] - 15s 5ms/step - loss: 0.2803 - accuracy: 0.9114 - val_loss: 0.3644 - val_accuracy: 0.8888 - lr: 1.0000e-04
Epoch 6/15
2803/2813 [============================>.] - ETA: 0s - loss: 0.2639 - accuracy: 0.9162
Learning rate for epoch 6 is 9.999999747378752e-05
2813/2813 [==============================] - 14s 5ms/step - loss: 0.2636 - accuracy: 0.9163 - val_loss: 0.3615 - val_accuracy: 0.8896 - lr: 1.0000e-04
Epoch 7/15
2805/2813 [============================>.] - ETA: 0s - loss: 0.2528 - accuracy: 0.9188
Learning rate for epoch 7 is 9.999999747378752e-05
2813/2813 [==============================] - 14s 5ms/step - loss: 0.2526 - accuracy: 0.9189 - val_loss: 0.3607 - val_accuracy: 0.8909 - lr: 1.0000e-04
More simply,like this,run and no error:
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = Sequential()
model.add(Embedding(1000, 64, input_length=20))
test_array=np.random.randint(10000,size=(32,20))
model.predict(test_array)
why???

KerasLayer vs tf.keras.applications performances

I've trained some networks with ResNetV2 50 ( https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/4 ) and it work very well for my datasets.
Then I tried tf.keras.applications.ResNet50 and accuracy is very lower than the other.
Here two models:
The first (with hub)
base_model = hub.KerasLayer('https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/4', input_shape=(IMAGE_H, IMAGE_W, 3))
base_model.trainable = False
model = tf.keras.Sequential([
base_model ,
Dense(num_classes, activation='softmax')
])
The second (with keras.applications)
base_model = tf.keras.applications.ResNet50V2(input_shape=(IMAGE_H, IMAGE_W, 3), include_top=False, weights='imagenet', pooling='avg')
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
Dense(num_classes, activation='softmax')
])
The optimizer is the same (Adam), epochs, steps, dataset (train and validation), learning rate are the same as well. But the first start with a val_accuracy near 80% and end with an accuracy near 99%, the second start with 85% of val_accuracy from first to last epoch, as it's overfitting. I got the same behavior changing dataset and parameters for each model.
What am I doing wrong?
Both tf.keras.applications.ResNet50V2 and hub.KerasLayer for ResNet50V2 are trained on the same image_net dataset and share the same weights, there is no difference between both.
Coming to the accuracy difference, I have tried both the API's to load base model and run on 10 epochs.
I saw a minor difference in the accuracy of both training and validation.
Then again after running the TensorFlow hub base model for the second time, again I found some minor change in accuracy.This may be due to the order of calculation of these and also could be floating point precision error.
Below are the outputs:
tf.keras.applications.ResNet50V2 as base model.
base_model = tf.keras.applications.ResNet50V2(input_shape=(96, 96, 3), include_top=False, weights='imagenet', pooling='avg')
base_model.trainable = False
model = tf.keras.Sequential([
base_model,
Dense(10, activation='softmax')
])
model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"])
H = model.fit_generator(aug.flow(trainX, trainY, batch_size=16),
validation_data=(testX, testY), steps_per_epoch=len(trainX) // 16,
epochs=10)
Output 1:
Epoch 1/10
103/103 [==============================] - 55s 535ms/step - loss: 1.5820 - accuracy: 0.4789 - val_loss: 0.9162 - val_accuracy: 0.6949
Epoch 2/10
103/103 [==============================] - 57s 554ms/step - loss: 0.9539 - accuracy: 0.6534 - val_loss: 0.8376 - val_accuracy: 0.6852
Epoch 3/10
103/103 [==============================] - 55s 532ms/step - loss: 0.8610 - accuracy: 0.6944 - val_loss: 0.7104 - val_accuracy: 0.7240
Epoch 4/10
103/103 [==============================] - 55s 533ms/step - loss: 0.7671 - accuracy: 0.7214 - val_loss: 0.5988 - val_accuracy: 0.7918
Epoch 5/10
103/103 [==============================] - 55s 536ms/step - loss: 0.6994 - accuracy: 0.7526 - val_loss: 0.6029 - val_accuracy: 0.7676
Epoch 6/10
103/103 [==============================] - 55s 537ms/step - loss: 0.6880 - accuracy: 0.7508 - val_loss: 0.6121 - val_accuracy: 0.7724
Epoch 7/10
103/103 [==============================] - 55s 533ms/step - loss: 0.6588 - accuracy: 0.7593 - val_loss: 0.5486 - val_accuracy: 0.8015
Epoch 8/10
103/103 [==============================] - 55s 534ms/step - loss: 0.6640 - accuracy: 0.7630 - val_loss: 0.5287 - val_accuracy: 0.8232
Epoch 9/10
103/103 [==============================] - 54s 528ms/step - loss: 0.6004 - accuracy: 0.7881 - val_loss: 0.4598 - val_accuracy: 0.8426
Epoch 10/10
103/103 [==============================] - 55s 530ms/step - loss: 0.5583 - accuracy: 0.8016 - val_loss: 0.4605 - val_accuracy: 0.8426
Tensorflow Hub as base model.
tf.keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)
base_model =hub.KerasLayer('https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/4', input_shape=(96, 96, 3))
base_model.trainable = False
model = tf.keras.Sequential([base_model ,Dense(10, activation='softmax')])
model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"])
H = model.fit_generator(aug.flow(trainX, trainY, batch_size=16),
validation_data=(testX, testY), steps_per_epoch=len(trainX) // 16,
epochs=10)
Output 2:
Epoch 1/10
103/103 [==============================] - 54s 526ms/step - loss: 1.7543 - accuracy: 0.4464 - val_loss: 1.0185 - val_accuracy: 0.5981
Epoch 2/10
103/103 [==============================] - 53s 519ms/step - loss: 0.9827 - accuracy: 0.6283 - val_loss: 1.0067 - val_accuracy: 0.6416
Epoch 3/10
103/103 [==============================] - 56s 548ms/step - loss: 0.8719 - accuracy: 0.6944 - val_loss: 0.7195 - val_accuracy: 0.7240
Epoch 4/10
103/103 [==============================] - 54s 521ms/step - loss: 0.8177 - accuracy: 0.7208 - val_loss: 0.7490 - val_accuracy: 0.7385
Epoch 5/10
103/103 [==============================] - 54s 522ms/step - loss: 0.7641 - accuracy: 0.7379 - val_loss: 0.6325 - val_accuracy: 0.7797
Epoch 6/10
103/103 [==============================] - 54s 523ms/step - loss: 0.6551 - accuracy: 0.7655 - val_loss: 0.6431 - val_accuracy: 0.7579
Epoch 7/10
103/103 [==============================] - 55s 530ms/step - loss: 0.6538 - accuracy: 0.7734 - val_loss: 0.5824 - val_accuracy: 0.7797
Epoch 8/10
103/103 [==============================] - 55s 536ms/step - loss: 0.6387 - accuracy: 0.7691 - val_loss: 0.6254 - val_accuracy: 0.7772
Epoch 9/10
103/103 [==============================] - 56s 540ms/step - loss: 0.6394 - accuracy: 0.7685 - val_loss: 0.6539 - val_accuracy: 0.7554
Epoch 10/10
103/103 [==============================] - 55s 536ms/step - loss: 0.5816 - accuracy: 0.7955 - val_loss: 0.5703 - val_accuracy: 0.7990
Now if I run the Tensorflow hub model again.
Output 3:
Epoch 1/10
103/103 [==============================] - 55s 534ms/step - loss: 1.6412 - accuracy: 0.4764 - val_loss: 1.0697 - val_accuracy: 0.5738
Epoch 2/10
103/103 [==============================] - 54s 528ms/step - loss: 1.0312 - accuracy: 0.6412 - val_loss: 1.0196 - val_accuracy: 0.6077
Epoch 3/10
103/103 [==============================] - 59s 570ms/step - loss: 0.8710 - accuracy: 0.6975 - val_loss: 0.7088 - val_accuracy: 0.7240
Epoch 4/10
103/103 [==============================] - 54s 529ms/step - loss: 0.8108 - accuracy: 0.7128 - val_loss: 0.6539 - val_accuracy: 0.7458
Epoch 5/10
103/103 [==============================] - 54s 522ms/step - loss: 0.7311 - accuracy: 0.7440 - val_loss: 0.6029 - val_accuracy: 0.7676
Epoch 6/10
103/103 [==============================] - 54s 523ms/step - loss: 0.6683 - accuracy: 0.7612 - val_loss: 0.6621 - val_accuracy: 0.7506
Epoch 7/10
103/103 [==============================] - 54s 527ms/step - loss: 0.6518 - accuracy: 0.7753 - val_loss: 0.6166 - val_accuracy: 0.7700
Epoch 8/10
103/103 [==============================] - 54s 524ms/step - loss: 0.6147 - accuracy: 0.7795 - val_loss: 0.5611 - val_accuracy: 0.7797
Epoch 9/10
103/103 [==============================] - 54s 522ms/step - loss: 0.6671 - accuracy: 0.7667 - val_loss: 0.5126 - val_accuracy: 0.8087
Epoch 10/10
103/103 [==============================] - 54s 525ms/step - loss: 0.6090 - accuracy: 0.7832 - val_loss: 0.6355 - val_accuracy: 0.7627
Hope this answeres your question.