Keras : Simple 1 variable training to get the mean - tensorflow

I wrote a very basic model to train a single variable to approximate the mean value of a vector. But for some reason, it's not working properly.
I used this page describing a linear fit (2 variables):
https://www.tensorflow.org/guide/basic_training_loops
My code is as follow:
import tensorflow as tf
import numpy as np
class MyModel(tf.keras.Model):
def __init__(self, **kwargs):
super().__init__()
self.b = tf.Variable(1.0, trainable=True)
def call(self, x):
return x - self.b
model = MyModel()
model.compile(optimizer=tf.optimizers.Adam(learning_rate=1e-3), loss='mae')
X = np.random.random((10000,1))
Y = np.zeros(X.shape)
model.fit(X, Y, batch_size=10, epochs=10)
B should be optimized so that sum(abs(X - B)) is as close to 0 as possible (= the mean). However when I fit the model it's not training at all and always reaches to the solution B=0 (the real mean is around 0.5).
What do I do wrong?

This code is working fine. Please check below execution and it's output:
import tensorflow as tf
import numpy as np
class MyModel(tf.keras.Model):
def __init__(self, **kwargs):
super().__init__()
self.b = tf.Variable(1.0, trainable=True)
def call(self, x):
return x - self.b
model = MyModel()
model.compile(optimizer=tf.optimizers.Adam(learning_rate=1e-3), loss='mae')
X = np.random.random((10000,1))
Y = np.zeros(X.shape)
model.fit(X, Y, batch_size=10, epochs=10)
Output:
Epoch 1/10
1000/1000 [==============================] - 2s 1ms/step - loss: 0.2991
Epoch 2/10
1000/1000 [==============================] - 1s 1ms/step - loss: 0.2476
Epoch 3/10
1000/1000 [==============================] - 1s 1ms/step - loss: 0.2476
Epoch 4/10
1000/1000 [==============================] - 1s 1ms/step - loss: 0.2476
Epoch 5/10
1000/1000 [==============================] - 1s 1ms/step - loss: 0.2476
Epoch 6/10
1000/1000 [==============================] - 1s 1ms/step - loss: 0.2476
Epoch 7/10
1000/1000 [==============================] - 2s 2ms/step - loss: 0.2476
Epoch 8/10
1000/1000 [==============================] - 2s 2ms/step - loss: 0.2476
Epoch 9/10
1000/1000 [==============================] - 2s 2ms/step - loss: 0.2476
Epoch 10/10
1000/1000 [==============================] - 2s 2ms/step - loss: 0.2477

Related

Training a Keras model for object location

I am trying to use Keras to train a model for location a certain mountain peak in the images of a moving wide-angle camera. This is my training data, I have 329 images labeled:
The position does not change, so except for lens distortion and rotation, is always looks the same. Peace of cake for a human.
These plots are generated from my training data generator function to rule out any issues in that regard:
train_generator = traingui_generator([], frames_labeled, labels, batch_size)
im,la = next(train_generator)
fig = plt.figure(figsize=(16,10))
for n, (i,l) in enumerate(zip(im,la)):
a = plt.subplot(2,2,n+1)
plt.imshow(i)
plt.plot(l[0],l[1],'r+')
plt.xlim((1440,2880))
plt.ylim((0,1440))
a.invert_yaxis()
plt.show()
With the help of the Keras documentation examples, I came up with this model:
from keras import Input
from keras.models import Model
from keras.optimizers import Adam
from keras.layers import Cropping2D, Rescaling, Conv2D, MaxPooling2D, Flatten, Dense, AveragePooling2D,Resizing
inputs = Input(shape=input_shape)
x = Cropping2D(((0,0),(1440,0)))(inputs)
x = Rescaling(scale=1.0 / 255)(x)
x = Resizing(720,720)(x)
x = Conv2D(32, (3, 3), activation='relu')(x)
x = MaxPooling2D((2, 2))(x)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = MaxPooling2D((2, 2))(x)
x = Flatten()(x)
x = Dense(64, activation='relu')(x)
location = Dense(2)(x)
# Define the model with the input and output layers
model = Model(inputs=inputs, outputs=location)
model.compile(optimizer=Adam(learning_rate=0.001),
loss='mse', metrics=['accuracy'])
batch_size = 4
num_train_samples = 329
num_epochs = 10
train_generator = traingui_generator([], frames_labeled, labels, batch_size)
model.fit(train_generator, steps_per_epoch=num_train_samples // batch_size, epochs=num_epochs)
This is my learning process:
27/27 [==============================] - 87s 3s/step - loss: 1126492.5000 - accuracy: 0.9877
Epoch 2/20
27/27 [==============================] - 87s 3s/step - loss: 750418.5000 - accuracy: 1.0000
Epoch 3/20
27/27 [==============================] - 87s 3s/step - loss: 702527.0625 - accuracy: 1.0000
Epoch 4/20
27/27 [==============================] - 87s 3s/step - loss: 651334.3125 - accuracy: 1.0000
Epoch 5/20
27/27 [==============================] - 87s 3s/step - loss: 591387.7500 - accuracy: 1.0000
Epoch 6/20
27/27 [==============================] - 88s 3s/step - loss: 495730.0625 - accuracy: 0.9722
Epoch 7/20
27/27 [==============================] - 87s 3s/step - loss: 322107.7500 - accuracy: 0.8981
Epoch 8/20
27/27 [==============================] - 87s 3s/step - loss: 213287.5312 - accuracy: 0.8981
Epoch 9/20
27/27 [==============================] - 87s 3s/step - loss: 151553.3281 - accuracy: 0.9475
Epoch 10/20
27/27 [==============================] - 87s 3s/step - loss: 114601.3828 - accuracy: 0.9506
Epoch 11/20
27/27 [==============================] - 87s 3s/step - loss: 96194.7031 - accuracy: 0.9475
Epoch 12/20
27/27 [==============================] - 88s 3s/step - loss: 69348.9922 - accuracy: 0.9321
Epoch 13/20
27/27 [==============================] - 87s 3s/step - loss: 65372.2852 - accuracy: 0.9475
Epoch 14/20
27/27 [==============================] - 88s 3s/step - loss: 58215.0547 - accuracy: 0.9043
Epoch 15/20
27/27 [==============================] - 87s 3s/step - loss: 57038.0078 - accuracy: 0.9475
Epoch 16/20
27/27 [==============================] - 87s 3s/step - loss: 47969.5234 - accuracy: 0.9660
Epoch 17/20
27/27 [==============================] - 87s 3s/step - loss: 45780.5820 - accuracy: 0.9383
Epoch 18/20
27/27 [==============================] - 88s 3s/step - loss: 39562.1836 - accuracy: 0.9660
Epoch 19/20
27/27 [==============================] - 87s 3s/step - loss: 51684.8164 - accuracy: 0.9537
Epoch 20/20
27/27 [==============================] - 88s 3s/step - loss: 45646.8398 - accuracy: 0.9815
I notice that the loss function is quite often increasing during the epochs and the accuracy isn't stable, so I already went down with the learning_rate by one order of magnitude.
Unfortunately, the result is not particularly good:
These plots are generated with the same piece of code as above, just replacing la with the output of model.predict(im).
Note that this on the training data, so I would rather rule out overfitting, if I understand correctly.
I have also tried just two convolutional layers, without success.
Now I read that I should modify the model, but I am lacking guidance in what direction.
Should I just play around randomly, each time waiting for it to finish? Where would I start to adjust? Filter number? Kernel size of convolution or pooling? Number of layers? Number pf epochs? Is the training data even enough?
Or is there something fundamentally wrong in what I am doing?
The size of the image is unfortunately not really open for discussion, I would even like to remove the resize because there will be other, smaller feature that I would like to detect, and add the other halve sphere, such that my final data would have 1440x2880 pixel..
If necessary, I would have access to quite powerful hardware, though, if RAM is an issue, and I would not have too much of a problem if the training took hours or more, if the problem REALLY requires it.
I would really appreciate if someone experienced could give me a push into the right direction for this concrete problem. I have no idea what to expect.
Edit: Given that my images are much larger than most examples and I read that the first layers are meant capture low frequency features, I increased the kernel size of the first layers
x = Conv2D(32, (15, 15), activation='relu')(x)
x = MaxPooling2D((2, 2))(x)
x = Conv2D(64, (5, 5), activation='relu')(x)
x = MaxPooling2D((2, 2))(x)
x = Conv2D(128, (3, 3), activation='relu')(x)
x = MaxPooling2D((2, 2))(x)
If anything, this made things worse.

Validation loss not changing in Resnet

So I have data like in the shape of (25000, 178, 178, 3) where I have 25000 samples and each have 3 different color channel(not the RGB one), where I have around 21k samples with label 0 and rest 4k as label 1. Here's one of my sample data:
array([[[[1.79844797e-01, 1.73587397e-01, 1.73587397e-01, ...,
4.84393053e-02, 5.15680127e-02, 5.46967126e-02],
[1.76716089e-01, 1.79844797e-01, 1.82973504e-01, ...,
5.15680127e-02, 5.31323589e-02, 5.15680127e-02],
[1.81409150e-01, 1.86102197e-01, 1.81409150e-01, ...,
5.15680127e-02, 5.31323589e-02, 5.15680127e-02]]],
[[[2.51065755e+00, 2.53197193e+00, 2.53197193e+00, ...,
1.88543844e+00, 1.89964795e+00, 1.90675282e+00],
[2.51776242e+00, 2.52486706e+00, 2.53197193e+00, ...,
1.89964795e+00, 1.90675282e+00, 1.90675282e+00],
[2.53197193e+00, 2.51776242e+00, 2.52486706e+00, ...,
1.91385746e+00, 1.90675282e+00, 1.90675282e+00]]],
[[[7.13270283e+00, 7.11016369e+00, 7.13270283e+00, ...,
4.85625362e+00, 4.90133190e+00, 4.94641018e+00],
[7.08762503e+00, 7.08762503e+00, 7.08762503e+00, ...,
4.92387104e+00, 4.96894932e+00, 4.96894932e+00],
[7.08762503e+00, 7.08762503e+00, 7.06508589e+00, ...,
4.99148846e+00, 4.96894932e+00, 4.96894932e+00]]],
dtype=float32)
Now firstly I'm trying to normalize by color channel. As each color channel is completely different so I'm normalizing by color channel as follows, dara_array is my whole dataset:
def nan(index):
data_array[:, :, :, index] = (data_array[:, :, :, index] - np.min(data_array[:, :, :, index]))/(np.max(data_array[:, :, :, index]) - np.min(data_array[:, :, : ,index]))
Splitting for training, validation and testing:
rand_indices = np.random.permutation(len(data))
train_indices = rand_indices[0:19000]
valid_indices = rand_indices[19000:21000]
test_indices = rand_indices[21000:len(data)]
x_val = data_array[valid_indices, :]
y_val = EDR[[valid_indices]].astype('float')
x_train = data_array[train_indices, :]
y_train = EDR[[train_indices]].astype('float')
x_test = data_array[test_indices, :]
y_test = EDR[[test_indices]].astype('float')
Then I'm using Imagedatagenerator to fit the training data like this:
gen = ImageDataGenerator(
rotation_range=40,
zoom_range=0.2,
shear_range=0.2,
width_shift_range=0.2,
height_shift_range=0.2,
fill_mode='nearest',
horizontal_flip=True,
)
gen.fit(x_train)
Then I'm using RESNET to train the data as follows:
img_height,img_width = 178, 178
num_classes = 2
base_model = applications.resnet.ResNet101(weights= None, include_top=False, input_shape= (img_height,img_width,3))
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.7)(x)
predictions = Dense(1, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = predictions)
initial_learning_rate = 0.001
def lr_step_decay(epoch, lr):
drop_rate = 0.5
epochs_drop = 10.0
return initial_learning_rate * math.pow(drop_rate, math.floor(epoch/epochs_drop))
sgd = tf.keras.optimizers.SGD(lr = 0.001, momentum = 0.9, decay = 1e-6, nesterov=False)
opt_rms = optimizers.RMSprop(lr=0.001,decay=1e-6)
model.compile(loss = 'binary_crossentropy', optimizer = sgd, metrics = ['accuracy'])
history = model.fit_generator(gen.flow(x_train, y_train, batch_size = 64), 64, epochs = 30, verbose=1, validation_data=(x_val, y_val),
callbacks=[LearningRateScheduler(lr_step_decay)])
And here's how my model is training:
Epoch 1/30
64/64 [==============================] - 46s 713ms/step - loss: 0.5535 - accuracy: 0.8364 - val_loss: 6.0887 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 2/30
64/64 [==============================] - 43s 671ms/step - loss: 0.4661 - accuracy: 0.8562 - val_loss: 0.6467 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 3/30
64/64 [==============================] - 43s 673ms/step - loss: 0.4430 - accuracy: 0.8640 - val_loss: 0.4231 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 4/30
64/64 [==============================] - 45s 699ms/step - loss: 0.4327 - accuracy: 0.8674 - val_loss: 0.3895 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 5/30
64/64 [==============================] - 43s 677ms/step - loss: 0.4482 - accuracy: 0.8559 - val_loss: 0.3607 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 6/30
64/64 [==============================] - 43s 678ms/step - loss: 0.3857 - accuracy: 0.8677 - val_loss: 0.4244 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 7/30
64/64 [==============================] - 43s 677ms/step - loss: 0.4308 - accuracy: 0.8623 - val_loss: 0.4049 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 8/30
64/64 [==============================] - 43s 677ms/step - loss: 0.3776 - accuracy: 0.8711 - val_loss: 0.3580 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 9/30
64/64 [==============================] - 43s 677ms/step - loss: 0.4005 - accuracy: 0.8672 - val_loss: 0.3689 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 10/30
64/64 [==============================] - 43s 676ms/step - loss: 0.3977 - accuracy: 0.8828 - val_loss: 0.3513 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 11/30
64/64 [==============================] - 43s 675ms/step - loss: 0.4394 - accuracy: 0.8682 - val_loss: 0.3491 - val_accuracy: 0.8760 - lr: 5.0000e-04
Epoch 12/30
64/64 [==============================] - 43s 676ms/step - loss: 0.3702 - accuracy: 0.8779 - val_loss: 0.3676 - val_accuracy: 0.8760 - lr: 5.0000e-04
Epoch 13/30
64/64 [==============================] - 43s 678ms/step - loss: 0.3904 - accuracy: 0.8706 - val_loss: 0.3621 - val_accuracy: 0.8760 - lr: 5.0000e-04
Epoch 14/30
64/64 [==============================] - 43s 677ms/step - loss: 0.3579 - accuracy: 0.8765 - val_loss: 0.3483 - val_accuracy: 0.8760 - lr: 5.0000e-04
My validation accuracy is not changing at all, it's remaining constant. And probably it's predicting everything as 0 cause that'll be the exact accuracy of validation data if it predicts everything as 0 as per split(248 1's out of total 2k val record). Can someone tell me what I'm doing wrong here?
Sample plot of one file with 5 time dim(I'm just using 1 for training) and 1 channel from data:
Your observation is indeed correct : the network is not learning anything.
Ensure that your dataset is properly labelled + you feed your data correctly. At the same time, ask&answer the following question: is 178x178 a sufficient resolution for the "other" class that I am trying to detect? If you have already undergone those processes, proceed to the following suggestions.
I would try to start to decrease the learning rate to 0.0001 or 0.00001(although at this point the learning could converge too slowly).
At the same time could you remove the Dropout() altogether to see if your network at least is able to learn anything. At least at this point of investigation Dropout() is not needed, it actually hampers the learning due to the high dropout value used.

Binary vs Multiclass Classification using TPU

I am using an EfficientNetB7 and EfficientNetB0 model for training my dataset, and am facing a major anomaly.
EfficientNetB7 gave 96.4 percent accuracy with 40 epochs, lr_callback,4 nb_classes,imagenet weights.
GCS_DS_PATH = KaggleDatasets().get_gcs_path('plant-pathology-2020-fgvc7')
path='../input/plant-pathology-2020-fgvc7/'
train = pd.read_csv(path + 'train.csv')
test = pd.read_csv(path + 'test.csv')
sub = pd.read_csv(path + 'sample_submission.csv')
train_paths = train.image_id.apply(lambda x : GCS_DS_PATH + '/images/' + x + '.jpg').values
test_paths = test.image_id.apply(lambda x : GCS_DS_PATH + '/images/' + x + '.jpg').values
train_labels = train.loc[:,'healthy':].values.astype(int)
train_labels_healthy = train.loc[:,'healthy'].values.astype(int)
train_labels_multiple_diseases = train.loc[:,'multiple_diseases'].values.astype(int)
train_labels_rust = train.loc[:,'rust'].values.astype(int)
train_labels_scab = train.loc[:,'scab'].values.astype(int)
train_dataset = (
tf.data.Dataset
.from_tensor_slices((train_paths, train_labels))
.map(decode_image, num_parallel_calls=AUTO)
.map(data_augment, num_parallel_calls=AUTO)
.repeat()
.shuffle(512)
.batch(BATCH_SIZE)
.prefetch(AUTO)
)
train_dataset1 = (
tf.data.Dataset
.from_tensor_slices((train_paths, train_labels_healthy_one_hot))
.map(decode_image, num_parallel_calls=AUTO)
.map(data_augment, num_parallel_calls=AUTO)
.repeat()
.shuffle(512)
.batch(BATCH_SIZE)
.prefetch(AUTO)
)
nb_classes=4
def get_model():
base_model = efn.EfficientNetB7(weights='imagenet', include_top=False, pooling='avg', input_shape=(img_size, img_size, 3))
x = base_model.output
predictions = Dense(nb_classes, activation="softmax")(x)
return Model(inputs=base_model.input, outputs=predictions)
with strategy.scope():
model = get_model()
model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])
model.summary()
model.fit(
train_dataset,
steps_per_epoch=train_labels.shape[0] // BATCH_SIZE,
epochs=10
)
Output: Train for 28 steps
Epoch 1/10
28/28 [==============================] - 253s 9s/step - loss: 0.2862 - accuracy: 0.8951
Epoch 2/10
28/28 [==============================] - 15s 535ms/step - loss: 0.1453 - accuracy: 0.9520
Epoch 3/10
28/28 [==============================] - 34s 1s/step - loss: 0.1450 - accuracy: 0.9554
Epoch 4/10
28/28 [==============================] - 35s 1s/step - loss: 0.1271 - accuracy: 0.9587
Epoch 5/10
28/28 [==============================] - 35s 1s/step - loss: 0.0935 - accuracy: 0.9621
Epoch 6/10
28/28 [==============================] - 35s 1s/step - loss: 0.0951 - accuracy: 0.9621
Epoch 7/10
28/28 [==============================] - 35s 1s/step - loss: 0.0615 - accuracy: 0.9721
Epoch 8/10
28/28 [==============================] - 35s 1s/step - loss: 0.0674 - accuracy: 0.9833
Epoch 9/10
28/28 [==============================] - 35s 1s/step - loss: 0.0654 - accuracy: 0.9743
Epoch 10/10
28/28 [==============================] - 35s 1s/step - loss: 0.0435 - accuracy: 0.9821
So, I tried improving the accuracy by using 4 EfficientNetB0 models to predict the 4 classes independently, but the accuracy got stuck at 50 per cent. I tried varying the learning rate to see if it is stuck in a local minimum, but the accuracy is the same.
nb_classes=1
def get_model():
base_model = efn.EfficientNetB0(weights='imagenet', include_top=False, pooling='avg', input_shape=(img_size, img_size, 3))
x = base_model.output
predictions = Dense(nb_classes, activation="softmax")(x)
return Model(inputs=base_model.input, outputs=predictions)
adam = Adam(learning_rate=0.05) #Tried 0.0001,0.001,0.01,0.05
with strategy.scope():
model1 = get_model()
#print('1')
# model2 = get_model()
# print('2')
# model3 = get_model()
# print('3')
# model4 = get_model()
# print('4')
model1.compile(optimizer=adam, loss='binary_crossentropy',metrics=['accuracy'])
#model2.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])
#model3.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])
#model4.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])
model1.summary()
#model2.summary()
#model3.summary()
#model4.summary()
model1.fit(
train_dataset1,
steps_per_epoch=train_labels_rust.shape[0] // BATCH_SIZE,
epochs=10
)
Output: Train for 28 steps
Epoch 1/10
28/28 [==============================] - 77s 3s/step - loss: 7.6666 - accuracy: 0.5000
Epoch 2/10
28/28 [==============================] - 32s 1s/step - loss: 7.6666 - accuracy: 0.5000
Epoch 3/10
28/28 [==============================] - 33s 1s/step - loss: 7.6666 - accuracy: 0.5000
Epoch 4/10
28/28 [==============================] - 33s 1s/step - loss: 7.6666 - accuracy: 0.5000
Epoch 5/10
28/28 [==============================] - 33s 1s/step - loss: 7.6666 - accuracy: 0.5000
Epoch 6/10
28/28 [==============================] - 33s 1s/step - loss: 7.6666 - accuracy: 0.5000
Epoch 7/10
28/28 [==============================] - 33s 1s/step - loss: 7.6666 - accuracy: 0.5000
Epoch 8/10
28/28 [==============================] - 33s 1s/step - loss: 7.6666 - accuracy: 0.5000
Epoch 9/10
28/28 [==============================] - 33s 1s/step - loss: 7.6666 - accuracy: 0.5000
Epoch 10/10
28/28 [==============================] - 34s 1s/step - loss: 7.6666 - accuracy: 0.5000
I also tried other Neural Networks like ResNet50, but the accuracy remained stuck at 50 per cent. Can anyone please tell me where I am committing the mistake.
TO predict more than 2 classes use loss as 'categorical_crossentropy' or 'sparse_categorical_crossentropy' with activation as softmax

Why does the output layer is simply zero at the end of the network?

I am trying to train a model that takes a 15x15 image and classify each pixel into two classes (1/0).
This is my loss function:
smooth = 1
def tversky(y_true, y_pred):
y_true_pos = K.flatten(y_true)
y_pred_pos = K.flatten(y_pred)
true_pos = K.sum(y_true_pos * y_pred_pos)
false_neg = K.sum(y_true_pos * (1-y_pred_pos))
false_pos = K.sum((1-y_true_pos)*y_pred_pos)
alpha = 0.5
return (true_pos + smooth)/(true_pos + alpha*false_neg + (1-alpha)*false_pos + smooth)
def tversky_loss2(y_true, y_pred):
return 1 - tversky(y_true,y_pred)
This is the model:
input_image = layers.Input(shape=(size, size, 1))
b2 = layers.Conv2D(128, (3,3), padding='same', activation='relu')(input_image)
b2 = layers.Conv2D(128, (3,3), padding='same', activation='relu')(b2)
b2 = layers.Conv2D(128, (3,3), padding='same', activation='relu')(b2)
output = layers.Conv2D(1, (1,1), activation='sigmoid', padding='same')(b2)
model = models.Model(input_image, output)
model.compile(optimizer='adam', loss=tversky_loss2, metrics=['accuracy'])
The model left is the input and the label is the middle column and the prediction is always zero on the right column:
The training performs really poorly:
Epoch 1/10
100/100 [==============================] - 4s 38ms/step - loss: 0.9269 - acc: 0.1825
Epoch 2/10
100/100 [==============================] - 3s 29ms/step - loss: 0.9277 - acc: 0.0238
Epoch 3/10
100/100 [==============================] - 3s 29ms/step - loss: 0.9276 - acc: 0.0239
Epoch 4/10
100/100 [==============================] - 3s 29ms/step - loss: 0.9270 - acc: 0.0241
Epoch 5/10
100/100 [==============================] - 3s 30ms/step - loss: 0.9274 - acc: 0.0240
Epoch 6/10
100/100 [==============================] - 3s 29ms/step - loss: 0.9269 - acc: 0.0242
Epoch 7/10
100/100 [==============================] - 3s 29ms/step - loss: 0.9270 - acc: 0.0241
Epoch 8/10
100/100 [==============================] - 3s 29ms/step - loss: 0.9271 - acc: 0.0241
Epoch 9/10
100/100 [==============================] - 3s 29ms/step - loss: 0.9276 - acc: 0.0239
Epoch 10/10
100/100 [==============================] - 3s 29ms/step - loss: 0.9266 - acc: 0.0242
This sounds like a very imbalanced dataset with very tiny true regions. This might be hard to train indeed.
You may want to increase alpha to penalize more false negatives than false positives. Anyway, unless alpha is big enough, it's very normal that in the beginning your model first goes to all neg because it's definitely a great way to decrease the loss.
Now, there is a conceptual mistake regarding how Keras works in that loss. You need to keep the "samples" separate. Otherwise you are calculating a loss as if all images were one image. (Thus, it's probable that images with many positives have a reasoable result, while images with few positives don't, and this will be a good solution)
Fix the loss as:
def tversky(y_true, y_pred):
y_true_pos = K.batch_flatten(y_true) #keep the batch dimension
y_pred_pos = K.batch_flatten(y_pred)
true_pos = K.sum(y_true_pos * y_pred_pos, axis=-1) #don't sum over the batch dimension
false_neg = K.sum(y_true_pos * (1-y_pred_pos), axis=-1)
false_pos = K.sum((1-y_true_pos)*y_pred_pos, axis=-1)
alpha = 0.5
return (true_pos + smooth)/(true_pos + alpha*false_neg + (1-alpha)*false_pos + smooth)
This way you have an individual loss value for each image, so the exitence of images with many positives don't affect the results of images with few positives.

Weird accuracy metric in verbose when using Tensorflow keras loss=tf.losses.sparse_softmax_cross_entropy

While testing out training of Mnist dataset using Tensorflow's Keras api, i witness weird accuracy while specifying the loss=tf.losses.sparse_softmax_cross_entropy in complile statement. I am simply trying usage of 3 different ways of specifying loss functions viz.
loss='sparse_categorical_crossentropy'
loss=tf.losses.sparse_softmax_cross_entropy
loss=tf.losses.softmax_cross_entropy
Google Colab link to explain the point better
Here is the sample code
import tensorflow as tf
import numpy as np
from tensorflow import keras
from keras.datasets import mnist
from sklearn.metrics import f1_score
(x_train,y_train),(x_test,y_test) = mnist.load_data()
model = tf.keras.Sequential([
keras.layers.Flatten(),
keras.layers.Dense(units=128,activation=tf.nn.relu),
keras.layers.Dense(10, activation = tf.nn.softmax)
])
# loss='sparse_categorical_crossentropy'
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
hist = model.fit(x_train/255,y_train,epochs=10, verbose=0 )
y_pred1 = model.predict(x_test/255)
y_pred1 = np.argmax(y_pred1, axis=1)
results1 = (np.array([y_pred1 == y_test]).astype(int).reshape(-1,1))
acc1 = np.asscalar(sum(results1)/results1.shape[0])
print("case 1: loss='sparse_categorical_crossentropy' : "+ str(hist.history['acc'][9]))
print("calculated test acc : "+ str(acc1))
print("_________________________________________________")
# loss=tf.losses.sparse_softmax_cross_entropy
model.compile(optimizer=tf.train.AdamOptimizer(),
loss=tf.losses.sparse_softmax_cross_entropy,
metrics=['accuracy'])
hist = model.fit(x_train/255,y_train,epochs=10,verbose=1)
y_pred2 = model.predict(x_test/255)
y_pred2 = np.argmax(y_pred2, axis=1)
results2 = (np.array([y_pred2 == y_test]).astype(int).reshape(-1,1))
acc2 = np.asscalar(sum(results2)/results2.shape[0])
print("case 2: loss=tf.losses.sparse_softmax_cross_entropy : "+ str(hist.history['acc'][9]))
print("calculated test acc : "+ str(acc2))
print("_________________________________________________")
# loss=tf.losses.softmax_cross_entropy
model.compile(optimizer=tf.train.AdamOptimizer(),
loss=tf.losses.softmax_cross_entropy,
metrics=['accuracy'])
from keras.utils import to_categorical
y_train_onehot = to_categorical(y_train)
hist = model.fit(x_train/255,y_train_onehot,epochs=10,verbose=0)
y_pred3 = model.predict(x_test/255)
y_pred3 = np.argmax(y_pred3, axis=1)
results3 = (np.array([y_pred3 == y_test]).astype(int).reshape(-1,1))
acc3 = np.asscalar(sum(results3)/results3.shape[0])
print("case 3: loss=tf.losses.softmax_cross_entropy : "+ str(hist.history['acc'][9]))
print("calculated test acc : "+ str(acc3))
The output is as shown below
case 1: loss='sparse_categorical_crossentropy' : 0.99495
calculated test acc : 0.978
_________________________________________________
Epoch 1/10
60000/60000 [==============================] - 5s 79us/sample - loss: 1.4690 - acc: 0.0988
Epoch 2/10
60000/60000 [==============================] - 5s 79us/sample - loss: 1.4675 - acc: 0.0987
Epoch 3/10
60000/60000 [==============================] - 4s 75us/sample - loss: 1.4661 - acc: 0.0988
Epoch 4/10
60000/60000 [==============================] - 4s 74us/sample - loss: 1.4656 - acc: 0.0987
Epoch 5/10
60000/60000 [==============================] - 5s 77us/sample - loss: 1.4652 - acc: 0.0987
Epoch 6/10
60000/60000 [==============================] - 5s 78us/sample - loss: 1.4648 - acc: 0.0988
Epoch 7/10
60000/60000 [==============================] - 4s 75us/sample - loss: 1.4644 - acc: 0.0987
Epoch 8/10
60000/60000 [==============================] - 5s 76us/sample - loss: 1.4641 - acc: 0.0988
Epoch 9/10
60000/60000 [==============================] - 5s 79us/sample - loss: 1.4639 - acc: 0.0987
Epoch 10/10
60000/60000 [==============================] - 5s 76us/sample - loss: 1.4639 - acc: 0.0988
case 2: loss=tf.losses.sparse_softmax_cross_entropy : 0.09876667
calculated test acc : 0.9791
_________________________________________________
case 3: loss=tf.losses.softmax_cross_entropy : 0.99883336
calculated test acc : 0.9784
The accuracy appearing in verbose in second case loss=tf.losses.sparse_softmax_cross_entropy is 0.0987 which isn't making sense since evaluating the model on both training and test data is showing accuracy over 0.97