The interaction between two networks in a Tensorflow custom loss function - tensorflow

Assume you have two Tensorflow models model_A and model_B and the training loop looks something like this,
with tf.GradientTape() as tape:
output_A = model_A(input)
output_B = model_B(input)
loss = loss_fn(output_A, output_B, true_output_A, true_output_B)
grads = tape.gradient(loss, model_A.trainable_variables)
optimizer.apply_gradients(zip(grads, model_A.trainable_variables))
and define the loss function as,
def loss_fn(output_A, output_B, true_output_A, true_output_B)
loss = (output_A + output_B) - (true_output_A + true_output_B)
return loss
The loss function that is being used to update model_A has the output of another network (output_B). How does Tensorflow handle this situation?
Does it use the weights of model_B when computing the gradient? or does it deal with output_B as a constant and not try to trace its origins?

It won't use model_B weights, only model_A weights will be updated.
For example:
import tensorflow as tf
# Model1
cnnin = tf.keras.layers.Input(shape=(10, 10, 1))
x = tf.keras.layers.Conv2D(8, 4)(cnnin)
x = tf.keras.layers.Conv2D(16, 4)(x)
x = tf.keras.layers.Conv2D(32, 2)(x)
x = tf.keras.layers.Conv2D(64, 2)(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(4)(x)
x = tf.keras.layers.Dense(4, activation="relu")(x)
cnnout = tf.keras.layers.Dense(1, activation="linear")(x)
# Model 2
mlpin= tf.keras.layers.Input(shape=(10, 10, 1), name="mlp_input")
z= tf.keras.layers.Dense(4, activation="sigmoid")(mlpin)
z= tf.keras.layers.Dense(4, activation = "softmax")(z)
z = tf.keras.layers.Flatten()(z)
z = tf.keras.layers.Dense(4)(z)
mlpout = tf.keras.layers.Dense(1, activation="linear")(z)
Loss function
def loss_fn(output_A, output_B, true_output_A, true_output_B):
output_A = tf.reshape(output_A, [-1])
output_B = tf.reshape(output_B, [-1])
pred = tf.reduce_sum(output_A + output_B)
inputs = tf.reduce_sum(true_output_A+ true_output_B)
loss = inputs-pred
return loss
Customize what happens in Model.fit
loss_tracker = tf.keras.metrics.Mean(name = "custom_loss")
class TestModel(tf.keras.Model):
def __init__(self, model1, model2):
super(TestModel, self).__init__()
self.model1 = model1
self.model2 = model2
def compile(self, optimizer):
super(TestModel, self).compile()
self.optimizer = optimizer
def train_step(self, data):
x, (y1, y2) = data
with tf.GradientTape() as tape:
ypred1 = self.model1([x], training = True)
ypred2 = self.model2([x], training = True)
loss_value = loss_fn(ypred1, ypred2, y1,y2)
# Compute gradients
trainable_vars = self.model1.trainable_variables
gradients = tape.gradient(loss_value, trainable_vars)
# Update weights
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
loss_tracker.update_state(loss_value)
return {"loss": loss_tracker.result()}
Define model1 and model2 and save them, so you can check the weights after training
model1= tf.keras.models.Model(cnnin, cnnout, name="model1")
model2 = tf.keras.models.Model(mlpin, mlpout, name="model2")
model1.save('test_model1.h5')
model2.save('test_model2.h5')
import numpy as np
x = np.random.rand(6, 10,10,1)
y1 = np.random.rand(6,1)
y2 = np.random.rand(6,1)
trainable_model = TestModel(model1, model2)
trainable_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate = 0.0001))
trainable_model.fit(x=x, y = (y1, y2), epochs=10)
Gives the following output:
Epoch 1/10
1/1 [==============================] - 0s 375ms/step - loss: 7.9465
Epoch 2/10
1/1 [==============================] - 0s 6ms/step - loss: 7.8509
Epoch 3/10
1/1 [==============================] - 0s 6ms/step - loss: 7.7547
Epoch 4/10
1/1 [==============================] - 0s 6ms/step - loss: 7.6577
Epoch 5/10
1/1 [==============================] - 0s 5ms/step - loss: 7.5600
Epoch 6/10
1/1 [==============================] - 0s 4ms/step - loss: 7.4608
Epoch 7/10
1/1 [==============================] - 0s 4ms/step - loss: 7.3574
Epoch 8/10
1/1 [==============================] - 0s 6ms/step - loss: 7.2514
Epoch 9/10
1/1 [==============================] - 0s 5ms/step - loss: 7.1429
Epoch 10/10
1/1 [==============================] - 0s 5ms/step - loss: 7.0323
Then load saved models and check the trainable_weights:
test_model1 = tf.keras.models.load_model('test_model1.h5')
test_model2 = tf.keras.models.load_model('test_model2.h5')
Compare model1 trainable_weights before and after training (they should all change):
model1_weights = [i for i in model1.trainable_weights]
for i in range(len(model1_weights)):
print(np.array_equal(model1.trainable_weights[i], test_model1.trainable_weights[i]))
Outputs:
False
False
False
False
False
False
False
False
False
False
False
False
False
False
Compare model2 trainable_weights before and after training (they should all be the same):
model2_weights = [i for i in model2.trainable_weights]
for i in range(len(model2_weights)):
print(np.array_equal(model2.trainable_weights[i], test_model2.trainable_weights[i]))
Outputs:
True
True
True
True
True
True
True
True

Related

Is the loss function wrong in the following code for binary classification of images using soft labels? Or is there some other problem?

We are using CNN to classify images with labels 0 and 1 in tensorflow.
However, in reality, images have probability values between 0 and 1, not one-hot labels of 0 and 1. Images with probabilities in the range [0, 0.5) are labeled 0, and images in the range [0.5, 1.0] are labeled 1. I want to check whether the classification performance is better if binary classification is performed using soft labels between 0 and 1 instead of one-hot labels.
The code below is an example of binary classification only with data labeled 0 and 1 in the cifar10 dataset.
In the code below, the accuracy is about 98% without 'making soft labels part', but about 48% with 'making soft labels part'.
Should I modify the 'BinaryCrossEntropy_custom' function, which is the loss function, to solve the problem? Or is something else wrong?
This answer says that using logits solves it. I understand soft_labels argument, but what value should I put in logits argument in this example code?
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras import optimizers
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications import vgg16
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
# Rewrite the binary cross entropy function. We will modify this function to return a loss that fits the soft label later.
def BinaryCrossEntropy_custom(y_true, y_pred):
y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
term_0 = (1 - y_true) * K.log(1 - y_pred + K.epsilon())
term_1 = y_true * K.log(y_pred + K.epsilon())
return -K.mean(term_0 + term_1, axis=0)
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
# Only data with labels 0 and 1 are used.
train_ind01 = np.where((train_labels == 0) | (train_labels == 1))[0]
test_ind01 = np.where((test_labels == 0) | (test_labels == 1))[0]
train_images = train_images[train_ind01, :, :, :]
test_images = test_images[test_ind01, :, :, :]
train_labels = train_labels[train_ind01, :]
test_labels = test_labels[test_ind01, :]
train_labels = np.array(train_labels).astype('float64')
test_labels = np.array(test_labels).astype('float64')
# making soft labels part start
# Samples with label 0 are replaced with labels in the range [0,0.2],
# and samples with label 1 are replaced by labels in the range [0.8, 1.0].
sampl_train = np.random.uniform(low=-0.2, high=0.2, size=train_labels.shape)
sampl_test = np.random.uniform(low=-0.2, high=0.2, size=test_labels.shape)
train_labels = train_labels + sampl_train
test_labels = test_labels + sampl_test
train_labels = np.clip(train_labels, 0.0, 1.0)
test_labels = np.clip(test_labels, 0.0, 1.0)
# making soft labels part end
vgg = vgg16.VGG16(include_top=False, weights='imagenet', input_shape=(32, 32, 3))
output = vgg.layers[-1].output
output = layers.Flatten()(output)
output = layers.Dense(512, activation='relu')(output)
output = layers.Dropout(0.2)(output)
output = layers.Dense(256, activation='relu')(output)
output = layers.Dropout(0.2)(output)
predictions = layers.Dense(units=1, activation="sigmoid")(output)
model = Model(inputs=vgg.input, outputs=predictions)
model.compile(optimizer=Adam(learning_rate=.0001), loss=BinaryCrossEntropy_custom, metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=100,
validation_data=(test_images, test_labels))
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(test_acc)
The console output with making soft labels part is
Epoch 1/100
2022-09-16 15:29:29.136931: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8101
313/313 [==============================] - 17s 42ms/step - loss: 0.2951 - accuracy: 0.4779 - val_loss: 0.2775 - val_accuracy: 0.4650
Epoch 2/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2419 - accuracy: 0.4931 - val_loss: 0.2488 - val_accuracy: 0.4695
Epoch 3/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2290 - accuracy: 0.4978 - val_loss: 0.2424 - val_accuracy: 0.4740
Epoch 4/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2161 - accuracy: 0.5002 - val_loss: 0.2404 - val_accuracy: 0.4765
Epoch 5/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2139 - accuracy: 0.5007 - val_loss: 0.2620 - val_accuracy: 0.4730
Epoch 6/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2118 - accuracy: 0.5023 - val_loss: 0.2480 - val_accuracy: 0.4745
Epoch 7/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2097 - accuracy: 0.5019 - val_loss: 0.2350 - val_accuracy: 0.4775
Epoch 8/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2098 - accuracy: 0.5024 - val_loss: 0.2289 - val_accuracy: 0.4780
Epoch 9/100
313/313 [==============================] - 12s 38ms/step - loss: 0.2034 - accuracy: 0.5039 - val_loss: 0.2364 - val_accuracy: 0.4780
Epoch 10/100
313/313 [==============================] - 12s 39ms/step - loss: 0.2025 - accuracy: 0.5040 - val_loss: 0.2481 - val_accuracy: 0.4720

Error in module.fit_generator with Keras. Wrong number of channels given

I'm trying to train a kidney segmentation network on medical images.
Whenever I try to compile my network model, it gives me this warning: "NumpyArrayIterator is set to use the data format convention "channels_last" (channels on axis 3), i.e. expected either 1, 3, or 4 channels on axis 3. However, it was passed an array with shape (1947, 352, 352, 2) (2 channels)."
The shape of my X_Train are n_slice_train,IMG_HEIGHT,IMG_WIDTH,IMG_CHANNELS.
The shape of my Y_Train (mask) are n_slice_train, IMG_HEIGHT, IMG_WIDTH, NUM_CLASSES.
IMG_CHANNELS = 3
NUM_CLASSES = 2
dataset_name = "data"
# I want to train a network that takes a 512x512x3 (rgb) image as input
# and the respective segmentation mask (0: background, 255: lesion)
IMG_WIDTH = 352
IMG_HEIGHT = 352
IMG_CHANNELS = 3
NUM_CLASSES = 2
hu_min = -100
hu_max = 400
n_slice_train = sum(volumes_train_info["Num_slice_selected"])
n_slice_val = sum(volumes_val_info["Num_slice_selected"])
train_vol = os.listdir(path_train_vol)
val_vol = os.listdir(path_val_vol)
train_mask = os.listdir(path_train_mask)
val_mask = os.listdir(path_val_mask)
X_train = np.zeros((n_slice_train,IMG_HEIGHT,IMG_WIDTH,IMG_CHANNELS), dtype=np.uint8)
Y_train = np.zeros((n_slice_train,IMG_HEIGHT,IMG_WIDTH,NUM_CLASSES), dtype=np.float32)
desired_spacing = 1 # mm/px
volumes_train_info["New_PixelSpacing"] = {}
i = 0
for n, id_ in tqdm(enumerate(train_vol), total=len(train_vol)):
case_name = str(id_)
Vol = np.load(path_train_vol+'/'+case_name)
old_spacing = volumes_train_info["PixelSpacing_x"][n]
Vol, new_spacing = preprocess_vol(Vol, desired_spacing, old_spacing,IMG_HEIGHT,IMG_WIDTH, hu_min, hu_max)
volumes_train_info["New_PixelSpacing"][n] = new_spacing
for s in range(len(Vol)):
X_train[i] = Vol[s,:,:,:]
i += 1
j = 0
for n, id_ in tqdm(enumerate(train_mask), total=len(train_mask)):
case_name = str(id_)
Mask = np.load(path_train_mask+'/'+case_name)
old_spacing = volumes_train_info["PixelSpacing_x"][n]
Mask = preprocess_seg(Mask, desired_spacing, old_spacing,IMG_HEIGHT,IMG_WIDTH)
for s in range(len(Mask)):
mask = Mask[s,:,:]
mask = to_categorical(mask, num_classes=NUM_CLASSES, dtype='float32')
Y_train[j] = mask
j += 1
X_val = np.zeros((n_slice_val,IMG_WIDTH,IMG_HEIGHT,IMG_CHANNELS), dtype=np.uint8)
Y_val= np.zeros((n_slice_val,IMG_WIDTH,IMG_HEIGHT,NUM_CLASSES), dtype=np.float32)
i = 0
for n, id_ in tqdm(enumerate(val_vol), total=len(val_vol)):
case_name = str(id_)
Vol = np.load(path_val_vol+'/'+case_name)
old_spacing = volumes_val_info["PixelSpacing_x"][n]
Vol, new_spacing = preprocess_vol(Vol, desired_spacing, old_spacing,IMG_HEIGHT,IMG_WIDTH, hu_min, hu_max)
volumes_val_info["New_PixelSpacing"][n] = new_spacing
for s in range(len(Vol)):
X_val[i] = Vol[s,:,:,:]
i = i + 1
j = 0
for n, id_ in tqdm(enumerate(val_mask), total=len(val_mask)):
case_name = str(id_)
Mask = np.load(path_val_mask+'/'+case_name)
old_spacing = volumes_val_info["PixelSpacing_x"][n]
Mask = preprocess_seg(Mask, desired_spacing, old_spacing,IMG_HEIGHT,IMG_WIDTH)
for s in range(len(Mask)):
mask = Mask[s,:,:]
mask = to_categorical(mask, num_classes=NUM_CLASSES, dtype='float32')
Y_val[j] = mask
j = j + 1
# Data augmentation (training set)
image_datagen = ImageDataGenerator(rotation_range = 3,
width_shift_range = 0.2,
height_shift_range = 0.2,
horizontal_flip = True,
vertical_flip = False,
fill_mode = 'nearest')
# Data augmentation (validation set)
val_datagen = ImageDataGenerator()
# Generator
seed = 1
def XYaugmentGenerator(X1, y, seed, batch_size):
genX1 = image_datagen.flow(X1, y, batch_size=batch_size, seed=seed)
genX2 = image_datagen.flow(y, X1, batch_size=batch_size, seed=seed)
while True:
X1i = genX1.next()
X2i = genX2.next()
yield X1i[0], X2i[0]
BACKBONE = 'resnet34'
model = Unet(backbone_name=BACKBONE,
input_shape=(IMG_HEIGHT,IMG_WIDTH,IMG_CHANNELS),
encoder_weights='imagenet',
encoder_freeze=False,
decoder_block_type='transpose',
classes=NUM_CLASSES,
activation='sigmoid')
model.compile(optimizer=Adam(lr=0.00001), loss='binary_crossentropy', metrics=['binary_accuracy']) # scegliamo un learning rate di 10e-5
n_slice_train = X_train.shape[0]
n_slice_val = X_val.shape[0]
batch_size = 1
n_epochs = 20
# Checkpoint definition
csv_logger = CSVLogger('./log.out', append=True, separator=';')
earlystopping = EarlyStopping(monitor='val_loss',
min_delta=0,
patience=2,
verbose=0,
mode='auto')
filepath="model_2D_v5-{epoch:02d}-{val_loss:.2f}.hdf5"
modelcheckpoint = ModelCheckpoint(filepath,
monitor="val_loss",
verbose = 1,
save_best_only=True)
callbacks_list = [csv_logger, earlystopping, modelcheckpoint]
# Train model
results = model.fit_generator(XYaugmentGenerator(X_train,Y_train,seed, batch_size),
steps_per_epoch = np.ceil(float(n_slice_train)/float(batch_size)),
validation_data = val_datagen.flow(X_val,Y_val,batch_size),
validation_steps = np.ceil(float(n_slice_val)/float(batch_size)),
shuffle = True,
epochs = n_epochs,
callbacks = callbacks_list)
Traceback:
Epoch 1/20
/usr/local/lib/python3.7/dist-packages/keras_preprocessing/image/numpy_array_iterator.py:136: UserWarning:
NumpyArrayIterator is set to use the data format convention "channels_last" (channels on axis 3), i.e. expected either 1, 3, or 4 channels on axis 3. However, it was passed an array with shape (1947, 352, 352, 2) (2 channels).
1947/1947 [==============================] - 204s 105ms/step - loss: 0.5873 - binary_accuracy: 0.8081 - val_loss: 0.3755 - val_binary_accuracy: 0.9714
Epoch 00001: val_loss improved from inf to 0.37554, saving model to model_2D_v4-01-0.38.hdf5
Epoch 2/20
1947/1947 [==============================] - 184s 94ms/step - loss: 0.2442 - binary_accuracy: 0.9804 - val_loss: 0.1517 - val_binary_accuracy: 0.9843
Epoch 00002: val_loss improved from 0.37554 to 0.15170, saving model to model_2D_v4-02-0.15.hdf5
Epoch 3/20
1947/1947 [==============================] - 182s 94ms/step - loss: 0.1039 - binary_accuracy: 0.9907 - val_loss: 0.0701 - val_binary_accuracy: 0.9929
Epoch 00003: val_loss improved from 0.15170 to 0.07014, saving model to model_2D_v4-03-0.07.hdf5
Epoch 4/20
1947/1947 [==============================] - 183s 94ms/step - loss: 0.0492 - binary_accuracy: 0.9950 - val_loss: 0.0311 - val_binary_accuracy: 0.9950
Epoch 00004: val_loss improved from 0.07014 to 0.03110, saving model to model_2D_v4-04-0.03.hdf5
Epoch 5/20
1947/1947 [==============================] - 182s 94ms/step - loss: 0.0260 - binary_accuracy: 0.9957 - val_loss: 0.0202 - val_binary_accuracy: 0.9960
Epoch 00005: val_loss improved from 0.03110 to 0.02020, saving model to model_2D_v4-05-0.02.hdf5
Epoch 6/20
1947/1947 [==============================] - 183s 94ms/step - loss: 0.0153 - binary_accuracy: 0.9959 - val_loss: 0.0412 - val_binary_accuracy: 0.9943
Epoch 00006: val_loss did not improve from 0.02020
Epoch 7/20
1947/1947 [==============================] - 183s 94ms/step - loss: 0.0104 - binary_accuracy: 0.9961 - val_loss: 0.0200 - val_binary_accuracy: 0.9955
Epoch 00007: val_loss improved from 0.02020 to 0.02004, saving model to model_2D_v4-07-0.02.hdf5
Epoch 8/20
1947/1947 [==============================] - 186s 96ms/step - loss: 0.0075 - binary_accuracy: 0.9963 - val_loss: 0.0048 - val_binary_accuracy: 0.9958
Epoch 00008: val_loss improved from 0.02004 to 0.00476, saving model to model_2D_v4-08-0.00.hdf5
Epoch 9/20
1947/1947 [==============================] - 193s 99ms/step - loss: 0.0060 - binary_accuracy: 0.9964 - val_loss: 6.1250e-04 - val_binary_accuracy: 0.9962
Epoch 00009: val_loss improved from 0.00476 to 0.00061, saving model to model_2D_v4-09-0.00.hdf5
Epoch 10/20
1947/1947 [==============================] - 192s 99ms/step - loss: 0.0051 - binary_accuracy: 0.9965 - val_loss: 0.0119 - val_binary_accuracy: 0.9961
Epoch 00010: val_loss did not improve from 0.00061
Epoch 11/20
1947/1947 [==============================] - 192s 99ms/step - loss: 0.0047 - binary_accuracy: 0.9966 - val_loss: 0.0083 - val_binary_accuracy: 0.9968
Epoch 00011: val_loss did not improve from 0.00061```

Classification with PyTorch is much slower than Tensorflow: 42min vs. 11min

I have been a Tensorflow user and start to use Pytorch. As a trial, I implemented simple classification tasks with both libraries.
However, PyTorch is much slower than Tensorflow: Pytorch takes 42min while TensorFlow 11min. I referred to PyTorch official Tutorial, and made only little change from it.
Could anyone share some advice for this problem?
Here is the summary what I tried.
environment: Colab Pro+
dataset: Cifar10
classifier: VGG16
optimizer: Adam
loss: crossentropy
batch size: 32
PyTorch
Code:
import torch, torchvision
from torch import nn
from torchvision import transforms, models
from tqdm import tqdm
import time, copy
trans = transforms.Compose([transforms.Resize((224, 224)),
transforms.ToTensor(),])
data = {phase: torchvision.datasets.CIFAR10('./', train = (phase=='train'), transform=trans, download=True) for phase in ['train', 'test']}
dataloaders = {phase: torch.utils.data.DataLoader(data[phase], batch_size=32, shuffle=True) for phase in ['train', 'test']}
def train_model(model, criterion, optimizer, dataloaders, device, num_epochs=5):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'test']:
if phase == 'train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in tqdm(iter(dataloaders[phase])):
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / len(dataloaders[phase])
epoch_acc = running_corrects.double() / len(dataloaders[phase])
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
# deep copy the model
if phase == 'test' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
# load best model weights
model.load_state_dict(best_model_wts)
return model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = models.vgg16(pretrained=False)
model = model.to(device)
model = train_model(model=model,
criterion=nn.CrossEntropyLoss(),
optimizer=torch.optim.Adam(model.parameters(), lr=0.001),
dataloaders=dataloaders,
device=device,
)
Result:
Epoch 0/4
----------
0%| | 0/1563 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
100%|██████████| 1563/1563 [07:50<00:00, 3.32it/s]
train Loss: 75.5199 Acc: 3.2809
100%|██████████| 313/313 [00:38<00:00, 8.11it/s]
test Loss: 73.7274 Acc: 3.1949
Epoch 1/4
----------
100%|██████████| 1563/1563 [07:50<00:00, 3.33it/s]
train Loss: 73.8162 Acc: 3.2514
100%|██████████| 313/313 [00:38<00:00, 8.13it/s]
test Loss: 73.6114 Acc: 3.1949
Epoch 2/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7741 Acc: 3.1369
100%|██████████| 313/313 [00:38<00:00, 8.11it/s]
test Loss: 73.5873 Acc: 3.1949
Epoch 3/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7493 Acc: 3.1331
100%|██████████| 313/313 [00:38<00:00, 8.12it/s]
test Loss: 73.6191 Acc: 3.1949
Epoch 4/4
----------
100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]
train Loss: 73.7289 Acc: 3.1939
100%|██████████| 313/313 [00:38<00:00, 8.13it/s]test Loss: 73.5955 Acc: 3.1949
Training complete in 42m 22s
Best val Acc: 3.194888
Tensorflow
Code:
import tensorflow_datasets as tfds
from tensorflow.keras import applications, models
import tensorflow as tf
import time
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])
def resize(ip):
image = ip['image']
label = ip['label']
image = tf.image.resize(image, (224, 224))
image = tf.expand_dims(image,0)
label = tf.one_hot(label,10)
label = tf.expand_dims(label,0)
return (image, label)
ds_train_ = ds_train.map(resize)
ds_test_ = ds_test.map(resize)
model = applications.vgg16.VGG16(input_shape = (224, 224, 3), weights=None, classes=10)
model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics= ['accuracy'])
batch_size = 32
since = time.time()
history = model.fit(ds_train_,
batch_size = batch_size,
steps_per_epoch = len(ds_train)//batch_size,
epochs = 5,
validation_steps = len(ds_test),
validation_data = ds_test_,
shuffle = True,)
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60 ))
Result:
Epoch 1/5
1562/1562 [==============================] - 125s 69ms/step - loss: 36.9022 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000
Epoch 2/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3031 - accuracy: 0.1005 - val_loss: 2.3033 - val_accuracy: 0.1000
Epoch 3/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3035 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000
Epoch 4/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3038 - accuracy: 0.1024 - val_loss: 2.3030 - val_accuracy: 0.1000
Epoch 5/5
1562/1562 [==============================] - 129s 83ms/step - loss: 2.3028 - accuracy: 0.1024 - val_loss: 2.3033 - val_accuracy: 0.1000
Training complete in 11m 23s
It is because in your tensorflow codes, the data pipeline is feeding a batch of 1 image into the model per step instead of a batch of 32 images.
Passing batch_size into model.fit does not really control the batch size when the data is in the form of datasets. The reason why it showed a seemingly correct steps per epoch from the log is that you passed steps_per_epoch into model.fit.
To correctly set the batch size:
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])
def resize(ip):
image = ip['image']
label = ip['label']
image = tf.image.resize(image, (224, 224))
label = tf.one_hot(label,10)
return (image, label)
train_size=len(ds_train)
test_size=len(ds_test)
ds_train_ = ds_train.shuffle(train_size).batch(32).map(resize)
ds_test_ = ds_test.shuffle(test_size).batch(32).map(resize)
model.fit call:
history = model.fit(ds_train_,
epochs = 1,
validation_data = ds_test_)
After fixed the problem, tensorflow got similar speed performance with pytorch. In my machine, pytorch took ~27 minutes per epoch while tensorflow took ~24 minutes per epoch.
According to the benchmarks from NVIDIA, pytorch and tensorflow had similar speed performance in most popular deep learning applications with real-world datasets and problem size. (Reference: https://developer.nvidia.com/deep-learning-performance-training-inference)

Tensorflow 2 Metrics produce wrong results with 2 GPUs

I took this piece of code from tensorflow documentation about distributed training with custom loop https://www.tensorflow.org/tutorials/distribute/custom_training and I just fixed it to work with the tf.keras.metrics.AUC and run it with 2 GPUS (2 Nvidia V100 from a DGX machine).
# Import TensorFlow
import tensorflow as tf
# Helper libraries
import numpy as np
print(tf.__version__)
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# Adding a dimension to the array -> new shape == (28, 28, 1)
# We are doing this because the first layer in our model is a convolutional
# layer and it requires a 4D input (batch_size, height, width, channels).
# batch_size dimension will be added later on.
train_images = train_images[..., None]
test_images = test_images[..., None]
# One hot
train_labels = tf.keras.utils.to_categorical(train_labels, 10)
test_labels = tf.keras.utils.to_categorical(test_labels, 10)
# Getting the images in [0, 1] range.
train_images = train_images / np.float32(255)
test_images = test_images / np.float32(255)
# If the list of devices is not specified in the
# `tf.distribute.MirroredStrategy` constructor, it will be auto-detected.
GPUS = [0, 1]
devices = ["/gpu:" + str(gpu_id) for gpu_id in GPUS]
strategy = tf.distribute.MirroredStrategy(devices=devices)
print ('Number of devices: {}'.format(strategy.num_replicas_in_sync))
BUFFER_SIZE = len(train_images)
BATCH_SIZE_PER_REPLICA = 64
GLOBAL_BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
EPOCHS = 10
train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels)).shuffle(BUFFER_SIZE).batch(GLOBAL_BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(GLOBAL_BATCH_SIZE)
train_dist_dataset = strategy.experimental_distribute_dataset(train_dataset)
test_dist_dataset = strategy.experimental_distribute_dataset(test_dataset)
def create_model():
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(64, 3, activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
return model
with strategy.scope():
# Set reduction to `none` so we can do the reduction afterwards and divide by
# global batch size.
loss_object = tf.keras.losses.CategoricalCrossentropy(
from_logits=True,
reduction=tf.keras.losses.Reduction.NONE)
def compute_loss(labels, predictions):
per_example_loss = loss_object(labels, predictions)
return tf.nn.compute_average_loss(per_example_loss, global_batch_size=GLOBAL_BATCH_SIZE)
with strategy.scope():
test_loss = tf.keras.metrics.Mean(name='test_loss')
train_accuracy = tf.keras.metrics.CategoricalAccuracy(
name='train_accuracy')
test_accuracy = tf.keras.metrics.CategoricalAccuracy(
name='test_accuracy')
train_auc = tf.keras.metrics.AUC(name='train_auc')
test_auc = tf.keras.metrics.AUC(name='test_auc')
# model, optimizer, and checkpoint must be created under `strategy.scope`.
with strategy.scope():
model = create_model()
optimizer = tf.keras.optimizers.Adam()
def train_step(inputs):
images, labels = inputs
with tf.GradientTape() as tape:
predictions = model(images, training=True)
loss = compute_loss(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_accuracy(labels, predictions)
train_auc(labels, predictions)
return loss
def test_step(inputs):
images, labels = inputs
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_loss.update_state(t_loss)
test_accuracy(labels, predictions)
test_auc(labels, predictions)
# `run` replicates the provided computation and runs it
# with the distributed input.
#tf.function
def distributed_train_step(dataset_inputs):
per_replica_losses = strategy.run(train_step, args=(dataset_inputs,))
return strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses,
axis=None)
#tf.function
def distributed_test_step(dataset_inputs):
return strategy.run(test_step, args=(dataset_inputs,))
for epoch in range(EPOCHS):
# TRAIN LOOP
total_loss = 0.0
num_batches = 0
for x in train_dist_dataset:
total_loss += distributed_train_step(x)
num_batches += 1
train_loss = total_loss / num_batches
# TEST LOOP
for x in test_dist_dataset:
distributed_test_step(x)
template = ("Epoch {}, Loss: {}, Accuracy: {}, AUC: {},"
"Test Loss: {}, Test Accuracy: {}, Test AUC: {}")
print (template.format(epoch+1,
train_loss, train_accuracy.result()*100, train_auc.result()*100,
test_loss.result(), test_accuracy.result()*100, test_auc.result()*100))
test_loss.reset_states()
train_accuracy.reset_states()
test_accuracy.reset_states()
train_auc.reset_states()
test_auc.reset_states()
The problem is that AUC's evaluation is definitely wrong cause it exceeds its range (should be from 0-100) and i get theese results by running the above code for one time:
Epoch 1, Loss: 1.8061423301696777, Accuracy: 66.00833892822266, AUC: 321.8688659667969,Test Loss: 1.742477536201477, Test Accuracy: 72.0999984741211, Test AUC: 331.33709716796875
Epoch 2, Loss: 1.7129968404769897, Accuracy: 74.9816665649414, AUC: 337.37017822265625,Test Loss: 1.7084736824035645, Test Accuracy: 75.52999877929688, Test AUC: 337.1878967285156
Epoch 3, Loss: 1.643971562385559, Accuracy: 81.83333587646484, AUC: 355.96209716796875,Test Loss: 1.6072628498077393, Test Accuracy: 85.3499984741211, Test AUC: 370.603759765625
Epoch 4, Loss: 1.5887378454208374, Accuracy: 87.27833557128906, AUC: 373.6204528808594,Test Loss: 1.5906082391738892, Test Accuracy: 87.13999938964844, Test AUC: 371.9998474121094
Epoch 5, Loss: 1.581775426864624, Accuracy: 88.0, AUC: 373.9468994140625,Test Loss: 1.5964380502700806, Test Accuracy: 86.68000030517578, Test AUC: 371.0227355957031
Epoch 6, Loss: 1.5764907598495483, Accuracy: 88.49166870117188, AUC: 375.2404479980469,Test Loss: 1.5832056999206543, Test Accuracy: 87.94000244140625, Test AUC: 373.41998291015625
Epoch 7, Loss: 1.5698528289794922, Accuracy: 89.19166564941406, AUC: 376.473876953125,Test Loss: 1.5770654678344727, Test Accuracy: 88.58000183105469, Test AUC: 375.5516662597656
Epoch 8, Loss: 1.564456820487976, Accuracy: 89.71833801269531, AUC: 377.8564758300781,Test Loss: 1.5792100429534912, Test Accuracy: 88.27000427246094, Test AUC: 373.1791687011719
Epoch 9, Loss: 1.5612279176712036, Accuracy: 90.02000427246094, AUC: 377.9949645996094,Test Loss: 1.5729509592056274, Test Accuracy: 88.9800033569336, Test AUC: 375.5257263183594
Epoch 10, Loss: 1.5562015771865845, Accuracy: 90.54000091552734, AUC: 378.9789123535156,Test Loss: 1.56815767288208, Test Accuracy: 89.3499984741211, Test AUC: 375.8636474609375
Accuracy is ok but it seems that it's the only one metric that behaves nice. I tried other metrics too but they are not evaluated correctly. It seems that the problems come when using more than one GPU, cause when I run this code with one GPU it produce the right results.
When you use distributed strategy, the metric must be constructed and used inside the strategy.scope() block. So when you want to call the metric.result() method, remember to put it inside the with strategy.scope() block.
with strategy.scope():
print(metric.result())

Jacobian matrix of logits with respect to image using tf.GradientTape

I am trying to find the Jacobian of logits with respect to input but I do get None and I could not figure it why.
Let'say I have a model, I trained it and saved it.
import tensorflow as tf
print("TensorFlow version: ", tf.__version__)
tf.keras.backend.set_floatx('float64')
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
#Normalize the images, between 0-1
x_train, x_test = x_train / 255.0, x_test / 255.0
# Add a channels dimension
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]
print(x_train.shape)
#(60000, 28, 28, 1)
print(y_train.shape)
(60000,)
print(x_test.shape)
#(10000, 28, 28, 1)
print(y_test.shape)
#(10000,)
num_class = 10
# Convert labels to one hot encoded vectors.
y_train_oh, y_test_oh = tf.keras.utils.to_categorical(y_train, num_classes= num_class, dtype='float32'), tf.keras.utils.to_categorical(y_test, num_classes= num_class, dtype='float32')
print(y_train_oh.shape)
#(60000, 10)
print(y_test_oh.shape)
#(10000, 10)
batch_size = 32
train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train_oh)).shuffle(10000).batch(batch_size)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test_oh)).batch(batch_size)
IMG_SIZE = (28, 28, 1)
input_img = tf.keras.layers.Input(shape=IMG_SIZE)
hidden_layer_1 = tf.keras.layers.Conv2D(filters = 16, kernel_size = (3, 3), strides=(1, 1), padding='same', activation=tf.nn.relu)(input_img)
hidden_layer_2 = tf.keras.layers.Conv2D(filters = 32, kernel_size = (3, 3), strides=(2, 2), padding='same', activation=tf.nn.relu)(hidden_layer_1)
hidden_layer_3 = tf.keras.layers.Conv2D(filters = 64, kernel_size = (3, 3), strides=(2, 2), padding='same', activation=tf.nn.relu)(hidden_layer_2)
flatten_layer = tf.keras.layers.Flatten()(hidden_layer_3)
output_img = tf.keras.layers.Dense(num_class)(flatten_layer)
#NO SOFTMAX LAYER IN THE END, WE WILL DO IT LATER
#predictions = tf.nn.softmax(logits)
model = tf.keras.Model(input_img, output_img)
model.summary()
loss_object = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
# This function accepts one-hot encoded labels
optimizer = tf.keras.optimizers.Adam()
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.CategoricalAccuracy(name='train_accuracy')
test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.CategoricalAccuracy(name='test_accuracy')
#tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
# training=True is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = model(images, training=True)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, predictions)
#tf.function
def test_step(images, labels):
# training=False is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
# Train the model for 15 epochs.
num_epochs = 15
train_loss_results = []
train_accuracy_results = []
test_loss_results = []
test_accuracy_results = []
for epoch in range(num_epochs):
# Reset the metrics at the start of the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()
for images, labels in train_ds:
train_step(images, labels)
for test_images, test_labels in test_ds:
test_step(test_images, test_labels)
train_loss_results.append(train_loss.result())
train_accuracy_results.append(train_accuracy.result())
test_loss_results.append(test_loss.result())
test_accuracy_results.append(test_accuracy.result())
template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
print(template.format(epoch+1,
train_loss.result(),
train_accuracy.result()*100,
test_loss.result(),
test_accuracy.result()*100))
tf.keras.models.save_model(model = model, filepath = 'model.h5', overwrite=True, include_optimizer=True)
# Epoch 1, Loss: 0.1654163608489558, Accuracy: 95.22, Test Loss: 0.061988271648914496, Test Accuracy: 97.88
# Epoch 2, Loss: 0.060983153790452826, Accuracy: 98.15833333333333, Test Loss: 0.044874734015780696, Test Accuracy: 98.53
# Epoch 3, Loss: 0.042541984771347297, Accuracy: 98.69, Test Loss: 0.042536806688480366, Test Accuracy: 98.57000000000001
# Epoch 4, Loss: 0.03330485398344463, Accuracy: 98.98166666666667, Test Loss: 0.039308084282613225, Test Accuracy: 98.64
# Epoch 5, Loss: 0.024959077225852524, Accuracy: 99.205, Test Loss: 0.04370295960736327, Test Accuracy: 98.67
# Epoch 6, Loss: 0.020565333928674955, Accuracy: 99.33666666666666, Test Loss: 0.04245114839809372, Test Accuracy: 98.69
# Epoch 7, Loss: 0.01639637468442185, Accuracy: 99.47666666666667, Test Loss: 0.04561551753656099, Test Accuracy: 98.72999999999999
# Epoch 8, Loss: 0.013642370500962534, Accuracy: 99.56333333333333, Test Loss: 0.04333075060614142, Test Accuracy: 98.83
# Epoch 9, Loss: 0.010697861799085589, Accuracy: 99.655, Test Loss: 0.05918524164135248, Test Accuracy: 98.48
# Epoch 10, Loss: 0.011164671695055153, Accuracy: 99.61666666666666, Test Loss: 0.05492968221334442, Test Accuracy: 98.64
# Epoch 11, Loss: 0.008642793950046499, Accuracy: 99.69833333333334, Test Loss: 0.05367191278261649, Test Accuracy: 98.74000000000001
# Epoch 12, Loss: 0.00788155746288626, Accuracy: 99.74499999999999, Test Loss: 0.06254112380584512, Test Accuracy: 98.68
# Epoch 13, Loss: 0.006521700676742724, Accuracy: 99.77000000000001, Test Loss: 0.06381602274510409, Test Accuracy: 98.7
# Epoch 14, Loss: 0.007104389384812846, Accuracy: 99.75166666666667, Test Loss: 0.05241271737958395, Test Accuracy: 98.87
# Epoch 15, Loss: 0.006479600550850722, Accuracy: 99.77833333333334, Test Loss: 0.06816933916442823, Test Accuracy: 98.74000000000001
You can find the saved model in h5 format in this link, if you do not want to train it.
It works well so far, I can do predictions on some samples:
predictions = model(mnist_twos, training=False)
for i, logits in enumerate(predictions):
class_idx = tf.argmax(logits).numpy()
p = tf.nn.softmax(logits)[class_idx] #probabilities
print("Example {} prediction: {} ({:4.1f}%)".format(i, class_idx, 100*p))
Example 0 prediction: 2 (100.0%)
Example 1 prediction: 2 (100.0%)
Example 2 prediction: 2 (100.0%)
Example 3 prediction: 2 (100.0%)
Example 4 prediction: 2 (100.0%)
Example 5 prediction: 2 (100.0%)
Example 6 prediction: 2 (100.0%)
Example 7 prediction: 2 (100.0%)
Example 8 prediction: 2 (100.0%)
Example 9 prediction: 2 (100.0%)
What I want to do is now to find the jacobian matrix of logits with respect to input image. Since I have 10 selected images, I will have a Jacobian matrix of size (10, 28, 28, 1) since the shape of the MNIST sample is (28, 28, 1). I can do this with Tensorflow 1.0 like:
for i in range(n_class):
if i==0:
j = tf.gradients(tf.reshape(logits, (-1,))[i], X_p)
else:
j = tf.concat([j, tf.gradients(tf.reshape(logits, (-1,))[i], X_p)],axis=0)
where X_p is the placeholder for the image I am feeding in.
X_p = tf.placeholder(shape=[28, 28, 1], dtype=tf.float32)
However, I am currently using Tensorflow 2.0 and I cannot make it work using tf.GradientTape. It always ends up None. This seems to be a common problem for everyone and I followed the examples here but to no avail. Can someone help me about it?
Please check the batch_jacobian method of the GradinetTape. https://www.tensorflow.org/api_docs/python/tf/GradientTape#batch_jacobian
Convert your input to the tf variable if you are getting None gradients even after batch_jacobian.