Use different activation functions on forward- and backward pass in Keras - tensorflow

I have a non-differentiable activation function I want to use on the forward-pass. On the backward-pass I want to use the ReLU activation function. I started with a small example, but unfortunately can not find approaches to incur the second activation function into my keras code. Probably something like this isn't even possible with keras? I would be very happy if someone has an idea how to approach this and can give me a hint. Thank you!
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
test_images_discrete = test_images
train_images = train_images / 255.0
test_images = test_images / 255.0
#my_test_model
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(filters=16, kernel_size=5, strides=2, padding="valid",
activation="relu", input_shape=(28,28,1), use_bias=True),
tf.keras.layers.Conv2D(filters=32, kernel_size=3, strides=2, padding="valid",
activation="relu", use_bias=True),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(32, activation="relu"),
tf.keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=["accuracy"])
model.fit(train_images,
train_labels,
epochs=5,
validation_data=(test_images, test_labels))

Related

Why is VGG-16 performing poor on CIFAR-10 dataset?

I am trying to implement VGG-16 Convolutional Neural Network for the CIFAR-10 dataset with Tensorflow. But I am getting near about 10% of training accuracy. What is wrong with my code?
import tensorflow as tf
from tensorflow.keras import datasets
(X_train, y_train), (X_test, y_test) = datasets.cifar10.load_data()
X_train.shape, y_train.shape, X_test.shape, y_test.shape
X_train = X_train/255
X_test = X_test/255
y_train = y_train.reshape(-1,)
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(filters=64, kernel_size=(3,3), activation="relu", input_shape=
(32,32,3),padding="same"),
tf.keras.layers.Conv2D(filters=64, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.MaxPool2D(pool_size=(2,2), strides=(2,2)),
tf.keras.layers.Conv2D(filters=128, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.Conv2D(filters=128, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.MaxPool2D(pool_size=(2,2), strides=(2,2)),
tf.keras.layers.Conv2D(filters=256, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.Conv2D(filters=256, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.Conv2D(filters=256, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.MaxPool2D(pool_size=(2,2), strides=(2,2)),
tf.keras.layers.Conv2D(filters=512, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.Conv2D(filters=512, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.Conv2D(filters=512, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.MaxPool2D(pool_size=(2,2), strides=(2,2)),
tf.keras.layers.Conv2D(filters=512, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.Conv2D(filters=512, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.Conv2D(filters=512, kernel_size=(3,3), activation="relu",
padding="same"),
tf.keras.layers.MaxPool2D(pool_size=(2,2), strides=(2,2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(4096, activation="relu"),
tf.keras.layers.Dense(4096, activation="relu"),
tf.keras.layers.Dense(10, activation="softmax")
])
model.summary()
model.compile(loss=tf.keras.losses.sparse_categorical_crossentropy,
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"])
X_train[0].shape, y_train[0].shape
model.fit(X_train, y_train, epochs = 100)
It seems like you haven't found the right training schedule yet.
If you don't mind changing the model a bit, I recommend using Batchnorm after each convolution layer. In general, a model is easier to train with Batchnorm.
Furthermore, did you decrease the learning rate after a certain amount of iterations? At some point, a too-large learning rate may not decrease your training error anymore. The ResNet, for example, is trained with an initial learning rate of 0.1 for 100 epochs then with 0.01 for another 50 epochs and 0.001 for another 50 epochs.

TF2 code 10 times slower than equivalent PyTorch code for a Conv1D network

I've been trying to translate some PyTorch code to TensorFlow 2, but the TF2 code is around 10 times slower. I've tried looking at where this might come from, and as far as I can tell it comes from the tape.gradient call (performance was the same with keras' .fit function). I've tried to use different data loaders, ways of declaring the model, installations, etc... and the results have been consistent.
Any explanation / solution as to why this is happening would be much appreciated.
Here is a minimalist version of the TF2 code:
import time
import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
# Generate some fake data
train_labels = np.random.randint(10, size=1000)
train_data = np.random.rand(1000, 120, 18, 1)
train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_labels))
train_dataset = train_dataset.batch(256)
# Create a small model
model = tf.keras.Sequential([
layers.Conv1D(64, kernel_size=7, strides=3, padding="same", activation="relu"),
layers.Conv1D(64, kernel_size=5, strides=2, padding="same", activation="relu"),
layers.Conv1D(128, kernel_size=5, strides=2, padding="same", activation="relu"),
layers.Conv1D(128, kernel_size=3, strides=1, padding="same", activation="relu"),
layers.Conv1D(128, kernel_size=3, strides=1, padding="same", activation="relu"),
layers.Conv1D(256, kernel_size=1, strides=1, padding="same", activation="relu"),
layers.GlobalAveragePooling2D(),
layers.Flatten(),
layers.Dense(128, use_bias=True, activation="relu"),
layers.Dense(32, use_bias=True, activation="relu"),
layers.Dense(1, activation='sigmoid', use_bias=True),
])
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3, decay=5e-4)
#tf.function
def train_step(data_batch, label_batch):
with tf.GradientTape() as tape:
y_pred = model(data_batch)
loss = tf.keras.losses.MSE(labels_batch, y_pred)
gradients = tape.gradient(loss, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
step_times = []
for epoch in range(20):
for data_batch, labels_batch in train_dataset:
step_start_time = time.perf_counter()
train_step(data_batch, labels_batch)
if epoch != 0:
step_times.append(time.perf_counter()-step_start_time)
print(f"Average training step time: {np.mean(step_times):.3f}s.")
And the PyTorch equivalent:
import time
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
torch.backends.cudnn.benchmark = True
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Generate some fake data
train_labels = np.random.randint(10, size=1000)
train_data = np.random.rand(1000, 18, 120)
# Create a small model
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv1d(18, 64, kernel_size=7, stride=3, padding=3)
self.conv2 = nn.Conv1d(64, 64, kernel_size=5, stride=2, padding=2)
self.conv3 = nn.Conv1d(64, 128, kernel_size=5, stride=2, padding=2)
self.conv4 = nn.Conv1d(128, 128, kernel_size=3, stride=1, padding=1)
self.conv5 = nn.Conv1d(128, 128, kernel_size=3, stride=1, padding=1)
self.conv6 = nn.Conv1d(128, 256, kernel_size=3, stride=1, padding=1)
self.fc1 = nn.Linear(256, 128)
self.fc2 = nn.Linear(128, 32)
self.fc3 = nn.Linear(32, 1)
def forward(self, inputs):
x = F.relu(self.conv1(inputs))
x = F.relu(self.conv2(x))
x = F.relu(self.conv3(x))
x = F.relu(self.conv4(x))
x = F.relu(self.conv5(x))
x = F.relu(self.conv6(x))
x = x.mean(2)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = torch.sigmoid(self.fc3(x))
return x
model = Model()
model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=5e-4)
loss_fn = torch.nn.MSELoss()
batch_size = 256
train_steps_per_epoch = train_data.shape[0] // batch_size
step_times = []
for epoch in range(20):
for step in range(train_steps_per_epoch):
batch_start, batch_end = step * batch_size, (step+1) * batch_size
data_batch = torch.FloatTensor(train_data[batch_start:batch_end]).to(device)
labels_batch = torch.FloatTensor(train_labels[batch_start:batch_end]).to(device)
step_start_time = time.perf_counter()
optimizer.zero_grad()
y_pred = model(data_batch)
loss = loss_fn(labels_batch, torch.squeeze(y_pred))
loss.backward()
optimizer.step()
if epoch != 0:
step_times.append(time.perf_counter()-step_start_time)
print(f"Average training step time: {np.mean(step_times):.3f}s.")
You're using tf.GradientTape correctly, but both your models and data are different in the snippets you provided.
Here is the TF code that uses the same data and model architecture as your Pytorch model.
import time
import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
# Generate some fake data
train_labels = np.random.randint(10, size=1000)
train_data = np.random.rand(1000, 120, 18)
train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_labels))
train_dataset = train_dataset.batch(256)
model = tf.keras.Sequential([
layers.Conv1D(64, kernel_size=7, strides=3, padding="same", activation="relu"),
layers.Conv1D(64, kernel_size=5, strides=2, padding="same", activation="relu"),
layers.Conv1D(128, kernel_size=5, strides=2, padding="same", activation="relu"),
layers.Conv1D(128, kernel_size=3, strides=1, padding="same", activation="relu"),
layers.Conv1D(128, kernel_size=3, strides=1, padding="same", activation="relu"),
layers.Conv1D(256, kernel_size=3, strides=1, padding="same", activation="relu"),
layers.GlobalAveragePooling1D(),
layers.Dense(128, use_bias=True, activation="relu"),
layers.Dense(32, use_bias=True, activation="relu"),
layers.Dense(1, activation='sigmoid', use_bias=True),
])
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3, decay=5e-4)
#tf.function
def train_step(data_batch, label_batch, model):
with tf.GradientTape() as tape:
y_pred = model(data_batch, training=True)
loss = tf.keras.losses.MSE(labels_batch, y_pred)
gradients = tape.gradient(loss, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
step_times = []
for epoch in range(20):
for data_batch, labels_batch in train_dataset:
step_start_time = time.perf_counter()
train_step(data_batch, labels_batch, model)
if epoch != 0:
step_times.append(time.perf_counter()-step_start_time)
print(f"Average training step time: {np.mean(step_times):.3f}s.")
So, in reality, TF is 3 times faster than Pytorch: 0.035s vs 0.112s

How can I pass logits to sigmoid_cross_entropy_with_logits before I fit and predict model?

Since I need to train a model with multiple labels, I need to use loss function tf.nn.sigmoid_cross_entropy_with_logits. This function has two parameters: logits and loss.
Is parameter logitsis the value of predicted y? How can I pass this value before I compile model? I cannot predict y before I compile and fit model, right?
This is my code:
import tensorflow as tf
from tensorflow import keras
model = keras.Sequential([keras.layers.Dense(50, activation='tanh', input_shape=[100]),
keras.layers.Dense(30, activation='relu'),
keras.layers.Dense(50, activation='tanh'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(8)])
model.compile(optimizer='rmsprop',
loss=tf.nn.sigmoid_cross_entropy_with_logits(logits=y_pred), labels=y), # <---How to figure out y_pred here?
metrics=['accuracy'])
model.fit(x, y, epochs=10, batch_size=32)
y_pred = model.predict(x) # <--- Now I got y_pred after compile, fit and predict
I'm using tensorflow v2.1.0
These arguments (labels and logits) are passed to the loss function within Keras' implementation. To make your code work do like this:
import tensorflow as tf
from tensorflow import keras
def loss_fn(y_true, y_pred):
return tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true, logits=y_pred)
model = keras.Sequential([keras.layers.Dense(50, activation='tanh', input_shape=[100]),
keras.layers.Dense(30, activation='relu'),
keras.layers.Dense(50, activation='tanh'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(8)])
model.compile(optimizer='rmsprop',
loss=loss_fn,
metrics=['accuracy'])
x = np.random.normal(0, 1, (64, 100))
y = np.random.randint(0, 2, (64, 8)).astype('float32')
model.fit(x, y, epochs=10, batch_size=32)
y_pred = model.predict(x)
The suggested way, though, is to use Keras' loss implementation instead. In your case it would be:
model = keras.Sequential([keras.layers.Dense(50, activation='tanh', input_shape=[100]),
keras.layers.Dense(30, activation='relu'),
keras.layers.Dense(50, activation='tanh'),
keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(8)])
model.compile(optimizer='rmsprop',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
x = np.random.normal(0, 1, (64, 100))
y = np.random.randint(0, 2, (64, 8)).astype('float32')
model.fit(x, y, epochs=10, batch_size=32)
y_pred = model.predict(x)

Keras at TF2 metrics not added

I'm using Tensorflow 2.0 nightly build, on google colab.
I made simple CNN model, and than compiled it, and fit it.
Here's code:
model = tf.keras.models.Sequential([
tf.keras.layers.Reshape((28, 28, 1)),
tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), padding='SAME',
activation=tf.nn.relu),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), padding='SAME',
activation=tf.nn.relu),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(10),
tf.keras.layers.Softmax()
])
optimizer = tf.keras.optimizers.Adam(0.001)
model.compile(optimizer=optimizer,
loss=tf.keras.losses.CategoricalCrossentropy(),
matrics=['accuracy'])
log_dir='./logs'
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir,
histogram_freq=2,
write_images=True,
update_freq='batch',
profile_batch=0)
model.fit(x=x_train, y=y_train, batch_size=100, epochs=15,
callbacks=[tensorboard_callback], validation_data=(x_test, y_test))
And it don't give me accuracy information.
I evaluated model, and it supposed to give me accuracy information, but it only gives me loss information.
I printed model.metrics, and it was just [].
Is it bug? Or I missed something?
You misspelled metrics as matrics. Change matrics=['accuracy'] to metrics=['accuracy'].

Trying to understand how to add more hidden layers into a neural network using a for loop

I'm trying to figure out how would I make a simple for loop to add more hidden layers to this neural network for the basic tensorflow neural network from the code below:
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
normally I would go ahead and change the following code:
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
and add more layers.Dense as follow:
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
Is it possible to create a simple for loop where I can input the number of hidden layers I want?
First, create an list object and add a Flatten layer to it.
layers = list()
layers.add( tf.keras.layers.Flatten() )
Now, we use a loop statement to add n number of Dense layers.
units = [ 64 , 128 , 256 ]
for i in range( n ):
layers.add( tf.keras.layers.Dense( units[i] , activation='relu' ) )
Where n could be any positive integer.