Can I use the output of tf.keras.utils.image_dataset_from_directory to train an autoencoder? - tensorflow

To put it simply, I'd like to be able to use a keras dataset created from a local image directory to train an autoencoder. To clarify, this is a model that approximates the Identity function for images : ideally, the output is exactly equal to the input.
The dataset is too large to fit in memory, so converting the dataset to a numpy array with np.concatenate will not help me here.
Or in other words, I'd like an Identity image dataset, where the label for each image in the dataset is exactly equal to the image itself.
Here's my (non-working) sample code:
train_ds, validate_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
labels=None,
validation_split=0.1,
subset="both",
shuffle=True,
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size,
crop_to_aspect_ratio=True)
history = autoencoder.fit(
x=train_ds,
y=train_ds,
validation_data=(validate_ds, validate_ds),
epochs=epochs,
batch_size=16
)
The image_dataset_from_directory function gives me a dataset of images with no labels. So far so good.
The second command fails with the error message:
ValueError: `y` argument is not supported when using dataset as input.
On the other hand, if I exclude the y variable I get this error:
ValueError: Target data is missing. Your model was compiled with loss=binary_crossentropy, and therefore expects target data to be provided in `fit()`.
Which is not at all surprising, because there are NO labels, as I requested none. But yet it won't let me use the dataset as the labels which is what I need to do.
Any help would be appreciated.

While there are ways to modify the dataset, I think the best option is to write a custom model class. This is modified from the official tutorial:
class Autoencoder(tf.keras.Model):
def train_step(self, data):
# Unpack the data. Its structure depends on your model and
# on what you pass to `fit()`.
x = data # CHANGE 1: changed from x, y = data
with tf.GradientTape() as tape:
y_pred = self(x, training=True) # Forward pass
# Compute the loss value
# (the loss function is configured in `compile()`)
loss = self.compiled_loss(x, y_pred, regularization_losses=self.losses) # CHANGE 2: replaced y by x as label
# Compute gradients
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
# Update weights
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
# Update metrics (includes the metric that tracks the loss)
self.compiled_metrics.update_state(x, y_pred) # CHANGE 3: like change 2
# Return a dict mapping metric names to current value
return {m.name: m.result() for m in self.metrics}
def test_step(self, data):
# CHANGED in the same way
x = data
# Compute predictions
y_pred = self(x, training=False)
# Updates the metrics tracking the loss
self.compiled_loss(x, y_pred, regularization_losses=self.losses)
# Update the metrics.
self.compiled_metrics.update_state(x, y_pred)
# Return a dict mapping metric names to current value.
# Note that it will include the loss (tracked in self.metrics).
return {m.name: m.result() for m in self.metrics}
This is for the functional API (tf.keras.Model). In case you are using a Sequential model, you should inherit from that instead. You can use this as a direct replacement for the normal model constructor.
Another option could be to use train_zipped = tf.data.Dataset.zip((train_ds, train_ds)) to create an input, target dataset that you can put directly into the usual model and loss function. Personally, I don't like the duplication. Also, I'm not sure if this will behave correctly for the shuffled data (will both copies of train_ds be shuffled in the same way?).
You could circumvent this by setting shuffle=False in image_dataset_from_directory, and then use train_zipped = train_zipped.shuffle(buffer_size) instead. However, in my experience this is very slow.

Related

Using training weights on a non-training data to design a new loss function

I would like to access the training point(s) at a training iteration and incorporate a soft constraint into my loss function by using data points not included in the training set. I will use this post as a reference.
import numpy as np
import keras.backend as K
from keras.layers import Dense, Input
from keras.models import Model
# Some random training data and labels
features = np.random.rand(100, 5)
labels = np.random.rand(100, 2)
# Simple neural net with three outputs
input_layer = Input((20,))
hidden_layer = Dense(16)(input_layer)
output_layer = Dense(3)(hidden_layer)
# Model
model = Model(inputs=input_layer, outputs=output_layer)
#each training point has another data pair. In the real example, I will have multiple
#supporters. That is why I am using dict.
holder = np.random.rand(100, 5)
iter = np.arange(start=1, stop=features.shape[0], step=1)
supporters = {}
for i,j in zip(iter, holder): #i represent the ith training data
supporters[i]=j
# Write a custom loss function
def custom_loss(y_true, y_pred):
# Normal MSE loss
mse = K.mean(K.square(y_true-y_pred), axis=-1)
new_constraint = ....
return(mse+new_constraint)
model.compile(loss=custom_loss, optimizer='sgd')
model.fit(features, labels, epochs=1, ,batch_size=1=1)
For simplicity, let us assume that I'd like to minimize the minimum absolute value difference between the prediction value and the prediction of the pair data stored in supporters by using the fixed network weights. Also, assume that I pass one training point at each batch. However, I could not figure out how to perform this opeartion. I've tried something shown below, but clearly, it is not correct.
new_constraint = K.sum(y_pred - model.fit(supporters))
Fit is the procedure of training evaluating the model. I think that it would be better for your problem to load a new instance of your model with your current weights and evaluate the batch loss in order to calculate the loss of the main model.
main_model = Model() # This is your main training model
def custom_loss_1(y_true, y_pred): # Avoid recursive calls
mse = K.mean(K.square(y_true-y_pred), axis=-1)
return mse
def custom_loss(y_true, y_pred):
support_model = tf.keras.models.clone_model(main_model) # You copy the main model but the weights are uninitialized
support_model.build((20,)) # You build with inputs same as your support data
support_model.compile(loss=custom_loss_1, optimizer='sgd')
support_model.set_weights(main_model.get_weights()) # You load the weight of the main model
mse = custom_loss_1(y_true, y_pred)
# You just want to evaluate the model, not to train. If you have more
# metrics than just loss the use support_model.evaluate(supporters)[0]
new_constraint = K.sum(y_pred - support_model.predict(supporters)) # predict to get the output, evaluate to get the metrics
return(mse+new_constraint)

How can I isolate why my tensorflow model has such a high loss and low accuracy?

The Context:
I am creating a test application that largely replicates the functionality described here.
I was able to run the code found in the tutorial linked above, and I see losses and accuracies that are reasonable, even after just a couple of epochs.
Tutorial Code: Early into the training of the two-headed CNN, losses and accuracy look good
This is because the code starts with the VGG16 model and the already trained weights, and it freezes those layers so that no learning is required for the core classification.
My test code largely replicates the tutorial structure. It uses the exact same dataset, and the already-trained VGG16 weights. However I load the image dataset using generators (rather than pulling all data into memory, as the tutorial does).
You can find how I created those generators in the answer provided here. I had struggled for a while, before I finally got it to a point that I think is correct.
The Problem:
When I train my model the classification loss and accuracy are as expected, however the bounding box loss grows, and the bounding box accuracy does not improve, over the epochs.
My Code: Even after just a couple epochs you see the bounding box loss starting to grow
Further Details:
I've spent a lot of time looking at the (image, target) tuples yielded by the generator, and I think I am handling the yielded data properly (including the unitrect).
A pycharm view of the images and target tuples yielded by generator
In fact I've also added a debug mode that allows me to display the images and rectangles fed into the training session.
A motorcycle with the bounding box as computed from the unit rectangle bounding box loaded from CSV into the dataframe (df); df is an input to flow_from_dataframe
The model I am using:
imodel = tf.keras.applications.vgg16.VGG16(weights=None, include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
imodel.load_weights(weights, by_name=True)
imodel.trainable = False
# flatten the max-pooling output of VGG
flatten = imodel.output
flatten = Flatten()(flatten)
# construct a fully-connected layer header to output the predicted
# bounding box coordinates
bboxHead = Dense(128, activation="relu")(flatten)
bboxHead = Dense(64, activation="relu")(bboxHead)
bboxHead = Dense(32, activation="relu")(bboxHead)
bboxHead = Dense(4, activation="sigmoid",
name="bounding_box")(bboxHead)
# construct a second fully-connected layer head, this one to predict
# the class label
softmaxHead = Dense(512, activation="relu")(flatten)
softmaxHead = Dropout(0.5)(softmaxHead)
softmaxHead = Dense(512, activation="relu")(softmaxHead)
softmaxHead = Dropout(0.5)(softmaxHead)
softmaxHead = Dense(len(classes), activation="softmax",
name="class_label")(softmaxHead)
# put together our model which accept an input image and then output
# bounding box coordinates and a class label
model = Model(
inputs=imodel.input,
outputs=(bboxHead, softmaxHead))
# define a dictionary to set the loss methods -- categorical
# cross-entropy for the class label head and mean absolute error
# for the bounding box head
losses = {
"class_label": "categorical_crossentropy",
"bounding_box": "mean_squared_error",
}
# define a dictionary that specifies the weights per loss (both the
# class label and bounding box outputs will receive equal weight)
lossWeights = {
"class_label": 1.0,
"bounding_box": 1.0
}
# initialize the optimizer, compile the model, and show the model
# summary
opt = Adam(lr=learning_rate)
model.compile(loss=losses, optimizer=opt, metrics=["accuracy"], loss_weights=lossWeights)
My call to "fit"
model.fit(x=train_generator[0], steps_per_epoch=train_generator[1],
validation_data=validation_generator[0], validation_steps=validation_generator[1],
epochs=epochs, verbose=1)
The weights that I load I've used in other experiments and downloaded them from kaggle - (see vgg16_weights_tf_dim_ordering_tf_kernels.h5).
My Generator:
def generate_image_generator(generator, data_directory, df, subset, target_size, batch_size, shuffle, seed):
genImages = generator.flow_from_dataframe(dataframe=df, directory=data_directory, target_size=target_size,
x_col="file",
y_col=['cls_onehot', 'bbox'],
subset=subset,
class_mode="multi_output",
batch_size=batch_size, shuffle=shuffle, seed=seed)
while True:
images, labels = genImages.next()
targets = {
'class_label': labels[0],
'bounding_box': np.array(labels[1], dtype="float32")
}
yield images, targets
def get_train_and_validate_generators(self, data_directory, files, max_images, validation_split, shuffle, seed, target_size):
generator = ImageDataGenerator(validation_split=validation_split,
rescale=1./255.)
df = get_dataframe(data_directory, files)
if max_images:
df = df.head(max_images)
train_generator = generate_image_generator(generator, data_directory, df, "training",
target_size,
self.batch_size,
shuffle, seed)
valid_generator = generate_image_generator(generator, data_directory, df, "validation",
target_size,
self.batch_size,
shuffle, seed)
Loading the dataframe from a list of CSV
def get_dataframe(data_directory, files):
frames=[]
for di in files:
df = pd.read_csv(data_directory+di["file"])
frames.append(df)
df = pd.concat(frames)
df['cls_onehot'] = df['cls'].str.get_dummies().values.tolist()
df['bbox'] = df[['sxu', 'syu', 'exu', 'eyu']].values.tolist()
return df
A snippet of the CSV:
id,file,sx,sy,ex,ey,cls,sxu,syu,exu,eyu,w,h
0,motorcycle.0001.jpg,31,19,233,141,motorcycle,0.1183206106870229,0.11801242236024845,0.8893129770992366,0.8757763975155279,262,161
1,motorcycle.0002.jpg,32,15,232,142,motorcycle,0.12167300380228137,0.09259259259259259,0.8821292775665399,0.8765432098765432,263,162
2,motorcycle.0003.jpg,30,20,234,143,motorcycle,0.11406844106463879,0.12269938650306748,0.8897338403041825,0.8773006134969326,263,163
3,motorcycle.0004.jpg,30,15,231,132,motorcycle,0.11450381679389313,0.1,0.8816793893129771,0.88,262,150
4,motorcycle.0005.jpg,31,19,232,145,motorcycle,0.1183206106870229,0.1144578313253012,0.8854961832061069,0.8734939759036144,262,166
When I load weights from "imagenet", rather than use those I received from kaggle, I see the very same increase in bounding box loss
imodel = tf.keras.applications.vgg16.VGG16(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
The Question:
Please provide suggestions on how to isolate this bounding box loss growth problem.
Ok. It looks like the problem was not at all with my generator. The code was fine except for one silly oversight. I still had an old call to compile running. I called compile correctly the first time with the composite loss function. Then I called it a second time strictly with categorical cross entropy as the cost, effectively ignoring my bounding boxes.
Anyways, if someone stumbles on this post, I hope they find the complete view of how to do classification and object detection, with a generator function, useful.
I edited the above question with the correct details.. so it now reflects the right answer.
I'd still like to get the perspective of the experts who have had to dig into the workings of a model to better understand the underlying details that lead to loss calculation.
Now that I'm starting to understand tensorflow at the high-level, its clear how to recognize when things are working.. its not clear how to diagnose when things aren't working.

How to use Attention or AdditiveAttention Layers which are given in tensorflow (Keras) for NER task?

I'm making a NER model with Bi-LSTM. I want to use Attention layers with it. I want to what is the right way to fit that Attention Layer? There are two layers given as: tf.keras.layers.Attention and tf.keras.layers.AdditiveAttention. I think it uses All Hidden states from LSTM as well as the last output but I'm not quite sure. Below is the code. Please tell me where do I have to put that Attention Layer? Documentation was not helpful for me. All other answers have used their own CustomAttention() Layer.
def build_model(vocab_size:int,n_tags:int,max_len:int,emb_dim:int=300,emb_weights=False,use_elmo:bool=False,use_crf:bool=False,train_embedding:bool=False):
'''
Build and return a Keras model based on the given inputs
args:
n_tags: No of unique 'y' tags present in the data
max_len: Maximum length of sentence to use
emb_dim: Size of embedding dimension
emb_weights: pretrained Embedding Weights for Embedding Layer. if False, use default
use_elmo: Whether to use Elmo Embeddings
use_crf: Whether to use the CRF layer
train_embedding: Whether to train the embeddings weights
out:
Keras model. See comments for each type of loss function and metric to use
'''
assert not(isinstance(emb_weights,np.ndarray) and use_elmo), "Either provide embedding weights or use ELMO. Not both"
inputs = Input(shape=(max_len,))
if isinstance(emb_weights,np.ndarray):
x = Embedding(trainable=train_embedding,input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True, embeddings_initializer=keras.initializers.Constant(emb_weights))(inputs)
elif use_elmo:
x = Lambda(ElmoEmbedding, output_shape=(max_len, 1024))(inputs) # Lambda will create a layer based on the function defined
else: # use default Embeddings
x = Embedding(input_dim=vocab_size, output_dim=emb_dim, input_length=max_len, mask_zero=True,)(inputs) # n_words = vocab_size
x = Bidirectional(LSTM(units=50, return_sequences=True,recurrent_dropout=0.1))(x)
# I think the attention layer will come here but I'm not sure exactly how to implement it here.
if use_crf:
try: # If you can not modify your crf.py file, it'll use the second package
x = Dense(50, activation="relu")(x) # use TimeDistributed(Dense(50, activation="relu")(x)) in case otherwise
crf = CRF(n_tags) # Instantiate CRF layer
out = crf(x)
model = Model(inputs, out)
return model # use crf_loss and crf_accuracy at compile time
except:
output = Dense(n_tags, activation=None)(x)
crf = CRF_TF2(dtype='float32') # it does not take any n_tags. See the documentation.
output = crf(output)
base_model = Model(inputs, output)
model = ModelWithCRFLoss(base_model) # It has Loss and Metric already. Change the model if you want to use DiceLoss.
return model # Do not use any metric or loss with this model.compile(). Just use Optimizer and run training
else:
out = Dense(n_tags, activation="softmax")(x) # Wrap it around TimeDistributed(Dense()) if you have old versions
model = Model(inputs, out)
return model # use "sparse_categorical_crossentropy", "accuracy"

how to perform early stopping when writing our own custom training loops in tensorflow 2.0?

To perform early stopping in Tensorflow, tf.keras has a very convenient method which is a call tf.keras.callbacks, which in turn can be used in model.fit() to execute it. When we write Custom training loop, I couldn't understand how to make use of the tf.keras.callbacks to execute it. Can someone provide with a basic tutorial on how to do it?
https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch
https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/
You have 2 approaches to create custom training loops.
One is this common 2 nested for loops.
or you can do this. All the callbacks and other features are available here
Tip : THE CODE BELLOW IS JUST AN SLICE OF CODE AND MODEL STRUCTURE IS NOT IMPLEMENTED. You should do it by your own.
More info? check here
class CustomModel(keras.Model):
def train_step(self, data):
# Unpack the data. Its structure depends on your model and
# on what you pass to `fit()`.
print(data)
x, y = data
with tf.GradientTape() as tape:
y_pred = self(x, training=True) # Forward pass
# Compute the loss value
# (the loss function is configured in `compile()`)
loss = self.compiled_loss(y, y_pred,
regularization_losses=self.losses)
# Compute gradients
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
# Update weights
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
# Update metrics (includes the metric that tracks the loss)
self.compiled_metrics.update_state(y, y_pred)
# Return a dict mapping metric names to current value
return {m.name: m.result() for m in self.metrics}
# Construct and compile an instance of CustomModel
inputs = keras.Input(shape=(32,))
outputs = keras.layers.Dense(1)(inputs)
model = CustomModel(inputs, outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['...'])
earlystopping_cb = keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
# Just use `fit` as usual
model.fit(train_ds, epochs=3, callbacks=[earlystopping_cb])
more info: https://keras.io/getting_started/intro_to_keras_for_engineers/#using-fit-with-a-custom-training-step

load multiple models in Tensorflow

I have written the following convolutional neural network (CNN) class in Tensorflow [I have tried to omit some lines of code for clarity.]
class CNN:
def __init__(self,
num_filters=16, # initial number of convolution filters
num_layers=5, # number of convolution layers
num_input=2, # number of channels in input
num_output=5, # number of channels in output
learning_rate=1e-4, # learning rate for the optimizer
display_step = 5000, # displays training results every display_step epochs
num_epoch = 10000, # number of epochs for training
batch_size= 64, # batch size for mini-batch processing
restore_file=None, # restore file (default: None)
):
# define placeholders
self.image = tf.placeholder(tf.float32, shape = (None, None, None,self.num_input))
self.groundtruth = tf.placeholder(tf.float32, shape = (None, None, None,self.num_output))
# builds CNN and compute prediction
self.pred = self._build()
# I have already created a tensorflow session and saver objects
self.sess = tf.Session()
self.saver = tf.train.Saver()
# also, I have defined the loss function and optimizer as
self.loss = self._loss_function()
self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.loss)
if restore_file is not None:
print("model exists...loading from the model")
self.saver.restore(self.sess,restore_file)
else:
print("model does not exist...initializing")
self.sess.run(tf.initialize_all_variables())
def _build(self):
#builds CNN
def _loss_function(self):
# computes loss
#
def train(self, train_x, train_y, val_x, val_y):
# uses mini batch to minimize the loss
self.sess.run(self.optimizer, feed_dict = {self.image:sample, self.groundtruth:gt})
# I save the session after n=10 epochs as:
if epoch%n==0:
self.saver.save(sess,'snapshot',global_step = epoch)
# finally my predict function is
def predict(self, X):
return self.sess.run(self.pred, feed_dict={self.image:X})
I have trained two CNNs for two separate tasks independently. Each took around 1 day. Say, model1 and model2 are saved as 'snapshot-model1-10000' and 'snapshot-model2-10000' (with their corresponding meta files) respectively. I can test each model and compute its performance separately.
Now, I want to load these two models in a single script. I would naturally try to do as below:
cnn1 = CNN(..., restore_file='snapshot-model1-10000',..........)
cnn2 = CNN(..., restore_file='snapshot-model2-10000',..........)
I encounter the error [The error message is long. I just copied/pasted a snippet of it.]
NotFoundError: Tensor name "Variable_26/Adam_1" not found in checkpoint files /home/amitkrkc/codes/A549_models/snapshot-hela-95000
[[Node: save_1/restore_slice_85 = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/restore_slice_85/tensor_name, save_1/restore_slice_85/shape_and_slice)]]
Is there a way to load from these two files two separate CNNs? Any suggestion/comment/feedback is welcome.
Thank you,
Yes there is. Use separate graphs.
g1 = tf.Graph()
g2 = tf.Graph()
with g1.as_default():
cnn1 = CNN(..., restore_file='snapshot-model1-10000',..........)
with g2.as_default():
cnn2 = CNN(..., restore_file='snapshot-model2-10000',..........)
EDIT:
If you want them into same graph. You'll have to rename some variables. One idea is have each CNN in separate scope and let saver handle variables in that scope e.g.:
saver = tf.train.Saver(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES), scope='model1')
and in cnn wrap all your construction in scope:
with tf.variable_scope('model1'):
...
EDIT2:
Other idea is renaming variables which saver manages (since I assume you want to use your saved checkpoints without retraining everything. Saving allows different variable names in graph and in checkpoint, have a look at documentation for initialization.
This should be a comment to the most up-voted answer. But I do not have enough reputation to do that.
Anyway.
If you(anyone searched and got to this point) still having trouble with the solution provided by lpp AND you are using Keras, check following quote from github.
This is because the keras share a global session if no default tf session provided
When the model1 created, it is on graph1
When the model1 loads weight, the weight is on a keras global session which is associated with graph1
When the model2 created, it is on graph2
When the model2 loads weight, the global session does not know the graph2
A solution below may help,
graph1 = Graph()
with graph1.as_default():
session1 = Session()
with session1.as_default():
with open('model1_arch.json') as arch_file:
model1 = model_from_json(arch_file.read())
model1.load_weights('model1_weights.h5')
# K.get_session() is session1
# do the same for graph2, session2, model2
You need to create 2 sessions and restore the 2 models separately. In order for this to work you need to do the following:
1a. When you're saving the models you need to add scopes to the variable names. That way you will know which variables belong to which model:
# The first model
tf.Variable(tf.zeros([self.batch_size]), name="model_1/Weights")
...
# The second model
tf.Variable(tf.zeros([self.batch_size]), name="model_2/Weights")
...
1b. Alternatively, if you already saved the models you can rename the variables by adding scope with this script.
2.. When you restore the different models you need to filter by variable name like this:
# The first model
sess_1 = tf.Session()
sess_1.run(tf.initialize_all_variables())
saver_1 = tf.train.Saver([v for v in tf.all_variables() if 'model_1' in v.name])
saver_1.restore(sess_1, weights_1_file)
sess_1.run(pred, feed_dict={image: X})
# The second model
sess_2 = tf.Session()
sess_2.run(tf.initialize_all_variables())
saver_2 = tf.train.Saver([v for v in tf.all_variables() if 'model_2' in v.name])
saver_2.restore(sess_2, weights_2_file)
sess_2.run(pred, feed_dict={image: X})
I encountered the same problem and could not solve the problem (without retraining) with any solution i found on the internet. So what I did is load each model in two separate threads which communicate with the main thread. It is simple enough to write the code, you just have to be careful when you synchronize the threads.
In my case each thread received the input for its problem and returned to the main thread the output. It works without any observable overhead.
One way is to clear your session if you want to train or load multiple models in succession. You can easily do this using
from keras import backend as K
# load and use model 1
K.clear_session()
# load and use model 2
K.clear_session()`
K.clear_session() destroys the current TF graph and creates a new one.
Useful to avoid clutter from old models / layers.