Transfer Learning with Tensorflow (MobileNet) - tensorflow

In Transfer learning, I think my model.fit_generator goes in an infinite loop. I don't know-how. Here is my Colab notebook link https://colab.research.google.com/drive/1o9GNCQdMeh4HZdiZ5QAjiDDkixn-OsXx
Here is the image of model.fit_generator

If you update the last line as follows then it takes around 40 seconds for 5 epochs.
from
model.fit_generator(train_generator, epochs=5, validation_data=valid_generator)
to
model.fit_generator(train_generator, epochs=5, steps_per_epoch=len(train_generator), validation_data=valid_generator, validation_steps=len(valid_generator))
Please check the description of what is expected when the input to model.fit is a generator. So when 'steps_per_epoch' is None, the epoch will run until the input dataset is exhausted. So the generator running infinitely if your dataset is infinitely repeating dataset.
steps_per_epoch: Integer or None. Total number of steps (batches of
samples) before declaring one epoch finished and starting the next
epoch. When training with input tensors such as TensorFlow data
tensors, the default None is equal to the number of samples in your
dataset divided by the batch size, or 1 if that cannot be determined.
If x is a tf.data dataset, and 'steps_per_epoch' is None, the epoch
will run until the input dataset is exhausted. When passing an
infinitely repeating dataset, you must specify the steps_per_epoch
argument. This argument is not supported with array inputs.

Related

What to do if I don't want to waste samples by flooring steps_per_epoch in model.fit?

I have a model that has 536 training samples, and would like to run through all the samples per epoch. The batch size is 32, epoch is 50. Here is the code and its error:
results = model.fit(train_X, train_y, batch_size = 32, epochs = 50, validation_data=(val_X, val_y), callbacks=callbacks)
The dataset you passed contains 837 batches, but you passed epochs=50 and steps_per_epoch=17, which is a total of 850 steps. We cannot draw that many steps from this dataset. We suggest to set steps_per_epoch=16.
Total number of samples / batch size = steps per epoch = 536/32 = 16.75. The model.fit would work if I set steps per epoch = 16. Doesn't this mean I'm discarding 24 samples (0.75 * 32) per each epoch?
If yes, how can I not discard these samples? One way would be adjusting batch size to have no residual when diving # of samples by it.
If there are other ways, please enlighten me.
If you do not want to discard any sample of data to train, It's better not to use steps_per_epoch after defining the batch_size. Because the model itself can calculate the steps_per_epoch while training the input dataset based on defined batch_size and provided training samples.
Please go through with this attached gist where I have tried to elaborate these terms in more detail.

How to only iterate through the first 20 batches per epoch using keras

I would like to fit a keras model but instead of running through all the batches in each epoch, I want to shuffle the whole training dataset (at the start of each epoch) then train only on the first 20 batches.
I have tried using steps_per_epoch but that restarts the next epoch where the previous one left off. What I want to do is, for every epoch:
shuffle the training dataset
split into batches of size 100
train on the first 20 batches
Can I achieve this using keras? I suspect I may need to use tensorflow directly to achieve this rather than keras, but I'm not sure how to go about doing that.

Unknown number of steps - Training convolution neural network at Google Colab Pro

I am trying to run (training) my CNN at Google Colab Pro, when I run my code, all is allright, but It does not know the number of steps, so an infinite loop is created.
Mounted at /content/drive
2.2.0-rc3
Found 10018 images belonging to 2 classes.
Found 1336 images belonging to 2 classes.
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
Epoch 1/300
8/Unknown - 364s 45s/step - loss: 54.9278 - accuracy: 0.5410
I am using ImageDataGenerator() for loadings images. How can I fix it?
An iterator does not store anything, it generates the data dynamically. When you are using a dataset or dataset iterator, you must provide steps_per_epoch. The length of an iterator is unknown until you iterate through it. You could explicitly pass len(datafiles) into the .fit function. So, You need to provide steps_per_epoch as shown below.
model.fit_generator(
train_data_gen,
steps_per_epoch=total_train // batch_size,
epochs=epochs,
validation_data=val_data_gen,
validation_steps=total_val // batch_size
)
More details are mentioned here
steps_per_epoch: Integer or None. Total number of steps (batches of
samples) before declaring one epoch finished and starting the next
epoch. When training with input tensors such as TensorFlow data
tensors, the default None is equal to the number of samples in your
dataset divided by the batch size, or 1 if that cannot be determined.
If x is a tf.data dataset, and 'steps_per_epoch' is None, the epoch
will run until the input dataset is exhausted. This argument is not
supported with array inputs.
I notice you are using binary classification. One more thing to remember when you use ImageDataGenerator is to provide class_mode as shown below. Otherwise, there will be a bug (in keras) or 50% accuracy (in tf.keras).
train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
directory=train_dir,
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),class_mode='binary') #

TensorBoard Callback in Keras does not respect initial_epoch of fit?

I'm trying to train multiple models in parallel on a single graphics card. To achieve that I need to resume training of models from saved weights which is not a problem. The model.fit() method has even a parameter initial_epoch that lets me tell the model which epoch the loaded model is on. However when i pass a TensorBoard callback to the fit() method in order to monitor the training of the models, on Tensorboard all data is shown on x=0.
Is there a ways to overcome this and adjust the epoch on tensorboard?
By the way: Im running Keras 2.0.6 and Tensorflow 1.3.0.
self.callbacks = [TensorBoardCallback(log_dir='./../logs/'+self.model_name, histogram_freq=0, write_graph=True, write_images=False, start_epoch=self.step_num)]
self.model.fit(x=self.data['X_train'], y=self.data['y_train'], batch_size=self.input_params[-1]['batch_size'], epochs=1, validation_data=(self.data['X_test'], self.data['y_test']), verbose=verbose, callbacks=self.callbacks, shuffle=self.hyperparameters['shuffle_data'], initial_epoch=self.step_num)
self.model.save_weights('./weights/%s.hdf5'%(self.model_name))
self.model.load_weights('./weights/%s.hdf5'%(self.model_name))
self.model.fit(x=self.data['X_train'], y=self.data['y_train'], batch_size=self.input_params[-1]['batch_size'], epochs=1, validation_data=(self.data['X_test'], self.data['y_test']), verbose=verbose, callbacks=self.callbacks, shuffle=self.hyperparameters['shuffle_data'], initial_epoch=self.step_num)
self.model.save_weights('./weights/%s.hdf5'%(self.model_name))
The resulting graph on Tensorboard looks like this which is not what i was hoping for:
Update:
When passing epochs=10 to the first model.fit() the 10 epoch results are displayed in TensorBoard (see picture).
However when reloading the model and running it (with the same callback attached) the on_epoch_end method of the callback gets never called.
Turns out that when i pass the number of episodes to model.fit() to tell it how long to train, it has to be the number FROM the initial_epoch specified. So if initial_epoch=self.step_num then , epochs=self.step_num+10 if i want to train for 10 episodes.
Say we just started fitting our model and our first time epoch count is 30
(please ignore other paramterers just look at epochs and initial_epoch)
model.fit(train_dataloader,validation_data = test_dataloader,epochs =30,steps_per_epoch = len(train_dataloader),callbacks = callback_list)
Now say ,after 30 epoch we want to start again from 31st epoch (you can see this in tesnorboard) by changing our Adam optimizer(or nay optimizer) learning rate
so we can do is
model.optimizer.learning_rate = 0.0005
model1.fit(train_dataloader,validation_data = test_dataloader,initial_epoch=30,epochs =55,steps_per_epoch = len(train_dataloader),callbacks = callback_list)
=> So here initial_epoch= where we have left training last time;
epochs= initial_epoch+num_epoch we want to run for this second fit

Keras: why does entire epoch take longer time when it shows all batches are complete?

I am trying to train a VGG16 model and have 4k training samples and 2k validation samples.
In the above image, even though the first 134 batches get completed very fast, the last batch waits for a long time and finishes after ~8mins, which I think is taking too long. Is there something wrong that I am doing? Using the following to start the training process.
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',metrics=['accuracy'])
model.fit_generator(train_generator, samples_per_epoch=4320, epochs=50, validation_data=validation_generator, nb_val_samples=2880)
When an epoch finishes, as you are providing validation data, Keras has to evaluate your model on the evaluation set, that's what takes 8 minutes in your case.