Tensorflow model.fit use of step per epoch - tensorflow

model.fit(x,y,steps_per_epoch=1,epochs=n_iters,batch_size=batch_size,shuffle=False)
If I use the model.fit to train a model and set steps_per_epoch=1 and epochs=n_iters, does that mean that only the first batch in the training data will be repeatedly used?
Or is it the same as setting steps_per_epoch=num_of_batches and epochs=n_iters//num_of_batches

Related

Transfer Learning with Tensorflow (MobileNet)

In Transfer learning, I think my model.fit_generator goes in an infinite loop. I don't know-how. Here is my Colab notebook link https://colab.research.google.com/drive/1o9GNCQdMeh4HZdiZ5QAjiDDkixn-OsXx
Here is the image of model.fit_generator
If you update the last line as follows then it takes around 40 seconds for 5 epochs.
from
model.fit_generator(train_generator, epochs=5, validation_data=valid_generator)
to
model.fit_generator(train_generator, epochs=5, steps_per_epoch=len(train_generator), validation_data=valid_generator, validation_steps=len(valid_generator))
Please check the description of what is expected when the input to model.fit is a generator. So when 'steps_per_epoch' is None, the epoch will run until the input dataset is exhausted. So the generator running infinitely if your dataset is infinitely repeating dataset.
steps_per_epoch: Integer or None. Total number of steps (batches of
samples) before declaring one epoch finished and starting the next
epoch. When training with input tensors such as TensorFlow data
tensors, the default None is equal to the number of samples in your
dataset divided by the batch size, or 1 if that cannot be determined.
If x is a tf.data dataset, and 'steps_per_epoch' is None, the epoch
will run until the input dataset is exhausted. When passing an
infinitely repeating dataset, you must specify the steps_per_epoch
argument. This argument is not supported with array inputs.

Unknown number of steps - Training convolution neural network at Google Colab Pro

I am trying to run (training) my CNN at Google Colab Pro, when I run my code, all is allright, but It does not know the number of steps, so an infinite loop is created.
Mounted at /content/drive
2.2.0-rc3
Found 10018 images belonging to 2 classes.
Found 1336 images belonging to 2 classes.
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
Epoch 1/300
8/Unknown - 364s 45s/step - loss: 54.9278 - accuracy: 0.5410
I am using ImageDataGenerator() for loadings images. How can I fix it?
An iterator does not store anything, it generates the data dynamically. When you are using a dataset or dataset iterator, you must provide steps_per_epoch. The length of an iterator is unknown until you iterate through it. You could explicitly pass len(datafiles) into the .fit function. So, You need to provide steps_per_epoch as shown below.
model.fit_generator(
train_data_gen,
steps_per_epoch=total_train // batch_size,
epochs=epochs,
validation_data=val_data_gen,
validation_steps=total_val // batch_size
)
More details are mentioned here
steps_per_epoch: Integer or None. Total number of steps (batches of
samples) before declaring one epoch finished and starting the next
epoch. When training with input tensors such as TensorFlow data
tensors, the default None is equal to the number of samples in your
dataset divided by the batch size, or 1 if that cannot be determined.
If x is a tf.data dataset, and 'steps_per_epoch' is None, the epoch
will run until the input dataset is exhausted. This argument is not
supported with array inputs.
I notice you are using binary classification. One more thing to remember when you use ImageDataGenerator is to provide class_mode as shown below. Otherwise, there will be a bug (in keras) or 50% accuracy (in tf.keras).
train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
directory=train_dir,
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),class_mode='binary') #

How to calculate TF object detection API accuracy over custom dataset?

I am using TF object detection API to detect object on a custom dataset but when it comes to accuracy I have no idea how to calculate it so,
How to calculate the accuracy of the object detection model over a custom dataset? And find the confident score of the model over the test dataset?
I tried to use eval.py but it is not helpful.
Are you talking about training accuracy, validation accuracy or test accuracy? As the names suggest there are 3 different values for accuracy:
Training accuracy: accuracy of the model on the training set
Validation accuracy: accuracy of the model on the validation set
Test accuracy: accuracy of the model on the test set
Training and validation accuracy are outputs of the training, for the test accuracy you need to run the model on the test set.
Did you retrain the model (from a checkpoint, fine tuning...) or did you use the model as you got it? If you have retrained the model you should have training and validation accuracy easily, actually you have those values for each epoch.
If you haven't retrained the model you can only check the test accuracy, given that the test dataset is labelled.
This link helped me to run eval.py and get mAP value for training data.
Just need to run using CUDA like this:
CUDA_VISIBLE_DEVICES="" python3 eval.py --logtostderr --pipeline_config_path=pre-trained-model/ssd_inception_v2_coco.config --checkpoint_dir=training/ --eval_dir=eval/

keras model.fit with validation data - which batch_size is used to evaluate the validation data?

I want to validate my model with validation data inside
model.fit(x_train, y_train, batch_size= 50, epochs=1,validation_data=(x_test,y_test))
Now, I want to train with batch_size=50. My validation data x_test is like of length of 1000.
As I can read from the doc the validation data is used after each epoch to evaluate. So I assume the model.evaluate method is used? But what batch size is used?
My validation data is greater then the batch_size in the fit method.
How is this handled?
What are the result if just the training batch_size is used but validation data is larger? Is val_acc averaged over each batch?
I want to validate on all my data in one batch.
Keras uses the same batch_size parameter for both training and validation in model.fit(). See discussion here.
If you intend to do evaluate on the entire validation data, you can maybe write a callback function and run model.evaluate() on the entire validation data after every epoch.

TensorBoard Callback in Keras does not respect initial_epoch of fit?

I'm trying to train multiple models in parallel on a single graphics card. To achieve that I need to resume training of models from saved weights which is not a problem. The model.fit() method has even a parameter initial_epoch that lets me tell the model which epoch the loaded model is on. However when i pass a TensorBoard callback to the fit() method in order to monitor the training of the models, on Tensorboard all data is shown on x=0.
Is there a ways to overcome this and adjust the epoch on tensorboard?
By the way: Im running Keras 2.0.6 and Tensorflow 1.3.0.
self.callbacks = [TensorBoardCallback(log_dir='./../logs/'+self.model_name, histogram_freq=0, write_graph=True, write_images=False, start_epoch=self.step_num)]
self.model.fit(x=self.data['X_train'], y=self.data['y_train'], batch_size=self.input_params[-1]['batch_size'], epochs=1, validation_data=(self.data['X_test'], self.data['y_test']), verbose=verbose, callbacks=self.callbacks, shuffle=self.hyperparameters['shuffle_data'], initial_epoch=self.step_num)
self.model.save_weights('./weights/%s.hdf5'%(self.model_name))
self.model.load_weights('./weights/%s.hdf5'%(self.model_name))
self.model.fit(x=self.data['X_train'], y=self.data['y_train'], batch_size=self.input_params[-1]['batch_size'], epochs=1, validation_data=(self.data['X_test'], self.data['y_test']), verbose=verbose, callbacks=self.callbacks, shuffle=self.hyperparameters['shuffle_data'], initial_epoch=self.step_num)
self.model.save_weights('./weights/%s.hdf5'%(self.model_name))
The resulting graph on Tensorboard looks like this which is not what i was hoping for:
Update:
When passing epochs=10 to the first model.fit() the 10 epoch results are displayed in TensorBoard (see picture).
However when reloading the model and running it (with the same callback attached) the on_epoch_end method of the callback gets never called.
Turns out that when i pass the number of episodes to model.fit() to tell it how long to train, it has to be the number FROM the initial_epoch specified. So if initial_epoch=self.step_num then , epochs=self.step_num+10 if i want to train for 10 episodes.
Say we just started fitting our model and our first time epoch count is 30
(please ignore other paramterers just look at epochs and initial_epoch)
model.fit(train_dataloader,validation_data = test_dataloader,epochs =30,steps_per_epoch = len(train_dataloader),callbacks = callback_list)
Now say ,after 30 epoch we want to start again from 31st epoch (you can see this in tesnorboard) by changing our Adam optimizer(or nay optimizer) learning rate
so we can do is
model.optimizer.learning_rate = 0.0005
model1.fit(train_dataloader,validation_data = test_dataloader,initial_epoch=30,epochs =55,steps_per_epoch = len(train_dataloader),callbacks = callback_list)
=> So here initial_epoch= where we have left training last time;
epochs= initial_epoch+num_epoch we want to run for this second fit