Does training step controls number of iterations in convolution neural network? - tensorflow

By using Convolution neural network I have to train 100000 samples and batch size is 100 where as training step is 4000. If I pass 100 samples for first time it will be considered as one iteration.I want to run code for 10000 iterations. If i set training step 1000 does it means that i complete 10000 iterations?

The answer depends on how you are training your model, but most setups increment global step once per session.run call on your training ops, so setting training step to X and batch size to Y will lead to you doing X steps, each over Y examples, to a total of X*Y examples processed.

Related

Tensorboard training data visualization starting at 1000

I am training my model for 5000 timesteps and it should start the training from 0 timestep but it is starting from 1000 timesteps instead. How can I fix this?
enter image description here
Thanks!
The training should start from zero, not 1000 timesteps.

Is the loss printed by tensorflow a batch/sample wise loss or is it a running average loss?

When I train a TensorFlow model, it usually prints information similar to the below line at each iteration
INFO:tensorflow:loss = 1.9433185, step = 11 (0.300 sec)
Is the loss being printed the loss of the batch that the model saw currently, or is it the running average loss over all the previous batches of the training?
If I use a batch size of 1 i.e. only one training sample in each batch, then the loss printed will be of every sample separately, or will it be a running average loss?
The loss reported in the progress bar of Keras/TensorFlow is always a running mean of the batches seen so far, it is not a per-batch value.
I do not think there is a way to see the per-batch values during training.

Running training the discriminator with more examples

As I understand what of the diff between regular GAN to WGAN is that we train the discriminator/critic with more examples in each epoch. If in the regular gan we have in each epoch one batch for both modules, in WGAN we will have 5 batches (or more) for the discriminator and one for the generator.
So basically we have another inner loop for the discriminator :
real_images_labels = np.ones((BATCH_SIZE, 1))
fake_images_labels = -real_images_labels
for epoch in range(epochs):
for batch in range(NUM_BACHES):
for critic_iter in range(n_critic):
random_batches_idx = np.random.randint(0, NUM_BACHES) # Choose random batch from dataset
imgs_data=dataset_list[random_batches_idx]
c_loss_real = critic.train_on_batch(imgs_data, real_images_labels) # update the weights after 1 batch
noise = tf.random.normal([imgs_data.shape[0], noise_dim]) # Generate noise data
generated_images = generator(noise, training=True)
c_loss_fake = critic.train_on_batch(generated_images, fake_images_labels) # update the weights after 1 batch
imgs_data=dataset_list[batch]
noise = tf.random.normal([imgs_data.shape[0], noise_dim]) # Generate noise data
gen_loss_batch = gen_loss_batch + gan.train_on_batch(noise,real_images_labels)
The training is taking me a lot of time, per epoch about 3m. The idea I had to decrease the training time is instead running forward for each batch n_critic times I can increase the batch_size for the discriminator and run forward one time with a bigger batch_size.
I am seeking feedback: does it sound reasonable?
(I didn't paste my entire code, it was just a part of it).
Yes, it does sound reasonable typically increasing batch_size during training, typically decreases the training time with a cost of using more memory and lower accuracy (lower generalization ability).
Having said this you should do always do trial and error with regards to batching as extreme values may or may not increase the training time.
For further discussion you can refer to this question

What does steps mean in the train method of tf.estimator.Estimator?

I'm completely confused with the meaning of epochs, and steps. I also read the issue What is the difference between steps and epochs in TensorFlow?, But I'm not sure about the answer. Consider this part of code:
EVAL_EVERY_N_STEPS = 100
MAX_STEPS = 10000
nn = tf.estimator.Estimator(
model_fn=model_fn,
model_dir=args.model_path,
params={"learning_rate": 0.001},
config=tf.estimator.RunConfig())
for _ in range(MAX_STEPS // EVAL_EVERY_N_STEPS):
print(_)
nn.train(input_fn=train_input_fn,
hooks=[train_qinit_hook, step_cnt_hook],
steps=EVAL_EVERY_N_STEPS)
if args.run_validation:
results_val = nn.evaluate(input_fn=val_input_fn,
hooks=[val_qinit_hook,
val_summary_hook],
steps=EVAL_STEPS)
print('Step = {}; val loss = {:.5f};'.format(
results_val['global_step'],
results_val['loss']))
end
Also, the number of training samples is 400. I consider the MAX_STEPS // EVAL_EVERY_N_STEPS equal to epochs (or iterations). Indeed, the number of epochs is 100. What does the steps mean in nn.train?
In Deep Learning:
an epoch means one pass over the entire training set.
a step or iteration corresponds to one forward pass and one backward pass.
If your dataset is not divided and passed as is to your algorithm, each step corresponds to one epoch, but usually, a training set is divided into N mini-batches. Then, each step goes through one batch and you need N steps to complete a full epoch.
Here, if batch_size == 4 then 100 steps are indeed equal to one epoch.
epochs = batch_size * steps // n_training_samples

Difference of training steps or complete run through

on tensorflow.org in the beginner-mnist tutorial they train with 1000 steps, 100 examples. Which is more than the training set which only includes 55,000 points ? In the expert-mnist tutorial they train with 20000 steps, 50 examples.
I think the training steps are done, so that one could every training step make a print-output what loss or/and accuracy one got without waiting till the end or processing.
But could one also simply pipe all examples through the train_operation in 1 step and then look at the outcome, or is not possible ?
Training on the whole dataset at each iteration is called batch gradient descent. Training on minibatches (e.g. 100 samples at a time) is called stochastic gradient descent. You can read more about the two and the reasons for choosing larger or smaller batch sizes in this question on Cross Validated.
Batch gradient descent typically isn't feasible because it requires too much RAM. Each iteration will also take significantly longer and the tradeoff often isn't worth it even if you have the computational resources.
That said, the batch size is a hyperparameter that you can play around with to find a value that works well.