Tensorboard training data visualization starting at 1000 - tensorflow

I am training my model for 5000 timesteps and it should start the training from 0 timestep but it is starting from 1000 timesteps instead. How can I fix this?
enter image description here
Thanks!
The training should start from zero, not 1000 timesteps.

Related

Validation loss (mae) decreasing but validation metric (mae) not decreasing

Issue that I am facing: the training mae loss, training mae error (metric), validation mae loss are decreasing but the validation mae error fluctuates but it's overall trend is not decreasing.
Model description: consists of two networks(timedistributed CNN and timedistributed Dense layers respectively applied to each temporal slice of input in the sample) the output of which merge into one and then are fed to the first LSTM layer as a sequence in time. The output of this LSTM goes to the second LSTM layer which finally gives the output.
Model loss and metric:
tensorflow model loss in model.compile method = MAE
tensorflow model metric in model.compile method = MAE
Model output: forecast 6 numbers of the target label at a 10 minute interval (i.e. 6 numbers for the next one hour)
Model input: input data for one hour. Images to the CNN branch. Image timestamp specific tabular features to the Dense branch. The interval between the data points in both the images and their respective tabular features is 10 minutes.
Training size: 76,600 samples (from year 2018 to 2021)
Validation size: 13,500 samples (year 2022)
Regularization used: L2 and Dropout in CNN & dense branches and LSTM.
Learning rate used: 2e-4 which decays at a rate of 0.4 after every 5 epochs till 1e-5.
Epochs: 20
eg: input data from 9am to 9.50am will consist of 6 timestamps separated at 10 minute interval. At each of these timestamps we'll have an image and corresponding tabular data features. The output will be 6 numbers at 10am, 10.10am ,10.20am, 10.30am, 10.40am and 10.50am for the target label.
What have I tried: I have played around with learning rate and the number of layers in the model which helped me to decrease everything except for validation mae error. If I apply more regularization then: training mae loss, training mae error, validation mae loss don't decrease as much as with little regularization.
I am not able to understand as to what would it mean if everything except for validation mae is decreasing.
Also, if it's important to know: the training mae loss is decreasing faster than the validation mae loss.
Pasting the links to the relevant images (not allowed to paste inline images by stackoverflow).
The graph of train vs validation metric(mae)
The graph of train vs validation loss(mae)
Thank you for all the help in advance.

how does batch size work in TimeDistributed

I'm beginner of AI and trying to implement the CRNN model in Keras.
model.add(TimeDistributed(base_model, input_shape=(3,32,32,3)))
I understand that the above code creates 3 timesteps and uses a 32x32 RGB image.
Then, if I have 90 train_image and set batch size to 30, how does it work?
grouped 30 pieces and entered into the timestep
or
enter into timestep in order
or am I misunderstanding about batch size?
If you have 90 images and you want a batch size of 30, then your input_shape should be (90,30,32,32,3). Source: the docs https://keras.io/api/layers/recurrent_layers/time_distributed/
Batch size is the amount of samples you want to use in an iteration of learning, before your CRNN updates its internal parameters.
As you can see in your screenshots you’ve trained your model for one epoch, in 3 timesteps (when training your learning model, an epoch is one iteration through the entire dataset. 30 times 3 makes 90, the whole dataset).

Is the loss printed by tensorflow a batch/sample wise loss or is it a running average loss?

When I train a TensorFlow model, it usually prints information similar to the below line at each iteration
INFO:tensorflow:loss = 1.9433185, step = 11 (0.300 sec)
Is the loss being printed the loss of the batch that the model saw currently, or is it the running average loss over all the previous batches of the training?
If I use a batch size of 1 i.e. only one training sample in each batch, then the loss printed will be of every sample separately, or will it be a running average loss?
The loss reported in the progress bar of Keras/TensorFlow is always a running mean of the batches seen so far, it is not a per-batch value.
I do not think there is a way to see the per-batch values during training.

Tensorflow accuracy spikes in every epoch

I'm running tensorflow on GPU for training. I have a 1 layer GRU cell, with 800 batch size and I do 10 epochs. I see this spikes in the accuracy graph from tensorboard and I do not understand why. See the image.
If you count the spikes, they are 10, as the number of epochs. I tried this with different configurations, reducing batch size, increasing number of layers but the spikes are still there.
You can find the code here if it helps.
I use tf.RandomShuffleQueue for the data with infinite epochs, and I calculate how many steps it should do. I do not think that the problem is on how I calculate the accuracy (here). Do you have any suggestions why this happens?
EDIT
min_after_dequeue=2000

Does training step controls number of iterations in convolution neural network?

By using Convolution neural network I have to train 100000 samples and batch size is 100 where as training step is 4000. If I pass 100 samples for first time it will be considered as one iteration.I want to run code for 10000 iterations. If i set training step 1000 does it means that i complete 10000 iterations?
The answer depends on how you are training your model, but most setups increment global step once per session.run call on your training ops, so setting training step to X and batch size to Y will lead to you doing X steps, each over Y examples, to a total of X*Y examples processed.