Do data augmentations by Tensorflow Object Detection API result in more samples than original? - tensorflow

So let's say my original raw dataset has 100 images. And I apply random_horizontal_flip data augmentation, which by default horizontally flips with 50% probability. So just for the sake of example, lets say it flips 50 of the 100 images. So,
Does that mean my algorithm will now be trained with 150 images (100 original and 50 flipped versions) or does it mean it will be trained with 100 images still, but 50 of them will be the flipped versions of the originals?
Is the answer to question #1 generalizable to all data augmentation options provided by Tensorflow object detection API?
I read as much official documentation as possible, and looked into preprocessor code, but couldn't find my answer.

Default augmentation probability, which is 50%, is independenetly applied to each image. Number of images that your model/algorithm is trained on depends on the number of epochs.
Let's say your batch size is 1 and total number of epoch is 100:
Your algoirthm will be trained on 100 images, 50 of them will be flipped version of the original images. In this case, model will not see the original 50 images because your epoch is too low.
Let's say your batch size is 1 and total number of epoch is 200:
Your algoirthm will be trained on 200 images, 100 of them will be flipped version of the original images.
As a result, as long as your epoch size is not limiting your dataset, with the probability of 50%, you will see an effect as if you have doubled the dataset by flipping each item.
In addition to the horizontal flip, if you add the vertical flip (random_vertical_flip) too, you triple your dataset.

Related

Poor classification accuracy when varying input image size with convolutional neural network (CNN)

I'm using Keras and TensorFlow to perform image classification and I obtain very high accuracy with image sets of a fixed size (where all training images have the same dimensions). However, accuracy is very poor when I let image height and width vary by specifying an input_shape of (None, None, 3). It's not that surprising that performance drops when image dimensions vary but what was surprising to me is the effect on training time. When images all have the same dimensions, each training epoch takes about 20 minutes; however, when training images are of varying sizes, each training epoch is taking less than 5 minutes (everything else including GPU and CNN architecture being the same).
Why would varying the input image size result in such a short training time? This likely is tied to why classification performance is decreasing significantly and I'd like a better understanding as to why this is occurring.

how does batch size work in TimeDistributed

I'm beginner of AI and trying to implement the CRNN model in Keras.
model.add(TimeDistributed(base_model, input_shape=(3,32,32,3)))
I understand that the above code creates 3 timesteps and uses a 32x32 RGB image.
Then, if I have 90 train_image and set batch size to 30, how does it work?
grouped 30 pieces and entered into the timestep
or
enter into timestep in order
or am I misunderstanding about batch size?
If you have 90 images and you want a batch size of 30, then your input_shape should be (90,30,32,32,3). Source: the docs https://keras.io/api/layers/recurrent_layers/time_distributed/
Batch size is the amount of samples you want to use in an iteration of learning, before your CRNN updates its internal parameters.
As you can see in your screenshots you’ve trained your model for one epoch, in 3 timesteps (when training your learning model, an epoch is one iteration through the entire dataset. 30 times 3 makes 90, the whole dataset).

How many training samples should I take for an object detection model with 62 classes?

I'm trying to train a YOLOv3 model for 62 classes using https://github.com/wizyoung/YOLOv3_TensorFlow.
How many samples should I take for each class.
I'm using a Nvidia GTX 1050Ti GPU so what should be my batch size with each image of 300*300 size?
Is 80-20 train/test split ideal?
The 80-20% train-test(val) split is dependent on the number of samples, not on the number of classes. The more data you have, the bigger the discrepancy percentage between train and test(val) it can be (for millions of samples data you can have 95%---5% split)
Normally, at least (minimum) number of 200 bounding_boxes_annotations per object should be present. That is, each of your classes should have at least 200 annotations.
1050Ti has only 4GB VRAM. Depending on your image_size, you can increase or decrease the batch_size. However, take into consideration that you do not have very much VRAM available, most likely(decrease it to 1 if you have OOM issues) a batch_size of 2 for images of 300x300 will be the maximum you can achieve.

Why we have target_size for DeepLab while CNN can accept any sizes?

I still have not understood a concept. One reason that we use fully convolutional layer at the end in a CNN network is to handle different images sizes during training. My question is that if this is the case why we always crop or squeeze images into squared sizes in the input section. Please do not say the question is repeated, we use squared images to make it easier, check pyramid pooling, and so on.
For example, Here's a link
DeepLab can accept any images with different sizes. But in its code, there is a target_size as (513). Now, if CNN can accept images with different sizes, why we need to use target_size. If this is for converting images into a standard format, why 513?
During training, we should specify batch size. What is our batch_size in this case: (5, None, None, None). Is it possible to have images with different sizes in a batch?
I read many posts and still, I am confused with these questions:
- How can we train a model on images with different sizes (imagine that sizes are standard)? I see some codes use a batch size of one. I think it is not a solution.
- Is there any snipped code that shows how can we define batches for a model like FCN to accept dataset with different sizes?
- In this paper: Here's a link my problem was explained but authors again resized images into squared format, if we can use batches comprises of images with different sizes why they proposed that idea of using squared images between 180 by 180 and 224 by 224.
Has DeepLab used this part: link to make images into a standard format? or for other reason?
width, height = image.size
resize_ratio = 1.0 * 513 / max(width, height)
target_size = (int(resize_ratio * width), int(resize_ratio * height))
I could not find the place of their code when they training the model on PASCAL dataset.
I expected to find a simple code for Keras or Tensorflow whereas it shows easily that we can apply a CNN model such as FCN or DeepLab for a dataset such as PASCAL VOC2012 (for Segmentation) with images of different sizes without any resizing or cropping. Still, I am looking.
Thank you for detail answers in advance. Please do not repeat answers like you can use batch size one, squared images are common and better, you can add black margins to the images, fully connected layer is the problem, you can use global max pooling, and so on. I am looking to find a code that works on images with different sizes.
I could not find the place of DeepLab model in TensorFlow GitHub where it accepts batches with different sizes?? here
Also in here FCN it is trained on COCO dataset with target_size of 320 by 320. Why? it should be any size for FCN.
Also, could one explain to me how can we have a batch of images with different sizes? Could we have an np array of different sized images? Batch = [5, none, none, 3] each of 5 with different sizes.
I also found another confusing part in semantic segmentation. Using Keras Augmentation we can not augment image with more than 4 channels. It means that using Keras augmentation, we can not train PASCAL dataset with 21 channels. ??

after training accuracy reach 100% it decrease suddenly and return to 100% later. it occurs regularly

I use my own simulation of the images as a dataset, 0-9, A-Z, nac :
a total of 37 categories,
there are fifteen kinds of fonts,
each font 1000 words per character,
a total of 509000 pictures (lack of some characters in some fonts) ,
Of which 70% as training set, 30% as testing set.
The size of the figure is 28x28 grayscale, black background and white word.
With tensorflow mnist that handwritten recognition of the demo network (2 layers conv). Use the tf.nn.softmax_cross_entropy_with_logits to count loss.
As shown in the figure, respectively, 10000 and 20000 iterations of the results, why is there such a strange situation? accuracy suddenly fall (regularly)
iteration 10000
iteration 20000
I think this is related to my question:
Loss increases after restoring checkpoint
Take a look at this Tensorboard chart
On my side, every time the model is restored from a checkpoint, I have a drop in performance.