I am training a model that can accept variable input sizes (its a fully convolutional network) that has quite a complex input pipeline.
Thats why i have to use the dataset api's from_generator method to handle all the logic.
However, I want to be able to train the network on image batches of different sizes.
E.g. for first batch, the input images may be of size 200x200 , but for the next one it may be 300x300.
I want to randomise this process for a variety of size ranges (e.g. from 100x100 to 2000x2000).
This would be quite trivial using feed_dict: i would prepare a batch with the specific image size on each train step.
Is there any way to do this using the (high performance) dataset api so that I can leverage multi threading/prefetching without much work ?
Your best bet is to start with Datasets for each different minibatch size you want to support, do batching within each such dataset, and then interleave them before building an iterator.
Related
I have to train 70,000 images for my face verification project on google colab free.
First, it gets stuck on 1st epoch and then even if it starts training, after sometime it throws out of RAM error.
The code I use is:
<https://nbviewer.org/github/nicknochnack/FaceRecognition/blob/main/Facial%20Verification%20with%20a%20Siamese%20Network%20-%20Final.ipynb>
If I've to make mini-batches of my dataset to fit it in the colab's GPU memory, then how can I do it?
Also, I want to train the whole dataset because it contains the images of 5 different people as anchors and positives.
You can do following options to train larger datasets.
Add more pooling layers in model.
Lower input size in your model.
Use Binary Format of images with lower image size for image classification models.
Lower the batch size while training and validating your model.
You can also use tf.data api to do various operations like batching , slicing , processing, shuffling etc to create a data pipeline. You can constrain GPU usage further to avoid Out of memory issues.
Attaching sample colab notebook below. https://colab.sandbox.google.com/github/tensorflow/docs/blob/master/site/en/guide/data.ipynb
I want to see the effect of batch size on generalization for which I want to run my .fit() method with all the possible batch sizes.
But I was wondering what could be the constraints be on choosing batch sizes?
What does it depend on, a machine?? a dataset?
Any help is highly appreciated
It depends on the size of each sample and your GPU memory, if you're using it, else your RAM. Keep in mind that various other things are loaded in your memory, like the model's parameters, the graph, etc. But strictly for the size of a batch: NUM_SAMPLES * SIZE_OF_SAMPLE.
The batch size you choose is affected by several parameters:
Resources - You need to choose a small enough batch size that will be able to fit inside you CPU / GPU RAM.
Normalization - If you use BatchNorm you should probably use a large batch size, as the BatchNorm layers learn the mean and variance of your batch. The smaller the batches are the larger the deviance between them will be.
Personally, I usually use the largest batch size possible according to my resources. In case the possible batch is small (<16) I swap BatchNorm with other normalization methods such as LayerNorm / InstanceNorm.
The machine's memory.
The training batch size has a huge impact on the required GPU memory for training a neural network.
The GPU memory includes Parameters, optimizer’s variables, intermediate calculations, and workspace variables. So, the larger the batch size, the more samples are being propagated through the neural network in the forward pass. This results in larger intermediate calculations (e.g. layer activation outputs) that need to be stored in GPU memory. Technically speaking, the size of the activations is linearly dependent on the batch size
You can use some walk-around to increase the limitation:
Data-parallelism — use multiple GPUs to train all mini-batches in parallel, each on a single GPU
Gradient accumulation — run the mini-batches sequentially, while accumulating the gradients. The accumulated results are used to update the model variables at the end of the last mini-batch.
I'm trying to consume this tutorial by Google to use TensorFlow Estimator to train and recognise images: https://www.tensorflow.org/tutorials/estimators/cnn
The data I can see in the tutorial are: train_data, train_labels, eval_data, eval_labels:
((train_data,train_labels),(eval_data,eval_labels)) =
tf.keras.datasets.mnist.load_data();
In the convolutional layers, there should be feature filter image data to multiply with the input image data? But I don't see them in the code.
As from this guide, the input image data matmul with filter image data to check for low-level features (curves, edges, etc.), so there should be filter image data too (the right matrix in the image below)?: https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks
The filters are the weight matrices of the Conv2d layers used in the model, and are not pre-loaded images like the "butt curve" you gave in the example. If this were the case, we would need to provide the CNN with all possible types of shapes, curves, colours, and hope that any unseen data we feed the model contains this finite sets of images somewhere in them which the model can recognise.
Instead, we allow the CNN to learn the filters it requires to sucessfully classify from the data itself, and hope it can generalise to new data. Through multitudes of iterations and data( which they require a lot of), the model iteratively crafts the best set of filters for it to succesfully classify the images. The random initialisation at the start of training ensures that all filters per layer learn to identify a different feature in the input image.
The fact that earlier layers usually corresponds to colour and edges (like above) is not predefined, but the network has realised that looking for edges in the input is the only way to create context in the rest of the image, and thereby classify (humans do the same initially).
The network uses these primitive filters in earlier layers to generate more complex interpretations in deeper layers. This is the power of distributed learning: representing complex functions through multiple applications of much simpler functions.
My question is about finding an efficient (mostly in term of parameters count) way to implement a sliding window in tensorflow (1.4) in order to apply a neural network through the image and produce a 2-d map with each pixel (or region) representing the network output for the corresponding receptive field (which in this case is the sliding window itself).
In practice, I'm trying to implement either a MTANN or a PatchGAN using tensorflow, but I cannot understand the implementation I found.
The two architectures can be briefly described as:
MTANN: A linear neural network with input size of [1,N,N,1] and output size [ ] is applied to an image of size [1,M,M,1] to produce a map of size [1,G,G,1], in which every pixel of the generated map corresponds to a likelihood of the corresponding NxN patch to belong to a certain class.
PatchGAN Discriminator: More general architecture, as I can understand the network that is strided through the image outputs a map itself instead of a single value, which then is combined with adjacent maps to produce the final map.
While I cannot find any tensorflow implementation of MTANN, I found the PatchGAN implementation, which is considered as a convolutional network, but I couldn't figure out how to implement this in practice.
Let's say I got a pre-trained network of which I got the output tensor. I understand that convolution is the way to go, since a convolutional layer operates over a local region of the input and what is I'm trying to do can be clearly represented as a convolutional network. However, what if I already have the network that generates the sub-maps from a given window of fixed-size?
E.g. I got a tensor
sub_map = network(input_patch)
which returns a [1,2,2,1] maps from a [1,8,8,3] image (corresponding to a 3-layer FCN with input size 8, filter size 3x3).
How can I sweep this network on [1,64,64,3] images, in order to produce a [1,64,64,1] map composed of each spatial contribution, like it happens in a convolution?
I've considered these solutions:
Using tf.image.extract_image_patches which explicitly extract all the image patches and channels in the depth dimension, but I think it would consume too many resources, as I'm switching to PatchGAN Discriminator from a full convolutional network due to memory constraints - also the composition of the final map is not so straight-forward.
Adding a convolutional layer before the network I got, but I cannot figure out what the filter (and its size) should be in this case in order to keep the pretrained model work on 8x8 images while integrating it in a model which works on bigger images.
For what I can get it should be something like whole_map = tf.nn.convolution(input=x64_images, filter=sub_map, ...) but I don't think this would work as the filter is an operator which depends on the receptive field itself.
The ultimate goal is to apply this small network to big images (eg. 1024x1024) in an efficient way, since my current model downscales progressively the images and wouldn't fit in memory due to the huge number of parameters.
Can anyone help me to get a better understanding of what I am missing?
Thank you
I found an interesting video by Andrew Ng exactly on how to implement a sliding window using a convolutional layer.
The problem here was that I was thinking at the number of layers as a variable that is dependent on a fixed input/output shape, while it should be the opposite.
In principle, a saved model should only contain the learned filters for each level and as long as the filter shapes are compatible with the layers' input/output depth. Thus, applying a different (ie. bigger) spatial resolution to the network input produces a different output shape, which can be seen as an application of the neural network to a sliding windows sweeping across the input image.
I have to build a network that receives images of different sizes. I don't want to resize or crop so i am using a fully convolutional network.
The problem is that i can't pre-create minibatchs because of the different sizes of each image.
One solution would be to take the biggest image in the intended mini-batch and zero-pad all the other images to fit the same size. However that is not efficient in time and memory, especially since the images vary in size significantly (30px to 3000px even).
Another solution which i am using right now is to create mini-batchs of 1 that of course solves the problem of different sizes but it is no good for convergence.
So the question is whether Keras offers some method to collect gradients from several inputs and only then take a learning step?