Training Chip and Target Image format in TensorFlow - tensorflow

I am trying to build a Land Cover Classification model for Sentinel Image. The Image Channels(bands) I am using are 32-bit float.
I need to understand how to best format the Image data, both the chips/patches for training and the Target Image for Classification. I have few questions?
Do I need to convert my Original Image and Training Chips from 32bit to other depth?
Do I need to ensure that both the training chips/patches and target have same depth (either 32bit, 16bit or other)?
Do I need to resale my data? I saw some papers where data was rescaled between 0-1 or 0-255?
Does data depth effect the performance of learning and predicting?
Many thanks.
Maz

The best precision to use on a PC is float32 for many reasons like, more precision makes calculation more accurate which is better, but somehow float16 is slower than float32 on PC(I don't remember why) and float64 is unusable slow on regular machines.
So
You usually need to use float32 as input anyway. So if it's float32 in the first place then just use it like that.
You do, but I think they all will got converted to ther desired precision during fit or predict for keras. It's in $HOME/.keras/keras.json.
I don't think it's a need but std centered rescale helps convergence, though, google always simply rescale to -1 to 1.
It does, but as I said, more precision gives better accuracy but it slower.

Related

Does tf.image.convert_image_dtype(img, tf.float32) influence NN performance?

I am starting with adapting a neural network to a particular problem and am getting into image preprocessing at the moment. After finding out that there are differences with resizing in different libraries OpenCV vs. Pillow vs. Tensorflow (source) I am wondering if converting my image to float32 influences the training performance.
As far as I have understood it I need to convert it to float32 to display it using matplotlib.pyplot.
From theoretical point it shouldn't change anything as 245 and 245.0 should be the same.
Looking forward to your reply and thank you very much in advance!

What is the difference between Floating point 16 and 8 bit quantized in Deep Learning Model

Currently, I am reading this website to understand the Face Detection Models. In this article, it mentioned about the Floating Point 16 and 8 bit quantized version model.
I would like to ask:
What is the difference between two of them?
What is the application of different types of DL model? In which case that we need to use ?
Link to website:
https://www.learnopencv.com/face-detection-opencv-dlib-and-deep-learning-c-python/
What is the difference between two of them?
As you can see, the model size with 8-bit is smaller than the FP16 one, FP16 and 8-bit here means the precision and type of the model weight values.
What is the application of different types of DL model? In which case
that we need to use ?
Normally high precision will make the model size big but may be more accurate, however, in some case when we need to run the model with small latency or on light-weight device, we may want to reduce the model size with some slight lost of accuracy some times.

Do I need every class in a training image for object detection?

I just try to dive into TensorFlows Object Detection. I have a very small training set of circa 40 images yet. Each image can have up to 3 classes. But now the question came into my mind: Does every training image need every class? Is that important for efficient training? Or is it okay if an image may only have one of the object classes?
I get a very high total loss with ~8.0 and thought this might be the reason for this but I couldn't find an answer.
In general machine learning systems can cope with some amount of noise.
An image missing labels or having the wrong labels is fine as long as overall you have sufficient data for the model to figure it out.
40 examples for image classification sounds very small. It might work if you start with a pre-trained image network and there are few classes that are very easy to distinguish.
Ignore absolute the loss value, it doesn't mean anything. Look at the curve to see that the loss is decreasing and stop the training when the curve flattens out. Compare the loss value to a test dataset to check if the values are sufficiently similar (you are not overfitting). You might be able to compare to another training of the exact same system (to check if the training is stable for example).

How does tensorflow handle quantized networks

I have been reading about tensorflow's conversion of neural networks from floats to 8 bit values. Reading the matrix multiplication code in their repository seems to indicate that they are using 8 bit integers rather than fixed floating point which their documentation might have indicated.
I want to understand how exactly it performs the transformation. From what I have read, Am guessing that it scales the weights from 0 to 255. For instance, if we are talking about convolution on an input image which has a range of 0 to 255. The result of the convolution would then be a 32 bit integers which are then scaled back to 0 to 255 using statistics of min and max of the output. Is that correct ?
If so, Why does this work ?
Repository I checked for their code
https://github.com/google/gemmlowp/blob/master/standalone/neon-gemm-kernel-benchmark.cc#L573
I know I'm one year late to answer this question, but this answer may help someone else
Quantization
First, Quantization is the process of converting a continuous range of values (float numbers) to a finite range of discrete values (quantized integers qint). Quantized datatypes are pretty common with embedded systems because most embedded systems have limited resources and to load a trained network (that could be more than 200 MB) on a microcontroller is unachievable. So, we have to find out a way to reduce the size of these trained networks.
Almost all of the size of trained neural networks is taken up with the weights. Because all of the weights are floating-point numbers, simple compression formats like zip don’t compress them well. So, we had to find another way which is “Quantization”.
How is done?
Quantization is done by storing the minimum value and the maximum value for each layer's weights and then compressing each float value to an eight-bit integer representing the closest real number.
For example, assume that the weights of a certain layer in our neural network vary from -4.85 to 2.35 which represent the min and max respectively. Then quantization is done using the following formula:
Then, for example, the number 1.3 and 0 will be:
This simple formula can get the size to shrink by 75%, and as you can see, it’s reversible if we want to convert it back to float after loading so that your existing floating-point code can work without any changes. Moving calculations over to eight-bit will make trained models run faster, and use less power which is essential on embedded systems and mobile devices.
Quantization Vs Precision
Won’t that affect the precision of the model? Apparently, its effect isn’t that big and in this article we can see why. But in short, when we are trying to teach a network, the aim is to have it understand the patterns and discard noise. That means we expect the network to be able to produce good results despite a lot of noise. The networks that emerge from this process have to be very robust numerically, with a lot of redundancy in their calculations so that small differences in input samples don’t affect the results. And that’s what makes neural networks robust when it comes to noise. So, we can consider the quantization error as some kind of noise that well-trained neural networks can handle.

Neural network weights explode in linear unit

I am currently implementing a simple neural network and the backprop algorithm in Python with numpy. I have already tested my backprop method using central differences and the resulting gradient is equal.
However, the network fails to approximate a simple sine curve. The network hast one hidden layer (100 neurons) with tanh activation functions and a output layer with a linear activation function. Each unit hast also a bias input. The training is done by simple gradient descent with a learning rate of 0.2.
The problem arises from the gradient, which gets with every epoch larger, but I don't know why? Further, the problem is unchanged, if I decrease the learning rate.
EDIT: I have uploaded the code to pastebin: http://pastebin.com/R7tviZUJ
There are two things you can try, maybe in combination:
Use a smaller learning rate. If it is too high, you may be overshooting the minimum in the current direction by a lot, and so your weights will keep getting larger.
Use smaller initial weights. This is related to the first item. A smaller learning rate would fix this as well.
I had a similar problem (with a different library, DL4J), even in the case of extremely simple target functions. In my case, the issue turned out to be the cost function. When I changed from negative log likelihood to Poisson or L2, I started to get decent results. (And my results got MUCH better once I added exponential learning rate decay.)
Looks like you dont use regularization. If you train your network long enough it will start to learn the excact data rather than abstract pattern.
There are a couple of method to regularize your network like: stopped training, put a high cost to large gradients or more complex like e.g.g drop out. If you search web/books you probably will find many options for this.
A too big learning rate can fail to converge, and even DIVERGE, that is the point.
The gradient could diverge for this reason: when exceeding the position of the minima, the resulting point could not only be a bit further, but could even be at a greater distance than initially, but the other side. Repeat the process, and it will continue to diverge. in other words, the variation rate around the optimal position could be just to big compared to the learning rate.
Source: my understanding of the following video (watch near 7:30).
https://www.youtube.com/watch?v=Fn8qXpIcdnI&list=PLLH73N9cB21V_O2JqILVX557BST2cqJw4&index=10