2D image decomposition - frequency

I have a matrix and I want to decompose it into different matrices with low to high frequency limit. As I have noticed, it can be done using wavelet transform. I found something like the figure below for 1D signal and I want to do similar procedure for my 2D matrix using MATLAB. I want to decompose it to different matrices with low to high frequency components in different levels.
I used the matrix tool box, however, when I have problems with extracting the data.
How can I do this using MATLAB?

You are looking for the wavedec2 function.
There's a basic example w/ the function documentation here

Related

Multivariate Gaussian likelihood without matrix inversion

There are several tricks available for sampling from a multivariate Gaussian without matrix inversion--cholesky/LU decomposition among them. Are there any tricks for calculating the likelihood of a multivariate Gaussian without doing the full matrix inversion?
I'm working in python, using numpy arrays. scipy.stats.multivariate_normal is absurdly slow for the task, taking significantly longer than just doing the matrix inversion directly with numpy.linalg.inv.
So at this point I'm trying to understand what is best practice.

What does it mean to say convolution implementation is based on GEMM (matrix multiply) or it is based on 1x1 kernels?

I have been trying to understand (but miserably failing) how convolutions on images (with height, width, channels) are implemented in software.
I've heard people say their convolution implementation is done using GEMM, or done using "Direct convolution" or done using 1x1 kernels.
I find it very confusing and can't wrap my head around so many different ways it's described everywhere - I thought I understood a typical convolution like pytorch conv2d as a mathematical operation on an image, but what do they mean when someone says they do conv2d using one of the following ways?
1x1 kernels or 1x1 convolution (what does kernel even mean here)
GEMM
"direct convolution"
For doing Convolution using GEMM, what I understand based on this paper is that each of the input-image and filters are converted to 2d matrices using im2col and im2row ops and then these two are simply matrix-multiplied.
The 3d input image (height, width, input-channels) is converted to a 2d matrix, the 4-d kernel (output-channels, input-channels, kernel-height, kernel-width) is converted to a 2d matrix. Or does "GEMM-based implementation of convolution" mean something else? If that's what it means then how is it different than doing "convolution using 1x1 kernels"?
1x1 kernels or 1x1 convolution (what does kernel even mean here)
You can have 3x3 convolution, so you have a square containing 9 elements sliding over the image (with some specified stride, dilation etc.). In this case you have 1x1 convolution so the kernel is a single element (with stride=1 as well and no dilation).
So instead of sliding window with summation you simply project linearly each pixel with this single valued kernel.
It is a cheap operation and is used as part of depthwise separable convolutions used in many modern architectures to increase/decrease number of channels.
GEMM
In the article you provided there is as the top:
[...] function called GEMM. It’s part of the BLAS (Basic Linear Algebra
Subprograms)
So BLAS is a specification which describes a set of low-level algebraic operations and how they should be performed on computer.
Now, you have a lot of implementations of BLAS tailored to specific architectures or having some traits usable in some context. For example there is cuBLAS which is written and optimized for GPU (and used heavily by deep learning "higher level" libraries like PyTorch) or Intel's MKL for Intel CPUs (you can read more about BLAS anywhere on the web)
Usually those are written with a low-level (Fortran, C, Assembly, C++) languages for maximum performance.
GEMM is GEneralized Matrix multiplication routine which is used to implement fully connected layers and convolutions and is provided by various BLAS implementations.
It has nothing to do with the deep learning convolution per-se, it is a fast matrix multiplication routine (considering things like cache hit)
Direct convolutions
It is an approach which is O(n^2) complexity so you simply multiply items with each other. There is more efficient approach using Fast Fourier Transformation which is O(n*log(n)). Some info presented in this answer and questions about this part would be better suited for math related stackexchanges.

Build a dataset for TensorFlow

I have a large number of JPGs representing vehicles. I want to create a dataset for TensorFlow with a categorization such that every vehicle image describes the side, the angle or the roof, i.e. I want to create nine subsets of images (front, back, driver side, driver front angle, driver back angle, passenger side, passenger front angle, passenge back angle, roof). At the moment the filename of each JPG describes the desired point.
How can I turn this set to be a dataset that TensorFlow can easily manipulate? Also, should I run a procedure which crop the JPG to extract only the vehicle portion? How could I do that using TensorFlow?
I apologize in advance for not providing details and examples to this question, but I don't really know how can I achieve an entry point for this problem. The tutorials I'm following all assume an already created dataset ready to use.
Okay, I'm going to try to answer this as well as I can, but producing and pre-processing data for use in ML algorithms is laborious and often expensive (hence the repeated use of well known data sets for testing algorithm designs).
To address a few straight-forward questions first:
should I run a procedure which crop the JPG to extract only the vehicle portion?
No. This isn't necessary. The neural network will sort the relevant information in the images from the irrelevant itself and having a diverse set of images will help to build a robust classifier. Also you would likely make life a lot more difficult for yourself later on by resizing images (see point 1. below for more).
How could I do that using TensorFlow?
You wouldn't. Tensorflow is designed to build and test ML models, and does not have tool for pre-processing data. (well perhaps TensorFlow Extended does, but this shouldn't be necessary)
Now a rough guideline for how you would go about creating a data set from the files described:
1) The first thing you will need to do is to load your .jpg images into python and resize them all to be identical. A neural network will need the same number of inputs (pixels in this case) in every training example, so having different sized images will not work.
There is a good answer detailing how to load images using python image library (PIL) on stack overflow here.
The PIL image instances (elements of the list loadedImages in the example above) can then be converted to numpy arrays using data = np.asarray(image), which tensorflow can work with.
In addition to building a set of numpy arrays of your data, you will also need a second numpy array of labels for this data. A typical way to encode this will be as a numpy array the same length as your number of images with an integer value for each point representing the class to which that image belongs (0-8 for your 9 classes). You could input these by hand, but this will be labour intensive, and I would suggest using python strings inbuilt find method to locate key words within the filenames to automate determining their class. This could be done within the
for image in imagesList:
loop in the above link, as image should be a string containing the image filename.
As I mentioned above, resizing the images is necessary to make sure they are all identical. You could do this with numpy, using indexing to choose a subsection of each image array, or using PIL's resize function before converting to numpy. There is no right answer here, and many methods have been used to resize images for this purpose, from padding, to stretching to cropping.
Then end result here should be 2 numpy arrays. One of image data which has shape [w,h,3,n] where w=image width, h=image height, 3 = the three RGB layers (provided images are in colour) and n= the number of images you have. The second of labels associated with these images, of shape [n,] where every element of the length n array is an integer from 0-8 specifying its class.
At this point it would be a good idea to save the dataset in this format using numpy.save() so that you don't have to go through this process again.
2) Once you have your images in this format, tensorflow has a class called tf.Dataset into which you can load the image and label data described above and will allow you to shuffle and sample data from it.
I hope that was helpful, and I am sorry that there is no quick-fix solution to this (at least not one I am aware of). Good luck.

How does tensorflow handle quantized networks

I have been reading about tensorflow's conversion of neural networks from floats to 8 bit values. Reading the matrix multiplication code in their repository seems to indicate that they are using 8 bit integers rather than fixed floating point which their documentation might have indicated.
I want to understand how exactly it performs the transformation. From what I have read, Am guessing that it scales the weights from 0 to 255. For instance, if we are talking about convolution on an input image which has a range of 0 to 255. The result of the convolution would then be a 32 bit integers which are then scaled back to 0 to 255 using statistics of min and max of the output. Is that correct ?
If so, Why does this work ?
Repository I checked for their code
https://github.com/google/gemmlowp/blob/master/standalone/neon-gemm-kernel-benchmark.cc#L573
I know I'm one year late to answer this question, but this answer may help someone else
Quantization
First, Quantization is the process of converting a continuous range of values (float numbers) to a finite range of discrete values (quantized integers qint). Quantized datatypes are pretty common with embedded systems because most embedded systems have limited resources and to load a trained network (that could be more than 200 MB) on a microcontroller is unachievable. So, we have to find out a way to reduce the size of these trained networks.
Almost all of the size of trained neural networks is taken up with the weights. Because all of the weights are floating-point numbers, simple compression formats like zip don’t compress them well. So, we had to find another way which is “Quantization”.
How is done?
Quantization is done by storing the minimum value and the maximum value for each layer's weights and then compressing each float value to an eight-bit integer representing the closest real number.
For example, assume that the weights of a certain layer in our neural network vary from -4.85 to 2.35 which represent the min and max respectively. Then quantization is done using the following formula:
Then, for example, the number 1.3 and 0 will be:
This simple formula can get the size to shrink by 75%, and as you can see, it’s reversible if we want to convert it back to float after loading so that your existing floating-point code can work without any changes. Moving calculations over to eight-bit will make trained models run faster, and use less power which is essential on embedded systems and mobile devices.
Quantization Vs Precision
Won’t that affect the precision of the model? Apparently, its effect isn’t that big and in this article we can see why. But in short, when we are trying to teach a network, the aim is to have it understand the patterns and discard noise. That means we expect the network to be able to produce good results despite a lot of noise. The networks that emerge from this process have to be very robust numerically, with a lot of redundancy in their calculations so that small differences in input samples don’t affect the results. And that’s what makes neural networks robust when it comes to noise. So, we can consider the quantization error as some kind of noise that well-trained neural networks can handle.

plotting a function like ezplot in matlab in python using matplotlib

I have a a classification function that classify a data point into one of two classes. The problem is, I need a way to plot the decision boundary of two class. While this is easy for linear function. it's cumbersome to find the equation of the boundary. ezplot package in matlab seems to be able to do it. It will plot the result automatically. It works using linear and quadratic function. It doesn't require you to provide the coordinate. In matplotlib, you can only plot if you are given the coordinate. Does anyone know how to do this with matplotlib?