I am trying to build a machine learning model, and in the first step, I plan to convert my data matrix (consisting of real numbers, both positive and negative values) into RGB images. All of these values are smaller than 255. I know we can do that with the PIL package, but I wonder if the original negative values can still be retained if we make them into images? Or they will be all rounded to zero?
I went through so many google examples, but still confused. So I am asking to be certain.
Related
I am evaluating a couple of object detection models on a data set and was planning on performing pre-processing on the data using standardization to zero mean and unit variance. But I don't know how to store the images when they have been pre-processed. Currently they are in jpg format, but what format can be used after I have pre-processed them? Some of the models I evaluate are yolov4, yolov5, and SSD.
If i instead scaled the pixel values from 0-255 to 0-1, what image format could I then use?
Also, if I train the object detector on pre-processed images and then want to apply it to a video, I assume I need to somehow pre-process the video to get decent results. How would I go about doing that?
I have calculated mean and std on my data set using the python module cv2. I read the images using imread which returns a numpy array. Then I subtract mean and divide with std. This gives me a numpy array with both negative and positive floating point values. But when I try to save this numpy array as an image using the function imwrite(filename, array), it doesn't work. I assume because the numpy array isn't allowed to contain negative values.
I'm trying to understand the difference between numpy fft and rfft. I've read the doc, and it only says rfft is meant for real inputs.
I've tested their performance on a large real array and found out rfft is faster than fft by about a third. My question is: why is rfft fast? Thanks!
An RFFT has half the degrees of freedom on the input, and half the number of complex outputs, compared to an FFT. Thus the FFT computation tree can be pruned to remove those adds and multiplies not needed for the non-existent inputs and/or those unnecessary since there are a lesser number of independant output values that need to be computed.
This is because an FFT of a strictly real input (e.g. all the input value imaginary components zero) produces a complex conjugate mirrored result, where each half can be trivially derived from the other half.
I have a multi-dimensional, hyper-spectral image (channels, width, height = 15, 2500, 2500). I want to compress its 15 channel dimensions into 5 channels.So, the output would be (channels, width, height = 5, 2500, 2500). One simple way to do is to apply PCA. However, performance is not so good. Thus, I want to use Variational AutoEncoder(VAE).
When I saw the available solution in Tensorflow or keras library, it shows an example of clustering the whole images using Convolutional Variational AutoEncoder(CVAE).
https://www.tensorflow.org/tutorials/generative/cvae
https://keras.io/examples/generative/vae/
However, I have a single image. What is the best practice to implement CVAE? Is it by generating sample images by moving window approach?
One way of doing it would be to have a CVAE that takes as input (and output) values of all the spectral features for each of the spatial coordinates (the stacks circled in red in the picture). So, in the case of your image, you would have 2500*2500 = 6250000 input data samples, which are all vectors of length 15. And then the dimension of the middle layer would be a vector of length 5. And, instead of 2D convolutions that are normally used along the spatial domain of images, in this case it would make sense to use 1D convolution over the spectral domain (since the values of neighbouring wavelengths are also correlated). But I think using only fully-connected layers would also make sense.
As a disclaimer, I haven’t seen CVAEs used in this way before, but like this, you would also get many data samples, which is needed in order for the learning generalise well.
Another option would be indeed what you suggested -- to just generate the samples (patches) using a moving window (maybe with a stride that is the half size of the patch). Even though you wouldn't necessarily get enough data samples for the CVAE to generalise really well on all HSI images, I guess it doesn't matter (if it overfits), since you want to use it on that same image.
I have dataset of images which are half black in a upper triangular fashion, i.e. all pixels below the main diagonal are black.
Is there a way in Tensorflow to give such an image to a conv2d layer and mask or limit the convolution to only the relevant pixels?
If the black translates to 0 then you don't need to do anything. The convolution will multiply the 0 by whatever weight it has so it's not going to contribute to the result. If it's not you can multiply the data with a binary mask to make them 0.
For all black pixels you will still get any bias term if you have any.
You could multiply the result with a binary mask to 0 out the areas you don't want populated. This way you can also decide to drop results that have too many black cells, like around the diagonal.
You can also write your own custom operation that does what you want. I would recommend against it because you only get a speedup of at most 2 (the other operations will lower it). You probably get more performance by running on a GPU.
I have been getting seemingly unacceptably high inaccuracies when computing matrix inverses (solving a linear system) in numpy.
Is this a normal level of inaccuracy?
How can I improve the accuracy of this computation?
Also, is there a way to solve this system more efficiently in numpy or scipy (scipy.linalg.cho_solve seemed promising but does not do what I want)?
In the code below, cholM is a 128 x 128 matrix. The matrix data is too large to include here but is located on pastebin: cholM.txt.
Also, the original vector, ovec, is being randomly selected, so for different ovec's the accuracy varies, but, for most cases, the error still seems unacceptably high.
Edit Solving the system using the singular value decomposition produces significantly lower error than the other methods.
import numpy.random as rnd
import numpy.linalg as lin
import numpy as np
cholM=np.loadtxt('cholM.txt')
dims=len(cholM)
print 'Dimensions',dims
ovec=rnd.normal(size=dims)
rvec=np.dot(cholM.T,ovec)
invCholM=lin.inv(cholM.T)
svec=np.dot(invCholM,rvec)
svec1=lin.solve(cholM.T,rvec)
def back_substitute(M,v):
r=np.zeros(len(v))
k=len(v)-1
r[k]=v[k]/M[k,k]
for k in xrange(len(v)-2,-1,-1):
r[k]=(v[k]-np.dot(M[k,k+1:],r[k+1:]))/M[k,k]
return r
svec2=back_substitute(cholM.T,rvec)
u,s,v=lin.svd(cholM)
svec3=np.dot(u,np.dot(np.diag(1./s),np.dot(v,rvec)))
for k in xrange(dims):
print '%20.3f%20.3f%20.3f%20.3f'%(ovec[k]-svec[k],ovec[k]-svec1[k],ovec[k]-svec2[k],ovec[k]-svec3[k])
assert np.all( np.abs(ovec-svec)<1e-5 )
assert np.all( np.abs(ovec-svec1)<1e-5 )
As noted by #Craig J Copi and #pv, the condition number of the cholM matrix is high, around 10^16, indicating that to achieve higher accuracy in the inverse, much greater numerical precision may be required.
Condition number can be determined by the ratio of maximum singular value to minimum singular value. In this instance, this ratio is not the same as the ratio of eigenvalues.
http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html
We could find the solution vector using a matrix inverse:
...
However, it is better to use the linalg.solve command which can be faster and more numerically stable
edit - from Steve Lord at MATLAB
http://www.mathworks.com/matlabcentral/newsreader/view_thread/63130
Why are you inverting? If you're inverting to solve a system, don't --
generally you would want to use backslash instead.
However, for a system with a condition number around 1e17 (condition numbers
must be greater than or equal to 1, so I assume that the 1e-17 figure in
your post is the reciprocal condition number from RCOND) you're not going to
get a very accurate result in any case.