How to convert BGR TensorFlow Lite model to RGB? - kotlin

I have a tflite model trained on BGR data. How can I make it work properly with RGB images?
UPDATE
I want to use it with the material-showcase app: https://github.com/googlesamples/mlkit/tree/master/android/material-showcase
#Farmaker #JaredJunyoungLim . Thank you very much for your answers. I've updated the question. At first I was thinking about converting the model itself, so it wouldn't require any changes in the code. For example, the converter to the OpenVINO format has an option to reverse input channels. I have also tried to set the BGR ColorSpace in the metadata, but have found out that it's most probably not possible.
I guess I'll go with your suggestion then. In the linked code, there is indeed the ByteBuffer (FrameProcessorBase.kt). I guess this is the place to change the order of the channels (after the line 70):
val frame = processingFrame ?: return
However, how can I change the order of channels, if this is just a ByteBuffer? Do I need to figure out the way data is stored in it? For example there is R,G,B,R,G,B,R,G,B,... etc. for every pixel? Or maybe there is some more elegant way to that?
I can see that the format is set to IMAGE_FORMAT_NV21, which is YCrCb
UPDATE 2
For what I've tested (
Log.d("ByteBuffer", frame.toString())
), it seems that the ByteBuffer takes 1.5 bytes per pixel:
java.nio.HeapByteBuffer[pos=0 lim=3110401 cap=3110401]
(Resolution: 1920x1080; 3110400/1920/1080=1.5)
So it uses 12 bits per pixel, which means 4 bits per channel per pixel. That's a bit strange, because I would suspect at least 8 bits per channel per pixel (0-255).
So I guess that maybe it's compressed.

Related

How to Modify Data Augmentation in Darknet's YOLOv4

I built a digital scale reader using Darknet's YOLOv4Tiny. It is having trouble confusing 2's and 5's which leads me to believe that I am doing some unwanted data augmentation during training. (The results are mostly correct, and glare could be a factor, but I am expecting better results).
I have referenced this post:
Understanding darknet's yolo.cfg config files
and the darknet github:
https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-%5Bnet%5D-section
Below is a link to the yolov4-tiny.cfg that I modified for my model:
https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4-tiny.cfg
And a snippet from the link above:
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=1
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
Am I correct that angle=0 means that there is no rotation?
Are there any other possible ways I might be augmenting my data that could cause an issue?
Edit: If I wanted, how could I eliminate all data augmentation?
Or do I just need more data (currently 2484 images for 10 digit classes)?
horizontal flip is applied by default, add "flip=0" to disable.
https://github.com/AlexeyAB/darknet
Here, the angle, saturation, exposure, and hue all are part of data augmentation. You can eliminate all data augmentation by setting the value to 0. As data argumentation's values are hyperparameters, modifying the value of these could lead the accuracy to better or worse both. Here I suggest you keep the value given by darknet the same as they found these values are good to get good accuracy. And by all these 4 types of data augmentation darknet initially generated all images. If you add more images without replication it is always good to add more images for the deep learning model to learn the necessary complexity during training.

Implement CVAE for a single image

I have a multi-dimensional, hyper-spectral image (channels, width, height = 15, 2500, 2500). I want to compress its 15 channel dimensions into 5 channels.So, the output would be (channels, width, height = 5, 2500, 2500). One simple way to do is to apply PCA. However, performance is not so good. Thus, I want to use Variational AutoEncoder(VAE).
When I saw the available solution in Tensorflow or keras library, it shows an example of clustering the whole images using Convolutional Variational AutoEncoder(CVAE).
https://www.tensorflow.org/tutorials/generative/cvae
https://keras.io/examples/generative/vae/
However, I have a single image. What is the best practice to implement CVAE? Is it by generating sample images by moving window approach?
One way of doing it would be to have a CVAE that takes as input (and output) values of all the spectral features for each of the spatial coordinates (the stacks circled in red in the picture). So, in the case of your image, you would have 2500*2500 = 6250000 input data samples, which are all vectors of length 15. And then the dimension of the middle layer would be a vector of length 5. And, instead of 2D convolutions that are normally used along the spatial domain of images, in this case it would make sense to use 1D convolution over the spectral domain (since the values of neighbouring wavelengths are also correlated). But I think using only fully-connected layers would also make sense.
As a disclaimer, I haven’t seen CVAEs used in this way before, but like this, you would also get many data samples, which is needed in order for the learning generalise well.
Another option would be indeed what you suggested -- to just generate the samples (patches) using a moving window (maybe with a stride that is the half size of the patch). Even though you wouldn't necessarily get enough data samples for the CVAE to generalise really well on all HSI images, I guess it doesn't matter (if it overfits), since you want to use it on that same image.

Surface format is B8G8R8A8_UNORM, but vkCmdClearColorImage takes float?

I use vkGetPhysicalDeviceSurfaceFormatsKHR to get supported image formats for the swapchain, and (on Linux+Nvidia, using SDL) I get VK_FORMAT_B8G8R8A8_UNORM as the first option and I go ahead and create the swapchain with that format:
VkSwapchainCreateInfoKHR swapchain_info = {
...
.imageFormat = format, /* taken from vkGetPhysicalDeviceSurfaceFormatsKHR */
...
};
So far, it all makes sense. The image format used to draw on the screen is the usual 8-bits-per-channel BGRA.
As part of my learning process, I have so far arrived at setting up a lot of stuff but not yet the graphics pipeline1. So I am trying the only command I can use that doesn't need a pipeline: vkCmdClearColorImage2.
The VkClearColorValue used to define the clear color can take the color as float, uint32_t or int32_t, depending on the format of the image. I would have expected, based on the image format given to the swapchain, that I should give it uint32_t values, but that doesn't seem to be correct. I know because the screen color didn't change. I tried giving it floats and it works.
My question is, why does the clear color need to be specified in floats when the image format is VK_FORMAT_B8G8R8A8_UNORM?
1 Actually I have, but thought I would try out the simpler case of no pipeline first. I'm trying to incrementally use Vulkan (given its verbosity) particularly because I'm also writing tutorials on it as I learn.
2 Actually, it technically doesn't need a render pass, but I figured hey, I'm not using any pipeline stuff here, so let's try it without a pipeline and it worked.
My rendering loop is essentially the following:
acquire image from swapchain
create a command buffer with the following:
transition from VK_IMAGE_LAYOUT_UNDEFINED to VK_IMAGE_LAYOUT_GENERAL (because I'm clearing the image outside a render pass)
clear the image
transition from VK_IMAGE_LAYOUT_GENERAL to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
submit command buffer to queue (taking care of synchronization with swapchain with semaphores)
submit for presentation
My question is, why does the clear color need to be specified in floats when the image format is VK_FORMAT_B8G8R8A8_UNORM?
Because the normalized, scaled, or sRGB image formats are really just various forms of floating-point compression. A normalized integer is a way of storing floating-point values on the range [0, 1] or [-1, 1], but using a much smaller amount of data than even a 16-bit float. A scaled integer is a way of storing floating point values on the range [0, MAX] or [-MIN, MAX]. And sRGB is just a compressed way of storing linear color values on the range [0, 1], but in a gamma-corrected color space that puts precision in different places than the linear color values would suggest.
You see the same things with inputs to the vertex shader. A vec4 input type can be fed by normalized formats just as well as by floating-point formats.

Explain how premultiplied alpha works

Can somebody please explain why rendering with premultiplied alpha (and corrected blending function) looks differently than "normal" alpha when, mathematically speaking, those are the same?
I've looked into this post for understanding of premultiplied alpha:
http://blogs.msdn.com/b/shawnhar/archive/2009/11/06/premultiplied-alpha.aspx
The author also said that the end computation is the same:
"Look at the blend equations for conventional vs. premultiplied alpha. If you substitute this color format conversion into the premultiplied blend function, you get the conventional blend function, so either way produces the same end result. The difference is that premultiplied alpha applies the (source.rgb * source.a) computation as a preprocess rather than inside the blending hardware."
Am I missing something? Why is the result different then?
neshone
The difference is in filtering.
Imagine that you have a texture with just two pixels and you are sampling it exactly in the middle between the two pixels. Also assume linear filtering.
Schematically:
R|G|B|A + R|G|B|A = R|G|B|A
non-premultiplied:
1|0|0|1 + 0|1|0|0 = 0.5|0.5|0|0.5
premultiplied:
1|0|0|1 + 0|0|0|0 = 0.5|0|0|0.5
Notice the difference in green channel.
Filtering premultiplied alpha produces correct results.
Note that all this has nothing to do with blending.
This is a guess, because there is not enough information yet to figure it out.
It should be the same. One common way of getting a different value is to use a different Gamma correction method between the premultiply and the rendering step.
I am going to guess that one of your stages, either the blending, or the premultiplying stage is being done with a different gamma value. If you generate your premultiplied textures with a tool like DirectXTex texconv and use the default srgb option for premultiplying alpha, then your sampler needs to be an _SRGB format and your render target should be _SRGB as well. If you are treating them linearly then you may not be able to render to an _SRGB target or sample the texture with gamma correction, even if you are doing the premultiply in the same shader that samples (depending on 3D API and render target setup differences). Doing so will cause the alpha to be significantly different between the two methods in the midtones.
See: The Importance of Being Linear.
If you are generating the alpha in Photoshop then you should know a couple things. Photoshop does not save alpha in linear OR sRGB format. It saves it as a Gamma value about half way between linear and sRGB. If you premultiply in Photoshop it will compute the premultiply correctly but save the result with the wrong ramp. If you generate a normal alpha then sample it as sRGB or LINEAR in your 3d API it will be close but will not match the values Photoshop shows in either case.
For a more in depth reply the information we would need would be.
What 3d API are you using.
How are your textures generated and sampled
When and how are you premultiplying the alpha.
and preferably a code or shader example that shows the error.
I was researching why one would use Pre vs non-Pre and found this interesting info from Nvidia
https://developer.nvidia.com/content/alpha-blending-pre-or-not-pre
It seems that their specific case has more precision when using Pre, over Post-Alpha.
I also read (I believe on here but cannot find it), that doing pre-alpha (which is multiplying Alpha to each RGB value), you will save time. I still need to find out if that's true or not, but there seems to be a reason why pre-alpha is preferred.

Texture format for cellular automata in OpenGL ES 2.0

I need some quick advice.
I would like to simulate a cellular automata (from A Simple, Efficient Method
for Realistic Animation of Clouds) on the GPU. However, I am limited to OpenGL ES 2.0 shaders (in WebGL) which does not support any bitwise operations.
Since every cell in this cellular automata represents a boolean value, storing 1 bit per cell would have been the ideal. So what is the most efficient way of representing this data in OpenGL's texture formats? Are there any tricks or should I just stick with a straight-forward RGBA texture?
EDIT: Here's my thoughts so far...
At the moment I'm thinking of going with either plain GL_RGBA8, GL_RGBA4 or GL_RGB5_A1:
Possibly I could pick GL_RGBA8, and try to extract the original bits using floating point ops. E.g. x*255.0 gives an approximate integer value. However, extracting the individual bits is a bit of a pain (i.e. dividing by 2 and rounding a couple times). Also I'm wary of precision problems.
If I pick GL_RGBA4, I could store 1.0 or 0.0 per component, but then I could probably also try the same trick as before with GL_RGBA8. In this case, it's only x*15.0. Not sure if it would be faster or not seeing as there should be fewer ops to extract the bits but less information per texture read.
Using GL_RGB5_A1 I could try and see if I can pack my cells together with some additional information like a color per voxel where the alpha channel stores the 1 bit cell state.
Create a second texture and use it as a lookup table. In each 256x256 block of the texture you can represent one boolean operation where the inputs are represented by the row/column and the output is the texture value. Actually in each RGBA texture you can represent four boolean operations per 256x256 region. Beware texture compression and MIP maps, though!