Using VK_IMAGE_TILING_LINEAR for srgb formats? - vulkan

So I'm still trying to figure out color spaces for render textures, and how to create images without color banding. Within my gbuffer I use VK_FORMAT_R8G8B8A8_UNORM with VK_IMAGE_TILING to for my albedo texture AND VK_FORMAT_A2B10G10R10_UNORM_PACK32 with VK_IMAGE_TILING_OPTIMAL for my normal and emission render textures. For my brightness texture (which holds pixel values that are considered "bright" within a scene) and glow render texture (the final texture for bloom effects to be added later onto the final scene), I use VK_FORMAT_R8G8B8A8_SRGB and VK_IMAGE_TILING_OPTIMAL (although looking at this, I should probably make my bright and final textures R16G16B16A16 float formats instead). What I got was definitely not what I had in mind:
Changing the tiling for my normal, emission, glow, and brightness textures to VK_IMAGE_TILING_LINEAR, however, got me nice results instead, but at the cost of performance:
The nice thing about it though (and sorry about the weird border on the top left, was cropping over on MS paint...), is that the image doesn't suffer from color banding, such as when instead of using these formats for my textures, I use VK_FORMAT_R8G8B8A8_UNORM with VK_IMAGE_TILING_OPTIMAL:
Were you can see banding occuring on the top left of the helmet, as well as underneath it (where the black tubes are). Of course, I've heard of avoiding VK_IMAGE_TILING_LINEAR from this post
In general, I'm having trouble figuring out what would be the best way to avoid using VK_IMAGE_TILING_LINEAR when using srgb textures? I still would like to keep the nice crisp images that srgb gives me, but I am unsure how to solve this issue. The link might actually have the solution, but I'm not very much sure if there's a way to apply it to my gbuffer.
I would also like to clarify that VK_IMAGE_TILING_OPTIMAL works fine for Nvidia based GPUs (well, tested on a GTX 870M) but complains about using VK_IMAGE_TILING_LINEAR for srgb format, however, intel based gpus work fine with VK_IMAGE_TILING_LINEAR and sort of crap out like the first image up top this post when using VK_IMAGE_TILING_OPTIMAL.
The engine is custom made, feel free to check it out in this link
If you fancy some code, I use a function called SetUpRenderTextures() inside Engine/Renderer/Private/Renderer.cpp file, under line 1396:
VkImageCreateInfo cImageInfo = { };
VkImageViewCreateInfo cViewInfo = { };
// TODO(): Need to make this more adaptable, as intel chips have trouble with srgb optimal tiling.
cImageInfo.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
cImageInfo.usage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT | VK_IMAGE_USAGE_SAMPLED_BIT;
cImageInfo.imageType = VK_IMAGE_TYPE_2D;
cImageInfo.format = VK_FORMAT_R8G8B8A8_UNORM;
cImageInfo.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
cImageInfo.mipLevels = 1;
cImageInfo.extent.depth = 1;
cImageInfo.arrayLayers = 1;
cImageInfo.extent.width = m_pWindow->Width();
cImageInfo.extent.height = m_pWindow->Height();
cImageInfo.samples = VK_SAMPLE_COUNT_1_BIT;
cImageInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
cImageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;
cViewInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
cViewInfo.format = VK_FORMAT_R8G8B8A8_UNORM;
cViewInfo.image = nullptr; // No need to set the image, texture->Initialize() handles this for us.
cViewInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
cViewInfo.subresourceRange = { };
cViewInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
cViewInfo.subresourceRange.baseArrayLayer = 0;
cViewInfo.subresourceRange.baseMipLevel = 0;
cViewInfo.subresourceRange.layerCount = 1;
cViewInfo.subresourceRange.levelCount = 1;
gbuffer_Albedo->Initialize(cImageInfo, cViewInfo);
gbuffer_Emission->Initialize(cImageInfo, cViewInfo);
cImageInfo.format = VK_FORMAT_R8G8B8A8_SRGB;
cViewInfo.format = VK_FORMAT_R8G8B8A8_SRGB;
cImageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;
GlowTarget->Initialize(cImageInfo, cViewInfo);
// It's probably best that these be 64bit float formats as well...
pbr_Bright->Initialize(cImageInfo, cViewInfo);
pbr_Final->Initialize(cImageInfo, cViewInfo);
cImageInfo.format = VK_FORMAT_A2B10G10R10_UNORM_PACK32;
cImageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;
cViewInfo.format = VK_FORMAT_A2B10G10R10_UNORM_PACK32;
gbuffer_Normal->Initialize(cImageInfo, cViewInfo);
gbuffer_Position->Initialize(cImageInfo, cViewInfo);
So yes, the rundown. How to avoid using linear image tiling for srgb textures? Is this a hardware specific thing, and is it mandatory? Also, I apologize for any form of ignorance I have on this subject.
Thank you :3

Support for this combination is mandatory, so the corruption in your first image is either an application or a driver bug.
So VK_FORMAT_R8G8B8A8_SRGB is working with VK_IMAGE_TILING_OPTIMAL on your Nvidia GPU but not on your Intel GPU? But VK_FORMAT_R8G8B8A8_SRGB does work with VK_IMAGE_TILING_LINEAR on the Intel GPU?
If so, that sounds like you've got some missing or incorrect image layout transitions. Intel is more sensitive to getting those right than Nvidia is. Do the validation layers complain about anything? You need to make sure the image is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL when rendering to it, and VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL when sampling from it.

Related

Simple Captcha Solving

I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:
I tried to the remove the noisy dots with some filters:
import cv2
import numpy as np
import pytesseract
img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))
Resulting tranformed images are:
Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?
Final Update:
As #Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named connectedComponentsWithStats, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.
And here are the new resulting images:
I've taken a much more direct approach to filtering ink splotches from pdf documents. I won't share the whole thing it's a lot of code, but here is the general strategy I adopted:
Use Python Pillow library to get an image object where you can manipulate pixels directly.
Binarize the image.
Find all connected pixels and how many pixels are in each group of connected pixels. You can do this using the minesweeper algorithm. Which is easy to search for.
Set some threshold value of pixels that all legitimate letters are expected to have. This will be dependent on your image resolution.
replace all black pixels in groups below the threshold with white pixels.
Convert back to image.
Your final output image is too blurry. To enhance the performance of pytesseract you need to sharpen it.
Sharpening is not as easy as blurring, but there exist a few code snippets / tutorials (e.g. http://datahacker.rs/004-how-to-smooth-and-sharpen-an-image-in-opencv/).
Rather than chaining blurs, blur once either using Gaussian or Median Blur, experiment with parameters to get the blur amount you need, perhaps try one method after the other but there is no reason to chain blurs of the same method.
There is an OCR example in python that detect the characters. Save several images and apply the filter and train a SVM algorithm. that may help you. I did trained a algorithm with even few Images but the results were acceptable. Check this link.
Wish you luck
I know the post is a bit old but I suggest you to try this library I've developed some time ago. If you have a set of labelled captchas that service would fit you. Take a look: https://github.com/punkerpunker/captcha_solver
In README there is a section "Train model on external data" that you might be interested in.

How to find PixelFormat for decoded bitmap using SkiaSharp

In System.Drawing we retrieve the PixelFormat from Image object, but SkiaSharp.SkImage does not provide API to find the PixelFormat of decoded image. Whether it has any other workaround to find the PixelFormat of decoded images and also how can we create Image with PixelFormat value as equivalent of System.Drawing.Image
Short answer
There is this aspect, but it is split into two types: SKColorType and SKAlphaType. These properties are found on the SKImageInfo type.
SKImage vs SKBitmap
Before I share the real answer, a brief intro to the differences between the two image types.
SKBitmap (raster only)
A SKBitmap is a raster-only graphic, this means that the color and alpha type information is readily available. This is in the Info property.
SKBitmap bitmap = ...;
SKColorType colorType = bitmap.Info.ColorType;
SKAlphaType alphaType = bitmap.Info.AlphaType;
SKImage (raster or texture)
SKImage is a bit different in that it may not actually be a raster bitmap. An SKImage could be a GPU object, such as a OpenGL Texture. As such, this color type does not apply. So in your case where you are using raster bitmaps, use SKBitmap instead of SKImage.
However, there is still hope with SKImage, because a raster-based SKImage actually uses a SKBitmap under the hood. If you know your SKImage is a raster image, then you can use the PeekPixels() method to get a SKPixmap object which will also have an Info property. SKPixmap is a thin type that contains a reference to the image data, the info and a few other properties.
To check if a certain SKImage is a texture image or a raster image, you can use the IsTextureBacked property.
SKImage image = ...;
if (!image.IsTextureBacked) {
using (SKPixmap pixmap = image.PeekPixels()) {
SKColorType colorType = pixmap.ColorType;
SKAlphaType alphaType = pixmap.AlphaType;
}
} else {
// this is a texture on the GPU and can't be determined easily
}
Longer answer
So now the longer answer... The SKColorType and SKAlphaType types together form the equivalent of the PixelFormat.
For example:
PixelFormat.Format16bppRgb565 is equivalent to:
SKColorType.Rgb565 and SKAlphaType.Opaque
PixelFormat.Format32bppArgb is equivalent to:
SKColorType.Rgba8888 (or SKColorType.Bgra8888 on Windows) and SKAlphaType.Unpremul
PixelFormat.Format32bppPArgb is equivalent to:
SKColorType.Rgba8888 (or SKColorType.Bgra8888 on Windows) and SKAlphaType.Premul
The main difference between Rgba8888 and Bgra8888 is the platform that the app is running on. Typically, you would check the color type against the SKImageInfo.PlatformColorType as this will know what the native color type is supposed to be.

tint() on createGraphics() buffer in p5

I'm trying to change the opacity of an offscreen graphics buffer in p5, and can't figure it out.
consider:
var pg = createGraphics(...)
var image = loadImage(...)
tint(255, 50)
image(image, 0, 0) <----- this works
image(pg, 0, 0) <----- but this doesn't
working example here
tint(255, x) should leave the coloring unchanged and set the opacity to x, but seems to have no effect. works fine on images, though... what am I missing?
Update: It seems as though in P5 (though not in Processing), tint() does in fact work only on p5.Image objects. So, a workaround is to create a p5.Image from the offscreen graphics buffer using get() to grab pixels from the buffer (which returns an image). Confusingly, the reference article for get() also uses images, making it hard to understand what's actually happening.
An updated (working) example is here.
To reiterate, the reason this is worth doing at all is to render complex shapes (like text) only once to a buffer, then draw / manipulate that buffer as needed. This drastically reduces CPU load and speeds up the sketch.
(credit for figuring this out goes to Ian)

Comparing two images - Detect egg in a nest

I have a webcam directly over a chicken nest. This camera takes images and uploads them to a folder on a server. I'd like to detect if an egg has been laid from this image.
I'm thinking the best method would be to compare the contrast as the egg will be much more reflective than the straw nest. (The camera has Infrared so the image is partly grey scale)
I'd like to do this in .NET if possible.
Try to resize your image to a smaller size, maybe 10 x 10 pixel. This averages out any small disturbing details.
Const N As Integer = 10
Dim newImage As New Bitmap(N, N)
Dim fromCamera As Image = Nothing ' Get image from camera here
Using gr As Graphics = Graphics.FromImage(newImage)
gr.SmoothingMode = SmoothingMode.HighSpeed
gr.InterpolationMode = InterpolationMode.Bilinear
gr.PixelOffsetMode = PixelOffsetMode.HighSpeed
gr.DrawImage(fromCamera, New Rectangle(0, 0, N, N))
End Using
Note: you do not need a high quality, but you need a good averaging. Maybe you will have to test different quality settings.
Since now, a pixel covers a large area of your original image, a bright pixel is very likely part of an egg. It might also be a good idea to compare the brightness of the brightest pixel to the average image brightness, since that would reduce problems due to global illumination changes.
EDIT (in response to comment):
Your code is well structured and makes sense. Here some thoughts:
Calculate the gray value from the color value with:
Dim grayValue = c.R * 0.3 + c.G * 0.59 + c.B * 0.11
... instead of comparing the three color components separately. The different weights are due to the fact, that we perceive green stronger than red and red stronger than blue. Again, we do not want a beautiful thumbnail we want a good contrast. Therefore, you might want to do some experiments here as well. May be it is sufficient to use only the red component. Dependent on lighting conditions one color component might yield a better contrast than others. I would recommend, to make the gray conversion part of the thumbnail creation and to write the thumbnails to a file or to the screen. This would allow you to play with the different settings (size of the thumbnail, resizing parameters, color to gray conversion, etc.) and to compare the (intermediate) results visually. Creating a bitmap (bmp) with the (end-)result is a very good idea.
The Using statement does the Dispose() for you. It does it even if an exception should occur before End Using (There is a hidden Try Finally involved).

OpenGL models show grid-like lines

Shading problem solved; grid lines persist. See Update 5
I'm working on an .obj file loader for OpenGL using Objective-C.
I'm trying to get objects to load and render them with shading. I'm not using any textures, materials, etc. to modify the model besides a single light source. When I render any model, the shading is not distributed properly (as seen in the pictures below). I believe this has something to do with the normals, but I'm not sure.
These are my results:
And this is the type of effect I'm trying to achieve:
I thought the problem was that the normals I parsed from the file were incorrect, but after calculating them myself and getting the same results, I found that this wasn't true. I also thought not having GL_SMOOTH enabled was the issue, but I was wrong there too.
So I have no idea what I'm doing wrong here, so sorry if the question seems vague. If any more info is needed, I'll add it.
Update:
Link to larger picture of broken monkey head: http://oi52.tinypic.com/2re5y69.jpg
Update 2: If there's is a mistake in the process of me calculating normals, this is what I'm doing:
Create a triangle for each group of indices.
Calculate the normals for the triangle and store it in a vector.
Ensure the vector is normalized with the following function
:
static inline void normalizeVector(Vector3f *vector) {
GLfloat vecMag = VectorMagnitude(*vector);
if (vecMag == 0.0) {
vector->x /= 1.0;
vector->y /= 0.0;
vector->z /= 0.0;
}
vector->x /= vecMag;
vector->y /= vecMag;
vector->z /= vecMag;
}
Update 3: Here's the code I'm using to create the normals:
- (void)calculateNormals {
for (int i = 0; i < numOfIndices; i += 3) {
Triangle triangle;
triangle.v1.x = modelData.vertices[modelData.indices[i]*3];
triangle.v1.y = modelData.vertices[modelData.indices[i]*3+1];
triangle.v1.z = modelData.vertices[modelData.indices[i]*3+2];
triangle.v2.x = modelData.vertices[modelData.indices[i+1]*3];
triangle.v2.y = modelData.vertices[modelData.indices[i+1]*3+1];
triangle.v2.z = modelData.vertices[modelData.indices[i+1]*3+2];
triangle.v3.x = modelData.vertices[modelData.indices[i+2]*3];
triangle.v3.y = modelData.vertices[modelData.indices[i+2]*3+1];
triangle.v3.z = modelData.vertices[modelData.indices[i+2]*3+2];
Vector3f normals = calculateNormal(triangle);
normalizeVector(&normals);
modelData.normals[modelData.surfaceNormals[i]*3] = normals.x;
modelData.normals[modelData.surfaceNormals[i]*3+1] = normals.y;
modelData.normals[modelData.surfaceNormals[i]*3+2] = normals.z;
modelData.normals[modelData.surfaceNormals[i+1]*3] = normals.x;
modelData.normals[modelData.surfaceNormals[i+1]*3+1] = normals.y;
modelData.normals[modelData.surfaceNormals[i+1]*3+2] = normals.z;
modelData.normals[modelData.surfaceNormals[i+2]*3] = normals.x;
modelData.normals[modelData.surfaceNormals[i+2]*3+1] = normals.y;
modelData.normals[modelData.surfaceNormals[i+2]*3+2] = normals.z;
}
Update 4: Looking further into this, it seems like the .obj file's normals are surface normals, while I need the vertex normals. (Maybe)
If the vertex normals are what I need, if anybody can explain the theory behind calculating them, that'd be great. I tried looking it up but I only found examples, not a theory. (e.g. "get the cross product of each face and normalize it"). If I know what I have to do, I can look it up an individual process if I get stuck and won't have to keep updating this.
Update 5: I re-wrote my whole loader, and got it to work, somehow. Although it shades properly, I still have those grid-like lines that you can see on my original results.
Your normalizeVector function is clearly wrong. Dividing by zero is never a good idea. Should be working when vecMag != 0.0 though. How are you calculating the normals? Using cross-product? When happens if you let OpenGL calculate the normals?
It may be the direction of your normals that's at fault - if they're the wrong way around then you'll only see the polygons that are supposed to be pointing away from you.
Try calling:
glDisable(GL_CULL_FACE);
and see whether your output changes.
Given that we're seeing polygon edges I'd also check for:
glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);