In System.Drawing we retrieve the PixelFormat from Image object, but SkiaSharp.SkImage does not provide API to find the PixelFormat of decoded image. Whether it has any other workaround to find the PixelFormat of decoded images and also how can we create Image with PixelFormat value as equivalent of System.Drawing.Image
Short answer
There is this aspect, but it is split into two types: SKColorType and SKAlphaType. These properties are found on the SKImageInfo type.
SKImage vs SKBitmap
Before I share the real answer, a brief intro to the differences between the two image types.
SKBitmap (raster only)
A SKBitmap is a raster-only graphic, this means that the color and alpha type information is readily available. This is in the Info property.
SKBitmap bitmap = ...;
SKColorType colorType = bitmap.Info.ColorType;
SKAlphaType alphaType = bitmap.Info.AlphaType;
SKImage (raster or texture)
SKImage is a bit different in that it may not actually be a raster bitmap. An SKImage could be a GPU object, such as a OpenGL Texture. As such, this color type does not apply. So in your case where you are using raster bitmaps, use SKBitmap instead of SKImage.
However, there is still hope with SKImage, because a raster-based SKImage actually uses a SKBitmap under the hood. If you know your SKImage is a raster image, then you can use the PeekPixels() method to get a SKPixmap object which will also have an Info property. SKPixmap is a thin type that contains a reference to the image data, the info and a few other properties.
To check if a certain SKImage is a texture image or a raster image, you can use the IsTextureBacked property.
SKImage image = ...;
if (!image.IsTextureBacked) {
using (SKPixmap pixmap = image.PeekPixels()) {
SKColorType colorType = pixmap.ColorType;
SKAlphaType alphaType = pixmap.AlphaType;
}
} else {
// this is a texture on the GPU and can't be determined easily
}
Longer answer
So now the longer answer... The SKColorType and SKAlphaType types together form the equivalent of the PixelFormat.
For example:
PixelFormat.Format16bppRgb565 is equivalent to:
SKColorType.Rgb565 and SKAlphaType.Opaque
PixelFormat.Format32bppArgb is equivalent to:
SKColorType.Rgba8888 (or SKColorType.Bgra8888 on Windows) and SKAlphaType.Unpremul
PixelFormat.Format32bppPArgb is equivalent to:
SKColorType.Rgba8888 (or SKColorType.Bgra8888 on Windows) and SKAlphaType.Premul
The main difference between Rgba8888 and Bgra8888 is the platform that the app is running on. Typically, you would check the color type against the SKImageInfo.PlatformColorType as this will know what the native color type is supposed to be.
Related
I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:
I tried to the remove the noisy dots with some filters:
import cv2
import numpy as np
import pytesseract
img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))
Resulting tranformed images are:
Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?
Final Update:
As #Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named connectedComponentsWithStats, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.
And here are the new resulting images:
I've taken a much more direct approach to filtering ink splotches from pdf documents. I won't share the whole thing it's a lot of code, but here is the general strategy I adopted:
Use Python Pillow library to get an image object where you can manipulate pixels directly.
Binarize the image.
Find all connected pixels and how many pixels are in each group of connected pixels. You can do this using the minesweeper algorithm. Which is easy to search for.
Set some threshold value of pixels that all legitimate letters are expected to have. This will be dependent on your image resolution.
replace all black pixels in groups below the threshold with white pixels.
Convert back to image.
Your final output image is too blurry. To enhance the performance of pytesseract you need to sharpen it.
Sharpening is not as easy as blurring, but there exist a few code snippets / tutorials (e.g. http://datahacker.rs/004-how-to-smooth-and-sharpen-an-image-in-opencv/).
Rather than chaining blurs, blur once either using Gaussian or Median Blur, experiment with parameters to get the blur amount you need, perhaps try one method after the other but there is no reason to chain blurs of the same method.
There is an OCR example in python that detect the characters. Save several images and apply the filter and train a SVM algorithm. that may help you. I did trained a algorithm with even few Images but the results were acceptable. Check this link.
Wish you luck
I know the post is a bit old but I suggest you to try this library I've developed some time ago. If you have a set of labelled captchas that service would fit you. Take a look: https://github.com/punkerpunker/captcha_solver
In README there is a section "Train model on external data" that you might be interested in.
I am new to machine learning.
I want to prepare a document with a signature at the bottom of it.
For this purpose I am taking a photo of the user's signature for placement in the document.
How can I using machine learning extract only the signature part from the image and place it on the document?
Input example:
Output expected in gif format:
Extract the green image plane. Then take the complementary of the gray value of every pixel as the transparency coefficient. Then you can perform compositing to the destination.
https://en.wikipedia.org/wiki/Alpha_compositing
A simple image-processing technique using OpenCV should work. The idea is to obtain a binary image then bitwise-and the image to remove the non-signature details. Here's the results:
Input image
Binary image
Result
Code
import cv2
# Load image, convert to grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Bitwise-and and color background white
result = cv2.bitwise_and(image, image, mask=thresh)
result[thresh==0] = [255,255,255]
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
Please do research before posting questions like these. A simple google search of "extract signature from image python" gave so many results.
Git Repo
Git Repo
Stack Overflow
There are many other such alternatives. Please have a look and try a few approaches.
If you still have some questions or doubts, then post the approach you have taken and a discussion is warranted.
So I'm still trying to figure out color spaces for render textures, and how to create images without color banding. Within my gbuffer I use VK_FORMAT_R8G8B8A8_UNORM with VK_IMAGE_TILING to for my albedo texture AND VK_FORMAT_A2B10G10R10_UNORM_PACK32 with VK_IMAGE_TILING_OPTIMAL for my normal and emission render textures. For my brightness texture (which holds pixel values that are considered "bright" within a scene) and glow render texture (the final texture for bloom effects to be added later onto the final scene), I use VK_FORMAT_R8G8B8A8_SRGB and VK_IMAGE_TILING_OPTIMAL (although looking at this, I should probably make my bright and final textures R16G16B16A16 float formats instead). What I got was definitely not what I had in mind:
Changing the tiling for my normal, emission, glow, and brightness textures to VK_IMAGE_TILING_LINEAR, however, got me nice results instead, but at the cost of performance:
The nice thing about it though (and sorry about the weird border on the top left, was cropping over on MS paint...), is that the image doesn't suffer from color banding, such as when instead of using these formats for my textures, I use VK_FORMAT_R8G8B8A8_UNORM with VK_IMAGE_TILING_OPTIMAL:
Were you can see banding occuring on the top left of the helmet, as well as underneath it (where the black tubes are). Of course, I've heard of avoiding VK_IMAGE_TILING_LINEAR from this post
In general, I'm having trouble figuring out what would be the best way to avoid using VK_IMAGE_TILING_LINEAR when using srgb textures? I still would like to keep the nice crisp images that srgb gives me, but I am unsure how to solve this issue. The link might actually have the solution, but I'm not very much sure if there's a way to apply it to my gbuffer.
I would also like to clarify that VK_IMAGE_TILING_OPTIMAL works fine for Nvidia based GPUs (well, tested on a GTX 870M) but complains about using VK_IMAGE_TILING_LINEAR for srgb format, however, intel based gpus work fine with VK_IMAGE_TILING_LINEAR and sort of crap out like the first image up top this post when using VK_IMAGE_TILING_OPTIMAL.
The engine is custom made, feel free to check it out in this link
If you fancy some code, I use a function called SetUpRenderTextures() inside Engine/Renderer/Private/Renderer.cpp file, under line 1396:
VkImageCreateInfo cImageInfo = { };
VkImageViewCreateInfo cViewInfo = { };
// TODO(): Need to make this more adaptable, as intel chips have trouble with srgb optimal tiling.
cImageInfo.sType = VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO;
cImageInfo.usage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT | VK_IMAGE_USAGE_SAMPLED_BIT;
cImageInfo.imageType = VK_IMAGE_TYPE_2D;
cImageInfo.format = VK_FORMAT_R8G8B8A8_UNORM;
cImageInfo.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;
cImageInfo.mipLevels = 1;
cImageInfo.extent.depth = 1;
cImageInfo.arrayLayers = 1;
cImageInfo.extent.width = m_pWindow->Width();
cImageInfo.extent.height = m_pWindow->Height();
cImageInfo.samples = VK_SAMPLE_COUNT_1_BIT;
cImageInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
cImageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;
cViewInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
cViewInfo.format = VK_FORMAT_R8G8B8A8_UNORM;
cViewInfo.image = nullptr; // No need to set the image, texture->Initialize() handles this for us.
cViewInfo.viewType = VK_IMAGE_VIEW_TYPE_2D;
cViewInfo.subresourceRange = { };
cViewInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT;
cViewInfo.subresourceRange.baseArrayLayer = 0;
cViewInfo.subresourceRange.baseMipLevel = 0;
cViewInfo.subresourceRange.layerCount = 1;
cViewInfo.subresourceRange.levelCount = 1;
gbuffer_Albedo->Initialize(cImageInfo, cViewInfo);
gbuffer_Emission->Initialize(cImageInfo, cViewInfo);
cImageInfo.format = VK_FORMAT_R8G8B8A8_SRGB;
cViewInfo.format = VK_FORMAT_R8G8B8A8_SRGB;
cImageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;
GlowTarget->Initialize(cImageInfo, cViewInfo);
// It's probably best that these be 64bit float formats as well...
pbr_Bright->Initialize(cImageInfo, cViewInfo);
pbr_Final->Initialize(cImageInfo, cViewInfo);
cImageInfo.format = VK_FORMAT_A2B10G10R10_UNORM_PACK32;
cImageInfo.tiling = VK_IMAGE_TILING_OPTIMAL;
cViewInfo.format = VK_FORMAT_A2B10G10R10_UNORM_PACK32;
gbuffer_Normal->Initialize(cImageInfo, cViewInfo);
gbuffer_Position->Initialize(cImageInfo, cViewInfo);
So yes, the rundown. How to avoid using linear image tiling for srgb textures? Is this a hardware specific thing, and is it mandatory? Also, I apologize for any form of ignorance I have on this subject.
Thank you :3
Support for this combination is mandatory, so the corruption in your first image is either an application or a driver bug.
So VK_FORMAT_R8G8B8A8_SRGB is working with VK_IMAGE_TILING_OPTIMAL on your Nvidia GPU but not on your Intel GPU? But VK_FORMAT_R8G8B8A8_SRGB does work with VK_IMAGE_TILING_LINEAR on the Intel GPU?
If so, that sounds like you've got some missing or incorrect image layout transitions. Intel is more sensitive to getting those right than Nvidia is. Do the validation layers complain about anything? You need to make sure the image is VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL when rendering to it, and VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL when sampling from it.
I want to create an "array_like" QImage subclass that can be passed to numpy.array().
I'd like to avoid using PIL as a substitute; the whole point of this is to avoid the dependency on PIL. Besides, constantly converting between QImage and the PIL Image is impractical for my program.
I find the documentation cryptic, and after reading it I'm still confused about how to emulate the array interface. As the numpy documentation states, to qualify as an "array_like" object, it needs the __array_interface__ attribute, which is a dictionary with five keys. However, I've never dealt with types, buffers, and memory before; if someone could explain how to solve this problem it would be much appreciated.
I'm using Python 3.3 and PySide 1.1.2.
Thanks to all who reply!
It's easier to just use the buffer object returned from QImage.bits() and np.frombuffer().
def qimage2array(q_image):
width = q_image.width()
height = q_image.height()
arr = np.frombuffer(q_image.bits(), dtype=np.uint8).reshape([height, width, -1])
return arr
How can I do a basic face alignment on a 2-dimensional image with the assumption that I have the position/coordinates of the mouth and eyes.
Is there any algorithm that I could implement to correct the face alignment on images?
Face (or image) alignment refers to aligning one image (or face in your case) with respect to another (or a reference image/face). It is also referred to as image registration. You can do that using either appearance (intensity-based registration) or key-point locations (feature-based registration). The second category stems from image motion models where one image is considered a displaced version of the other.
In your case the landmark locations (3 points for eyes and nose?) provide a good reference set for straightforward feature-based registration. Assuming you have the location of a set of points in both of the 2D images, x_1 and x_2 you can estimate a similarity transform (rotation, translation, scaling), i.e. a planar 2D transform S that maps x_1 to x_2. You can additionally add reflection to that, though for faces this will most-likely be unnecessary.
Estimation can be done by forming the normal equations and solving a linear least-squares (LS) problem for the x_1 = Sx_2 system using linear regression. For the 5 unknown parameters (2 rotation, 2 translation, 1 scaling) you will need 3 points (2.5 to be precise) for solving 5 equations. Solution to the above LS can be obtained through Direct Linear Transform (e.g. by applying SVD or a matrix pseudo-inverse). For cases of a sufficiently large number of reference points (i.e. automatically detected) a RANSAC-type method for point filtering and uncertainty removal (though this is not your case here).
After estimating S, apply image warping on the second image to get the transformed grid (pixel) coordinates of the entire image 2. The transform will change pixel locations but not their appearance. Unavoidably some of the transformed regions of image 2 will lie outside the grid of image 1, and you can decide on the values for those null locations (e.g. 0, NaN etc.).
For more details: R. Szeliski, "Image Alignment and Stitching: A Tutorial" (Section 4.3 "Geometric Registration")
In OpenCV see: Geometric Image Transformations, e.g. cv::getRotationMatrix2D cv::getAffineTransform and cv::warpAffine. Note though that you should estimate and apply a similarity transform (special case of an affine) in order to preserve angles and shapes.
For the face there is lot of variability in feature points. So it won't be possible to do a perfect fit of all feature points by just affine transforms. The only way to align all the points perfectly is to warp the image given the points. Basically you can do a triangulation of image given the points and do a affine warp of each triangle to get the warped image where all the points are aligned.
Face detection could be handled based on the just eye positions.
Herein, OpenCV, Dlib and MTCNN offers to detect faces and eyes. Besides, it is a python based framework but deepface wraps those methods and offers an out-of-the box detection and alignment function.
detectFace function applies detection and alignment in the background respectively.
#!pip install deepface
from deepface import DeepFace
backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
DeepFace.detectFace("img.jpg", detector_backend = backends[0])
Besides, you can apply detection and alignment manually.
from deepface.commons import functions
img = functions.load_image("img.jpg")
backends = ['opencv', 'ssd', 'dlib', 'mtcnn']
detected_face = functions.detect_face(img = img, detector_backend = backends[3])
plt.imshow(detected_face)
aligned_face = functions.align_face(img = img, detector_backend = backends[3])
plt.imshow(aligned_face)
processed_img = functions.detect_face(img = aligned_face, detector_backend = backends[3])
plt.imshow(processed_img)
There's a section Aligning Face Images in OpenCV's Face Recognition guide:
http://docs.opencv.org/trunk/modules/contrib/doc/facerec/facerec_tutorial.html#aligning-face-images
The script aligns given images at the eyes. It's written in Python, but should be easy to translate to other languages. I know of a C# implementation by Sorin Miron:
http://code.google.com/p/stereo-face-recognition/