Grayscale image using opencv from numpy array failed - numpy

I use the following numpy array that hold an image which is black and white image with the following shape
print(img.shape)
(28, 112)
when I try to grayscale the image, to use it to get contours using opencv with following steps
#grayscale the image
grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#thredshold image
thresh = cv2.threshold(grayed, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
I got the following error
<ipython-input-178-7ebff17d1c18> in get_digits(img)
6
7 #grayscale the image
----> 8 grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
9
10
error: C:\projects\opencv-python\opencv\modules\imgproc\src\color.cpp:11073: error: (-215) depth == 0 || depth == 2 || depth == 5 in function cv::cvtColor
the opencv errors have no information in it to be able to get what is wrong

Here is the working code for how you were trying it:
img = np.stack((img,) * 3,-1)
img = img.astype(np.uint8)
grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(grayed, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
A simpler way of getting the same result is to invert the image yourself:
img = (255-img)
thresh = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)[1]
As you discovered, as you perform different operations on images, the image is required to be in different formats.
cv2.THRESH_BINARY_INV and cv2.THRESH_BINARY are designed to take a color image (and convert it to grayscale) so you need a three channel representation.
cv2.THRESH_OTSU works with grayscale images so one channel is okay for that.
Since your image was already grayscale from the start, you weren't able to convert it from color to grayscale nor did you really need to. I assume you were trying to invert the image but that's easy enough on your own (255-img).
At one point you tried to do an cv2.THRESH_OTSU with floating point values but cv2.THRESH_OTSU requires integers between 0 and 255.
If openCV had more user-friendly error messages it would really help with issues like these.

Related

remove background using u2net produced mask

I am trying to remove background from an image. For this purpose I am using U2NET. I am writing the network structure using Tensorflow by following this repository. I have changed the model architecture according to my needs. It takes 96x96 image and produces 7 masks. I am taking 1st mask (out of 7) and multiplying it against the all channels of original 96x96 image.
The code that predicts 7 masks is:
img = Image.open(os.path.join('DUTS-TE','DUTS-TE-Image', test_x_names[90]))
copied = deepcopy(img)
copied = copied.resize((96,96))
copied = np.expand_dims(copied,axis=0)
preds = model.predict(copied)
preds = np.squeeze(preds)
"preds[0]" is:
predicted mask
Multiplying the mask against the original image produces:
masked image and corresponding code is ("img2" is original image):
img2 = np.asarray(img2)
immg = np.zeros((96,96,3), np.uint8)
for i in range(0,3):
immg[:,:,i] = img2[:,:,i] * preds[0]
plt.imshow(immg)
plt.show()
If i binarize the mask and then multiply it against the original image it produces :
enter image description here and corresponding code is :
frame = binarize(preds[0,:,:], threshold = 0.5)
img2 = np.asarray(img2)
immg = np.zeros((96,96,3), np.uint8)
for i in range(0,3):
immg[:,:,i] = img2[:,:,i] * frame
plt.imshow(immg)
plt.show()
Multiplying the original image with mask or binarized mask do not segment the foreground properly from the background. So, what can be done? Am I missing something?

Pytesseract OCR with different colors

I am trying to read this type of image with pytesseract but I have some issue with the part in yellow because the color transformation that works for other chracters won't work for those in yellow boxes. Also I want to keep the " numbers fo each row well split.
Any idea how I could manage that?
Thanks
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# invert = 255 - thresh
# OCR
data = pytesseract.image_to_string(thresh, config="--psm 6")
print(data)
cv2.imshow("thresh", thresh)
# cv2.imshow("invert", invert)
cv2.waitKey()
Returns: '> SKAPOVALOY 4 (15\nRINDERKNECH 6 [EY 15\n'

Is focal length in pixel unit a linear measurment

I have a pan-tilt-zoom camera (changing focal length over time). There is no idea about its base focal length (e.g. focal length in time point 0). However, It is possible to track the change in focal length between frame and another based on some known constraints and assumptions (Doing a SLAM).
If I assume a random focal length (in pixel unit), for example, 1000 pixel. Then, the new focal lengths are tracked frame by frame. Would I get correct results relatively? Would the results (focal lengths) in each frame be correct up to scale to the ground truth focal length?
For pan and tilt, assuming 0 at start would be valid. Although it is not correct, The estimated values of new tili-pan will be correct up to an offset. However, I suspect the estimated focal length will not be even correct up to scale or offset.. Is it correct or not?
For a quick short answer - if pan-tilt-zoom camera is approximated as a thin lens, then this is the relation between distance (z) and focal length (f):
This is just an approximation. Not fully correct. For more precise calculations, see the camera matrix. Focal length is an intrinsic parameter in the camera matrix. Even if not known, it can be calculated using some camera calibration method such as DLT, Zhang's Method and RANSAC. Once you have the camera matrix, focal length is just a small part of it. You get many more useful things along with it.
OpenCV has an inbuilt implementation of Zhang's method. (Look at this documentation for explanations, but code is old and unusable. New up-to-date code below.) You need to take some pictures of a chess board through your camera. Here is some helper code:
import cv2
from matplotlib import pyplot as plt
import numpy as np
from glob import glob
from scipy import linalg
x,y = np.meshgrid(range(6),range(8))
world_points=np.hstack((x.reshape(48,1),y.reshape(48,1),np.zeros((48,1)))).astype(np.float32)
_3d_points=[]
_2d_points=[]
img_paths=glob('./*.JPG') #get paths of all checkerboard images
for path in img_paths:
im=cv2.imread(path)
ret, corners = cv2.findChessboardCorners(im, (6,8))
if ret: #add points only if checkerboard was correctly detected:
_2d_points.append(corners) #append current 2D points
_3d_points.append(world_points) #3D points are always the same
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(_3d_points, _2d_points, (im.shape[1],im.shape[0]), None, None)
print ("Ret:\n",ret)
print ("Mtx:\n",mtx)
print ("Dist:\n",dist)
You might want Undistortion: Correcting for Radial Distortion
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((6*8,3), np.float32)
objp[:,:2] = np.mgrid[0:6,0:8].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
for fname in img_paths:
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (6,8),None)
# If found, add object points, image points (after refining them)
if ret == True:
objpoints.append(objp)
cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
imgpoints.append(corners)
if 'IMG_5456.JPG' in fname:
plt.figure(figsize=(20,10))
img_vis=img.copy()
cv2.drawChessboardCorners(img_vis, (6,8), corners, ret)
plt.imshow(img_vis)
plt.show()
#Calibration
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
# Reprojection Error
tot_error = 0
for i in range(len(objpoints)):
imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
error = cv2.norm(imgpoints[i],imgpoints2, cv2.NORM_L2)/len(imgpoints2)
tot_error += error
print ("Mean Reprojection error: ", tot_error/len(objpoints))
# undistort
mapx,mapy = cv2.initUndistortRectifyMap(mtx,dist,None,newcameramtx,(w,h),5)
dst = cv2.remap(img,mapx,mapy,cv2.INTER_LINEAR)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
plt.figure(figsize=(20,10))
#cv2.drawChessboardCorners(dst, (6,8), corners, ret)
plt.imshow(dst)
plt.show()
# Reprojection Error
tot_error = 0
for i in range(len(objpoints)):
imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
error = cv2.norm(imgpoints[i],imgpoints2, cv2.NORM_L2)/len(imgpoints2)
tot_error += error
print ("Mean Reprojection error: ", tot_error/len(objpoints))

how to fix this issue ? cv2.error: OpenCV(4.1.2) ... error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize'

I am trying to do rotation on multiple images in a folder but I am having this error when I put values of fx, fy greater than 0.2 in the resize function
(cv2.error: OpenCV(4.1.2) ... error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize')
Although, when I try to rotate a single image and put values of fx and fy equal to 0.5, it works perfectly fine.
Is there a way to fix this issue because it is very hectic to augment images one by one? Plus the multiple images which are rotated by the code attached here, with fx and fy values equal to 0.2, have undesirable dimensions i.e the photos are very small and their quality is also reduced.
the part of code for rotation of multiple images is given below:
for imag in os.listdir(source_folder):
img = cv2.imread(os.path.join(source_folder,imag))
img = cv2.resize(img, (0,0), fx=0.5, fy=0.5)
width = img.shape[1]
height = img.shape[0]
M = cv2.getRotationMatrix2D((width/2,height/2),5,1.0)
rotated_img = cv2.warpAffine(img,M,(img.shape[1],img.shape[0]))
cv2.imwrite(os.path.join(destination_right_folder, "v4rl" + str(a) + '.jpg') , rotated_img)
#cv2.imshow("rotated_right",rotated_img)
#cv2.waitKey(0)
a += 1
Add a check after you read the image to see if it is None:
img = cv2.imread(os.path.join(source_folder,imag))
if img is None: continue
The error is happening when you call the cv2.resize() function. Maybe files are being read that are not images.

How to exchange colors between 2 images?

I have an image of spectacles with black background that I need to overlay onto a face image. To do so, I am taking the part of face image with shape same as spectacles; and put the colors of face image on black parts of the spectacles image. Then this small part of image can be put back. But I am not being able to take the correct colors from face image for the spectacles image. I tried this :
specs[np.where((hmd == [0,0,0,0]).all(axis=2))] = sub_face
specs image:
face image:
I need to put a resized specs image to face. I have resized specs image and also know the position where I will place the specs on face image. I just need to remove black background from specs and add relevant face colors so it looks like there are specs on face in a natural way.
Code I am using :
import cv2
specs = cv2.imread("rot_h0v0z0.png")
face = cv2.imread("~/Downloads/celebA/000001.png")
specs = cv2.resize(image, None, fx=0.3, fy=0.3, interpolation=cv2.INTER_AREA)
sub_face = face[0:specs.shape[0], 0:specs.shape[1]]
specs[np.where((hmd == [0,0,0,0]).all(axis=2))] = sub_face
Was able to solve it, turned out pretty simple :P
(b,g,r) = cv2.split(specs)
indices = np.where(b == [0])
for i,j in zip(indices[0], indices[1]):
specs[i,j] = sub_face[i,j]
Was able to solve it, turned out pretty simple :P
(b,g,r) = cv2.split(specs)
indices = np.where(b == [0])
for i,j in zip(indices[0], indices[1]):
specs[i,j] = sub_face[i,j]