I am trying to remove background from an image. For this purpose I am using U2NET. I am writing the network structure using Tensorflow by following this repository. I have changed the model architecture according to my needs. It takes 96x96 image and produces 7 masks. I am taking 1st mask (out of 7) and multiplying it against the all channels of original 96x96 image.
The code that predicts 7 masks is:
img = Image.open(os.path.join('DUTS-TE','DUTS-TE-Image', test_x_names[90]))
copied = deepcopy(img)
copied = copied.resize((96,96))
copied = np.expand_dims(copied,axis=0)
preds = model.predict(copied)
preds = np.squeeze(preds)
"preds[0]" is:
predicted mask
Multiplying the mask against the original image produces:
masked image and corresponding code is ("img2" is original image):
img2 = np.asarray(img2)
immg = np.zeros((96,96,3), np.uint8)
for i in range(0,3):
immg[:,:,i] = img2[:,:,i] * preds[0]
If i binarize the mask and then multiply it against the original image it produces :
enter image description here and corresponding code is :
frame = binarize(preds[0,:,:], threshold = 0.5)
img2 = np.asarray(img2)
immg = np.zeros((96,96,3), np.uint8)
for i in range(0,3):
immg[:,:,i] = img2[:,:,i] * frame
Multiplying the original image with mask or binarized mask do not segment the foreground properly from the background. So, what can be done? Am I missing something?
I am trying to read this type of image with pytesseract but I have some issue with the part in yellow because the color transformation that works for other chracters won't work for those in yellow boxes. Also I want to keep the " numbers fo each row well split.
Any idea how I could manage that?
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# invert = 255 - thresh
data = pytesseract.image_to_string(thresh, config="--psm 6")
cv2.imshow("thresh", thresh)
# cv2.imshow("invert", invert)
Returns: '> SKAPOVALOY 4 (15\nRINDERKNECH 6 [EY 15\n'
I have a pan-tilt-zoom camera (changing focal length over time). There is no idea about its base focal length (e.g. focal length in time point 0). However, It is possible to track the change in focal length between frame and another based on some known constraints and assumptions (Doing a SLAM).
If I assume a random focal length (in pixel unit), for example, 1000 pixel. Then, the new focal lengths are tracked frame by frame. Would I get correct results relatively? Would the results (focal lengths) in each frame be correct up to scale to the ground truth focal length?
For pan and tilt, assuming 0 at start would be valid. Although it is not correct, The estimated values of new tili-pan will be correct up to an offset. However, I suspect the estimated focal length will not be even correct up to scale or offset.. Is it correct or not?
For a quick short answer - if pan-tilt-zoom camera is approximated as a thin lens, then this is the relation between distance (z) and focal length (f):
This is just an approximation. Not fully correct. For more precise calculations, see the camera matrix. Focal length is an intrinsic parameter in the camera matrix. Even if not known, it can be calculated using some camera calibration method such as DLT, Zhang's Method and RANSAC. Once you have the camera matrix, focal length is just a small part of it. You get many more useful things along with it.
OpenCV has an inbuilt implementation of Zhang's method. (Look at this documentation for explanations, but code is old and unusable. New up-to-date code below.) You need to take some pictures of a chess board through your camera. Here is some helper code:
import cv2
from matplotlib import pyplot as plt
import numpy as np
from glob import glob
from scipy import linalg
x,y = np.meshgrid(range(6),range(8))
img_paths=glob('./*.JPG') #get paths of all checkerboard images
for path in img_paths:
ret, corners = cv2.findChessboardCorners(im, (6,8))
if ret: #add points only if checkerboard was correctly detected:
_2d_points.append(corners) #append current 2D points
_3d_points.append(world_points) #3D points are always the same
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(_3d_points, _2d_points, (im.shape[1],im.shape[0]), None, None)
print ("Ret:\n",ret)
print ("Mtx:\n",mtx)
print ("Dist:\n",dist)
You might want Undistortion: Correcting for Radial Distortion
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((6*8,3), np.float32)
objp[:,:2] = np.mgrid[0:6,0:8].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
for fname in img_paths:
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (6,8),None)
# If found, add object points, image points (after refining them)
if ret == True:
if 'IMG_5456.JPG' in fname:
cv2.drawChessboardCorners(img_vis, (6,8), corners, ret)
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
# Reprojection Error
tot_error = 0
for i in range(len(objpoints)):
imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
error = cv2.norm(imgpoints[i],imgpoints2, cv2.NORM_L2)/len(imgpoints2)
tot_error += error
print ("Mean Reprojection error: ", tot_error/len(objpoints))
# undistort
mapx,mapy = cv2.initUndistortRectifyMap(mtx,dist,None,newcameramtx,(w,h),5)
dst = cv2.remap(img,mapx,mapy,cv2.INTER_LINEAR)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
#cv2.drawChessboardCorners(dst, (6,8), corners, ret)
# Reprojection Error
tot_error = 0
for i in range(len(objpoints)):
imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
error = cv2.norm(imgpoints[i],imgpoints2, cv2.NORM_L2)/len(imgpoints2)
tot_error += error
print ("Mean Reprojection error: ", tot_error/len(objpoints))
I am trying to do rotation on multiple images in a folder but I am having this error when I put values of fx, fy greater than 0.2 in the resize function
(cv2.error: OpenCV(4.1.2) ... error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize')
Although, when I try to rotate a single image and put values of fx and fy equal to 0.5, it works perfectly fine.
Is there a way to fix this issue because it is very hectic to augment images one by one? Plus the multiple images which are rotated by the code attached here, with fx and fy values equal to 0.2, have undesirable dimensions i.e the photos are very small and their quality is also reduced.
the part of code for rotation of multiple images is given below:
for imag in os.listdir(source_folder):
img = cv2.imread(os.path.join(source_folder,imag))
img = cv2.resize(img, (0,0), fx=0.5, fy=0.5)
width = img.shape[1]
height = img.shape[0]
M = cv2.getRotationMatrix2D((width/2,height/2),5,1.0)
rotated_img = cv2.warpAffine(img,M,(img.shape[1],img.shape[0]))
cv2.imwrite(os.path.join(destination_right_folder, "v4rl" + str(a) + '.jpg') , rotated_img)
a += 1
Add a check after you read the image to see if it is None:
img = cv2.imread(os.path.join(source_folder,imag))
if img is None: continue
The error is happening when you call the cv2.resize() function. Maybe files are being read that are not images.
I have an image of spectacles with black background that I need to overlay onto a face image. To do so, I am taking the part of face image with shape same as spectacles; and put the colors of face image on black parts of the spectacles image. Then this small part of image can be put back. But I am not being able to take the correct colors from face image for the spectacles image. I tried this :
specs[np.where((hmd == [0,0,0,0]).all(axis=2))] = sub_face
specs image:
face image:
I need to put a resized specs image to face. I have resized specs image and also know the position where I will place the specs on face image. I just need to remove black background from specs and add relevant face colors so it looks like there are specs on face in a natural way.
Code I am using :
import cv2
specs = cv2.imread("rot_h0v0z0.png")
face = cv2.imread("~/Downloads/celebA/000001.png")
specs = cv2.resize(image, None, fx=0.3, fy=0.3, interpolation=cv2.INTER_AREA)
sub_face = face[0:specs.shape[0], 0:specs.shape[1]]
specs[np.where((hmd == [0,0,0,0]).all(axis=2))] = sub_face
Was able to solve it, turned out pretty simple :P
(b,g,r) = cv2.split(specs)
indices = np.where(b == [0])
for i,j in zip(indices[0], indices[1]):
specs[i,j] = sub_face[i,j]
