I am trying to remove background from an image. For this purpose I am using U2NET. I am writing the network structure using Tensorflow by following this repository. I have changed the model architecture according to my needs. It takes 96x96 image and produces 7 masks. I am taking 1st mask (out of 7) and multiplying it against the all channels of original 96x96 image.
The code that predicts 7 masks is:
img = Image.open(os.path.join('DUTS-TE','DUTS-TE-Image', test_x_names[90]))
copied = deepcopy(img)
copied = copied.resize((96,96))
copied = np.expand_dims(copied,axis=0)
preds = model.predict(copied)
preds = np.squeeze(preds)
"preds[0]" is:
predicted mask
Multiplying the mask against the original image produces:
masked image and corresponding code is ("img2" is original image):
img2 = np.asarray(img2)
immg = np.zeros((96,96,3), np.uint8)
for i in range(0,3):
immg[:,:,i] = img2[:,:,i] * preds[0]
If i binarize the mask and then multiply it against the original image it produces :
enter image description here and corresponding code is :
frame = binarize(preds[0,:,:], threshold = 0.5)
img2 = np.asarray(img2)
immg = np.zeros((96,96,3), np.uint8)
for i in range(0,3):
immg[:,:,i] = img2[:,:,i] * frame
Multiplying the original image with mask or binarized mask do not segment the foreground properly from the background. So, what can be done? Am I missing something?
My code is:
randomScale = random.uniform(0.08, 1.0)
CPtransform = transforms.RandomResizedCrop((self.height, self.width), scale=(randomScale, randomScale), ratio=(1,1), interpolation=2)
toImage = T.ToPILImage()
padImage= CPtransform(toImage(image).convert("L"))
padMask = CPtransform(toImage(mask).convert("L"))
return TF.to_tensor(padImage), TF.to_tensor(padMask)
But the mask are not correspond to its image after augmentation as the graph shows. The function that I used on them are all the same but the result are different.
Can anybody help? Thanks!
You can concat image and mask and convert it to a single tensor and do the transformation.
image = T.PILToTensor()(pil_image)
mask = T.PILToTensor()(pil_mask)
# concatenate the images and apply transform:
both_images = torch.cat((image, mask),0)
# Apply the transformations to both images simultaneously
transformed_images = CPtransform(both_images)
#get transformed image and mask separately
image_trans = transformed_images[:image.shape[0]]
mask_trans = transformed_images[image.shape[0]:]
I am using this function to predict the output of never seen images
def predictor(img, model):
image = cv2.imread(img)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (224, 224))
image = np.array(image, dtype = 'float32')/255.0
image = image.reshape(1, 224,224,3)
clas = model.predict(image).argmax()
name = dict_class[clas]
print('The given image is of \nClass: {0} \nSpecies: {1}'.format(clas, name))
how to change it, if I want the top 2(or k) accuracy
70% chance its dog
15% its a bear
If you are using TensorFlow + Keras and probably doing multi-class classification, then the output of model.predict() is a tensor representing either the logits or already the probabilities (softmax on top of logits).
I am taking this example from here and slightly modifying it : https://www.tensorflow.org/api_docs/python/tf/math/top_k.
#See the softmax, probabilities add up to 1
network_predictions = [0.7,0.2,0.05,0.05]
prediction_probabilities = tf.math.top_k(network_predictions, k=2)
top_2_scores = prediction_probabilities.values.numpy()
dict_class_entries = prediction_probabilities.indices.numpy()
And here in dict_class_entries you have then the indices (sorted ascendingly) in accordance with the probabilities. (i.e. dict_class_entries[0] = 0 (corresponds to 0.7) and top_2_scores[0] = 0.7 etc.).
You just need to replace network_probabilities with model.predict(image).
Notice I removed the argmax() in order to send an array of probabilities instead of the index of the max score/probability position (that is, argmax()).
I am trying to run this https://github.com/xamyzhao/timecraft/blob/master/make_timelapse.py
and colab does not support cv2.imshow so I changed this part
for i in range(n_samples):
pred_vid = video_predictor_model.predict(
[im[np.newaxis], np.ones((1,) + im.shape), np.zeros((1, 5))])
print(f'Predicted video shape: {pred_vid.shape}')
pred_vid_im = vis_utils.visualize_video(
pred_vid[0], normalized=True)
cv2.imshow(f'Video sample {i+1}', pred_vid_im)
from google.colab.patches import cv2_imshow
for i in range(n_samples):
pred_vid = video_predictor_model.predict(
[im[np.newaxis], np.ones((1,) + im.shape), np.zeros((1, 5))])
print(f'Predicted video shape: {pred_vid.shape}')
pred_vid_im = vis_utils.visualize_video(
pred_vid[0], normalized=True)
#cv2.imshow(f'Video sample {i+1}', pred_vid_im)
cv2.imwrite(f'Video sample {i+1}.jpg', pred_vid_im)
but saved frames are just black and Im not sure why
This is not a google colab issue. pred_vid_im holds the normalized value. So all the values will be between 0 and 1. That's why the saved image is black.
So you need to multiply pred_vid_im with 255.0. Change the code to this,
pred_vid_im = vis_utils.visualize_video(pred_vid[0], normalized=True) * 255.0
Now the saved image will be correct.
I use the following numpy array that hold an image which is black and white image with the following shape
(28, 112)
when I try to grayscale the image, to use it to get contours using opencv with following steps
#grayscale the image
grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#thredshold image
thresh = cv2.threshold(grayed, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
I got the following error
<ipython-input-178-7ebff17d1c18> in get_digits(img)
7 #grayscale the image
----> 8 grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
error: C:\projects\opencv-python\opencv\modules\imgproc\src\color.cpp:11073: error: (-215) depth == 0 || depth == 2 || depth == 5 in function cv::cvtColor
the opencv errors have no information in it to be able to get what is wrong
Here is the working code for how you were trying it:
img = np.stack((img,) * 3,-1)
img = img.astype(np.uint8)
grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(grayed, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
A simpler way of getting the same result is to invert the image yourself:
img = (255-img)
thresh = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)[1]
As you discovered, as you perform different operations on images, the image is required to be in different formats.
cv2.THRESH_BINARY_INV and cv2.THRESH_BINARY are designed to take a color image (and convert it to grayscale) so you need a three channel representation.
cv2.THRESH_OTSU works with grayscale images so one channel is okay for that.
Since your image was already grayscale from the start, you weren't able to convert it from color to grayscale nor did you really need to. I assume you were trying to invert the image but that's easy enough on your own (255-img).
At one point you tried to do an cv2.THRESH_OTSU with floating point values but cv2.THRESH_OTSU requires integers between 0 and 255.
If openCV had more user-friendly error messages it would really help with issues like these.
This question is based on: Tensorflow image reading & display
Following their code we have the following:
string = ['/home/user/test.jpg']
filepath_queue = tf.train.string_input_producer(string)
self.reader = tf.WholeFileReader()
key, value = self.reader.read(filepath_queue)
# Output: Tensor("ReaderRead:1", shape=TensorShape([]), dtype=string)
my_img = tf.image.decode_jpeg(value, channels=3)
# Output: Tensor("DecodeJpeg:0", shape=TensorShape([Dimension(None), Dimension(None), Dimension(3)]), dtype=uint8)
Why does my_img have no dimensions? (Dimension(3) is only because of the argument channels=3)
Does this mean that the image is not properly loaded? (img = misc.imread('/home/user/test.jpg') does load that image).
The image will be properly loaded, but TensorFlow doesn't have enough information to infer the image's shape until the op is run. This arises because tf.image.decode_jpeg() can produce tensors of different shapes (heights and widths), depending on the contents of the string tensor value. This enables you to build input pipelines using a collection of images with different sizes.
The Dimension(None) in the shape means "unknown" rather than "empty".
If you happen to know that all images read by this operation will have the same size, you can use Tensor.set_shape() to provide this information, and doing so will help to validate the shapes of later parts of the graph:
my_img = tf.image.decode_jpeg(value, channels=3)
my_img.set_shape([KNOWN_HEIGHT, KNOWN_WIDTH, 3])
# Output: Tensor("DecodeJpeg:0", shape=TensorShape([Dimension(28), Dimension(28), Dimension(3)]), dtype=uint8)