My PyTorch model outputs a segmented image with values (0,1,2) for each one of the three classes. During the preparation of the set, I mapped black to 0, red to 1 and white to 2. I have two questions:
How can I show what each class represents? for example take a look at the image:
I am currently using the following method to show each class:
output = net(input)
input = input.cpu().squeeze()
input = transforms.ToPILImage()(input)
probs = F.softmax(output, dim=1)
probs = probs.squeeze(0)
full_mask = probs.squeeze().cpu().numpy()
fig, (ax0, ax1, ax2, ax3, ax4) = plt.subplots(1, 5, figsize=(20,10), sharey=True)
ax0.set_title('Input Image')
ax1.set_title('Background Class')
ax2.set_title('Neuron Class')
ax3.set_title('Dendrite Class')
ax4.set_title('Predicted Mask')
ax1.imshow(full_mask[0, :, :].squeeze())
ax2.imshow(full_mask[1, :, :].squeeze())
ax3.imshow(full_mask[2, :, :].squeeze())
full_mask = np.argmax(full_mask, 0)
img = mask_to_image(full_mask)
But there appears to be shared pixels between the classes, is there a better way to show this (I want the first image to only of the background class, the the second only of the neuron class and the third only of the dendrite class)?
2.My second question is about generating a black, red and white image from the mask, currently the mask is of shape (512,512) and has the following values:
[[0 0 0 ... 0 0 0]
[0 0 0 ... 2 0 0]
[0 0 0 ... 2 2 0]
[2 1 2 ... 2 2 2]
[2 1 2 ... 2 2 2]
[0 2 0 ... 2 2 2]]
And the results look like this:
Since I am using this code to convert to image:
def mask_to_image(mask):
return Image.fromarray((mask).astype(np.uint8))
But there appears to be shared pixels between the classes, is there a
better way to show this (I want the first image to only of the
background class, the the second only of the neuron class and the
third only of the dendrite class)?
Yes, you can take argmax along 0th dimension so the one with highest logit (unnormalized probability) will be 1, rest will be zero:
output = net(input)
binary_mask = torch.argmax(output, dim=0).cpu().numpy()
ax.set_title('Neuron Class')
ax.imshow(binary_mask == 0)
My second question is about generating a black, red and white image
from the mask, currently the mask is of shape (512,512) and has the
following values
You can spread [0, 1, 2] values into the zero-th axis making it channel-wise. Now [0, 0, 0] values across all channels for single pixel will be black, [255, 255, 255] would be white and [255, 0, 0] would be red (as PIL is RGB format):
import torch
tensor = torch.randint(high=3, size=(512, 512))
red = tensor == 0
white = tensor == 2
zero_channel = red & white
image = torch.stack([zero_channel, white, white]).int().numpy() * 255
Suppose I have a segmented image as a Numpy array, where each entry in the image is a number from 1, ... C, C+1 where C is the number of segmentation classes, and class C+1 is some background class. I want to find an efficient way to convert this to a contour image (a binary image where a contour pixel will have value 1, and the rest will have values 0), so that any pixel who has a neighbor in its 8-neighbourhood (or 4-neighbourhood) will be a contour pixel.
The inefficient way would be something like:
def isValidLocation(i, j, image_height, image_width):
if i<0:
return False
if i>image_height-1:
return False
if j<0:
return False
if j>image_width-1:
return False
return True
def get8Neighbourhood(i, j, image_height, image_width):
nbd = []
for height_offset in [-1, 0, 1]:
for width_offset in [-1, 0, 1]:
if isValidLocation(i+height_offset, j+width_offset, image_height, image_width):
nbd.append((i+height_offset, j+width_offset))
return nbd
def getContourImage(seg_image):
seg_image_height = seg_image.shape[0]
seg_image_width = seg_image.shape[1]
contour_image = np.zeros([seg_image_height, seg_image_width], dtype=np.uint8)
for i in range(seg_image_height):
for j in range(seg_image_width):
nbd = get8Neighbourhood(i, j, seg_image_height, seg_image_width)
for (m,n) in nbd:
if seg_image[m][n] != seg_image[i][j]:
contour_image[i][j] = 1
return contour_image
I'm looking for a more efficient "vectorized" way of achieving this, as I need to be able to compute this at run time on batches of 8 images at a time in a deep learning context. Any insights appreciated. Visual Example Below. The first image is the original image overlaid over the ground truth segmentation mask (not the best segmentation admittedly...), the second is the output of my code, which looks good, but is way too slow. Takes me about 10 seconds per image with an intel 9900K cpu.
Image Credit from SUN RGBD dataset.
This might work but it might have some limitations which I cannot be sure of without testing on the actual data, so I'll be relying on your feedback.
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
# some sample data with few rectangular segments spread out
seg = np.ones((100, 100), dtype=np.int8)
seg[3:10, 3:10] = 20
seg[24:50, 40:70] = 30
seg[55:80, 62:79] = 40
seg[40:70, 10:20] = 50
Now to find the contours, we will convolve the image with a kernel which should give 0 values when convolved within the same segment of the image and <0 or >0 values when convolved over image regions with multiple segments.
# kernel for convolving
k = np.array([[1, -1, -1],
[1, 0, -1],
[1, 1, -1]])
convolved = ndimage.convolve(seg, k)
# contour pixels
non_zeros = np.argwhere(convolved != 0)
plt.scatter(non_zeros[:, 1], non_zeros[:, 0], c='r', marker='.')
As you can see in this sample data the kernel has a small limitation and misses identifying two contour pixels caused due to symmetric nature of data (which I think would be a rare case in actual segmentation outputs)
For better understanding, this is the scenario(occurs at top left and bottom right corners of the rectangle) where the kernel convolution fails to identify the contour i.e. misses one pixel
[ 1, 1, 1]
[ 1, 1, 1]
[ 1, 20, 20]
Based on #sai's idea I came up with this snippet, which yielded the same result much, much faster than my original code. Runs in 0.039 seconds, which when compared to close to 8-10 seconds for the original I'd say is quite a speed-up!
filters = []
for i in [0, 1, 2]:
for j in [0, 1, 2]:
filter = np.zeros([3,3],
if i ==1 and j==1:
filter[i][j] = -1
filter[1][1] = 1
def getCountourImage2(seg_image):
convolved_images = []
for filter in filters:
convoled_image = ndimage.correlate(seg_image, filter, mode='reflect')
convoled_images = np.add.reduce(convolved_images)
seg_image = np.where(convoled_images != 0, 255, 0)
return seg_image
I have been experimenting with tensorflow Datasets but I cannot figure out how to efficiently create RLE-masks.
FYI, I am using data from the Airbus Ship Detection Challenge in Kaggle:
I know my RLE-decoding function works (borrowed) from one of the kernels:
def rle_decode(mask_rle, shape=(768, 768)):
mask_rle: run-length as string formated (start length)
shape: (height,width) of array to return
Returns numpy array, 1 - mask, 0 - background
if not isinstance(mask_rle, str):
img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
return img.reshape(shape).T
s = mask_rle.split()
starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
starts -= 1
ends = starts + lengths
img = np.zeros(shape[0]*shape[1], dtype=np.uint8)
for lo, hi in zip(starts, ends):
img[lo:hi] = 1
return img.reshape(shape).T
.... BUT it does not seem to play nicely with the pipeline:
list_ds =
ds =
With the following parse function, everything works fine:
def parse_img(file_path,new_size=[128,128]):
img_content =
img = tf.image.decode_jpeg(img_content)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(img,new_size)
return img
But things go rogue if I include the mask:
def parse_img(file_path,new_size=[128,128]):
# Image
img_content =
img = tf.image.decode_jpeg(img_content)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(img,new_size)
# Mask
file_id = tf.strings.split(file_path,'/')[-1]
objects = [rle_decode(m) for m in df2[df.ImageId==file_id]]
mask = np.sum(objects,axis=0)
mask = np.expand_dims(mask,3) # Force mask to have 3 channels, necessary for resize step
mask = tf.image.convert_image_dtype(mask, tf.int8)
mask = tf.clip_by_value(mask,0,1)
mask = tf.image.resize(mask,new_size)
mask = tf.squeeze(mask) # squeeze back
mask = tf.image.convert_image_dtype(mask, tf.int8)
return img, mask
Although my parse_img function works fine (I have checked it on a sample, it takes 271 µs ± 67.9 µs per run); the step takes forever (>5 minutes) before hanging.
I can't figure out what's wrong and it drives me crazy!
Any idea?
You can rewrite the function rle_decode with tensorflow like this (here I do not do the final transposition to keep it more general, but you can do it later):
import tensorflow as tf
def rle_decode_tf(mask_rle, shape):
shape = tf.convert_to_tensor(shape, tf.int64)
size = tf.math.reduce_prod(shape)
# Split string
s = tf.strings.split(mask_rle)
s = tf.strings.to_number(s, tf.int64)
# Get starts and lengths
starts = s[::2] - 1
lens = s[1::2]
# Make ones to be scattered
total_ones = tf.reduce_sum(lens)
ones = tf.ones([total_ones], tf.uint8)
# Make scattering indices
r = tf.range(total_ones)
lens_cum = tf.math.cumsum(lens)
s = tf.searchsorted(lens_cum, r, 'right')
idx = r + tf.gather(starts - tf.pad(lens_cum[:-1], [(1, 0)]), s)
# Scatter ones into flattened mask
mask_flat = tf.scatter_nd(tf.expand_dims(idx, 1), ones, [size])
# Reshape into mask
return tf.reshape(mask_flat, shape)
A small test (TensorFlow 2.0):
mask_rle = '1 2 4 3 9 4 15 5'
shape = [4, 6]
# Original NumPy function
print(rle_decode(mask_rle, shape))
# [[1 0 0 1]
# [1 0 0 0]
# [0 1 1 0]
# [1 1 1 0]
# [1 1 1 0]
# [1 1 1 0]]
# TensorFlow function (transposing is done out of the function)
tf.print(tf.transpose(rle_decode_tf(mask_rle, shape)))
# [[1 0 0 1]
# [1 0 0 0]
# [0 1 1 0]
# [1 1 1 0]
# [1 1 1 0]
# [1 1 1 0]]
I am trying to understand Harris detector, using the explanation here. As per explanation, I understand, if we calculate the eigen values, then,
However, when I try to calculate the eigen values are always high. Below is my main image from which I extract parts to calculate eigen values.
For a flat area with no visible features, I get this distribution (on right most) which is good, but eigen values are large
For a linear edge, also I get high eigen values: 16290305.45393251 567780.54606749
For corner, it is expected to get high values, but now I am doubtful if these high values are correct due to above cases.
8958127.80563239 10986758.19436761
Here is my method, translated from matlab code here. Its the vals value I directly get from numpy's linear algebra library.
def plot_derivatives_1(img_rgb, mode=1):
img_rgb = image in rgb color space (3 channeled)
img_1c = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
if mode == 1: # method 1 derivative
Ix = cv2.Sobel(img_1c, cv2.CV_64F, 1, 0, ksize=3)
Iy = cv2.Sobel(img_1c, cv2.CV_64F, 0, 1, ksize=3)
# another method of derivatives
dx = np.array([
[-1, 0, 1],
[-1, 0, 1],
[-1, 0, 1]
dy = np.transpose(dx)
Ix = signal.convolve2d(img_1c, dx, mode='valid')
Iy = signal.convolve2d(img_1c, dy, mode='valid')
Ix, Iy = Ix.astype(np.float64), Iy.astype(np.float64) # else gaussian blur later is failing
# yet to solve why we need A and eigen outputs
A = np.array([
[ np.sum(Ix*Ix), np.sum(Ix*Iy) ],
[ np.sum(Ix*Iy), np.sum(Iy*Iy) ]
vals, V = linalg.eig(A)
lamb = vals/np.max(vals)
print('lambda values:{}'.format(vals))
fig, ax = plt.subplots(1,4, figsize=(20,5))
ax[0].imshow(img_rgb);ax[0].set_title('Input Image')
ax[1].imshow(Ix, cmap='gray');ax[1].set_title('$I_x = \dfrac{\partial I}{\partial x}$')
ax[2].imshow(Iy, cmap='gray');ax[2].set_title('$I_y = \dfrac{\partial I}{\partial y}$')
ax[3].scatter(Ix, Iy);ax[3].set_xlim([-200,200]);ax[3].set_ylim([-200,200]);
ax[3].set_aspect('equal');ax[3].set_title('Derivatives Distribution');
ax[3].axvline(x=0, color = 'r');ax[3].axhline(y=0, color ='r')
return Ix, Iy
A sample call for a case (here shown for corner).
img = cv2.imread(SRC_FOLDER + 'checkersandbooksmall_sample_6.jpg')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
Ix, Iy = plot_derivatives_1(img_rgb, mode=1)
I use jupyter notebook and the code is just built as I try to understand the concept.
What am I doing wrong to get high eigen values always for all cases?
The sample images used for above cases could be found here
dataset = slim.dataset.Dataset(...)
provider = slim.dataset_data_provider.DatasetDataProvider(dataset, ..._
image, labels = provider.get(['image', 'label')
Let's say, for an example in a dataset A, labels could be [1, 2, 1, 3]. However, for some reason (e.g, due to dataset B), I would like to map the label IDs to other values. The mapping could be like below.
# {old_label: target_label}
mapping = {0: 0, 1: 2, 2: 2, 3: 2, 4: 2, 5: 3, 6: 1}
For now, I am guessing two ways:
-- seems to have a map(map_func) function that every examples should pass, which could be the solution. However, I am more familiar to slim.dataset.Dataset. Is there a similar trick for slim.dataset.Dataset?
-- I was wondering if I can simply apply some mapping function to a tensor label such as:
new_labels = tf.map_fn(lambda x: x+1, labels, dtype=tf.int32)
# labels = [1 2 1 3] --> new_labels = [2 3 2 4]. This works.
new_labels = tf.map_fn(lambda x: mapping[x], labels, dtype=tf.int32)
# I wished but this does not work!
However, the below didn't work, which is what I need. Could anyone please advise?
I think you can try tf.contrib.lookup:
keys = list(mapping.keys())
values = [mapping[k] for k in keys]
table = tf.contrib.lookup.HashTable(
tf.contrib.lookup.KeyValueTensorInitializer(keys, values, key_dtype=tf.int64, value_dtype=tf.int64), -1
new_labels = table.lookup(labels)
I want to create a Tensor with as uppertriangular part the values from a vector. I have found in MATLAB this can be done with
a = [1 2 3 4 5 6 7 8 9 10];
b = triu(ones(5),1);
b = b'
b(b==1) = a
b = b'
My tensorflow implementation so far
b = tf.matrix_band_part(tf.ones([dim,dim]), 0, -1) # make upper triangular part 1
b = tf.transpose(b)
b = tf.transpose(b)
Who can help me?
I haven't seen a fantastic way to do it, but it's certainly possible. Here's one way (expanding the last dimension of a tensor into a matrix; the preceding dimensions may be batch dimensions):
import tensorflow as tf
def matrix_with_upper_values(upper_values):
# Check that the input is at least a vector
upper_values = tf.convert_to_tensor(upper_values)
# Put the batch dimensions last
upper_values = tf.transpose(
tf.concat(0, [[tf.rank(upper_values) - 1],
tf.range(tf.rank(upper_values) - 1)]))
input_shape = tf.shape(upper_values)[0]
# Compute the size of the matrix that would have this upper triangle
matrix_size = (1 + tf.cast(tf.sqrt(tf.cast(input_shape * 8 + 1, tf.float32)),
tf.int32)) // 2
# Check that the upper triangle size is valid
check_size_op = tf.Assert(
tf.equal(matrix_size ** 2, input_shape * 2 + matrix_size),
["Not a valid upper triangle size: ", input_shape])
with tf.control_dependencies([check_size_op]):
matrix_size = tf.identity(matrix_size)
# Compute indices for the whole matrix and the upper diagonal
index_matrix = tf.reshape(tf.range(matrix_size ** 2),
[matrix_size, matrix_size])
diagonal_indicies = (matrix_size * tf.range(matrix_size)
+ tf.range(matrix_size))
upper_triangular_indices, _ = tf.unique(tf.reshape(
index_matrix, 0, -1) # upper triangular part
- tf.diag(diagonal_indicies), # remove diagonal
batch_dimensions = tf.shape(upper_values)[1:]
return_shape_transposed = tf.concat(0, [[matrix_size, matrix_size],
# Fill everything else with zeros; later entries get priority
# in dynamic_stitch
result_transposed = tf.reshape(
upper_triangular_indices[1:]], # discard 0
[tf.zeros(return_shape_transposed, dtype=upper_values.dtype),
# Transpose the batch dimensions to be first again
return tf.transpose(
tf.concat(0, [tf.range(2, tf.rank(upper_values) + 1), [0, 1]]))
with tf.Session():
print(matrix_with_upper_values(tf.zeros([0, 3])).eval())
[[0 1]
[0 0]]
[[0 2 7]
[0 0 1]
[0 0 0]]
[[0 3 1 4]
[0 0 1 5]
[0 0 0 9]
[0 0 0 0]]
[[ 0.]]
[[[0 2 7]
[0 0 1]
[0 0 0]]
[[0 4 3]
[0 0 5]
[0 0 0]]]