cv2.imshow and write video frames in colab - tensorflow

I am trying to run this
and colab does not support cv2.imshow so I changed this part
for i in range(n_samples):
pred_vid = video_predictor_model.predict(
[im[np.newaxis], np.ones((1,) + im.shape), np.zeros((1, 5))])
print(f'Predicted video shape: {pred_vid.shape}')
pred_vid_im = vis_utils.visualize_video(
pred_vid[0], normalized=True)
cv2.imshow(f'Video sample {i+1}', pred_vid_im)
from google.colab.patches import cv2_imshow
for i in range(n_samples):
pred_vid = video_predictor_model.predict(
[im[np.newaxis], np.ones((1,) + im.shape), np.zeros((1, 5))])
print(f'Predicted video shape: {pred_vid.shape}')
pred_vid_im = vis_utils.visualize_video(
pred_vid[0], normalized=True)
#cv2.imshow(f'Video sample {i+1}', pred_vid_im)
cv2.imwrite(f'Video sample {i+1}.jpg', pred_vid_im)
but saved frames are just black and Im not sure why

This is not a google colab issue. pred_vid_im holds the normalized value. So all the values will be between 0 and 1. That's why the saved image is black.
So you need to multiply pred_vid_im with 255.0. Change the code to this,
pred_vid_im = vis_utils.visualize_video(pred_vid[0], normalized=True) * 255.0
Now the saved image will be correct.


remove background using u2net produced mask

I am trying to remove background from an image. For this purpose I am using U2NET. I am writing the network structure using Tensorflow by following this repository. I have changed the model architecture according to my needs. It takes 96x96 image and produces 7 masks. I am taking 1st mask (out of 7) and multiplying it against the all channels of original 96x96 image.
The code that predicts 7 masks is:
img ='DUTS-TE','DUTS-TE-Image', test_x_names[90]))
copied = deepcopy(img)
copied = copied.resize((96,96))
copied = np.expand_dims(copied,axis=0)
preds = model.predict(copied)
preds = np.squeeze(preds)
"preds[0]" is:
predicted mask
Multiplying the mask against the original image produces:
masked image and corresponding code is ("img2" is original image):
img2 = np.asarray(img2)
immg = np.zeros((96,96,3), np.uint8)
for i in range(0,3):
immg[:,:,i] = img2[:,:,i] * preds[0]
If i binarize the mask and then multiply it against the original image it produces :
enter image description here and corresponding code is :
frame = binarize(preds[0,:,:], threshold = 0.5)
img2 = np.asarray(img2)
immg = np.zeros((96,96,3), np.uint8)
for i in range(0,3):
immg[:,:,i] = img2[:,:,i] * frame
Multiplying the original image with mask or binarized mask do not segment the foreground properly from the background. So, what can be done? Am I missing something?

How to get top k predictions for a new Image

I am using this function to predict the output of never seen images
def predictor(img, model):
image = cv2.imread(img)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (224, 224))
image = np.array(image, dtype = 'float32')/255.0
image = image.reshape(1, 224,224,3)
clas = model.predict(image).argmax()
name = dict_class[clas]
print('The given image is of \nClass: {0} \nSpecies: {1}'.format(clas, name))
how to change it, if I want the top 2(or k) accuracy
70% chance its dog
15% its a bear
If you are using TensorFlow + Keras and probably doing multi-class classification, then the output of model.predict() is a tensor representing either the logits or already the probabilities (softmax on top of logits).
I am taking this example from here and slightly modifying it :
#See the softmax, probabilities add up to 1
network_predictions = [0.7,0.2,0.05,0.05]
prediction_probabilities = tf.math.top_k(network_predictions, k=2)
top_2_scores = prediction_probabilities.values.numpy()
dict_class_entries = prediction_probabilities.indices.numpy()
And here in dict_class_entries you have then the indices (sorted ascendingly) in accordance with the probabilities. (i.e. dict_class_entries[0] = 0 (corresponds to 0.7) and top_2_scores[0] = 0.7 etc.).
You just need to replace network_probabilities with model.predict(image).
Notice I removed the argmax() in order to send an array of probabilities instead of the index of the max score/probability position (that is, argmax()).

tflite uint8 quantization model input and output float conversion

I have successfully converted a quantized 8bit tflite model for object detection. My model was originally trained on images that are normalized by dividing 255 so the original input range is [0, 1]. Since my quantized tflite model requires input to be uint8, how can I convert my image (originally [0, 255]) to be correct for my network?
Also how can I convert output to float to compare the results with floating point model?
The following code does not give me the right result.
im = cv2.imread(image_path)
im = im.astype(np.float32, copy=False)
input_image = im
input_image = np.array(input_image, dtype=np.uint8)
input_image = np.expand_dims(input_image, axis=0)
interpreter.set_tensor(input_details[0]['index'], input_image)
output_data = interpreter.get_tensor(output_details[0]['index'])
output_data2 = interpreter.get_tensor(output_details[1]['index'])
output_data3 = interpreter.get_tensor(output_details[2]['index'])
min_1 = -8.198164939880371
max_1 = 8.798029899597168
scale = (max_1 - min_1)/ 255.0
min_2 = -9.77856159210205
max_2 = 10.169703483581543
scale_2 = (max_2 - min_2) / 255.0
min_3 = -14.382895469665527
max_3 = 11.445544242858887
scale_3 = (max_3 - min_3) / 255.0
output_data = (output_data ) * scale + min_1
output_data2 = (output_data2) * scale_2 + min_2
output_data3 = (output_data3) * scale_3 + min_3
i met the same problem but in pose estimation.
have you solved the problem yet?
you use quantized aware training?
i think you can get a q and z value(because you have to give mean and std-err when you use tflite api or toco commonad to get a quantized 8bit tflite model) about your input image.
try these codes:
image = q_input* (image - z_input)
output_data = q_output(image - z_output)
(for different layers you can access different q and z)
Let me know if you tried this way
I've converted the image via OpenCV to "CV_8UC3" and this worked for me:
// Convert to RGB color space
if (image.channels() == 1) {
cv::cvtColor(image, image, cv::COLOR_GRAY2RGB);
} else {
cv::cvtColor(image, image, cv::COLOR_BGR2RGB);
image.convertTo(image, CV_8UC3);

How to exporting adversarial examples for Facenet in Cleverhans?

I am trying to follow this blog to generate adversarial face images against Facenet. The code is here and works fine! My question is how can I export these adversarial images. Is this question too straightforward, so the blog didn't mention it, but only shows some sample pictures.
I was thinking it is not a hard problem, since I know the generated adversarial samples are in the "adv". But this adv (float32) came from faces1, after being prewhiten and normalized. To restore the int8 images from adv(float32), I have to reverse the normalization and prewhiten process. It seems like if we want output some images from facenet, we have to do this process.
I am new to Facenet and Cleverhans, I am not sure whether this is the best way to do that, or is that common way(such as functions) for people to export images from Facenet.
In, we finally got the adversarial samples. I need to export adv to plain int images.
adv =, feed_dict=feed_dict)
In There are some kinda of normalization.
def load_testset(size):
# Load images paths and labels
pairs = lfw.read_pairs(pairs_path)
paths, labels = lfw.get_paths(testset_path, pairs, file_extension)
# Random choice
permutation = np.random.choice(len(labels), size, replace=False)
paths_batch_1 = []
paths_batch_2 = []
for index in permutation:
paths_batch_1.append(paths[index * 2])
paths_batch_2.append(paths[index * 2 + 1])
labels = np.asarray(labels)[permutation]
paths_batch_1 = np.asarray(paths_batch_1)
paths_batch_2 = np.asarray(paths_batch_2)
# Load images
faces1 = facenet.load_data(paths_batch_1, False, False, image_size)
faces2 = facenet.load_data(paths_batch_2, False, False, image_size)
# Change pixel values to 0 to 1 values
min_pixel = min(np.min(faces1), np.min(faces2))
max_pixel = max(np.max(faces1), np.max(faces2))
faces1 = (faces1 - min_pixel) / (max_pixel - min_pixel)
faces2 = (faces2 - min_pixel) / (max_pixel - min_pixel)
In the load_data function, there is a prewhiten process.
nrof_samples = len(image_paths)
images = np.zeros((nrof_samples, image_size, image_size, 3))
for i in range(nrof_samples):
img = misc.imread(image_paths[i])
if img.ndim == 2:
img = to_rgb(img)
if do_prewhiten:
img = prewhiten(img)
img = crop(img, do_random_crop, image_size)
img = flip(img, do_random_flip)
images[i,:,:,:] = img
return images
I hope some expert can point me some hidden function in facenet or cleverhans that can directly export the adv images, otherwise reversing normalization and prewhiten process seems akward. Thank you very much.
I don't know much about the Facenet code. From your discussion, it seems like you will have to save the values of min_pixel,max_pixelto reverse the normalization, and then look at theprewhiten` function to see how you can reverse it. I'll email Bruno to see if he has any further comments to help you out.
EDIT: Now image exporting is included in the Facenet example of Cleverhans:

Can we visualize the embedding with multiple sprite images in tensorflow?

What I mean is, can I, for example, construct 2 different sprite images and be able to choose one of them while viewing embeddings in 2D/3D space using TSNE/PCA?
In other words, when using the following code:
embedding.sprite.image_path = "Path/to/the/sprite_image.jpg"
Is there a way to add another sprite image?
So, when training a Conv Net to distinguish between MNIST digits, I not only need to view the 1,2,..9, and 0 in the 3D/2D space, instead, I would like to see where are the ones gathering in that space. Same for 2s, 3s and so on. so I need a unique color for the 1s, another one for the 2s and so on... I need to view this as in the following image:
Any help is much appreciated!
There is an easier way to do this with filtering. You can just select the labels with a regex syntax:
If this is not what you are looking for, you could create a sprite image that assigns the same plain color image to each of your labels!
This functionality should come out of the box (without additional sprite images). See 'colour by' in the left sidepanel. You can toggle the A to switch sprite images on and off.
This run was produced with the example on the front page of the tensorboardX projector GitHub repo.
You can also see a live demo with MNIST dataset (images and colours) at
import torchvision.utils as vutils
import numpy as np
import torchvision.models as models
from torchvision import datasets
from tensorboardX import SummaryWriter
resnet18 = models.resnet18(False)
writer = SummaryWriter()
sample_rate = 44100
freqs = [262, 294, 330, 349, 392, 440, 440, 440, 440, 440, 440]
for n_iter in range(100):
dummy_s1 = torch.rand(1)
dummy_s2 = torch.rand(1)
# data grouping by `slash`
writer.add_scalar('data/scalar1', dummy_s1[0], n_iter)
writer.add_scalar('data/scalar2', dummy_s2[0], n_iter)
writer.add_scalars('data/scalar_group', {'xsinx': n_iter * np.sin(n_iter),
'xcosx': n_iter * np.cos(n_iter),
'arctanx': np.arctan(n_iter)}, n_iter)
dummy_img = torch.rand(32, 3, 64, 64) # output from network
if n_iter % 10 == 0:
x = vutils.make_grid(dummy_img, normalize=True, scale_each=True)
writer.add_image('Image', x, n_iter)
dummy_audio = torch.zeros(sample_rate * 2)
for i in range(x.size(0)):
# amplitude of sound should in [-1, 1]
dummy_audio[i] = np.cos(freqs[n_iter // 10] * np.pi * float(i) / float(sample_rate))
writer.add_audio('myAudio', dummy_audio, n_iter, sample_rate=sample_rate)
writer.add_text('Text', 'text logged at step:' + str(n_iter), n_iter)
for name, param in resnet18.named_parameters():
writer.add_histogram(name, param.clone().cpu().data.numpy(), n_iter)
# needs tensorboard 0.4RC or later
writer.add_pr_curve('xoxo', np.random.randint(2, size=100), np.random.rand(100), n_iter)
dataset = datasets.MNIST('mnist', train=False, download=True)
images = dataset.test_data[:100].float()
label = dataset.test_labels[:100]
features = images.view(100, 784)
writer.add_embedding(features, metadata=label, label_img=images.unsqueeze(1))
# export scalar data to JSON for external processing
There are some threads mentioning that this currently fails beyond a threshold number of datapoints.