I am trying to do image colorization. I have 5000 images (256x256x3) and would like not to load all data in my program (for memory reason). I have found that it is possible to use ImageDataGenerator.flow_from_directory() but I use LAB images and I would like to feed my model with a numpy array of the L component (256, 256, 1). My targets are A and B components (256, 256, 2). To have my image I then merge the input and output to have a LAB image (256, 256, 3). The problem i that ImageDataGenerator.flow_from_directory() only works with image type files (so a 256x256x3 image) and I would like to know if there is a way to do the same thing with numpy arrays.
I tried using tf.data.Dataset.list_files(), I had all my files but I did not found how to load my numpy array to feed my model. I guess I need to use some sort of generator but I do not really understand how to use it. This is what I have for now :
HEIGHT = 256
WIDTH = HEIGHT
Batch_size = 50
dir_X_train = 'data/X_train_np/train_black_resized/*.npy'
dir_X_test = 'data/X_test/test_black_resized/*.npy'
dir_y_train = 'data/y_train_np/train_color_resized/*.npy'
dir_y_test = 'data/y_test/test_color_resized/*.npy'
X_train_dataset = tf.data.Dataset.list_files(dir_X_train, shuffle=False).batch(Batch_size)
y_train_dataset = tf.data.Dataset.list_files(dir_y_train, shuffle=False).batch(Batch_size)
def process_path(file_path):
return tf.io.read_file(file_path[0])
X_train_dataset = X_train_dataset.map(process_path)
y_train_dataset = y_train_dataset.map(process_path)
train_dataset = tf.data.Dataset.zip((X_train_dataset, y_train_dataset))
for image_black, image_color in train_dataset.take(1):
print(image_black.numpy()[:100])
print(type(image_black))
print(image_color.numpy()[:100])
print(type(image_color))
Output :
b"\x93NUMPY\x01\x00v\x00{'descr': '<f4', 'fortran_order': False, 'shape': (256, 256), } "
<class 'tensorflow.python.framework.ops.EagerTensor'>
b"\x93NUMPY\x01\x00v\x00{'descr': '<f4', 'fortran_order': False, 'shape': (256, 256, 2), } "
<class 'tensorflow.python.framework.ops.EagerTensor'>
The shape seems to be correct but I don't know how to have the numpy.array
Related
I am using this function to predict the output of never seen images
def predictor(img, model):
image = cv2.imread(img)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (224, 224))
image = np.array(image, dtype = 'float32')/255.0
plt.imshow(image)
image = image.reshape(1, 224,224,3)
clas = model.predict(image).argmax()
name = dict_class[clas]
print('The given image is of \nClass: {0} \nSpecies: {1}'.format(clas, name))
how to change it, if I want the top 2(or k) accuracy
i.e
70% chance its dog
15% its a bear
If you are using TensorFlow + Keras and probably doing multi-class classification, then the output of model.predict() is a tensor representing either the logits or already the probabilities (softmax on top of logits).
I am taking this example from here and slightly modifying it : https://www.tensorflow.org/api_docs/python/tf/math/top_k.
#See the softmax, probabilities add up to 1
network_predictions = [0.7,0.2,0.05,0.05]
prediction_probabilities = tf.math.top_k(network_predictions, k=2)
top_2_scores = prediction_probabilities.values.numpy()
dict_class_entries = prediction_probabilities.indices.numpy()
And here in dict_class_entries you have then the indices (sorted ascendingly) in accordance with the probabilities. (i.e. dict_class_entries[0] = 0 (corresponds to 0.7) and top_2_scores[0] = 0.7 etc.).
You just need to replace network_probabilities with model.predict(image).
Notice I removed the argmax() in order to send an array of probabilities instead of the index of the max score/probability position (that is, argmax()).
I applied the method in this github to write JPEG files into .tfrecords. But I have issues when parsing them.
Here's my code for writing the tfrecords, each x_img is a numpy array, and each x_img[i] contains fixed amount of img_bytes
img_bytes = open(join(frames_path, vid, img_list[current]),'rb').read()
...
"x_img": tf.train.Feature( bytes_list = tf.train.BytesList( value= x_img[i])),
When parsing, I did this:
def parse_func(example_proto):
# FEATURES
feature_description = {
"x_img": tf.io.VarLenFeature(tf.string),
}
feat = tf.io.parse_single_example(example_proto, feature_description)
x = {}
x_img = tf.sparse.to_dense(feat["x_img"])
x_img = tf.io.decode_jpeg(x_img, channels = 3)
x["x_img"] = x_img/255
return x
But it returns error:
ValueError: Shape must be rank 0 but is rank 1 for 'DecodeJpeg' (op: 'DecodeJpeg') with input shapes: [?].
What is the right way to decode a JPEG which was previously stored in bytes?
Full answer:
tf.io.decode_jpeg works fine. The reason I got the error is that I shaped the input as (n, width, height, 3). But the function decode_jpeg only works on a single image instead of n images.
By writing:
x_img = tf.stack([
tf.io.decode_jpeg(x_images[0], channels = 3),
tf.io.decode_jpeg(x_images[1], channels = 3),
tf.io.decode_jpeg(x_images[2], channels = 3),
])
I could recover the bytes to JPEG. The more efficient way is using list comprehension, but unfortunately, list comprehension is not supported now (see here).
The reason for writing bytes into .tfrecords instead of using plt.imread() or cv2.imread() is that it doesn't decompress the image, so the process would be much faster and space-efficient. I didn't calculate it precisely, but decompressing JPEG images leads ~6x increase in disk space.
I want to concatenate three images with size [1024,1024,3] to make a batch with size [3,1024,1024,3]. I wrote this code with TensorFlow but it doesn't work. It returns the error "InaccessibleTensorError: The tensor 'Tensor("truediv:0", shape=(1024, 1024, 3), dtype=float32)' cannot be accessed here: it is defined in another function or code block. Use return values, explicit Python locals or TensorFlow collections to access it.".
def decode_img(filename):
image = tf.ones((3,1024,1024,3),dtype=tf.dtypes.float32)
cnt=0
slices = []
for fi in filename:
bits = tf.io.read_file(fi)
img = tf.image.decode_jpeg(bits, channels=3)
img = tf.image.resize(img, (1024,1024))
slices.append(tf.cast(img, tf.float32) / 255.0)
cnt +=1
image = tf.stack(slices)
return image
#-----------------------
filenames = ['img1.png', 'img2.png', 'img3.png']
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.map(decode_img, num_parallel_calls=AUTO)
In general, tensorflow does not support item assignment. Rather, generate all the img layers you want and then use tf.stack() or tf.concatenate.
filename = [img1.png, img2.png, img3.png]
cnt=0
slices = []
for fi in filename:
bits = tf.io.read_file(fi)
img = tf.image.decode_jpeg(bits, channels=3)
img = tf.image.resize(img, (1024,1024))
slices.append(tf.cast(img, tf.float32) / 255.0)
cnt +=1
image = tf.stack(slices)
I am trying to read the image dataset for the segmentation problem (1-class) by following this link. My main folder contains two folders i.e. (a) img (b) mask. img contains image samples and mask contains corresponding masks. My approach was, generate the path for image and then change the string path (i.e. img->mask). I modified the code provided here which now looks as:
def process_path(file_path):
file_path_str = str(file_path)
file_path_mask = file_path_str.replace('img', 'mask')
# load the raw data from the file as a string
img = tf.io.read_file(file_path)
img = decode_img(img)
mask = tf.io.read_file(str(file_path_mask))
mask = decode_mask(mask)
return img, mask
However, when I am trying to see the size of my samples using:
for image, mask in labeled_ds.take(1):
print("Image shape: ", image.numpy().shape)
print("Mask shape: ", mask.numpy().shape)
I am getting the following error:
InvalidArgumentError: NewRandomAccessFile failed to Create/Open: Tensor("arg0:0", shape=(), dtype=string) : The filename, directory name, or volume label syntax is incorrect.
; Unknown error
[[{{node ReadFile_1}}]] [Op:IteratorGetNextSync]
Question: Any suggestion on how to read image and mask both from a given folder without above error?
We can use tf.regex.replace to rename string. So, in place of python string replacement, use:file_path_mask = tf.regex_replace(file_path, "img", "mask"). For TF 2.0, use tf.strings.regex_replace.
Alternative workaround for a similar problem. I have 200 (nb_of_images = 200) grayscale images of shape (512, 512) loaded as np.array and 200 binary masks also of shape (512, 512) and loaded as np.array. Within a for loop, I take all the images, convert them to EagerTensor (with tf.convert_to_tensor), cast them to tf.float32 through the dtype arg, add one dimension with:
img = img[:, :, tf.newaxis]
so that my images are now EagerTensors of shape (512, 512, 1), and finally I append them to an external list called images.
Within the same loop, I do the exact same operations for the masks and in the end I append them to an external list called masks.
Once the for loop is finished, I basically have two lists of EagerTensors, with
len(images) == len(masks) == nb_of_images
Lastly, I re-convert the two lists to tf.Tensor with:
images_tf = tf.convert_to_tensor(images) # convert list back to tf.Tensor
masks_tf = tf.convert_to_tensor(masks) # convert list back to tf.Tensor
and finally I create the tf.data.Dataset with:
dataset = tf.data.Dataset.from_tensor_slices((images_tf, masks_tf)) # create tf.data.Dataset
What I mean is, can I, for example, construct 2 different sprite images and be able to choose one of them while viewing embeddings in 2D/3D space using TSNE/PCA?
In other words, when using the following code:
embedding.sprite.image_path = "Path/to/the/sprite_image.jpg"
Is there a way to add another sprite image?
So, when training a Conv Net to distinguish between MNIST digits, I not only need to view the 1,2,..9, and 0 in the 3D/2D space, instead, I would like to see where are the ones gathering in that space. Same for 2s, 3s and so on. so I need a unique color for the 1s, another one for the 2s and so on... I need to view this as in the following image:
source
Any help is much appreciated!
There is an easier way to do this with filtering. You can just select the labels with a regex syntax:
If this is not what you are looking for, you could create a sprite image that assigns the same plain color image to each of your labels!
This functionality should come out of the box (without additional sprite images). See 'colour by' in the left sidepanel. You can toggle the A to switch sprite images on and off.
This run was produced with the example on the front page of the tensorboardX projector GitHub repo. https://github.com/lanpa/tensorboardX
You can also see a live demo with MNIST dataset (images and colours) at http://projector.tensorflow.org/
import torchvision.utils as vutils
import numpy as np
import torchvision.models as models
from torchvision import datasets
from tensorboardX import SummaryWriter
resnet18 = models.resnet18(False)
writer = SummaryWriter()
sample_rate = 44100
freqs = [262, 294, 330, 349, 392, 440, 440, 440, 440, 440, 440]
for n_iter in range(100):
dummy_s1 = torch.rand(1)
dummy_s2 = torch.rand(1)
# data grouping by `slash`
writer.add_scalar('data/scalar1', dummy_s1[0], n_iter)
writer.add_scalar('data/scalar2', dummy_s2[0], n_iter)
writer.add_scalars('data/scalar_group', {'xsinx': n_iter * np.sin(n_iter),
'xcosx': n_iter * np.cos(n_iter),
'arctanx': np.arctan(n_iter)}, n_iter)
dummy_img = torch.rand(32, 3, 64, 64) # output from network
if n_iter % 10 == 0:
x = vutils.make_grid(dummy_img, normalize=True, scale_each=True)
writer.add_image('Image', x, n_iter)
dummy_audio = torch.zeros(sample_rate * 2)
for i in range(x.size(0)):
# amplitude of sound should in [-1, 1]
dummy_audio[i] = np.cos(freqs[n_iter // 10] * np.pi * float(i) / float(sample_rate))
writer.add_audio('myAudio', dummy_audio, n_iter, sample_rate=sample_rate)
writer.add_text('Text', 'text logged at step:' + str(n_iter), n_iter)
for name, param in resnet18.named_parameters():
writer.add_histogram(name, param.clone().cpu().data.numpy(), n_iter)
# needs tensorboard 0.4RC or later
writer.add_pr_curve('xoxo', np.random.randint(2, size=100), np.random.rand(100), n_iter)
dataset = datasets.MNIST('mnist', train=False, download=True)
images = dataset.test_data[:100].float()
label = dataset.test_labels[:100]
features = images.view(100, 784)
writer.add_embedding(features, metadata=label, label_img=images.unsqueeze(1))
# export scalar data to JSON for external processing
writer.export_scalars_to_json("./all_scalars.json")
writer.close()
There are some threads mentioning that this currently fails beyond a threshold number of datapoints. https://github.com/lanpa/tensorboardX