I want to do the same numpy operation as follow to make a custom layer
img=cv2.imread('img.jpg') # img.shape =>(600,600,3)
mask=np.random.randint(0,2,size=img.shape[:2],dtype='bool')
img2=np.expand_dims(img,axis=0) #img.shape => (1,600,600,3)
img2[:,mask,:].shape # => (1, 204030, 3)
this is my first attemp but I failed. I can't do the same operation for for tensorflow tensors
class Sampling_layer(keras.layers.Layer):
def __init__(self,sampling_matrix):
super(Sampling_layer,self).__init__()
self.sampling_matrix=sampling_matrix
def call(self,input_img):
return input_img[:,self.sampling_matrix,:]
More Explanations:
I want to define a keras layer so that given a batch of images it use a sampling matrix and give me a batch of sampled vectors for the images.The sampling matrix is a random boolean matrix the same size as the image. The slicing operation I used is straight forward for numpy arrays and works perfectly. but I can't get it done with tensors in tensorflow. I tried to use loops to perform the operation I want manually but I failed.
You can do the following.
import numpy as np
import tensorflow as tf
# Batch of images
img=np.random.normal(size=[2,600,600,3]) # img.shape =>(600,600,3)
# You'll need to match the first 3 dimensions of mask with the img
# for that we'll repeat the first axis twice
mask=np.random.randint(0,2,size=img.shape[1:3],dtype='bool')
mask = np.repeat(np.expand_dims(mask, axis=0), 2, axis=0)
# Defining input layers
inp1 = tf.keras.layers.Input(shape=(600,600,3))
mask_inp = tf.keras.layers.Input(shape=(600,600))
# The layer you're looking for
out = tf.keras.layers.Lambda(lambda x: tf.boolean_mask(x[0], x[1]) )([inp1, mask])
model = tf.keras.models.Model([inp1, mask_inp], out)
# Predict on sample data
toy_out = model.predict([img, mask])
Note that both your images and mask needs to have the same batch size. I couldn't find a solution to make this work without repeating the mask on batch axis to match the batch size of images. This is the only possible solution that came to my mind, (assuming that your mask changes for every batch of data).
Related
This question is about ensuring the prediction time input images to be in the same range as the images fed during the training time. I know that it's the usual practice to repeat the same steps that were done during the training time to process an image at the prediction time. But in my case, I apply random_trasnform() function inside a custom data generator during the training time, which won't make sense to add during the prediction time.
import cv2
import tensorflow as tf
import seaborn as sns
To simplify my problem, assume I'm doing the following changes to a grayscale image that I read in a custom data generator.
img_1 is an output of the data generator, that is supposed to be the input to a VGG19 model.
# using a simple augmenter
augmenter = tf.keras.preprocessing.image.ImageDataGenerator(
brightness_range=(0.75, 1.25),
preprocessing_function=tf.keras.applications.vgg19.preprocess_input # preprocessing function of VGG19
)
# read the image
img = cv2.imread('sphx_glr_plot_camera_001.png')
# add a random trasnform
img_1 = augmenter.random_transform(img)/255
The above random_tranform() has made the grayscale value distribution to be as follows (between [0,1]):
plt.imshow(img_1); plt.show();
sns.histplot(img_1[:, :, 0].ravel()); # select the 0th layer and ravel because the augmenter stacks 3 layers of the grayscale image to make it an RGB image
Now, I want to do the same in the prediction time, but, I don't want a random transform applied to the image so I just pass the input image through the preprocessing_function().
# read image
img = cv2.imread('sphx_glr_plot_camera_001.png')
# pass through the preprocessing function
img_2 = tf.keras.applications.vgg19.preprocess_input(img)/255
But I'm unable to make the input to be in the range of the [0, 1] as was done during the training.
plt.imshow(img_2); plt.show();
sns.histplot(img_2[:, :, 0].ravel());
This makes the predictions completely incorrect. How can I make sure that the inputs to the model at the prediction time undergo the same steps so that they end up having a similar distribution to the inputs that were fed during training? I don't want to add a random_transform() at the prediction time as well.
I will recommend to add an per image standardization in your model this will ensure you that the mean of the image is 0 and standard deviation is 1 in you training set and in your inference
Problem Setup
I have a collection of float32 numpy arrays (say: (100, 100) each) that belong to several classes.
I've created an image dataset from them by saving them to the disk (DATA_SET_PATH) using matplotlib.image.imsave(<save_path.jpg>, array, cmap='gray')
Then, I've trained a pretrained VGG model on that image dataset using the following.
from tensorflow.keras.applications.vgg19 import preprocess_input
augmenter = tf.keras.preprocessing.image.ImageDataGenerator(preprocessing_function=preprocess_input)
train_generator = augmenter.flow_from_directory(
<DATA_SET_PATH>,
target_size=(224, 224), # this is the input size that VGG model expects
color_mode='rgb',
... # other parameters
)
model = tf.keras.applications.VGG19(include_top=False,
weights='imagenet',
input_shape=(224, 224, 3)
)
# other model configurations ...
# ...
model.fit(train_generator , ...)
Now, in production, I am receiving a numpy array in the same format as in above (1) and I want to obtain the prediction for that single numpy array using model.predict().
Question
So, in this setup, how can I ensure a single numpy array (input) would transform to the state of a model input tensor during training?
What I tried:
import numpy as np
import cv2
input= np.random.randn(100, 100).astype(np.float32) # sample input array
# first resize the array
input = cv2.resize(input, (224, 224), cv2.INTER_AREA)
# make this has three channels (because VGG model has expected so during the training)
input = np.stack([input] * 3, axis=-1)
# pass through the respective preprocessing function
input = preprocess_input(input)
When I pass this to model.predict() after expanding the dimensions, the predictions are obviously wrong, despite having good performance during training.
I think this is due to the fact that the above input being different than what the model.input has received during training. If needed, I can save the input array to an image as in above (2), but I want to know the next steps that keras would apply on to it.
Edit:
Based on the insight by #Lescurel in a comment and looking the source of the tf.keras.preprocessing.image, I've used the the load_img() function and got this working by saving the array to an image and then loading it (to reproduce step 2 above and to make sure the preprocessing_function gets the values in the range 0-255).
Here's how I got it to work:
input= np.random.randn(100, 100).astype(np.float32) # sample input array
# save `input` to an image and load it
temp_path = "temp_path.jpg"
matplotlib.image.imsave(temp_path, input, cmap='gray')
img = tf.keras.preprocessing.image.load_img(temp_path,
color_mode='rgb',
target_size=(224, 224)
)
# convert to an array
input = tf.keras.preprocessing.image.img_to_array(img)
input = preprocess_input(input)
# the above `input` is passed to the model after adding the extra dimension.
# ...
For my use case, I would still prefer to avoid saving this to an image and directly transform the numpy array to preprocessing_function by ensuring its values are in (0, 255) range, but that will be the scope of another question :)
For the benefit of community providing solution here
Based on the insight by #Lescurel in a comment and looking the source
of the tf.keras.preprocessing.image, I've used the the load_img()
function and got this working by saving the array to an image and then
loading it (to reproduce step 2 above and to make sure the
preprocessing_function gets the values in the range 0-255).
Here's how I got it to work:
input= np.random.randn(100, 100).astype(np.float32) # sample input array
# save `input` to an image and load it
temp_path = "temp_path.jpg"
matplotlib.image.imsave(temp_path, input, cmap='gray')
img = tf.keras.preprocessing.image.load_img(temp_path,
color_mode='rgb',
target_size=(224, 224)
)
# convert to an array
input = tf.keras.preprocessing.image.img_to_array(img)
input = preprocess_input(input)
# the above `input` is passed to the model after adding the extra dimension.
# ...
(paraphrased from akilat90)
I thought I would share something that took me a while to figure out: easily wrapping an existing Keras Sequence Class with a TF Dataset object. After following tutorials and migrating from TF 1.X and Keras to TF 2.X I finally figured out how to do it with minimal code. Hopefully I'm not the only one who struggled with this and others will find this helpful :)
A few assumptions:
Sequence class loads data and labels
Labels have the same shape (apart from channels) as the source data (i.e. this is something I use for training U-Nets)
Data format is channels last
import tensorflow as tf
def DatasetFromSequenceClass(sequenceClass, stepsPerEpoch, nEpochs, batchSize, dims=[512,512,3], n_classes=2, data_type=tf.float32, label_type=tf.float32):
# eager execution wrapper
def DatasetFromSequenceClassEagerContext(func):
def DatasetFromSequenceClassEagerContextWrapper(batchIndexTensor):
# Use a tf.py_function to prevent auto-graph from compiling the method
tensors = tf.py_function(
func,
inp=[batchIndexTensor],
Tout=[data_type, label_type]
)
# set the shape of the tensors - assuming channels last
tensors[0].set_shape([batchSize, dims[0], dims[1], dims[2]]) # [samples, height, width, nChannels]
tensors[1].set_shape([batchSize, dims[0], dims[1], n_classes]) # [samples, height, width, nClasses for one hot]
return tensors
return DatasetFromSequenceClassEagerContextWrapper
# TF dataset wrapper that indexes our sequence class
#DatasetFromSequenceClassEagerContext
def LoadBatchFromSequenceClass(batchIndexTensor):
# get our index as numpy value - we can use .numpy() because we have wrapped our function
batchIndex = batchIndexTensor.numpy()
# zero-based index for what batch of data to load; i.e. goes to 0 at stepsPerEpoch and starts cound over
zeroBatch = batchIndex % stepsPerEpoch
# load data
data, labels = sequenceClass[zeroBatch]
# convert to tensors and return
return tf.convert_to_tensor(data), tf.convert_to_tensor(labels)
# create our data set for how many total steps of training we have
dataset = tf.data.Dataset.range(stepsPerEpoch*nEpochs)
# return dataset using map to load our batches of data, use TF to specify number of parallel calls
return dataset.map(LoadBatchFromSequenceClass, num_parallel_calls=tf.data.experimental.AUTOTUNE)
With that function, you can then update your training to look something like this:
# load our data as tensorflow datasets
training = DatasetFromSequenceClass(trainingSequence, training_steps, nEpochs, batchSize, dims=shp, n_classes=nClasses)
validation = DatasetFromSequenceClass(validationSequence, validation_steps, nEpochs, batchSize, dims=shp, n_classes=nClasses)
# train
model_object.fit(training,
steps_per_epoch=training_steps,
validation_data=validation,
validation_steps=validation_steps,
epochs=nEpochs,
callbacks=callbacks,
verbose=1)
From here there are lots of other options for the Dataset API (like prefetch), but this should be a good starting point.
I have been going through the implementation of neural network in openAI code for any Vanilla Policy Gradient (As a matter of fact, this part is used nearly everywhere). The code looks something like this :
def mlp_categorical_policy(x, a, hidden_sizes, activation, output_activation, action_space):
act_dim = action_space.n
logits = mlp(x, list(hidden_sizes) + [act_dim], activation, None)
logp_all = tf.nn.log_softmax(logits)
pi = tf.squeeze(tf.random.categorical(logits, 1), axis=1)
logp = tf.reduce_sum(tf.one_hot(a, depth=act_dim) * logp_all, axis=1)
logp_pi = tf.reduce_sum(tf.one_hot(pi, depth=act_dim) * logp_all, axis=1)
return pi, logp, logp_pi
and this multi-layered perceptron network is defined as follows :
def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
for h in hidden_sizes[:-1]:
x = tf.layers.dense(inputs=x, units=h, activation=activation)
return tf.layers.dense(inputs=x, units=hidden_sizes[-1], activation=output_activation)
My question is what is the return from this mlp function? I mean the structure or shape. Is it an N-dimentional tensor? If so, how is it given as an input to tf.random_categorical? If not, and its just has the shape [hidden_layer2, output], then what happened to the other layers? As per their website description about random_categorical it only takes a 2-D input. The complete code of openAI's VPG algorithm can be found here. The mlp is implemented here. I would be highly grateful if someone would just tell me what this mlp_categorical_policy() is doing?
Note: The hidden size is [64, 64], the action dimension is 3
Thanks and cheers
Note that this is a discrete action space - there are action_space.n different possible actions at every step, and the agent chooses one.
To do this the MLP is returning the logits (which are a function of the probabilities) of the different actions. This is specified in the code by + [act_dim] which is appending count of the action_space as the final MLP layer. Note that the last layer of an MLP is the output layer. The input layer is not specified in tensorflow, it is inferred from the inputs.
tf.random.categorical takes the logits and samples a policy action pi from them, which is returned as a number.
mlp_categorical_policy also returns logp, the log probability of the action a (used to assign credit), and logp_pi, the log probability of the policy action pi.
It seems your question is more about the return from the mlp.
The mlp creates a series of fully connected layers in a loop. In each iteration of the loop, the mlp is creating a new layer using the previous layer x as an input and assigning it's output to overwrite x, with this line x = tf.layers.dense(inputs=x, units=h, activation=activation).
So the output is not the same as the input, on each iteration x is overwritten with the value of the new layer. This is the same kind of coding trick as x = x + 1, which increments x by 1. This effectively chains the layers together.
The output of tf.layers.dense is a tensor of size [:,h] where : is the batch dimension (and can usually be ignored). The creation of the last layer happens outisde the loop, it can be seen that the number of nodes in this layer is act_dim (so shape is [:,3]). You can check the shape by doing this:
import tensorflow.compat.v1 as tf
import numpy as np
def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
for h in hidden_sizes[:-1]:
x = tf.layers.dense(x, units=h, activation=activation)
return tf.layers.dense(x, units=hidden_sizes[-1], activation=output_activation)
obs = np.array([[1.0,2.0]])
logits = mlp(obs, [64, 64, 3], tf.nn.relu, None)
print(logits.shape)
result: TensorShape([1, 3])
Note that the observation in this case is [1.,2.], it is nested inside a batch of size 1.
I would like to resize every element in a ragged tensor. For example, if I have a ragged tensor of various sized images, how can I resize each one so that the dimensions are the same?
For example,
digits = tf.ragged.constant([np.zeros((1,60,60,1)), np.zeros((1,46,75,1))])
resize_lambda = lambda x: tf.image.resize(x, (60,60))
res = tf.ragged.map_flat_values(resize_lambda, digits)
I wish res to be a tensor of shape (2,60,60,1). How can I achieve this?
To clarify, this would be useful if within a custom layer we wanted to slice or crop sections from a single image to batch for inference in the next layer. In my case, I am attempting to combine two models (a model to segment an image into multiple cropped images of varying size and a classifier to predict each sub-image). I am also using tf 2.0
You should be able to do the following.
import tensorflow as tf
import numpy as np
digits = tf.ragged.constant([np.zeros((1,60,60,1)), np.zeros((1,46,75,1))])
res = tf.concat(
[tf.image.resize(digits[i].to_tensor(), (60,60)) for i in tf.range(digits.nrows())],
axis=0)