I am following the Tensorflow notebook for Few shot learning ( https://colab.research.google.com/github/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb#scrollTo=RW1FrT2iNnpy )
In it, I saw that they were annotating the images using colab_utils.annotate(). I can't understand the annotation format they are using (like YOLO or COCO format). Another problem is that we can't specify the classes at the time when we are drawing the bounding boxes and I have to remember the order in which I annotate the different images and classes so I can add them by code later on.
If someone can tell me what's that format so I can annotate the images on my PC locally rather than on COLAB which will save a lot of time.
Any help would be appreciated.
Regards
The colab_utils annotation tools is only practical for a single class. Below is the format from the source code:
[
// stuff for image 1
[
// stuff for rect 1
{x, y, w, h},
// stuff for rect 2
{x, y, w, h},
...
],
// stuff for image 2
[
// stuff for rect 1
{x, y, w, h},
// stuff for rect 2
{x, y, w, h},
...
],
...
]
As the annotations don't include any reference ID to the source image, order matters and you have to match the order of the box array with the order of your images; this tool is probably not practical for a large training set. The example from the colab you provided, below, is thus the example to follow.
gt_boxes = [
np.array([[0.436, 0.591, 0.629, 0.712]], dtype=np.float32),
np.array([[0.539, 0.583, 0.73, 0.71]], dtype=np.float32),
np.array([[0.464, 0.414, 0.626, 0.548]], dtype=np.float32),
np.array([[0.313, 0.308, 0.648, 0.526]], dtype=np.float32),
np.array([[0.256, 0.444, 0.484, 0.629]], dtype=np.float32)
]
Related
Torchvision's RandomResizedCrop is a tool I've found to be extremely handy when I'm working with datasets of high-resolution images at different sizes and aspect ratios and need to resize them down to a uniform size and aspect ratio without squashing and stretching.
Is there an equivalent to this in Tensorflow that can be mapped across a Tensorflow dataset, or a lambda function using tensorflow operations that would achieve the same effective result?
I couldn't find any equivalent in the library, but a somewhat "official" solution is present in the NNCLR tutorial. It indeed relies on the tf.image.crop_and_resize function, as #TFer pointed out.
I modified it a bit to make sure it also has the equivalent of the size argument in the PyTorch implementation, that I called crop_shape that I found clearer:
import tensorflow as tf
class RandomResizedCrop(tf.keras.layers.Layer):
# taken from
# https://keras.io/examples/vision/nnclr/#random-resized-crops
def __init__(self, scale, ratio, crop_shape):
super(RandomResizedCrop, self).__init__()
self.scale = scale
self.log_ratio = (tf.math.log(ratio[0]), tf.math.log(ratio[1]))
self.crop_shape = crop_shape
def call(self, images):
batch_size = tf.shape(images)[0]
random_scales = tf.random.uniform(
(batch_size,),
self.scale[0],
self.scale[1]
)
random_ratios = tf.exp(tf.random.uniform(
(batch_size,),
self.log_ratio[0],
self.log_ratio[1]
))
new_heights = tf.clip_by_value(
tf.sqrt(random_scales / random_ratios),
0,
1,
)
new_widths = tf.clip_by_value(
tf.sqrt(random_scales * random_ratios),
0,
1,
)
height_offsets = tf.random.uniform(
(batch_size,),
0,
1 - new_heights,
)
width_offsets = tf.random.uniform(
(batch_size,),
0,
1 - new_widths,
)
bounding_boxes = tf.stack(
[
height_offsets,
width_offsets,
height_offsets + new_heights,
width_offsets + new_widths,
],
axis=1,
)
images = tf.image.crop_and_resize(
images,
bounding_boxes,
tf.range(batch_size),
self.crop_shape,
)
return images
You can use tf.image.crop_and_resize to crop and resize the image.
Tensorflow provides,
Returns a tensor with crops from the input image at positions defined
at the bounding box locations in boxes. The cropped boxes are all
resized (with bilinear or nearest neighbor interpolation) to a fixed
size = [crop_height, crop_width].
For more details on the library please find here. Thanks
I've been using GEE to export some training patches from Sentinel-2 to be used in Python.
I could make it work, by following the GEE guide https://developers.google.com/earth-engine/tfrecord, and using the Export.image.toDrive function and then I can parse the exported TFRecord file to reconstruct my tiles.
var image_export_options = {
'patchDimensions': [366, 366],
'maxFileSize': 104857600,
// 'kernelSize': [366, 366],
'compressed': true
}
Export.image.toDrive({
image: clipped_img.select(bands.concat(['classes'])),
description: 'PatchesExport',
fileNamePrefix: 'Oros_1',
scale: 10,
folder: 'myExportFolder',
fileFormat: 'TFRecord',
region: export_area,
formatOptions: image_export_options,
})
However, when I try to specify the kernelSize in the formatOptions (that was supposed to "overlaps adjacent tiles by [kernelSize[0]/2, kernelSize[1]/2]", according to the guide) the files are exported but the '*mixer.json' doesn't reflect the increased number of patches and I am not able to iterate through the patches afterwards. The following command crashes the google colab session:
image_dataset = tf.data.TFRecordDataset(str(path/(file_prefix+'-00000.tfrecord.gz')), compression_type='GZIP')
first = next(iter(image_dataset))
first
The weird is that the problem happens only when I add the kernelSize to the formatOptions.
After some time trying to overcome this issue, I realized a not well documented behavior when one uses the kernel size to export patches from GEE.
Bundled with the exported TFRecord, there exists one xml file called mixer.
It doesn't matter if we use:
'patchDimensions': [184, 184],
'kernelSize': [1, 1], #default for no overlapping
or
'patchDimensions': [184, 184],
'kernelSize': [184, 184], #half patch overlapping
The mixer file remains the same and no mention to the kernel/overlapping size:
{'patchDimensions': [184, 184],
'patchesPerRow': 8,
'projection': {'affine': {'doubleMatrix': [10.0,
0.0,
493460.0,
0.0,
-10.0,
9313540.0]},
'crs': 'EPSG:32724'},
'totalPatches': 40}
In the second case, if we try to parse the patches using tf.io.parse_single_example(example_proto, image_features_dict), where image_features_dict equals something like:
{'B2': FixedLenFeature(shape=[184, 184], dtype=tf.float32, default_value=None),
'B3': FixedLenFeature(shape=[184, 184], dtype=tf.float32, default_value=None),
'B4': FixedLenFeature(shape=[184, 184], dtype=tf.float32, default_value=None)}
it will raise the error:
_FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors.
Can't parse serialized Example. [Op:ParseExampleV2]
Instead, to parse these records which have kernelSize > 1, we have to consider patchDimentions + kernelSize as the resulting patch size, even though the mixer.xml file says on contraty. In this example, our patchSize would be 368 (original patch size + kernelSize). Be aware that for odd kernel sizes, the number to be added to the original patch size is kernelSize - 1.
I am serving up the inception model using TensorFlow serving. I am doing this on Azure Kubernetes so not via the more standard and well documented google cloud.
In any event, this is all working however the bit i am confused about is the predictions come back as an array of floats. These values map to the original labels passed in during training but without the original labels file there is no way to reverse engineer what each probability relates to.
Before I moved to serving i was simply using an inference script that then cross references against the labels file which i stored along with the frozen model at time of training. But with serving this does not work.
So my question is how can i get the labels associated with the model and ideally get the prediction to return the labels and probabilities?
i tried the approach suggested by #user1371314 but i couldn't get it to work. An other solution that worked is creating a tensor (instead of a constant) and map it with only the first element of the output layer when saving the model. When you put it together it looks like this :
# get labels names and create a tensor from it
....
label_names_tensor = tf.convert_to_tensor(label_names)
# save the model and map the labels to the output layer
tf.saved_model.simple_save(
sess,
"./saved_models",
inputs={'image': model.input},
outputs={'label' : label_names_tensor,'prediction': model.output[0]})
When you make a prediction after serving your model you will get the following result:
{
"predictions": [
{
"label": "label-name",
"prediction": 0.114107
},
{
"label": "label-name",
"prediction": 0.288598
},
{
"label": "label-name",
"prediction": 0.17436
},
{
"label": "label-name",
"prediction": 0.186366
},
{
"label": "label-name",
"prediction": 0.236568
}
]
I am sure there is a way to return a mapping directly for this using the various TF ops however I have managed to at least package the labels into the model and return them in the prediction along with the probabilities.
What i did was create a tf.constant from the labels array and then added that tensor to the array of output tensors in tf.saved_model.signature_def_utils.build_signature_def
Now when i get a prediction i get the float array and also an array of labels and i can match them up on the client side.
I want to create tensorflow records to feed my model;
so far I use the following code to store uint8 numpy array to TFRecord format;
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
def convert_to_record(name, image, label, map):
filename = os.path.join(params.TRAINING_RECORDS_DATA_DIR, name + '.' + params.DATA_EXT)
writer = tf.python_io.TFRecordWriter(filename)
image_raw = image.tostring()
map_raw = map.tostring()
label_raw = label.tostring()
example = tf.train.Example(features=tf.train.Features(feature={
'image_raw': _bytes_feature(image_raw),
'map_raw': _bytes_feature(map_raw),
'label_raw': _bytes_feature(label_raw)
}))
writer.write(example.SerializeToString())
writer.close()
which I read with this example code
features = tf.parse_single_example(example, features={
'image_raw': tf.FixedLenFeature([], tf.string),
'map_raw': tf.FixedLenFeature([], tf.string),
'label_raw': tf.FixedLenFeature([], tf.string),
})
image = tf.decode_raw(features['image_raw'], tf.uint8)
image.set_shape(params.IMAGE_HEIGHT*params.IMAGE_WIDTH*3)
image = tf.reshape(image_, (params.IMAGE_HEIGHT,params.IMAGE_WIDTH,3))
map = tf.decode_raw(features['map_raw'], tf.uint8)
map.set_shape(params.MAP_HEIGHT*params.MAP_WIDTH*params.MAP_DEPTH)
map = tf.reshape(map, (params.MAP_HEIGHT,params.MAP_WIDTH,params.MAP_DEPTH))
label = tf.decode_raw(features['label_raw'], tf.uint8)
label.set_shape(params.NUM_CLASSES)
and that's working fine. Now I want to do the same with my array "map" being a float numpy array, instead of uint8, and I could not find examples on how to do it;
I tried the function _floats_feature, which works if I pass a scalar to it, but not with arrays;
with uint8 the serialization can be done by the method tostring();
How can I serialize a float numpy array and how can I read that back?
FloatList and BytesList expect an iterable. So you need to pass it a list of floats. Remove the extra brackets in your _float_feature, ie
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.ones((3,)).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes": _floats_feature(numpy_arr)}))
print(example)
features {
feature {
key: "bytes"
value {
float_list {
value: 1.0
value: 1.0
value: 1.0
}
}
}
}
I will expand on the Yaroslav's answer.
Int64List, BytesList and FloatList expect an iterator of the underlying elements (repeated field). In your case you can use a list as an iterator.
You mentioned: it works if I pass a scalar to it, but not with arrays. And this is expected, because when you pass a scalar, your _floats_feature creates an array of one float element in it (exactly as expected). But when you pass an array you create a list of arrays and pass it to a function which expects a list of floats.
So just remove construction of the array from your function: float_list=tf.train.FloatList(value=value)
I've stumbled across this while working on a similar problem. Since part of the original question was how to read back the float32 feature from tfrecords, I'll leave this here in case it helps anyone:
If map.ravel() was used to input map of dimensions [x, y, z] into _floats_feature:
features = {
...
'map': tf.FixedLenFeature([x, y, z], dtype=tf.float32)
...
}
parsed_example = tf.parse_single_example(serialized=serialized, features=features)
map = parsed_example['map']
Yaroslav's example failed when a nd array was the input:
numpy_arr = np.ones((3,3)).astype(np.float)
I found that it worked when I used numpy_arr.ravel() as the input. But is there a better way to do it?
First of all, many thanks to Yaroslav and Salvador for their enlightening answers.
According to my experience, their methods only works when the input is a 1D NumPy array as the size of (n, ). When the input is a Numpy array with the dimension of more than 2, the following error info appears:
def _float_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes":
_float_feature(numpy_arr)}))
print(example)
TypeError: array([[0., 1., 2.],
[3., 4., 5.]]) has type numpy.ndarray, but expected one of: int, long, float
So, I'd like to expand on Tsuan's answer, that is, flattening the input before it was fed into the TF example. The modified code is as follows:
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float).flatten()
example = tf.train.Example(features=tf.train.Features(feature={"bytes":
_float_feature(numpy_arr)}))
print(example)
In addition, np.flatten() is more applicable than np.ravel().
Use tfrmaker, a TFRecord utility package. You can install the package with pip:
pip install tfrmaker
Then you could create tfrecords like this:
from tfrmaker import images
# mapping label names with integer encoding.
LABELS = {"bishop": 0, "knight": 1, "pawn": 2, "queen": 3, "rook": 4}
# specifiying data and output directories.
DATA_DIR = "datasets/chess/"
OUTPUT_DIR = "tfrecords/chess/"
# create tfrecords from the images present in the given data directory.
info = images.create(DATA_DIR, LABELS, OUTPUT_DIR)
# info contains a list of information (path: releative path, size: no of images in the tfrecord) about created tfrecords
print(info)
The package also has some cool features like:
dynamic resizing
splitting tfrecords into optimal shards
spliting training, validation, testing of tfrecords
count no of images in tfrecords
asynchronous tfrecord creation
NOTE: This package currently supports image datasets that are organised as directories with class names as sub directory names.
I am trying to display images from the CIFAR-10 TensorFlow tutorial. The images become transformed so that the values read are floats more less between -1 and 3. I'm not show what kind of transformation has been applied. How can I display them to see the original content?
Here is what the part of the image output looks like:
array([[ 1.24836731, 0.04940184, -1.49835348],\n [ 1.117571 , 0.02760247, -1.56375158],\n [ 1.24836731, 0.18019807, -1.41115606],\n [ 1.18296909, 0.09300058, -1.47655416],\n [ 1.13937044, 0.02760247, -1.54195225],\n [ 1.13937044, 0.09300058, -1.52015293],\n
...
np.max(image)
2.9269187
np.min(image)
-1.759946
This is the link to the tutorial:
https://www.tensorflow.org/tutorials/deep_cnn/
Edit:
Rescaling does not seem to work for me:
Try scaling the image to be between 0 and 255? Subtract the min and divide by its new max.
A couple of ways to do this, for the greyscale MNIST images:
tmp = mnist.train.images[0]
tmp = tmp.reshape((28,28))
plt.imshow(tmp, cmap = cm.Greys)
plt.show()
Or, for CIFAR-10 images:
Code below taken from this tutorial
def visualize_sample(X_train, y_train, classes, samples_per_class=7):
"""visualize some samples in the training datasets """
num_classes = len(classes)
for y, cls in enumerate(classes):
idxs = np.flatnonzero(y_train == y) # get all the indexes of cls
idxs = np.random.choice(idxs, samples_per_class, replace=False)
for i, idx in enumerate(idxs): # plot the image one by one
plt_idx = i * num_classes + y + 1 # i*num_classes and y+1 determine the row and column respectively
plt.subplot(samples_per_class, num_classes, plt_idx)
plt.imshow(X_train[idx].astype('uint8'))
plt.axis('off')
if i == 0:
plt.title(cls)
plt.show()