Re-using created dataset for different task (object detection - image classification) - object-detection

I have created a large dataset in Amazon sagemaker and labeled it using bounding boxes. I used this dataset for object detection and everything worked fine.
Later, I wanted to use this dataset for simple image classification. But every time, I try to run it, I get an error: Customer Error: Label was not a float.
I think that the problem are probably bounding boxes as the image classification algorithm does not expect them, but is there any way, how to change it?? My goal is to use the parts of image that are in bounding boxes for image classification training. Is there any way, how to set parameters, so that the algorithm could accept as input the information in bounding boxes?
Bellow is a snippet from a log file that was generated, when I tried to run image classification on dataset with bounding boxes.
[14:42:27] /opt/brazil-pkg-cache/packages/AIApplicationsPipeIterators/AIApplicationsPipeIterators-1.0.1145.0/AL2012/generic-flavor/src/data_iter/src/ease_image_iter.cpp:452: JSON Logic Error while parsing
{
"annotations": [
{
"class_id": 0,
"height": 194,
"left": 34,
"top": 16,
"width": 150
}
],
"image_size": [
{
"depth": 3,
"height": 256,
"width": 185
}
]
}
: Value is not convertible to float.
PS: The dataset is an augmented manifest file.
I would be very grateful for any help.

Thank you for reaching out to us. SageMaker algorithms for Training expect a specific format for the labels for each algorithms. For example, https://docs.aws.amazon.com/sagemaker/latest/dg/image-classification.html. Hence, you cannot feed the bounding boxes to Image classification Training algorithm.

Related

Data Augmentation for Object Detection - Polygon Region Shape

I'm looking to run a Mask RCNN code on my dataset of about 2700 images. The images are too large and I would like to resize them, and I would also like to add some shear, scale and zoom augmentations.
Since this is an object detection task, it requires augmentation of annotated images with bounding boxes. Most of the resources I found dealt with rectangular bounding boxes which seems relatively straightforward.
However, my images have polygon bounding boxes. Here's an example:
I'm currently using the VGG annotator and the bounding box values are stores in a JSON file. How do I go about doing this?
You can use a library such as IMGAUG (https://imgaug.readthedocs.io/en/latest/source/installation.html)
Here is a link to a notebook where you can practice using the augmentations along with the polygons.
https://nbviewer.org/github/aleju/imgaug-doc/blob/master/notebooks/B03%20-%20Augment%20Polygons.ipynb
Once you are happy that you know how you want to augment your images you can use it with MaskRCNN in the following manner.
import imgaug.augmenters as iaa
augmentation = iaa.Sequential([
iaa.Fliplr(0.5), # horizontal flips
# Small gaussian blur with random sigma between 0 and 0.5.
# But we only blur about 50% of all images.
iaa.Sometimes(0.5,
iaa.GaussianBlur(sigma=(0, 0.5))
),
# Strengthen or weaken the contrast in each image.
iaa.ContrastNormalization((0.75, 1.5)),
# Apply affine transformations to each image.
# Scale/zoom them, translate/move them, rotate them and shear them.
iaa.Affine(
scale={"x": (0.8, 1.2), "y": (0.8, 1.2)},
translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)},
rotate=(-10, 10),
shear=(-2, 2))
], random_order=True) # apply augmenters in random order
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=20,
layers='heads',
augmentation=augmentation)

Non Max Suppression settings and postprocessing for EfficientDet

I've downloaded and installed the Tensorflow Object Detection API and downloaded one of the EfficientDet models. As I want to do some work on the raw scores directly before Non-Max Suppression reduces it to class output, my first goal was to try and get the same final outputs from the raw scores, using the downloaded model config as a guide.
post_processing {
batch_non_max_suppression {
score_threshold: 9.99999993922529e-09
iou_threshold: 0.5
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
As the Object Detection API has no score converter method under postprocessing, I'm not sure what this does, but the only batch NMS method in utils seems to be batch_multiclass_non_max_suppression.
So, having fed an image into the network and got an output detections, to try and replicate its results:
result = post_processing.batch_multiclass_non_max_suppression(tf.expand_dims(detections['raw_detection_boxes'], 2), detections['raw_detection_scores'], 9.99999993922529e-09, 0.5, 100, max_total_size=100)
detections['detection_boxes'] = result[0]
detections['detection_scores'] = result[1]
detections['detection_classes'] = result[2]
i.e., substitute the relevant scores in the detections with the output of NMS, and insert the dimension needed for the batch function to work. This is then visualised following per the TensorFlow Hub colab.
The problem is that whilst the input image (this from the MSCOCO dataset) should produce this:
It instead produces this:
The bounding boxes are all (seemingly) shifted upwards and the categories are simply off, which suggests there's more processing being done between the raw scores, NMS, and output, but it's entirely unclear what. The scores are correct, so it appears to be pruning correctly.
Edit: I suspect, after looking at the SSD model template, that the problem with the misaligned bounding boxes is because I'm not passing the resized image dimensions along to NMS, which is generated by the preprocessing step, which should be easy enough to address via generating the image resize function. However, after applying the slice operation to remove a background class doesn't address the incorrect labels:
Instead, it seems to have lost the person class entirely--this makes sense; it isn't configured to include a background class of any sort and if Person (id 1) is instead coming out as index 0, then this would cut them off.
EDIT 2: I looked at the original meta-architecture further and copied the image-resizing function, i.e.:
from object_detection.protos import image_resizer_pb2
from object_detection.utils import config_util as c
from object_detection.utils import shape_utils
config = c.get_configs_from_pipeline_file(r"C:\Users\Person\.keras\datasets\efficientdet_d7_coco17_tpu-32\pipeline.config")
image_config = c.get_image_resizer_config(config['model'])
resize = image_resizer_builder.build(image_config)
def compute_clip_window(preprocessed_images, true_image_shapes):
# identical to the meta-arch definition
# image resizing
im = tf.cast(input_tensor, tf.float32)
channel_offset = [0.485, 0.456, 0.406]
channel_scale = [0.229, 0.224, 0.225]
im = ((im / 255.0) - [[channel_offset]]) / [[channel_scale]]
resized = shape_utils.resize_images_and_return_shapes(im, resize)
clip = compute_clip_window(resized[0], resized[1])
Therefore allowing the clip argument to be supplied to NMS. However, this doesn't change anything, and it still returns the same mis-aligned boxes as the second image. This is incredibly confusing, as this seems like it should replicate everything the model needs in both the preprocessing and postprocessing steps to generate its own output: the image is normalized and resized; the true image size is retained alongside the resized image; no further processing of the raw boxes or raw scores happens before they get passed to the NMS (the returned versions of the raw values are identical to the values passed to NMS except with one dimension and the model itself doesn't interfere with the post-processing at all--and the call signature calls preprocessing, prediction, and postprocessing in turn, so nothing else should be happening in the interim.
Edit 3: I added another line was added (to no effect)--setting the multiclass scores in the NMS additional fields to the detection scores with backgrounds (i.e., the raw scores). By adding +1 to all the label classes, I got the following image:
Whilst this is correct, this only corrects for the earlier parts of the dataset, i.e. where the only empty class is the 0th. It still appears that there must be some mapping step I'm not following, alongside whatever is causing the image misalignment.
The easiest solution in my case was to load the model from the checkpoint and configs, rather than use the saved model directly, in order to access the original preprocess, predict, and postprocess methods, rather than having a single function call.

Vega Edge bundling (directed) - vary thickness of each edge to show strength of connection

I am looking to use an edge bundle visualisation as per:
https://vega.github.io/editor/#/examples/vega/edge-bundling
However, in this example, all the directed edges are of uniform thickness.
Instead, I need to generate edges of varying thickness to illustrate the strength of the relationship between nodes.
I envisaged passing that thickness into the model such that the edges defined by the JSON at
https://github.com/vega/vega-datasets/blob/master/data/flare-dependencies.json
would be adjusted so an edge currently defined as:
{
"source": 190,
"target": 4
},
would instead be defined as say:
{
"source": 190,
"target": 4,
"edgeWeight": 23
},
Is this possible? I did try experimenting by passing two simplified JSON datasets using value but couldn't figure out how to feed in that "edgeWeight" variable to the line definition in 'marks'.
Do you know how I might do that?
Regards,
Simon
I was provided an answer to this as follows:
First add a formula to append the size value from the flare.json dataset as a field 'strokeWidth' by writing this at line 90 in the example:
{
"type": "formula",
"expr": datum.size/10000",
"as": "strokeWidth"
},
Next, in the marks, set the strokeWidth value for each edge in the edgebundle to the associated 'strokeWidth' in the column now created by writing this at what becomes line 176 after the above change:
"strokeWidth": {"field": "strokeWidth"}
After this, the diagram should render with edges of a thickness defined by that 'size' variable.
Note that in this example, I had to scale the 'size' value in the original dataset by 10,000 to set the lines at a reasonable thickness.
In practice, I would scale the data prior to presenting it to Vega.

Managing datasets for a branch/tangled network

I'm trying to setup up two networks that meet in a couple of fully connected layers to a single output layer. I think I know how to get the branches setup (though additional resources for that would be nice), but I'm unclear on how to manage my dataset. One of the branches is working on a dataset of text documents that I gather through the preprocessing.text_dataset_from_directory() function then do some convolution over it, while the other is a set of corresponding numbers I'd like to input into the second branch, that is stored as a .csv. I'm not entirely sure how to make sure the values and the text files are input into the network simultaneously, any help on this would be greatly appreciated.
(For additional context I currently have the network that works on the text documents working, I'm trying to add a branch of supplementary data)
This is very well explained in the tensorflow documentation. The following style of code will work.
https://www.tensorflow.org/guide/keras/functional
model.fit(
{"title": title_data, "body": body_data, "tags": tags_data},
{"priority": priority_targets, "department": dept_targets},
epochs=2,
batch_size=32,
)

Tensorflow: training on JSON data to generate similar output

Assume one has JSON data containining instructions for generating the following 10x5 cell patterns, and that each cell can contain one of the following characters: _ 0 x y z
Also assume that each character can be displayed in various colors.
pattern 1:
_yx_0zzyxx
_0__yz_0y_
x0_0x000yx
_y__x000zx
zyyzx_z_0y
pattern 2:
xx0z00yy_z
zzx_0000_x
_yxy0y__yx
_xz0z__0_y
y__x0_0_y_
pattern 3:
yx0x_xz0_z
xz_x0_xxxz
_yy0x_0z00
zyy0__0zyx
z_xy0_0xz0
These were randomly generated, and are all black, but assume they were devised according to some set of rules, and in color.
The JSON for the first pattern would look something like:
{
width: 10,
height: 5,
cells: [
{
value: '_',
color: 'red'
},
{
value: 'y',
color: 'blue'
}, ...
]
}
If one wanted to train on this data in order to generate new yet similar patterns (again, assuming these were not randomly generated), what is the recommended approach for:
reading the data in (I'd imagine putting the JSON into an Example protobuf, serializing the buffer to string with tf.parse_example, and then writing that to TFRecord files)
training on that data
generating new patterns based on the trained model
supplying seed data for the generated patterns, e.g. first cell is the character "x' with the color blue.
I want to achieve something similar to what I've seen in style transfer with art/photos, and with music/MIDI data (see: Google Magenta). In those cases, here the model is trained an a distinctive set of artwork or melodic style, and a seed in the form of a photograph or primer melody is supplied in order to generate content similar to the data used in training.
Thanks!
I dislike preprocessing the dataset into new forms, it makes it difficult to change later on and slows future development, it's like technical debt in my opinion.
My approach would be to keep your JSON as-is and write some simple python code (a generator specifically which mean you use yield instead of return statements) to read the JSON file and spit out samples in sequence.
Then use the tensorflow Dataset input pipeline with Dataset.from_generator(...) to take data from your input function.
https://www.tensorflow.org/programmers_guide/datasets
The Dataset pipeline provides everything you need to manage the various transformations you'll want to apply, you can buffer, shuffle, batch, prefetch, and map functions onto your data trivially and in a nice modular, testable, framework that feeds naturally into your tensorflow model.