How to know what augmentation has been carried out tf.image.stateless_random_flip_up_down or similar API - tensorflow

In TF 2.x, there are a whole set of image augmentation API, take tf.image.stateless_random_flip_up_down for example. Most of these will perform the said operation at random. What I like to find out is if there’s a way to interrogate what exactly has been perform for a specific image in a specific batch. This info is critical if the target prediction involve localization like points, bounding boxes, etc. Since affine transform (like translate) performed on image, the same operation should be used to “augment” the targets (y) in a consistent manner.
I think all the image transform API in TF2.X do not return this piece of info. I would like to see if there’s easier way than creating custom ones of my own. I have done this for the older Keras data augmentation API in the past by subclasses, and would prefer not to repeat the tedium if possible.

I guess your main goal is to write a data augmentation pipeline that changes both image and its labels.
Well, then I would recommend you to use Albumentaitons library instead. It is a very popular open-source library that can easily be integrated with Tensorflow and PyTorch frameworks.
Here is its documentation: https://albumentations.ai/
Let me know if it helps!

Related

Programmatic Hyperparameter Tuning for TensorFlow Object Detection API

I am using the TF Object Detection API. I have a custom data set. I am training using SLURM jobs and calling the API scripts from within there. I am looking to try and tune hyperparameters found in the pipeline.config files. Unfortunately, in the documentation, this kind of process is not outlined. It seems like the process is to either use the sample configs or tune the hyperparameters by hand.
Tuning by hand is somewhat feasible, for example adjusting for two parameters for three values (batch size and steps) results in nine different .configs, but adding another hyperparameter to that boosts it up to twenty-seven files I need to keep track of. This does not seem like a good way to do it, particularly because it limits the values I can try and is clumsy.
It seems like there are libraries out there that hook into Keras and other more high-level frameworks, but I have found nothing that looks like it can take the results of the Object Detection API and actually optimize it.
Is it possible to do this with a pre-built library I don't know about? I would like to avoid having to edit the API implementation or coding this myself to minimize errors.

Missing tool in Deep Learning: data augmentation with geometrical landmarks

Data augmentation can easily be achieved using ad hoc modules in e.g. TensorFlow. This works perfectly for classification problems, however when the objective of the network is the prediction of a geometrical feature, e.g. a landmark, a problem arises. As the image is modified, e.g. flipped, or distorted, the corresponding labels also need to be adapted.
1 - Is there any tool to do this? I am sure that this is a common problem.
2 - Would it be useful to create a data augmentation script for neural networks that predict geometrical features?
I want to understand if I need to code all of this by myself or if I am missing something that already exists. If I need to do it and it could be useful I would just create an open source thing.
You can use imgaug library https://github.com/aleju/imgaug
An example of augmentation for key points using imgaug you can find here https://github.com/aleju/imgaug#example-augment-images-and-keypoints

Tensorflow object detection api: how to use imgaug for augmentation?

I've been hand-rolling augmenters using imgaug, as I really like some of the options that are not available in the tf object detection api. For instance, I use motion blur because so much of my data has fast-moving, blurry objects.
How can I best integrate my augmentation sequence with the api for on-the-fly training?
E.g., say I have an augmenter:
aug = iaa.SomeOf((0, 2),
[iaa.Fliplr(0.5), iaa.Flipud(0.5), iaa.Affine(rotate=(-10, 10))])
Is there some way to configure the object detection api to work with this?
What I am currently doing is using imgaug to generate (augmented) training data, and then creating tfrecord files from each iteration of this augmentation pipeline. This is very inefficient as I am saving large amounts of data to disk rather than running augmentation on the fly, during training.
Someone has made a repo for this:
https://github.com/JinLuckyboy/TensorFlowObjectDetectionAPI-with-imgaug
Sorry this is not a code answer and I have not actually looked into it, so I will not mark this as officially answered. If I ever get a chance to test it I will let people know.

Tensorflow Stored Learning

I haven't tried Tensorflow yet but still curious, how does it store, and in what form, data type, file type, the acquired learning of a machine learning code for later use?
For example, Tensorflow was used to sort cucumbers in Japan. The computer used took a long time to learn from the example images given about what good cucumbers look like. In what form the learning was saved for future use?
Because I think it would be inefficient if the program should have to re-learn the images again everytime it needs to sort cucumbers.
Ultimately, a high level way to think about a machine learning model is three components - the code for the model, the data for that model, and metadata needed to make this model run.
In Tensorflow, the code for this model is written in Python, and is saved in what is known as a GraphDef. This uses a serialization format created at Google called Protobuf. Common serialization formats include Python's native Pickle for other libraries.
The main reason you write this code is to "learn" from some training data - which is ultimately a large set of matrices, full of numbers. These are the "weights" of the model - and this too is stored using ProtoBuf, although other formats like HDF5 exist.
Tensorflow also stores Metadata associated with this model - for instance, what should the input look like (eg: an image? some text?), and the output (eg: a class of image aka - cucumber1, or 2? with scores, or without?). This too is stored in Protobuf.
During prediction time, your code loads up the graph, the weights and the meta - and takes some input data to give out an output. More information here.
Are you talking about the symbolic math library, or the idea of tensor flow in general? Please be more specific here.
Here are some resources that discuss the library and tensor flow
These are some tutorials
And here is some background on the field
And this is the github page
If you want a more specific answer, please give more details as to what sort of work you are interested in.
Edit: So I'm presuming your question is more related to the general field of tensor flow than any particular application. Your question still is too vague for this website, but I'll try to point you toward a few resources you might find interesting.
The tensorflow used in image recognition often uses an ANN (Artificial Neural Network) as the object on which to act. What this means is that the tensorflow library helps in the number crunching for the neural network, which I'm sure you can read all about with a quick google search.
The point is that tensorflow isn't a form of machine learning itself, it more serves as a useful number crunching library, similar to something like numpy in python, in large scale deep learning simulations. You should read more here.

Visualizing the detection process in Mask-RCNN

I am working on a project that aims to detect objects in certain difficult circumstances. I ran a test with Mask_RCNN on a dataset that contains that specific type of difficult examples and it did a pretty good job in some of them.
But some other examples didn't get detected surprisingly, when there is no obvious reason. To understand the reason behind this performance difference, I've been adviced to use Tensorboard. But then I realized that its mostly used for training phase, as I understood from this video.
At the end of the video, however, they mention about an integration project of Tensorboard, namely the Tensorflow Debugger Integration. But unfortunately I could not find further information regarding the continuation about that feature.
Is there any way to visualize weights and activation maps inside a CNN during inference/evaluation phase?
The main difference between training and inference time for tensorboard will be the global_step value. Most graphs display global step as the x-axis. You can supply your own global step counter if you like, but you'll have to decide what the x-axis should represent to you in this case since "time" isn't really a logical construct during inference. Other tabs such as the images tab don't have a time component, so using them should be the same as during training.
The tensorflow debugger is a nice terminal debugger, but wouldn't really be related to what you're trying to do here. It's certainly not a visualization tool.
Another approach might be to simply generate your own plots and output a set of PDFs with the various visualizations you need using standard tools like matplotlib for each test image. I've found tools like XnView make it really easy to look through a lot of PDF visualizations to understand what's going on. I've used this approach quite effectively. If you want to view many hundreds or thousands of results quickly you might have an easier time if all the visuals are just dumped out to a directory.