Data augmentation in Tensorflow using Estimator API and TFRecords dataset - tensorflow

I'm using Tensorflow's 1.3 Estimator API to perform some image classification. Since I have a considerable amount of data, I gave the TFRecords a go. Saved the file and can read the examples to a Dataset using a parser function inside the input_fn of the estimator model. So far so good.
The issue is when I want to do some image augmentation (rotating and shearing in this case).
1) I tried using the tf.contrib.keras.preprocessing.image.random_shearand the likes. Turns out Keras doesn't like the format of TF's shape ('Dimension') and I can't cast it to a list because its arguments are the axis indexes not the actual value.
2) Then I tried using the tf.contrib.image.rotate and tf.contrib.image.transform with random values in my chosen range. This time I get an error of NotFoundError: Op type not registered 'ImageProjectiveTransform' in binary running on MYPC. Make sure the Op and Kernel are registered in the binary running in this process. which is an open issue (https://github.com/tensorflow/tensorflow/issues/9672). At the moment I can't move from Windows, so I would very interested in possible alternatives.
3) Searched for a way to read TFRecords and transform it to numpy array and do the augmentation with other tools, but can't find a way from within the input_fn from where I can't access the session.
Thanks!

Have you tried using function from the answer to the question below?tensorflow: how to rotate an image for data augmentation?

Related

how to manage batches for model.provide_groundtruth

I'm trying to use TensorFlow 2 Object Detection API with a custom dataset for multi classes to train an SSD, I took as base the example provide by the documentation: https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb
My current problem is when I start the fine tuning:
InvalidArgumentError: The first dimension of paddings must be the rank
of inputs[2,2] [6] [Op:Pad]
That seems to be related with the section of model.provide_groundtruth on train_step_fn, as I mention I took my data from a TensorFlow record, I mapped this to a dataset and divide it into batches using padded_batches(tf.data.TFRecordDataset) seems that this is the correct to feed the training with the image but now my problem is the groundtruth because this now is also converted to batches [batch_size,num_detections,coordinate_bbox], is this the problem? any idea on how to fix this issue.
Thanks
P.S. I tried to used the version of modified the pipeline.config file and run the model_main_tf2.py as was in the past with TensorFlow 1 but this method is buggy.
Just to share with everyone this resolves my issue was that I manage to split the data into batches the images and ground truth correctly but I never convert my labels to one hot vector encoding.

Tensorflow Hub Image Modules: Clarity on Preprocessing and Output values

Many thanks for support!
I currently use TF Slim - and TF Hub seems like a very useful addition for transfer learning. However the following things are not clear from the documentation:
1. Is preprocessing done implicitly? Is this based on "trainable=True/False" parameter in constructor of module?
module = hub.Module("https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1", trainable=True)
When I use Tf-slim I use the preprocess method:
inception_preprocessing.preprocess_image(image, img_height, img_width, is_training)
2.How to get access to AuxLogits for an inception model? Seems to be missing:
import tensorflow_hub as hub
import tensorflow as tf
img = tf.random_uniform([10,299,299,3])
module = hub.Module("https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1", trainable=True)
outputs = module(dict(images=img), signature="image_feature_vector", as_dict=True)
The output is
dict_keys(['InceptionV3/Mixed_6b', 'InceptionV3/MaxPool_5a_3x3', 'InceptionV3/Mixed_6c', 'InceptionV3/Mixed_6d', 'InceptionV3/Mixed_6e', 'InceptionV3/Mixed_7a', 'InceptionV3/Mixed_7b', 'InceptionV3/Conv2d_2a_3x3', 'InceptionV3/Mixed_7c', 'InceptionV3/Conv2d_4a_3x3', 'InceptionV3/Conv2d_1a_3x3', 'InceptionV3/global_pool', 'InceptionV3/MaxPool_3a_3x3', 'InceptionV3/Conv2d_2b_3x3', 'InceptionV3/Conv2d_3b_1x1', 'default', 'InceptionV3/Mixed_5b', 'InceptionV3/Mixed_5c', 'InceptionV3/Mixed_5d', 'InceptionV3/Mixed_6a'])
These are excellent questions; let me try to give good answers also for readers less familiar with TF-Slim.
1. Preprocessing is not done by the module, because it is a lot about your data, and not so much about the CNN architecture within the module. The module only handles transforming input values from the canonical [0,1] range into whatever the pre-trained CNN within the module expects.
Lengthy rationale: Preprocessing of images for CNN training usually consists of decoding the input JPEG (or whatever), selecting a (reasonably large) random crop from it, random photometric and geometric transformations (distort colors, flip left/right, etc.), and resizing to the common image size for a batch of training inputs. The TensorFlow Hub modules that implement https://tensorflow.org/hub/common_signatures/images leave all of that to your code around the module.
The primary reason is that the suitable random transformations depend a lot on your training task, but not on the architecture or trained state weights of the module. For example, color distortions will help if you classify cars vs dogs, but probably not for ripe vs unripe bananas, and so on.
Also, a batch of images that have been decoded but not yet cropped/resized are hard to represent as a single tensor (unless you make it a 1-D tensor of encoded strings, but that brings other problems, such as breaking backprop into module inputs for advanced uses).
Bottom line: The Python code using the module needs to do image preprocessing (except scaling values), for example, as in https://github.com/tensorflow/hub/blob/master/examples/image_retraining/retrain.py
The slim preprocessing methods conflate the dataset-specific random transformations (tuned for Imagenet!) with the re-scaling to the architecture's value range (which the Hub module does for you). That means they are not directly applicable here.
2. Indeed, auxiliary heads are missing from the initial set of modules published under tfhub.dev/google/..., but I expect them to work fine for re-training anyways.
More details: Not all architectures have auxiliary heads, and even the original Inception paper says their effect was "relatively minor" [Szegedy&al. 2015; ยง5]. Using an image feature vector module for a custom classification task would burden the module consumer code with checking for aux features and, if found, putting aux logits and a loss term on top.
This complication did not seem to pull its weight, but more experiments might refute that assessment. (Please share in a GitHub issue if you know of any.)
For now, the only way to put an aux head onto https://tfhub.dev/google/imagenet/inception_v3/feature_vector/1 is to copy&paste some lines from https://github.com/tensorflow/models/blob/master/research/slim/nets/inception_v3.py (search "Auxiliary head logits") and apply that to the "Inception_V3/Mixed_6e" output that you saw.
3. You didn't ask, but: For training, the module's documentation recommends to pass hub.Module(..., tags={"train"}), or else batch norm operates in inference mode (and dropout, if the module had any).
Hope this explains how and why things are.
Arno (from the TensorFlow Hub developers)

Hyperparameter tune for Tensorflow with hyper-engine

I find hyper-engine python tool on
https://github.com/maxim5/hyper-engine.
The example in only using mnist.
https://github.com/maxim5/hyper-engine/tree/master/hyperengine/examples.
How can I feed my own data like this example below:
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/5_DataManagement/build_an_image_dataset.ipynb
HyperEngine supports custom data provider, the closest example is this one: it's generating word pairs from the text, not images, but the API is more or less clear. Basically, you only need to implement next_batch method:
def next_batch(self, batch_size):
pass
So if you want to train your network on a set of images on a disk, you simply need to write an iterator over files and yield numpy arrays upon calling the next batch.
But there is a but. Currently, HyperEngine is accepting only numpy arrays from next_batch. The example you refer to is working with TF queue API and read_images function is producing tensors, so you can't simply copy the code. Hopefully, will be a better support for various tensorflow APIs, including estimators, dataset API, queues, etc.

TensorFlow input pipeline for deployment on CloudML

I'm relatively new to TensorFlow and I'm having trouble modifying some of the examples to use batch/stream processing with input functions. More specifically, what is the 'best' way to modify this script to make it suitable for training and serving deployment on Google Cloud ML?
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/text_classification.py
Something akin to this example:
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census/estimator/trainer
I can package it up and train it in the cloud, but I can't figure out how to apply even the simple vocab_processor transformations to an input tensor. I know how to do it with pandas, but there I can't apply the transformation to batches (using the chunk_size parameter). I would be very happy if I could reuse my pandas preprocessing pipelines in TensorFlow.
I think you have 3 options
1) You cannot reuse pandas preprocessing pipelines in TF. However, you could start TF with the output of your pandas preprocessing. So you could build a vocab and convert the text words to integers, and save a new preprocessed dataset to disk. Then read the integer data (which is encoding your text) in TF to do training.
2) You could build a vocab outside of TF in pandas. Then inside TF, after reading the words, you can make a table to map the text to integers. But if you are going to build a vocab outside of TF, you might as well do the transformation at the same time outside of TF, which is option 1.
3) Use tensorflow_transform. You can call tft.string_to_int() on the text column to automatically build the vocab and convert to integers. The output of tensorflow_transform is preprocessed data in tf.example format. Then training can start from the tf.example files. This is again option 1 but with tf.example files. If you want to run prediction on raw text data, this option allows you to make an exported graph that has the same text preprocessing built in, so you don't have to manage the preprocessing step at prediction time. However, this option is the most complicated as it introduces two additional ideas: tf.example files and beam pipelines.
For examples of tensorflow_transform see https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/criteo_tft
and
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/reddit_tft

How should I structure my labels for TensorFlow?

I'm trying to use TensorFlow to train output servo commands given an input image.
I plan on using a file as #mrry suggested in this question, with the images like so:
../some/path/some_img.JPG *some_label*
My question is, what are the label formats I can provide to TensorFlow and what structures are suggested?
My data is basically n servo commands from 0-10 seconds. A vector would work great:
[0,2,4,3]
or similarly:
[0,.25,.4,.3]
I couldn't find much about labels in the docs. Can anyone shed any light on TensorFlow labels?
And a very related question is what is the best way to structure these for TensorFlow to properly learn from them?
In Tensorflow Labels are just generic tensor. You can use any kind of tensor to store your labels. In your case a 1-D tensor with shape (4,) seems to be desired.
Labels do only differ from the rest of the data by its use in the computational graph. (Usually) labels should only be used inside the loss function while you propagate the other data through the whole network. For your problem a 4-d regression function should work.
Also, look at my newest comment to the (old) question. Using the slice_input_producer seems to be preferable in your case.