Early stopping in Tensorflow using SessionRunHook() or validation monitor - tensorflow

I have a deep neural network which runs fine. However, adding the following code to institute early stopping results in an error:
validation_metrics = {
"accuracy":
tf.contrib.learn.MetricSpec(
metric_fn=tf.contrib.metrics.streaming_accuracy,
prediction_key=tf.contrib.learn.prediction_key.PredictionKey.
CLASSES)}
validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(
x=X_test, y=y_test, early_stopping_rounds=50, metrics=validation_metrics)
Output:
prediction_key=tf.contrib.learn.prediction_key.PredictionKey.CLASSES)}
AttributeError: module 'tensorflow.contrib.learn' has no attribute 'prediction_key'

You can try the following:
prediction_key=tf.contrib.learn.PredictionKey.CLASSES

All monitors has been deprecated.
Your best bet is to look at training hooks. Only vanilla hooks are available as of this writing. So you will need to implement your own.

Related

How to use tf.data.Dataset.ignore_errors to ignore errors in a Tensorflow Dataset?

When loading images from a directory in Tensorflow, you use something like:
dataset = tf.keras.utils.image_dataset_from_directory(
"S:\\Images",
batch_size=32,
image_size=(128,128),
label_mode=None,
validation_split=0.20, #Reserve 20% of images for validation
subset='training', #If we specify a validation_split, we *must* specify subset
seed=619 #If using validation_split we *must* specify a seed to ensure there is no overlap between training and validation data
)
But of course some of the images (.jpg, .png, .gif, .bmp) will be invalid. So we want to ignore those errors; just skip them (and ideally log the filenames so they can be repaired, removed, or deleted).
There have been some ideas along the way of how to ignore invalid images:
Method 1: tf.contrib.data.ignore_errors (Tensorflow 1.x only)
Warning: The tf.contrib module will not be included in TensorFlow 2.0.
Sample usage:
dataset = dataset.apply(tf.contrib.data.ignore_errors())
The only down-side of this method is that it was only available in Tensorflow 1. Trying to use it today simply won't work, as the tf.contib namespace no longer exists. That led to a built-in method:
Method 2: tf.data.experimental.ignore_errors(log_warning=False) (deprecated)
From the documentation:
Creates a Dataset from another Dataset and silently ignores any errors. (deprecated)
Deprecated: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use tf.data.Dataset.ignore_errors instead.
Sample usage:
dataset = dataset.apply(tf.data.experimental.ignore_errors(log_warning=True))
And this method works. It works great. And it has the advantage of working.
But it's apparently deprecated, and they documentation says we should use method 3:
Method 3 - tf.data.Dataset.ignore_errors(log_warning=False, name=None)
Drops elements that cause errors.
Sample usage:
dataset = dataset.ignore_errors(log_warning=True, name="Loading images from directory")
Except it doesn't work
The dataset.ignore_errors attribute doesn't work, and gives the error:
AttributeError: 'BatchDataset' object has no attribute 'ignore_errors'
Which means:
the thing that works is deprecated
they tell us to use this other thing instead
and "provide the instructions for updating"
but the other thing doesn't work
So we ask Stackoverflow:
How do i use tf.data.Dataset.ignore_errors to ignore errors?
Bonus Reading
TensorFlow Dataset `.map` - Is it possible to ignore errors?
TensorFlow: How to skip broken data
Untested Workaround
Not only is it not what i was asking, but people are not allowed to read this:
It looks like the tf.data.Dataset.ignore_errors() method is not
available in the BatchDataset object, which is what you are using in
your code. You can try using tf.data.Dataset.filter() to filter out
elements that cause errors when loading the images. You can use a
try-except block inside the lambda function passed to filter() to
catch the errors and return False for elements that cause errors,
which will filter them out. Here's an example of how you can use
filter() to achieve this:
def filter_fn(x):
try:
# Load the image and do some processing
# Return True if the image is valid, False otherwise
return True
except:
return False
dataset = dataset.filter(filter_fn)
Alternatively, you can use the tf.data.experimental.ignore_errors()
method, which is currently available in TensorFlow 2.x. This method
will silently ignore any errors that occur while processing the
elements of the dataset. However, keep in mind that this method is
experimental and may be removed or changed in a future version.
tf.data.Dataset.ignore_errors was introduced in TensorFlow 2.11. You can use tf.data.experimental.ignore_errors for older versions like so:
dataset.apply(tf.data.experimental.ignore_errors())

Running GluonCV object detection model on Android

I need to run a custom GluonCV object detection module on Android.
I already fine-tuned the model (ssd_512_mobilenet1.0_custom) on a custom dataset, I tried running inference with it (loading the .params file produced during the training) and everything works perfectly on my computer. Now, I need to export this to Android.
I was referring to this answer to figure out the procedure, there are 3 suggested options:
You can use ONNX to convert models to other runtimes, for example [...] NNAPI for Android
You can use TVM
You can use SageMaker Neo + DLR runtime [...]
Regarding the first one, I converted my model to ONNX.
However, in order to use it with NNAPI, it is necessary to convert it to daq. In the repository, they provide a precomplied AppImage of onnx2daq to make the conversion, but the script returns an error. I checked the issues section, and they report that "It actually fails for all onnx object detection models".
Then, I gave a try to DLR, since it's suggested to be the easiest way.
As I understand, in order to use my custom model with DLR, I would first need to compile it with TVM (which also covers the second point mentioned in the linked post). In the repo, they provide a Docker image with some conversion scripts for different frameworks.
I modified the 'compile_gluoncv.py' script, and now I have:
#!/usr/bin/env python3
from tvm import relay
import mxnet as mx
from mxnet.gluon.model_zoo.vision import get_model
from tvm_compiler_utils import tvm_compile
shape_dict = {'data': (1, 3, 300, 300)}
dtype='float32'
ctx = [mx.cpu(0)]
classes_custom = ["CML_mug"]
block = get_model('ssd_512_mobilenet1.0_custom', classes=classes_custom, pretrained_base=False, ctx=ctx)
block.load_parameters("ep_035.params", ctx=ctx) ### this is the file produced by training on the custom dataset
for arch in ["arm64-v8a", "armeabi-v7a", "x86_64", "x86"]:
sym, params = relay.frontend.from_mxnet(block, shape=shape_dict, dtype=dtype)
func = sym["main"]
func = relay.Function(func.params, relay.nn.softmax(func.body), None, func.type_params, func.attrs)
tvm_compile(func, params, arch, dlr_model_name)
However, when I run the script it returns the error:
ValueError: Model ssd_512_mobilenet1.0_custom is not supported. Available options are
alexnet
densenet121
densenet161
densenet169
densenet201
inceptionv3
mobilenet0.25
mobilenet0.5
mobilenet0.75
mobilenet1.0
mobilenetv2_0.25
mobilenetv2_0.5
mobilenetv2_0.75
mobilenetv2_1.0
resnet101_v1
resnet101_v2
resnet152_v1
resnet152_v2
resnet18_v1
resnet18_v2
resnet34_v1
resnet34_v2
resnet50_v1
resnet50_v2
squeezenet1.0
squeezenet1.1
vgg11
vgg11_bn
vgg13
vgg13_bn
vgg16
vgg16_bn
vgg19
vgg19_bn
Am I doing something wrong? Is this thing even possible?
As a side note, after this I'd need to deploy on Android a pose detection model (simple_pose_resnet18_v1b) and an activity recognition one (i3d_nl10_resnet101_v1_kinetics400) as well.
You actually can run GluonCV model directly on Android with Deep Java Library (DJL)
What you need to do is:
hyridize your GluonCV model and save as MXNet model
Build MXNet engine for android, MXNET already support Android build
Include MXNet shared library into your android project
Use DJL in your android project, you can follow this DJL Android demo for PyTorch
The error message is self-explanatory - there is no model "ssd_512_mobilenet1.0_custom" supported by mxnet.gluon.model_zoo.vision.get_model. You are confusing GluonCV's get_model with MXNet Gluon's get_model.
Replace
block = get_model('ssd_512_mobilenet1.0_custom',
classes=classes_custom, pretrained_base=False, ctx=ctx)
with
import gluoncv
block = gluoncv.model_zoo.get_model('ssd_512_mobilenet1.0_custom',
classes=classes_custom, pretrained_base=False, ctx=ctx)

Tensorflow 2.x Agents(TF-Agents, Reinforcement Learning Module) & PySC2

There are pysc2(https://github.com/deepmind/pysc2) & Tensorflow(1.x) and OpenAI-Baselines(https://github.com/openai/baselines), like the following
https://github.com/chris-chris/pysc2-examples
https://github.com/llSourcell/A-Guide-to-DeepMinds-StarCraft-AI-Environment
The TF team has recently come up with a RL implementations(alternative to OpenAi-Baselines) called TF-Agents (https://github.com/tensorflow/agents).
Examples :
https://github.com/tensorflow/agents/blob/master/docs/tutorials/1_dqn_tutorial.ipynb
https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_05_apply_rl.ipynb
https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_04_atari.ipynb
For TF-Agents, you do
env_name = 'CartPole-v0'
train_py_env = suite_gym.load(env_name)
eval_py_env = suite_gym.load(env_name)
q_net = q_network.QNetwork(
train_env.observation_spec(),
train_env.action_spec(),
fc_layer_params=fc_layer_params)
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)
agent = dqn_agent.DqnAgent(
train_env.time_step_spec(),
train_env.action_spec(),
q_network=q_net,
optimizer=optimizer,
td_errors_loss_fn=common.element_wise_squared_loss,
train_step_counter=train_step_counter)
agent.initialize()
For pysc2,
from pysc2.env import environment
from pysc2.env import sc2_env
from pysc2.lib import actions
from pysc2.lib import actions as sc2_actions
from pysc2.lib import features
mineral_env = sc2_env.SC2Env(
map_name="CollectMineralShards",
step_mul=step_mul,
agent_interface_format=AGENT_INTERFACE_FORMAT,
visualize=True)
How do I combine TF-Agents and Pysc2 together?
They are both Google products.
I've recently stumbled on a very similar situation where I wanted to use the hanabi-learning-environment developed by DeepMind with TF-Agents. I'm afraid I have to tell that there is no nice solution to this.
What you must do is fork the DeepMind repo and modify the environment wrapper to be compatible with what TF-Agents requires. It's gonna be quite some work to do especially if you are not familiar with how environments are defined in TF-Agents, but this is definetly something that can be done in about a week of work.
If you want to get an idea of what I did you can look at the original rl_env.py code in the Hanabi repo from DeepMind, and what I modified it into in my repo
I have no idea why DeepMind stick to their structure instead of making their code more compatible, but this is how it is.

NameError: global name 'linear' is not defined

I am trying to run an implementation of attention mechanism by Google DeepMind. However it is based on an older version of TensorFlow and I am getting this error
from tensorflow.models.rnn.rnn_cell import RNNCell, linear
concat = linear([inputs, h, self.c], 4 * self._num_units, True)
NameError: global name 'linear' is not defined
I couldn't find the linear model/function in the new tensorflow documentation. Can anyone help me with this? Thanks!
You need to use the tf.nn.rnn_cell._linear function to make the code work. Have a look at this tutorial for a sample usage.

Multiple GPU training with tensorfow.slim.learning

The new learning module in tensorflow.contrib.slim looks very promising :
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/learning.py
I'm trying to figure out how can I reproduce the CIFAR 10 multi-gpu example (or the ImageNet example) using this new module on a configuration where I have only a single worker node but with several GPU on it.
I've had some success using https://github.com/tensorflow/models/tree/master/slim/deployment
When creating the config object you would set num_clones = [num_gpus]
For example,
config = model_deploy.DeploymentConfig(num_clones=2)
Check out the example in model_deploy.py file.