Problem Description
I'm using tensorflow Estimator API, and have encountered a weird phenomenon.
I'm passing the exact same input_fn to both training and evaluation, and for some reason the images which are provided to the network are not identical.
They seem similar, but after taking a closer look, it seems that evaluation images are ok, but train images are somewhat distorted.
After loading them both, I noticed that for some reason the training images go through some kind of ReLu. I affirmed it with this code, which operates on mat_eval and mat_train, which are tensors that input_fn provides in evaluation and train mode:
special_relu = lambda mat: ((mat - 0.5) / 0.5) * ((mat - 0.5) / 0.5 > 0)
np.allclose(mat_train, special_relu(mat_eval))
>>> True
What I thought and tried
My initial thought was that it is some form of BatchNormalization. But BatchNormalization is supposed to happen within the network, and not as some preprocess, shouldn't it?
What I recorded (using tf.summary.image) was the features['image'] object, passed to my model_fn. And if I understand correctly, the features object is passed to model_fn by the input_fn called by the Estimator object.
Regardless, I tried to remove the parts in the code which are supposed to call the BatchNormalization. This had no effect. Of course, I might have not done that in the right way, but as I said it I don't really think it is BatchNormalization.
Code
from datetime import datetime
from pathlib import Path
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.python.platform import tf_logging as logging
from dcnn import modeling
from dcnn.dv_constants import BATCH_SIZE, BATCHES_PER_EPOCH
from dcnn.variant_io import get_input_fn, num_variants_in_ds
logging.set_verbosity(logging.INFO)
new_checkpoint_name = lambda: f'./train_dir/' \
f'{datetime.now().strftime("%d-%m %H:%M:%S")}'
if __name__ == '__main__':
model_name = 'small_inception'
start_from_checkpoint = ''
# start_from_checkpoint = '/home/yonatan/Desktop/yonas_code/dcnn/train_dir' \
# '/2111132905/model.ckpt-256'
model_dir = str(Path(start_from_checkpoint).parent) if \
start_from_checkpoint else new_checkpoint_name()
test = False
train = True
predict = False
epochs = 1
train_dataset_name = 'same_example'
val_dataset_name = 'same_example'
test_dataset_name = 'same_example'
predict_dataset_name = 'same_example'
model = modeling.get_model(model_name=model_name)
estimator = model.make_estimator( \
batch_size=BATCH_SIZE,
model_dir=model_dir,
params=dict(batches_per_epoch=BATCHES_PER_EPOCH),
use_tpu=False,
master='',
# The target of the TensorFlow standard server to use. Can be the empty string to run locally using an inprocess server.
start_from_checkpoint=start_from_checkpoint)
if train:
train_input_fn = get_input_fn(train_dataset_name, repeat=True)
val_input_fn = get_input_fn(val_dataset_name, repeat=False)
steps = (epochs * num_variants_in_ds(train_dataset_name)) / \
BATCH_SIZE
train_spec = tf.estimator.TrainSpec(input_fn=val_input_fn,
max_steps=steps)
eval_spec = tf.estimator.EvalSpec(input_fn=val_input_fn,
throttle_secs=1)
metrics = tf.estimator.train_and_evaluate(estimator, train_spec,
eval_spec)
print(metrics)
I have plenty of more code to share, but I tried to be concise. If anyone has any idea why this behavior happens, or needs more information, let me know.
Related
When I run following code, I don't get global steps until 5000.
First,
When I run this code, output is what I expect.
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import pandas as pd
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']
train_path = tf.keras.utils.get_file(
"iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
"iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")
train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
train_y = train.pop('Species')
test_y = test.pop('Species')
def input_fn(features, labels, training=True, batch_size=256):
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
# Shuffle and repeat if you are in training mode.
if training:
dataset = dataset.shuffle(1000).repeat()
return dataset.batch(batch_size)
my_feature_columns = []
for key in train.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
#print(my_feature_columns)
But,
After adding classifier code, I don't get global steps.
This is addition of code after writing code above:
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
# Two hidden layers of 30 and 10 nodes respectively.
hidden_units=[30, 10],
# The model must choose between 3 classes.
n_classes=3)
classifier.train(
input_fn=lambda: input_fn(train, train_y, training=True),
steps=5000)
Instead of getting global steps until 5000, I get this output
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpfl0bqu11
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 1061 vs previous value: 1061. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize
I write the same code that my teacher demo but he could run this code. I can't.
How could I get global steps?
I have trained an image multi classification model based on MobileNet-V2(Only the Dense layer has been added), and have carried out full integer quantization(INT8), and then exported model.tflite file, using TF Class () to call this model.
Here is my code to quantify it:
import tensorflow as tf
import numpy as np
import pathlib
def representative_dataset():
for _ in range(100):
data = np.random.rand(1, 96, 96, 3) // random tensor for test
yield [data.astype(np.float32)]
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model/my_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
tflite_quant_model = converter.convert()
tflite_models_dir = pathlib.Path("/tmp/mnist_tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)
tflite_model_quant_file = tflite_models_dir/"mnist_model_quant.tflite"
tflite_model_quant_file.write_bytes(tflite_quant_model)
The accuracy of this model is quite good in the test while training. However, when tested on openmv, the same label is output for all objects (although the probability is slightly different).
I looked up some materials, one of them mentioned TF Classify() has offset and scale parameters, which is related to compressing RGB values to [- 1,0] or [0,1] during training, but this parameter is not available in the official API document.
for obj in tf.classify(self.net , img1, min_scale=1.0, scale_mul=0.5, x_overlap=0.0, y_overlap=0.0):
print("**********\nTop 1 Detections at [x=%d,y=%d,w=%d,h=%d]" % obj.rect())
sorted_list = sorted(zip(self.labels, obj.output()), key = lambda x: x[1], reverse = True)
for i in range(1):
print("%s = %f" % (sorted_list[i][0], sorted_list[i][1]))
return sorted_list[i][0]
So are there any examples of workflow from tensorflow training model to deployment to openmv?
I want to do a neural network training in Tensorflow/Keras but prefer to use python multiprocessing module to maximize use of system resources and save time. What I do is simply like this (I want to run this code on a system without GPU or with one or more GPUs):
import ... (required modules)
from multiprocessing import Pool
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
tf.keras.backend.set_session(sess)
... some tf and non-tf variable initializations...
... some functions to facilitate reading tensorflow datasets in TFRecord format...
... function defining keras model...
# Main worker function
def doWork(args):
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.models import load_model
train_data = read_datasets(...)
val_data = read_datasets(...)
test_data = read_datasets(...)
if (NumGPUs > 1):
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = keras_model(...)
model.compile(...)
else:
model = keras_model(...)
model.compile(...)
model.fit(train_data, epochs=epochs, steps_per_epoch=train_steps, ...)
_, test_acc = model.evaluate(test_data, steps=test_steps)
...log results...
if __name__ == '__main__':
pool = Pool(processes=2)
a1 = <set of parameters for the first run>
a1 = <set of parameters for the second run>
pool.map(doWork, (a1, a2))
I can run this code on different computers and get my results, but some times I face system hangups (especially if I want to abort execution by pressing CTRL+C) or program termination with different errors, and I guess the above is not the right style of combining Tensorflow/Keras and Python multiprocessing. What is the correct style of writing the above code?
For a Deep learning model I defined with tf2.0 keras I need to write a custom loss function.
As this will depend on stuff like entropy and normal log_prob, it would really make my life less misrable if I could use tf.distributions.Normal and use two model outpus as mu and sigma respectivly.
However, as soon as I put this into my loss function, I get the Keras error that no gradient is defined for this function.
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
I tried encalpsulating the call in a tf.contrib.eager.Variable as I read somewhere. Did not help.
What is the trick to use them? I don't see a reason from the fundamental arcitecture why I should not be able to use them in a mixed form.
#this is just an example which does not really give a meaningful result.
import tensorflow as tf
import tensorflow.keras as K
import numpy as np
def custom_loss_fkt(extra_output):
def loss(y_true,y_pred):
dist = tf.distributions.Normal(loc=y_pred,scale=extra_output)
d = dist.entropy()
return K.backend.mean(d)
return loss
input_node = K.layers.Input(shape=(1,))
dense = K.layers.Dense(8,activation='relu')(input_node)
#dense = K.layers.Dense(4,activation='relu')(dense)
out1 = K.layers.Dense(4,activation='linear')(dense)
out2 = K.layers.Dense(4,activation ='linear')(dense)
model = K.Model(inputs = input_node, outputs = [out1,out2])
model.compile(optimizer = 'adam', loss = [custom_loss_fkt(out2),custom_loss_fkt(out1)])
model.summary()
x = np.zeros((1,1))
y1 = np.array([[0.,0.1,0.2,0.3]])
y2 = np.array([[0.1,0.1,0.1,0.1]])
model.fit(x,[y1,y2],epochs=1000,verbose=0)
print(model.predict(x))
I'm trying to benchmark the performance in the inference phase of my Keras model build with the TensorFlow backend. I was thinking that the the Tensorflow Benchmark tool was the proper way to go.
I've managed to build and run the example on Desktop with the tensorflow_inception_graph.pb and everything seems to work fine.
What I can't seem to figure out is how to save the Keras model as a proper .pbmodel. I'm able to get the TensorFlow Graph from the Keras model as follows:
import keras.backend as K
K.set_learning_phase(0)
trained_model = function_that_returns_compiled_model()
sess = K.get_session()
sess.graph # This works
# Get the input tensor name for TF Benchmark
trained_model.input
> <tf.Tensor 'input_1:0' shape=(?, 360, 480, 3) dtype=float32>
# Get the output tensor name for TF Benchmark
trained_model.output
> <tf.Tensor 'reshape_2/Reshape:0' shape=(?, 360, 480, 12) dtype=float32>
I've now been trying to save the model in a couple of different ways.
import tensorflow as tf
from tensorflow.contrib.session_bundle import exporter
model = trained_model
export_path = "path/to/folder" # where to save the exported graph
export_version = 1 # version number (integer)
saver = tf.train.Saver(sharded=True)
model_exporter = exporter.Exporter(saver)
signature = exporter.classification_signature(input_tensor=model.input, scores_tensor=model.output)
model_exporter.init(sess.graph.as_graph_def(), default_graph_signature=signature)
model_exporter.export(export_path, tf.constant(export_version), sess)
Which produces a folder with some files I don't know what to do with.
I would now run the Benchmark tool with something like this
bazel-bin/tensorflow/tools/benchmark/benchmark_model \
--graph=tensorflow/tools/benchmark/what_file.pb \
--input_layer="input_1:0" \
--input_layer_shape="1,360,480,3" \
--input_layer_type="float" \
--output_layer="reshape_2/Reshape:0"
But no matter which file I'm trying to use as the what_file.pb I'm getting a Error during inference: Invalid argument: Session was not created with a graph before Run()!
So I got this to work. Just needed to convert all variables in the tensorflow graph to constants and then save graph definition.
Here's a small example:
import tensorflow as tf
from keras import backend as K
from tensorflow.python.framework import graph_util
K.set_learning_phase(0)
model = function_that_returns_your_keras_model()
sess = K.get_session()
output_node_name = "my_output_node" # Name of your output node
with sess as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
graph_def = sess.graph.as_graph_def()
output_graph_def = graph_util.convert_variables_to_constants(
sess,
sess.graph.as_graph_def(),
output_node_name.split(","))
tf.train.write_graph(output_graph_def,
logdir="my_dir",
name="my_model.pb",
as_text=False)
Now just call the TensorFlow Benchmark tool with my_model.pb as the graph.
You're saving the parameters of this model and not the graph definition; to save that use tf.get_default_graph().as_graph_def().SerializeToString() and then save that to a file.
That said I don't think the benchmark tool will work since it has no way to initialize the variables your model depends on.