TFLiteConverter Segmentation Fault when running integer quantization - tensorflow

I'm using tensorflow==1.15.3 and I'm hitting a segmentation fault attempting int8 post-training quantization. The documentation for the 1.15 version of the TFLiteConverter can be found here.
I found a similar issue on github, but their solution to provide --add_postprocessing_op=true has not solved the segmentation fault.
I've debugged it using PDB and found exactly where it crashes. It never reaches my representative_dataset function. It faults when running CreateWrapperCPPFromBuffer(model_content):
> .../python3.6/site-packages/tensorflow_core/lite/python/optimize/calibrator.py(51)__init__()
-> .CreateWrapperCPPFromBuffer(model_content))
(Pdb) s
Fatal Python error: Segmentation fault
Current thread 0x00007ff40ee9f740 (most recent call first):
File ".../python3.6/site-packages/tensorflow_core/lite/python/optimize/calibrator.py", line 51 in __init__
File ".../python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 236 in _calibrate_quantize_model
File ".../python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 993 in convert
File ".../convert_model_to_tflite_int8.py", line 97 in <module>
File "<string>", line 1 in <module>
File "/usr/lib/python3.6/bdb.py", line 434 in run
File "/usr/lib/python3.6/pdb.py", line 1548 in _runscript
File "/usr/lib/python3.6/pdb.py", line 1667 in main
File "/usr/lib/python3.6/pdb.py", line 1694 in <module>
File "/usr/lib/python3.6/runpy.py", line 85 in _run_code
File "/usr/lib/python3.6/runpy.py", line 193 in _run_module_as_main
[1] 17668 segmentation fault (core dumped) python -m pdb convert_model_to_tflite_int8.py --add_postprocessing_op=true
Here is my conversion code:
converter = tf.lite.TFLiteConverter.from_frozen_graph(
graph_def_file=pb_model_path,
input_arrays=["device_0/input_node_name:1"],
output_arrays=["device_0/output_node_name"],
input_shapes={"device_0/input_node_name:1": [100, 16384]}
)
converter.allow_custom_ops = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
def test():
pdb.set_trace()
print(' ! ! ! representative_dataset_gen ! ! ! ')
zeros = np.zeros(shape=(1, 100, 16384), dtype='int8')
ds = tf.data.Dataset.from_tensor_slices((zeros)).batch(1)
for input_value in ds.take(1):
yield [input_value]
converter.representative_dataset = test
pdb.set_trace()
tflite_model = converter.convert()
tflite_model_size = open(model_name, 'wb').write(tflite_model)
print('TFLite Model is %d bytes' % tflite_model_size)
FWIW my model conversion works for tf.float16 (not using representative_dataset there, though).

Upgrading my tf version to 2.3 solved the segmentation fault. My model code isn't compatible with tf==2.x yet, but luckily the conversion code is independent from that so the upgrade went smoothly.

Related

Calling `Model.predict` in graph mode is not supported when the `Model` instance was constructed with eager mode enabled

So I just followed someone project and make it to here when I got this error:
[2020-10-12 15:33:21,128] ERROR in app: Exception on /predict/ [POST]
Traceback (most recent call last):
File "c:\users\mr777\anaconda3\envs\gpu\lib\site-packages\flask\app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "c:\users\mr777\anaconda3\envs\gpu\lib\site-packages\flask\app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "c:\users\mr777\anaconda3\envs\gpu\lib\site-packages\flask\app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "c:\users\mr777\anaconda3\envs\gpu\lib\site-packages\flask\_compat.py", line 39, in reraise
raise value
File "c:\users\mr777\anaconda3\envs\gpu\lib\site-packages\flask\app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "c:\users\mr777\anaconda3\envs\gpu\lib\site-packages\flask\app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "D:\Ngoding Python\Skripsi\deploy\app.py", line 70, in predict
out = model.predict(img)
File "c:\users\mr777\anaconda3\envs\gpu\lib\site-packages\tensorflow\python\keras\engine\training.py", line 130, in _method_wrapper
return method(self, *args, **kwargs)
File "c:\users\mr777\anaconda3\envs\gpu\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1562, in predict
version_utils.disallow_legacy_graph('Model', 'predict')
File "c:\users\mr777\anaconda3\envs\gpu\lib\site-packages\tensorflow\python\keras\utils\version_utils.py", line 122, in disallow_legacy_graph
raise ValueError(error_msg)
ValueError: Calling `Model.predict` in graph mode is not supported when the `Model` instance was constructed with eager mode enabled. Please construct your `Model` instance in graph mode or call `Model.predict` with eager mode enabled.
Here's the code I wrote:
with graph.as_default():
# perform the prediction
out = model.predict(img)
print(out)
print(class_names[np.argmax(out)])
# convert the response to a string
response = class_names[np.argmax(out)]
return str(response)
any idea with this? because I found the same question here
The answer is simple, just load your model inside the graph just like this:
with graph.as_default():
json_file = open('models/model.json','r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
#load weights into new model
loaded_model.load_weights("models/model.h5")
print("Loaded Model from disk")
#compile and evaluate loaded model
loaded_model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
# perform the prediction
out = loaded_model.predict(img)
print(out)
print(class_names[np.argmax(out)])
# convert the response to a string
response = class_names[np.argmax(out)]
return str(response)
#Ilham: Try to wrap the call method in a tf.function, right after defining your network. Something like this:
model = Sequential()
model.call = tf.function(model.call)
I had an issue similar to yours. I solved it just by adding that second line of code.
See the following link for more details: https://www.tensorflow.org/guide/intro_to_graphs

AttributeError: module 'tensorflow.contrib.seq2seq' has no attribute 'prepare_attention'

I am trying to run my code and the code is throwing the error.
Error is mentioned below:
AttributeError: module 'tensorflow.contrib.seq2seq' has no attribute 'prepare_attention'
I updated my tensorflow version to 1.0.0. But the up-gradation did not solved my problem. I also searched in google regarding this error, but i did not got correct solution.
Here is the code part, please have a look.
Getting the training and test predictions
training_predictions, test_predictions = seq2seq_model(tf.reverse(inputs, [-1]),
targets,
keep_prob,
batch_size,
sequence_length,
len(answerswords2int),
len(questionswords2int),
encoding_embedding_size,
decoding_embedding_size,
rnn_size,
num_layers,
questionswords2int)
C:\Users\Maniech\Anaconda3\lib\site-packages\tensorflow_core\python\client\session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).
warnings.warn('An interactive session is already active. This can '
Traceback (most recent call last):
File "<ipython-input-8-aecd893a8ef5>", line 37, in <module>
questionswords2int)
File "C:/Users/Maniech/Desktop/Deep NLP AZ/chatbot.py", line 292, in seq2seq_model
batch_size)
File "C:/Users/Maniech/Desktop/Deep NLP AZ/chatbot.py", line 258, in decoder_rnn
batch_size)
File "C:/Users/Maniech/Desktop/Deep NLP AZ/chatbot.py", line 201, in decode_training_set
attention_keys, attention_values, attention_score_function, attention_construct_function = tf.contrib.seq2seq.prepare_attention(attention_states, attention_option = "bahdanau", num_units = decoder_cell.output_size)
AttributeError: module 'tensorflow.contrib.seq2seq' has no attribute 'prepare_attention'
Any help is appreciated.

TPUEstimator error -- AttributeError: module 'tensorflow.contrib.tpu.python.ops.tpu_ops' has no attribute 'cross_replica_sum'

I have written a tensorflow code using the TPUEstimator, but I am having problems running it in use_tpu=False mode. I would like to run it on my local computer to make sure that all the operations are TPU-compatible. The code works fine with the normal Estimator. Here is my master code:
import logging
from tensorflow.contrib.tpu.python.tpu import tpu_config, tpu_estimator, tpu_optimizer
from tensorflow.contrib.cluster_resolver import TPUClusterResolver
from capser_7_model_fn import *
from capser_7_input_fn import *
import subprocess
from absl import flags
flags.DEFINE_bool(
'use_tpu', False,
'Use TPUs rather than plain CPUs')
tf.flags.DEFINE_string(
"tpu", default='$TPU_NAME',
help="The Cloud TPU to use for training. This should be either the name "
"used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 "
"url.")
tf.flags.DEFINE_string("model_dir", LOGDIR, "Estimator model_dir")
flags.DEFINE_integer(
'save_checkpoints_secs', 1000,
'Interval (in seconds) at which the model data '
'should be checkpointed. Set to 0 to disable.')
flags.DEFINE_integer(
'save_summary_steps', 100,
'Number of steps which must have run before showing summaries.')
tf.flags.DEFINE_integer("iterations", 1000,
"Number of iterations per TPU training loop.")
tf.flags.DEFINE_integer("num_shards", 8, "Number of shards (TPU chips).")
tf.flags.DEFINE_integer("batch_size", 1024,
"Mini-batch size for the training. Note that this "
"is the global batch size and not the per-shard batch.")
FLAGS = tf.flags.FLAGS
if FLAGS.use_tpu:
my_project_name = subprocess.check_output(['gcloud', 'config', 'get-value', 'project'])
my_zone = subprocess.check_output(['gcloud', 'config', 'get-value', 'compute/zone'])
cluster_resolver = TPUClusterResolver(
tpu=[FLAGS.tpu],
zone=my_zone,
project=my_project_name)
master = TPUClusterResolver(tpu=[os.environ['TPU_NAME']]).get_master()
else:
master = ''
my_tpu_run_config = tpu_config.RunConfig(
master=master,
model_dir=FLAGS.model_dir,
save_checkpoints_secs=FLAGS.save_checkpoints_secs,
save_summary_steps=FLAGS.save_summary_steps,
session_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True),
tpu_config=tpu_config.TPUConfig(iterations_per_loop=FLAGS.iterations, num_shards=FLAGS.num_shards),
)
# create estimator for model (the model is described in capser_7_model_fn)
capser = tpu_estimator.TPUEstimator(model_fn=model_fn_tpu,
config=my_tpu_run_config,
use_tpu=FLAGS.use_tpu,
train_batch_size=batch_size,
params={'model_batch_size': batch_size_per_shard})
# train model
logging.getLogger().setLevel(logging.INFO) # to show info about training progress
capser.train(input_fn=train_input_fn_tpu, steps=n_steps)
I have a capsule network defined in model_fn_tpu, which returns the TPUEstimator spec. The optimizer is a standard AdamOptimizer. I have made all the changes explained here https://www.tensorflow.org/guide/using_tpu#optimizer to make my code compatible with TPUEstimator. I get the following error:
Traceback (most recent call last):
File "C:/Users/doerig/PycharmProjects/capser/TPU_playground.py", line 85, in <module>
capser.train(input_fn=train_input_fn_tpu, steps=n_steps)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 363, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 843, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 856, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 831, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 2016, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 1121, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 1317, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "C:\Users\doerig\PycharmProjects\capser\capser_7_model_fn.py", line 101, in model_fn_tpu
**output_decoder_deconv_params)
File "C:\Users\doerig\PycharmProjects\capser\capser_model.py", line 341, in capser_model
loss_training_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step(), name="training_op")
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\training\optimizer.py", line 424, in minimize
name=name)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_optimizer.py", line 113, in apply_gradients
summed_grads_and_vars.append((tpu_ops.cross_replica_sum(grad), var))
AttributeError: module 'tensorflow.contrib.tpu.python.ops.tpu_ops' has no attribute 'cross_replica_sum'
Any ideas to solve this problem? Thank you in advance!
I suspect this is either a bug in the version of TensorFlow you are using + Windows, or else an issue with your build of TensorFlow.
For example, when I chase down the file tensorflow\contrib\tpu\python\tpu\tpu_optimizer.py in the TF 1.4 branch, I see that tpu_ops is imported as:
from tensorflow.contrib.tpu.python.ops import tpu_ops
and if you chase that to the relevant file, you see:
if platform.system() != "Windows":
# pylint: disable=wildcard-import,unused-import,g-import-not-at-top
from tensorflow.contrib.tpu.ops.gen_tpu_ops import *
from tensorflow.contrib.util import loader
from tensorflow.python.platform import resource_loader
# pylint: enable=wildcard-import,unused-import,g-import-not-at-top
_tpu_ops = loader.load_op_library(
resource_loader.get_path_to_datafile("_tpu_ops.so"))
else:
# We have already built the appropriate libraries into the binary via CMake
# if we have built contrib, so we don't need this
pass
Following up with the other TF branches that existed at the time of this posting, we see similar comments in 1.5, in 1.6, in 1.7, in 1.8, and in 1.9.
I strongly suspect this would not occur under Linux, but I might test this later and edit this answer.

How to use tf.train.Saver in SessionRunHook?

I have trained many sub-models, each sub-models is a part of the last model. And then I want to use those pretrained sub models to initial the last model's parameters. I try to use SessionRunHook to load other ckpt file's model parameters to initial the last model's.
I tried the follow code but failed. Hope some advices. Thanks!
The error info is:
Traceback (most recent call last):
File "train_high_api_local.py", line 282, in <module>
tf.app.run()
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "train_high_api_local.py", line 266, in main
clf_.train(input_fn=lambda: read_file([tables[0]], epochs_per_eval), steps=None, hooks=[hook_test]) # input yield: x, y
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 314, in train
.......
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 674, in create_session
hook.after_create_session(self.tf_sess, self.coord)
File "train_high_api_local.py", line 102, in after_create_session
saver = tf.train.Saver([ti]) # TODO: ERROR INFO: Graph is finalized and cannot be modified.
.......
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3135, in create_op
self._check_not_finalized()
File "/Users/zhouliaoming/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2788, in _check_not_finalized
raise RuntimeError("Graph is finalized and cannot be modified.")
RuntimeError: Graph is finalized and cannot be modified.
and the code detail is:
class SetTensor(session_run_hook.SessionRunHook):
""" like tf.train.LoggingTensorHook """
def after_create_session(self, session, coord):
""" Called when new TensorFlow session is created: graph is finalized and ops can no longer be added. """
graph = tf.get_default_graph()
ti = graph.get_tensor_by_name("h_1_15/bias:0")
with session.as_default():
with tf.name_scope("rewrite"):
saver = tf.train.Saver([ti]) # TODO: ERROR INFO: Graph is finalized and cannot be modified.
saver.restore(session, "/Users/zhouliaoming/data/credit_dnn/model_retrain/rm_gene_v2_sall/model.ckpt-2102")
pass
def main(unused_argv):
""" train """
norm_all_func = lambda x: tf.cond(x>1, lambda: tf.log(x), lambda: tf.identity(x))
feature_columns=[[tf.feature_column.numeric_column(COLUMNS[i], shape=fi, normalizer_fn=lambda x: tf.py_func(weight_norm2, [x], tf.float32) )] for i, fi in enumerate(FEA_DIM)] # normlized: running OK!
## use self-defined model
param = {"learning_rate": 0.0001, "feature_columns": feature_columns, "isanalysis": FLAGS.isanalysis, "isall": False}
clf_ = tf.estimator.Estimator(model_fn=model_fn_wide2deep, params=param, model_dir=ckpt_dir)
hook_test = SetTensor(["h_1_15/bias", "h_1_15/kernel"])
epochs_per_eval = 1
for n in range(int(FLAGS.num_epochs/epochs_per_eval)):
# train num_epochs
clf_.train(input_fn=lambda: read_file([tables[0]], epochs_per_eval), steps=None, hooks=[hook_test]) # input yield: x, y
SessionRunHook is not meant for this use case. As the error says, you cannot change the graph once sess.run() has been invoked.
You can assign variables using saver.restore() in your "normal code". You don't have to be inside any hooks.
Also, if you want to restore many variables and can match them to their names and shapes in a checkpoint, you might want to take a look at https://gist.github.com/iganichev/d2d8a0b1abc6b15d4a07de83171163d4. It shows some example code to restore a subset of variables.
You can do this:
class SaveAtEnd(tf.train.SessionRunHook):
def begin(self):
self._saver = # create your saver
def end(self, session):
self._saver.save(session, ...)

Multiple outputs in Keras gives value error

I am implementing a modification of the U-net for semantic segmentation.
I have two outputs from the network :
model = Model(input=inputs, output= [conv10, dense3])
model.compile(optimizer=Adam(lr=1e-5), loss=common_loss, metrics=[common_loss])
where common loss is defined as :
def common_loss(y_true, y_pred):
segmentation_loss = categorical_crossentropy(y_true[0], y_pred[0])
classifiction_loss = categorical_crossentropy(y_true[1], y_pred[1])
return segmentation_loss + alpha * classifiction_loss
When I run this I get an value error as:
File "y-net.py", line 138, in <module>
train_and_predict()
File "y-net.py", line 133, in train_and_predict
callbacks=[model_checkpoint], validation_data=(X_val, [y_img_val, y_class_val]))
File "/home/gpu_users/meetshah/miniconda2/envs/check/lib/python2.7/site-packages/keras/engine/training.py", line 1124, in fit
callback_metrics=callback_metrics)
File "/home/gpu_users/meetshah/miniconda2/envs/check/lib/python2.7/site-packages/keras/engine/training.py", line 848, in _fit_loop
callbacks.on_batch_end(batch_index, batch_logs)
File "/home/gpu_users/meetshah/miniconda2/envs/check/lib/python2.7/site-packages/keras/callbacks.py", line 63, in on_batch_end
callback.on_batch_end(batch, logs)
File "/home/gpu_users/meetshah/miniconda2/envs/check/lib/python2.7/site-packages/keras/callbacks.py", line 191, in on_batch_end
self.progbar.update(self.seen, self.log_values)
File "/home/gpu_users/meetshah/miniconda2/envs/check/lib/python2.7/site-packages/keras/utils/generic_utils.py", line 147, in update
if abs(avg) > 1e-3:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
My implementation and the entire trace can be found here :
https://gist.github.com/meetshah1995/19d54270e8d1b20f814e6c1495facc6a
You can see how to implement multiple metrics with multiple outputs here: https://github.com/EdwardTyantov/ultrasound-nerve-segmentation/blob/master/u_model.py.
model.compile(optimizer=optimizer,
loss={'main_output': dice_coef_loss, 'aux_output': 'binary_crossentropy'},
metrics={'main_output': dice_coef, 'aux_output': 'acc'},
loss_weights={'main_output': 1., 'aux_output': 0.5})
I am not sure, if combined output metrics are supported yet.