Alphapose running into error RuntimeError: shape '[32, 3, 3, 3]' is invalid for input of size 255 - google-colaboratory

I try the below code:
https://colab.research.google.com/github/tugstugi/dl-colab-notebooks/blob/master/notebooks/AlphaPose.ipynb#scrollTo=tacNMM2Ue2Rt
but it wont work when I run it. It shows
Traceback (most recent call last):
File "demo.py", line 50, in <module>
det_loader = DetectionLoader(data_loader, batchSize=args.detbatch).start()
File "/content/AlphaPose/dataloader.py", line 277, in __init__
self.det_model.load_weights('models/yolo/yolov3-spp.weights')
File "/content/AlphaPose/yolo/darknet.py", line 488, in load_weights
conv_weights = conv_weights.view_as(conv.weight.data)
RuntimeError: shape '[32, 3, 3, 3]' is invalid for input of size 255

Related

tf.image.ssim() not accepting 'return_index_map' argument

The documentation for Tensorflow's tf.image.ssim() says that if you want to return an element-wise structural similarity map, you can set "return_index_map" to True. It is returning an error when I define the argument as either True or False. I am running Tensorflow 2.10 on an Apple M1 Max.
When I run:
import tensorflow as tf
a = tf.ones((20, 20, 20, 20))
b = tf.zeros((20, 20, 20, 20))
sim = tf.image.ssim(a, b, max_val=1, return_index_map=True)
I get the error:
Traceback (most recent call last):
File "/Users/miguel/opt/miniconda3/envs/py39_tensorflow/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3433, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-4-967be677c86c>", line 1, in <module>
sim = tf.image.ssim(a, b, max_val=1, return_index_map=True)
File "/Users/miguel/opt/miniconda3/envs/py39_tensorflow/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Users/miguel/opt/miniconda3/envs/py39_tensorflow/lib/python3.9/site-packages/tensorflow/python/util/dispatch.py", line 1170, in op_dispatch_handler
result = api_dispatcher.Dispatch(args, kwargs)
TypeError: Got an unexpected keyword argument 'return_index_map'

Tensorflow restore() function throwing incompatible shape error

I am doing tensorflow source code modification, wherein I am modigying the call() function of the source code. Interestingly, the checkpoint restore() is giving the error of mismatch in shape.
But from my understanding the flow of code is: build()->restore()->call().
Although my change in tensorflow code is expected to alter shape in call phase, is it expected that restoring checkpoint which happens before it will give shape mismatch error? Or, there must be another reason for this error?
Traceback (most recent call last):
File "run_classifier.py", line 549, in <module>
app.run(main)
File "/home/arpitj/projects/lib/python3.8/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/arpitj/projects/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "run_classifier.py", line 542, in main
custom_main(custom_callbacks=None, custom_metrics=None)
File "run_classifier.py", line 485, in custom_main
checkpoint.restore(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 2354, in restore
status = self.read(save_path, options=options)
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 2229, in read
result = self._saver.restore(save_path=save_path, options=options)
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 1366, in restore
base.CheckpointPosition(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 254, in restore
restore_ops = trackable._restore_from_checkpoint_position(self) # pylint: disable=protected-access
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 983, in _restore_from_checkpoint_position
current_position.checkpoint.restore_saveables(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 329, in restore_saveables
new_restore_ops = functional_saver.MultiDeviceSaver(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/saving/functional_saver.py", line 339, in restore
restore_ops = restore_fn()
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/saving/functional_saver.py", line 323, in restore_fn
restore_ops.update(saver.restore(file_prefix, options))
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/saving/functional_saver.py", line 115, in restore
restore_ops[saveable.name] = saveable.restore(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 133, in restore
return resource_variable_ops.shape_safe_assign_variable_handle(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 309, in shape_safe_assign_variable_handle
shape.assert_is_compatible_with(value_tensor.shape)
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/framework/tensor_shape.py", line 1171, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (10, 64, 768) and (12, 64, 768) are incompatible

tensorflow.python.framework.errors_impl.NotFoundError while running deep learning model on Google Colaboratory

I'm trying to run on cloud this deep learning model:
https://github.com/razvanmarinescu/brgm#image-reconstruction-with-pre-trained-stylegan2-generators
What I do is simply utilizing their Colab Notebook: https://colab.research.google.com/drive/1G7_CGPHZVGFWIkHOAke4HFg06-tNHIZ4?usp=sharing#scrollTo=qMgE6QFiHuSL
When I try to exectute:
!python recon.py recon-real-images --input=/content/drive/MyDrive/boeing/EDGEconnect/val_imgs --masks=/content/drive/MyDrive/boeing/EDGEconnect/val_masks --tag=brains --network=dropbox:brains.pkl --recontype=inpaint --num-steps=1000 --num-snapshots=1
I receive this error:
args: Namespace(command='recon-real-images', input='/content/drive/MyDrive/boeing/EDGEconnect/val_imgs', masks='/content/drive/MyDrive/boeing/EDGEconnect/val_masks', network_pkl='dropbox:brains.pkl', num_snapshots=1, num_steps=1000, recontype='inpaint', superres_factor=4, tag='brains')
Local submit - run_dir: results/00004-brains-inpaint
dnnlib: Running recon.recon_real_images() on localhost...
Processing image 1/4
Loading networks from "dropbox:brains.pkl"...
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Failed!
Traceback (most recent call last):
File "recon.py", line 270, in <module>
main()
File "recon.py", line 263, in main
dnnlib.submit_run(sc, func_name_map[subcmd], **kwargs)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/submission/submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/submission/internal/local.py", line 22, in submit
return run_wrapper(submit_config)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/submission/submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "/content/drive/MyDrive/boeing/brgm/brgm/recon.py", line 189, in recon_real_images
recon_real_one_img(network_pkl, img_list[image_idx], masks, num_snapshots, recontype, superres_factor, num_steps)
File "/content/drive/MyDrive/boeing/brgm/brgm/recon.py", line 132, in recon_real_one_img
_G, _D, Gs = pretrained_networks.load_networks(network_pkl)
File "/content/drive/MyDrive/boeing/brgm/brgm/pretrained_networks.py", line 83, in load_networks
G, D, Gs = pickle.load(stream, encoding='latin1')
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/network.py", line 297, in __setstate__
self._init_graph()
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/network.py", line 154, in _init_graph
out_expr = self._build_func(*self.input_templates, **build_kwargs)
File "<string>", line 395, in G_synthesis_stylegan2
File "<string>", line 359, in layer
File "<string>", line 106, in modulated_conv2d_layer
File "<string>", line 75, in apply_bias_act
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/ops/fused_bias_act.py", line 68, in fused_bias_act
return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/ops/fused_bias_act.py", line 122, in _fused_bias_act_cuda
cuda_kernel = _get_plugin().fused_bias_act
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/ops/fused_bias_act.py", line 16, in _get_plugin
return custom_ops.get_plugin(os.path.splitext(__file__)[0] + '.cu')
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/custom_ops.py", line 156, in get_plugin
plugin = tf.load_op_library(bin_file)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/_cudacache/fused_bias_act_237d55aca3e3c3ec0547da06888d8e66.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I found that the very last part of an error:
tensorflow.python.framework.errors_impl.NotFoundError: /content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/_cudacache/fused_bias_act_237d55aca3e3c3ec0547da06888d8e66.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Can be solved by changing a flag in Cuda Makefile: https://github.com/mgharbi/hdrnet_legacy/issues/2 or by installing tf 1.14(colab runs on 1.15.2 and this change made no positive effect).
My question is, how can I get rid of this error, is there an option to change smth inside Google Colab's Cuda Makefile?

Use my word_counts.txt to generate image captions

Use my word_counts.txt to generate image captions.
Here's my error code:
Traceback (most recent call last):
File "/root/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 85, in
tf.app.run ()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit (main (_sys.argv [: 1] + flags_passthrough))
File "/root/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 55, in main
vocab = vocabulary.Vocabulary (FLAGS.vocab_file)
File "/root/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/vocabulary.py", line 49, in init
reverse_vocab = [line.split () [0] for line in reverse_vocab]
IndexError: list index out of range
Can someone help me?

Error while testing Tensorflow Object Detection

I've tried all steps of object_detection model installation mentioned #
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md
While testing the installation as mentioned in last step of article, I am getting below error.
ERROR: test_create_ssd_mobilenet_v1_model_from_config (__main__.ModelBuilderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder_test.py", line 193, in test_create_ssd_mobilenet_v1_model_from_config
model = self.create_model(model_proto)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder_test.py", line 53, in create_model
return model_builder.build(model_config, is_training=False)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 73, in build
return _build_ssd_model(model_config.ssd, is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 126, in _build_ssd_model
is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 98, in _build_ssd_feature_extractor
feature_extractor_config.conv_hyperparams, is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/hyperparams_builder.py", line 70, in build
hyperparams_config.regularizer),
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/hyperparams_builder.py", line 119, in _build_regularizer
return slim.l2_regularizer(scale=regularizer.l2_regularizer.weight)
File "/Users/RonakBhavsar/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/regularizers.py", line 92, in l2_regularizer
raise ValueError('scale cannot be an integer: %s' % (scale,))
ValueError: scale cannot be an integer: 1
I get this error for all the models mentioned in the test script. Anyone has any ideas?
We have a pull request out that should fix this issue. Please give that a try.