'MemoryError' when padding sequences using tensorflow - tensorflow

I am trying to training my model on an AWS instance 'g2.2xlarge' but getting a 'MemoryError' when trying to add paddings to my sequences.
content_array = keras.preprocessing.sequence.pad_sequences(content_array, maxlen=max_sequence_length,
padding='post')
Getting this error:
Traceback (most recent call last):
File "trainer.py", line 185, in <module>
train()
File "trainer.py", line 52, in train
padding='post')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/preprocessing/sequence.py", line 94, in pad_sequences
x = (np.ones((num_samples, maxlen) + sample_shape) * value).astype(dtype)
MemoryError
Any idea why ? I haven't started training the model even.

I was calculating the maximum sequence length incorrectly which led to a huge number. After correcting it I am not having any issues.

Related

Can't save YOLOv4 model because of array shape mismatch

I am able to run transfer learning on YOLOv4 and my custom dataset with the following command (which runs successfully and can identify test images I present to the model):
!./darknet detector train /content/darknet/build/darknet/x64/data/obj.data /content/darknet/build/darknet/x64/cfg/yolov4_train.cfg /content/darknet/build/darknet/x64/yolov4.conv.137 -dont_show
I am using the save_model.py tool from this github site:
!git clone https://github.com/hunglc007/tensorflow-yolov4-tflite
When I enter the following command to save the model it fails:
!python3 save_model.py --weights /content/darknet/build/darknet/x64/backup/yolov4_train_final.weights --output ./checkpoints/yolov4-224 --input_size 224
The failure is a mismatch between the weights saved in training and the expected array shape in the core/utility module utils.py (line 63):
Traceback (most recent call last):
File "save_model.py", line 58, in <module>
app.run(main)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "save_model.py", line 54, in main
save_tf()
File "save_model.py", line 49, in save_tf
utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
File "/content/tensorflow-yolov4-tflite/core/utils.py", line 65, in load_weights
conv_weights = conv_weights.reshape(conv_shape).transpose([2, 3, 1, 0])
ValueError: cannot reshape array of size 4554552 into shape (1024,512,3,3)
I added a debug print, and it looks like the it's getting all the way to the last layer before choking. In other words, the previous layers all get through this line of code in utils.py with a match between the saved weights and the array shape. I think this is somehow related to the fact I'm using image sizes of 224,224,3 instead of 416,416,3, but I did specify that in the input_size. For completeness, here's the last couple of debug prints before the Traceback above:
layer (out_dim, in_dim, height, width) 107 512 1024 1 1
layer (out_dim, in_dim, height, width) 108 1024 512 3 3
If anyone has any ideas, that would be great!

Camera Digit Prediction stopped working after moving to python 3.7, anyone know why?

When I moved my code from an interpreter based python 3.9 and tensorflow to python 3.7 and tensorflow-directml (so I could use my AMD GPU). The training part worked fine when I copied over the code. But when running the model I get an error suddenly complaining about the sizes of the input arrays to my neural network. The error does not occur with the initial interpreter but does with the second one even though the code is identical.
(The shapes of the digit array are the same for both versions (1, 28, 28) - binary image)
def cam_predict_digits(cam):
dig = np.zeros((1, 28, 28))
dig[0, :, :] = np.array(cam)
digit = np.array(dig)
print("predict input shape: " + str(digit.shape))
# Make prediction
prediction = model.predict(digit)
print(prediction)
print(f'Detected is probably: {np.argmax(prediction)}')
Traceback (most recent call last):
File "C:/Z_Uni/Individual_Project/Python_Projects/NeuralNet_GPU/Conv_NN_GPU_Model.py", line 123, in <module>
cam_predict_digits(Processed_Frame)
File "C:/Z_Uni/Individual_Project/Python_Projects/NeuralNet_GPU/Conv_NN_GPU_Model.py", line 74, in cam_predict_digits
prediction = model.predict(digit)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet_GPU\source\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 908, in predict
use_multiprocessing=use_multiprocessing)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet_GPU\source\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 716, in predict
x, check_steps=True, steps_name='steps', steps=steps)
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet_GPU\source\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 2471, in _standardize_user_data
exception_prefix='input')
File "C:\Z_Uni\Individual_Project\Python_Projects\NeuralNet_GPU\source\lib\site-packages\tensorflow_core\python\keras\engine\training_utils.py", line 563, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected conv2d_input to have 4 dimensions, but got array with shape (1, 28, 28)
Process finished with exit code 1
Could anyone explain why this is happening and what I can do to fix it? Thanks

How to convert YOLOv4-CSP darknet weight to Tensorflow format?

How to convert YOLOv4-CSP darknet weights to Tensorflow (tf) format?
I have tried using this repo but it didn't work.
I had this error message:
Traceback (most recent call last):
File "save_model.py", line 58, in <module>
app.run(main)
File "C:\Python37\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Python37\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "save_model.py", line 54, in main
save_tf()
File "save_model.py", line 49, in save_tf
utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
File "D:\swap\20210319\tensorflow-yolov4-tflite\core\utils.py", line 63, in load_weights
conv_weights = conv_weights.reshape(conv_shape).transpose([2, 3, 1, 0])
ValueError: cannot reshape array of size 3791890 into shape (1024,512,3,3)
The repository that you are using doesn't support conversion of Scaled YoloV4 or Yolov4-csp yet. It's still a feature request according to this issue
There's luckily a workaround. I found this repository that does the same thing, only difference being it converts the model to .h5 (keras format) before converting into tensorflow format. This also supports yolov4-csp.
I made a Google Colab notebook that does the conversion, which can be found here.

TF object detection API - Compute evaluation measures failed

I successfully trained a model on my own dataset, exported the inference graph and did the inference on my test dataset.
I now have
the detections as tfrecord file, specified in input config
an eval_config file with the specified metrics set
When I try to compute the measures like in the new object detector inference and evaluation measure computation tutorial with
python object_detection/metrics/offline_eval_map_corloc.py --eval_dir=/media/sf_shared --eval_config_path=/media/sf_shared/eval_config.pbtxt --input_config_path=/media/sf_shared/input_config.pbtxt
It returns this AttributeError:
INFO:tensorflow:Processing file: /media/sf_shared/detections.record
INFO:tensorflow:Processed 0 images...
Traceback (most recent call last):
File "object_detection/metrics/offline_eval_map_corloc.py", line 173, in <module>
tf.app.run(main)
File "/home/chrza/anaconda2/envs/tf27/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/metrics/offline_eval_map_corloc.py", line 166, in main
metrics = read_data_and_evaluate(input_config, eval_config)
File "object_detection/metrics/offline_eval_map_corloc.py", line 124, in read_data_and_evaluate
decoded_dict)
File "/home/chrza/anaconda2/envs/tf27/lib/python2.7/site-packages/tensorflow/models/research/object_detection/utils/object_detection_evaluation.py", line 174, in add_single_ground_truth_image_info
(groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]
AttributeError: 'NoneType' object has no attribute 'size'
Any hints?
I fixed it (temporarily) as follows:
if (standard_fields.InputDataFields.groundtruth_difficult in groundtruth_dict.keys()) and groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]:
if groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult].size or not groundtruth_classes.size:
groundtruth_difficult = groundtruth_dict[standard_fields.InputDataFields.groundtruth_difficult]
In place of the existing lines (195-198) in
object_detection/metrutils/object_detection_evaluation.py
The error is caused due to the fact that, even in the case there is no difficulty flag passed, the size of the object is being checked for.
This is an error if you skipped that parameter in your tf records.
Perhaps this was the intent of the developers, but the clarity of documentation certainly leaves a lot to be desired for.

tensorflow: ValueError: GraphDef cannot be larger than 2GB

This is the error i got
Traceback (most recent call last):
File "fully_connected_feed.py", line 387, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "fully_connected_feed.py", line 289, in main
run_training()
File "fully_connected_feed.py", line 256, in run_training
saver.save(sess, checkpoint_file, global_step=step)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/training/saver.py", line 1386, in save
self.export_meta_graph(meta_graph_filename)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/training/saver.py", line 1414, in export_meta_graph
graph_def=ops.get_default_graph().as_graph_def(add_shapes=True),
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/framework/ops.py", line 2257, in as_graph_def
result, _ = self._as_graph_def(from_version, add_shapes)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/framework/ops.py", line 2220, in _as_graph_def
raise ValueError("GraphDef cannot be larger than 2GB.")
ValueError: GraphDef cannot be larger than 2GB.
I believe it is from the result of this code
weights = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="hidden1")[0]
weights = tf.scatter_nd_update(weights,indices, updates)
weights = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="hidden2")[0]
weights = tf.scatter_nd_update(weights,indices, updates)
I am not sure why my model is getting so big in size (15k steps and 240MB). Any thoughts? thanks!
It's hard to say what is happening without seeing the code, but in general TensorFlow model sizes will not increase with number of steps - they should be fixed.
If the model size is increasing with number of steps, it suggests that the computation graph is being added to on every step. For example, something like:
import tensorflow as tf
with tf.Session() as sess:
for i in xrange(1000):
sess.run(tf.add(1, 2))
# or perhaps sess.run(tf.scatter_nd_update(...)) in your case
will create 3000 nodes in the graph (one for add, one for '1' one for '2' on every iteration). Instead, you want to define your computational graph once and run repeatedly with something like:
import tensorflow as tf
x = tf.add(1, 2)
# or perhaps x = tf.scatter_nd_update(...) in your case
with tf.Session() as sess:
for i in xrange(1000):
sess.run(x)
Which will have a fixed graph of 3 nodes for all the 1000 (and any more) iterations. Hope that helps.