required libraries works only from second time on Сolab when using TF2 for mask rsss - gpu

could you please help me with following problem: I used TF2 for MaskRCNN. The first cell gives such error. I did it looking here
first cell
!apt-get update
!pip3 install scikit-image==0.16.2
!pip3 install opencv-python
!pip3 install tensorflow==2.2.0
!pip3 install keras==2.3.1
ERROR
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
xarray-einstats 0.4.0 requires scipy>=1.6, but you have scipy 1.4.1 which is incompatible.
plotnine 0.8.0 requires scipy>=1.5.0, but you have scipy 1.4.1 which is incompatible.
jaxlib 0.3.25+cuda11.cudnn805 requires scipy>=1.5, but you have scipy 1.4.1 which is incompatible.
jax 0.3.25 requires scipy>=1.5, but you have scipy 1.4.1 which is incompatible.
google-api-core 2.11.0 requires google-auth<3.0dev,>=2.14.1, but you have google-auth 1.35.0 which is incompatible.
screenshot of this error
Then I run the same cell again without changing anything and it works without error. I can’t understand why it does this and why it doesn’t work the first time, but the second.
By the way I tried to download the necessary packages indicating the version, anyway, the second cell worked without error on the second restart, I also tried not to download the packages at all and they were by default in Colab there, the same story, the second cell worked without error on the second restart.
and in the same project I have another problem and if they are related please tell me. I can't figure out why Google Collab doesn't use the GPU to train the model. The training for epoch 2 lasted 2 hours. Colab uses only RAM and no GPU, although I indicated Runtime type - GPU.
The configuration is like this and I have 300 images for training with size 640*480.
class CbcConfig(mrcnn.config.Config):
NAME = "cbc_cfg"
# Train on 1 GPU and 2 images per GPU. We can put multiple images on each
# GPU because the images are small. Batch size is 2 (GPUs * images/GPU).
GPU_COUNT = 1
IMAGES_PER_GPU = 2
# number of classes (background + 3 blood cells)
NUM_CLASSES = 1+3
LEARNING_RATE = 0.001
STEPS_PER_EPOCH = 100

Related

Using Huggingface pipeline transformers on Mac M1, fresh PyTorch install errors

I am running a very basic sentiment analysis pipeline utilising the XLM-Roberta model on Huggingface. I am trying to ensure I am utilising the M1 chip as I will be looping over ~10e7 entries.
So as to be consistent I am running a fresh install of PyTorch following the yml file and step outlined in this (very useful) video, I subsequently pip install sentence-piece and protobuf (version 3.2.0) to deal with a few subsequent errors. When running a simple pipeline model I am however faced with the below:
# Imports
import pandas as pd
import datetime as dt
import itertools
from transformers import pipeline, AutoTokenizer
sentiment_model = pipeline(model="cardiffnlp/twitter-xlm-roberta-base-sentiment", return_all_scores = True)
ValueError: google.__spec__ is None
Interesting following the install methods for Tensorflow from the same channel runs fine but does not access the M1 chip and simply runs on CPU.
Has anyone faced this prior or have a method such that I can run PyTorch?
Many thanks in advance.

Error when saving model with tensorflow-agents

I am trying to save a model with tensorflow-agents. First I define the following:
collect_policy = tf_agent.collect_policy
saver = PolicySaver(collect_policy, batch_size=None)
and then save the model like this:
saver.save('my_directory/')
This works ok in google colab but I am getting the following error in my local PC.
AttributeError: module 'tensorflow.python.saved_model.nested_structure_coder' has no attribute 'StructureCoder'
These are the library versions I am using:
tensorflow 2.9.1
tf-agents 0.11.0
Tl;dr
Make sure you have the right tensorflow-probability version that is compatible for 2.9.x and tf-agents 0.11.0
pip uninstall tensorflow-probability
pip install tensorflow-probability==0.17.0
(0.19.0 for TF 2.11, 0.18.0 for TF 2.10 or look at the release notes)
Also make sure to restart your kernel from the notebook.
What the problem was
StructureCoder has been moved to tensorflow API. Therefore, other dependent libraries have made changes like this in tf-agent and like this in tensorflow-probability. Your machine is somehow picking up an older version that depends on the previous version of nested_structure_coder.
For me, I was using
tensorflow 2.9.0
tf-agents 0.13.0
tensorflow-probabilities 0.17.0
Try making an explicit import in your notebook:
import tensorflow_probability
print(tensorflow_probability.__version__) # was 0.17.0 for me

How to fix Illegal instruction (core dumped)

Hi i am trying to fix this issue when i run python3 brain.py below i get this error
Illegal instruction (core dumped)
from imageai.Prediction import ImagePrediction
import os
execution_path=os.getcwd()
prediction = ImagePrediction()
prediction.setModelTypeAsSqueezeNet()
prediction.setModelPath(os.path.join(execution_path, "squeezenet_weights_tf_dim_ordering_tf_kernels.h5"))
prediction.loadModel()
predictions, probabilities = prediction.predictImage(os.path.join(execution_path, "giraffe.jpg"), result_count=5 )
for eachPrediction, eachProbability in zip(predictions, probabilities):
print(eachPrediction , " : " , eachProbability)
I have tried to downgrade Tensorflow to 1.5.0 but then after i run that i get these errors
[ons mar 25 23:11:45] Jonathan#Whats next?:~/ReallySmartBrain$ pip3 install tensorflow==1.5.0
Defaulting to user installation because normal site-packages is not writeable
ERROR: Could not find a version that satisfies the requirement tensorflow==1.5.0 (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 1.15.0rc1, 1.15.0rc2, 1.15.0rc3, 1.15.0, 1.15.2, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.1.0rc0, 2.1.0rc1, 2.1.0rc2, 2.1.0, 2.2.0rc0, 2.2.0rc1)
ERROR: No matching distribution found for tensorflow==1.5.0
The other solution is to compile it from source code but i'don't have any idea to compile it from source code.
Can i fix this anyway?
I had the same problem. It seems that this problem is for older CPUs. As you said, one solution is with a downgrade to tensorflow 1.5.0.
The other solution (that one that worked for me) is to build tensorflow from source.
I compiled the version 2.1.0, it took me around 25 hours with a Intel(R) Pentium(R) Dual CPU T2370 # 1.73GHz and 2GB RAM.
You would need to install the proper version of Bazel. Find below the complete instructions from tensorflow:
https://www.tensorflow.org/install/source
I needed to add a swap file of 4GB. Otherwise you will go out of memory during the compilation.
Anyway, I have uploaded my .whl file in case you don't want to expend 25 hours (or more) to compile your own file:
https://drive.google.com/open?id=1ISgMcDiCw5W5MFvS5Zbme6pNBbA7xWMH
I've got a similar problem with an old CPU on a vintage Mac OS now running Linux because no recent Mac OS can run on it. I've tried to compile from sources and got a bunch of problems (Bazel, compiler, flags, dependencies, ...). Finally, I've lost few hours to learn that could be a real nightmare. Good advice, don't even try it!

module 'tensorflow' has no attribute 'logging'

I'm trying to run a tensorflow code in v2.0 and I'mg getting the following error
AttributeError: module 'tensorflow' has no attribute 'logging'
I don't want to simply remove it from the code.
why this code has been removed?
why should I do instead?
tf.logging was for Logging and Summary Operations and in TF 2.0 it has been removed in favor of the open-source absl-py, and to make the main tf.* namespace has functions that will be used more often.
In TF.2 lesser used functions are gone or moved into sub-packages like tf.math
So instead of tf.logging you could:
tf_upgrade_v2 will upgrade script and changes tf.logging to tf.compat.v1.logging
Python logging module can be used instead
Import absl-py library
If you are using someone else's code it's better to install same Tensorflow version as the author used, or downgrade your Tensorflow version. You may wanna try this:
pip install tensorflow==1.15.0
Or if you have gpu:
pip install tensorflow-gpu==1.15.0
You may still get depricated warnings, however you don't need to modify several files replacing tf with tf.compat.v1

MirroredStrategy without NCCL

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 x64
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 1.8.0
Python version: 3.6
Bazel version (if compiling from source): -
GCC/Compiler version (if compiling from source): -
CUDA/cuDNN version: 9.0
GPU model and memory: 3.5
Exact command to reproduce: simple_tfkeras_example.py
I would like to use MirroredStrategy to use multiple GPUs in the same machine. I tried one of the examples:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/distribute/python/examples/simple_tfkeras_example.py
The result is:
ValueError: Op type not registered 'NcclAllReduce' in binary running on RAID. Make sure the Op and Kernel are registered in the binary running in this process. while building NodeDef 'NcclAllReduce'
I am using Windows, therefore Nccl is not available. Is it possible to force TensorFlow not to use this library?
There are some binaries for NCCL on Windows, but they can be quite annoying to deal with.
As an alternative, Tensorflow gives you three other options in MirroredStrategy that are compatible with Windows natively. They are Hierarchical Copy, Reduce to First GPU, and Reduce to CPU. What you are most likely looking for is Hierarchical Copy, but you can test each of them to see what gives you the best result.
If you are using tensorflow versions older than 2.0, you will use tf.contrib.distribute:
# Hierarchical Copy
cross_tower_ops = tf.contrib.distribute.AllReduceCrossTowerOps(
'hierarchical_copy', num_packs=number_of_gpus))
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)
# Reduce to First GPU
cross_tower_ops = tf.contrib.distribute. ReductionToOneDeviceCrossTowerOps()
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)
# Reduce to CPU
cross_tower_ops = tf.contrib.distribute. ReductionToOneDeviceCrossTowerOps(
reduce_to_device="/device:CPU:0")
strategy = tf.contrib.distribute.MirroredStrategy(cross_tower_ops=cross_tower_ops)
After 2.0, you only need to use tf.distribute! Here is an example setting up an Xception model with 2 GPUs:
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"],
cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
with strategy.scope():
parallel_model = Xception(weights=None,
input_shape=(299, 299, 3),
classes=number_of_classes)
parallel_model.compile(loss='categorical_crossentropy', optimizer='rmsprop')