Google Colab Crash When Loading Pre-Trained Transformer - google-colaboratory

I am new to transformers. I have fine-tuned my MT5 model in Google Colab and saved it with save_pretrained(PATH). However, when I tried to load the model with MT5ForConditionalGeneration.from_pretrained(PATH), Google Colab keep crashing with log error:
terminate called after throwing an instance of 'std::__ios_failure'
what(): basic_filebuf::underflow error reading the file: iostream error
I suspect that it was out-of-memory problem, but I have no clue how to fix it.

Related

When trying to convert StyleGan2 Pickle File, getting pickle.load() error: 'No module named 'torch_utils.persistence'

I trained a StyleGan2-ADA model on a custom dataset which generated a .pkl file. I'm now trying to load the .pkl file so that I can convert it to a .pt file, but when I load the .pkl file using:
pickle.load(f)
I'm getting a ModuleNotFoundError: No module named 'torch_utils.persistence'
I've installed torch_utils and other dependencies, but for loading the file I'm not sure how to fix this issue. If anyone has had this issue in loading a .pkl file any help would be greatly appreciated!!
Same issue on Github here, but no clear solution.
Have tried installing torch_utils multiple times, but error still persists

GANs with TensorFlow

Recently, I started one of the tutorial for GANs created by TensorFlow.
See link: https://github.com/PacktPublishing/Advanced-Deep-Learning-with-Keras/blob/master/chapter4-gan/dcgan-mnist-4.2.1.py
But in the initializer (i.e. if name == 'main') I am having to issues to run it, since I run it
from Colab and is a .py program. So that, it says that error:
usage: ipykernel_launcher.py [-h] [-g GENERATOR]
ipykernel_launcher.py: error: unrecognized arguments: -f
/root/.local/share/jupyter/runtime/kernel-bcd12960-b051-4b8d-b6b0-3ff02367dbcc.json
An exception has occurred, use %tb to see the full traceback.
Would anyone know how to debug it for .ipynb files?
Thank you in advance

Create Version Failed. Bad model detected with error: "Error loading the model" - AI Platform Prediction

I created a model through AI Platform UI that uses a global endpoint. I am trying to deploy a basic tensorflow 1.15.0 model I exported using the Saved Model builder. When I try to deploy this model I get a Create Version Failed. Bad model detected with error: "Error loading the model" error in the UI and the I see the following in the logs:
ERROR:root:Failed to import GA GRPC module. This is OK if the runtime version is 1.x
Failure: Could not reach metadata service: Internal Server Error.
ERROR:root:Command '['/tools/google-cloud-sdk/bin/gsutil', '-o', 'GoogleCompute:service_account=default', 'cp', '-R', 'gs://cml-365057443918-1608667078774578/models/xsqr_global/v6/7349456410861999293/model/*', '/tmp/model/0001']' returned non-zero exit status 1.
ERROR:root:Error loading model: 'generator' object has no attribute 'next'
ERROR:root:Error loading the model
Framework/ML runtime version: Tensorflow 1.15.0
Python: 3.7.3
What is strange is that the gcloud ai-platform local predict works correctly with this exported model, and I can deploy this exact same model on a regional endpoint with no issues. It only gives this error if I try to use a global endpoint model. But I need the global endpoint because I plan on using a custom prediction routine (if I can get this basic model working first).
The logs seem to suggest an issue with copying the model from storage? I've tried giving various IAM roles additional viewer permissions, but I still get the same errors.
Thanks for the help.
I think it's the same issue as https://issuetracker.google.com/issues/175316320
The comment in the issue says the fix is now rolling out.
Today I faced the same error (ERROR: (gcloud.ai-platform.versions.create) Create Version failed. Bad model detected with error: "Error loading the model") & for those who wants a summary:
The recommendation is to use n1* machine types (for example: n1-standard-4) via regional endpoints (for example: us-central1) instead of mls1* machines while deploying version. Also I made sure to mention the same region (us-central1) while creating the model itself using the below command, thereby resolving the above mentioned error.
!gcloud ai-platform models create $model_name
--region=$REGION

Create New Python 3 notebook in Colaboratory Fails with error message

I have recently started to use Colaboratory and I am trying to create a new notebook.
But when I try to Open new notebook (New Python 3 notebook) if fails with this error message:
Notebook loading error
There was an error loading this notebook. Ensure that the file is accessible and try again.
https://drive.google.com/drive/?action=locate&id=1Hfx8Cl68kYnKZu90U5TADO0XqKsBq_fw&authuser=0
[object Object]
Error: [object Object]
at d (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab_20180222_085323-RC01_186629092:1135:347)
at Object.next (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab_20180222_085323-RC01_186629092:1135:493)
at b (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab_20180222_085323-RC01_186629092:522:42)
at
I saved a copy of the Welcome to Colaboratory notebook to my Google Drive.
When I try to Open Drive notebook, and select the notebook copy it crashes with the same error message above.
What else can I try to create a new notebook?
Thanks
pause your adblock extension for the colab page and it will work. Ad blocks may block some js file
You need to active All cookies on Cookie Control in your browser:

Python Configuration Error when build retrain.py by bazel, following google doc

I am learning transfer learning according to How to Retrain Inception's Final Layer for New Categories however, when I build 'retrain.py' using bazel, the following error ocures:
The error message is:
python configuration error:'PYTHON_BIN_PATH' environment variable is not set and referenced by '//third_party/py/numpy:headers'
I am so sorry, I have done my best to display the error image.unfortunately, I failed.
I use python2.7, anaconda2 and bazel0.6.1, tensorflow1.3.
appreciate for your any reply !