SentencePiece in Google Colab - google-colaboratory

I want to use sentencepiece, from https://github.com/google/sentencepiece in a Google Colab project where I am training an OpenNMT model. I'm a little confused with how to set up the sentencepiece binaries in Google Colab. Do I need to build with cmake?
When I try and install using pip install sentencepiece and try to include sentencepiece in my "transforms" in my script, I get this following error
After running this script (matched from the OpenNMT translation tutorial)
!onmt_build_vocab -config en-sp.yaml -n_sample -1
I get:
Traceback (most recent call last):
File "/usr/local/bin/onmt_build_vocab", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/onmt/bin/build_vocab.py", line 63, in main
build_vocab_main(opts)
File "/usr/local/lib/python3.7/dist-packages/onmt/bin/build_vocab.py", line 32, in build_vocab_main
transforms = make_transforms(opts, transforms_cls, fields)
File "/usr/local/lib/python3.7/dist-packages/onmt/transforms/transform.py", line 176, in make_transforms
transform_obj.warm_up(vocabs)
File "/usr/local/lib/python3.7/dist-packages/onmt/transforms/tokenize.py", line 110, in warm_up
load_src_model.Load(self.src_subword_model)
File "/usr/local/lib/python3.7/dist-packages/sentencepiece/__init__.py", line 367, in Load
return self.LoadFromFile(model_file)
File "/usr/local/lib/python3.7/dist-packages/sentencepiece/__init__.py", line 171, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
Below is how my script is written. I'm not sure what the not a string is coming from.
## Where the samples will be written
save_data: en-sp/run/example
## Where the vocab(s) will be written
src_vocab: en-sp/run/example.vocab.src
tgt_vocab: en-sp/run/example.vocab.tgt
## Where the model will be saved
save_model: drive/MyDrive/Europarl/model/model
# Prevent overwriting existing files in the folder
overwrite: False
# Corpus opts:
data:
europarl:
path_src: train_europarl-v7.es-en.es
path_tgt: train_europarl-v7.es-en.en
transforms: [sentencepiece, filtertoolong]
weight: 1
valid:
path_src: dev_europarl-v7.es-en.es
path_tgt: dev_europarl-v7.es-en.en
transforms: [sentencepiece]
skip_empty_level: silent
world_size: 1
gpu_ranks: [0]
...
EDIT: So I went ahead and Googled the issue more and found a google colab project that built sentencepiece using cmake here https://colab.research.google.com/github/mymusise/gpt2-quickly/blob/main/examples/gpt2_quickly.ipynb#scrollTo=dDAup5dxDXZW. However, even after building using cmake, I'm still getting this issue.

To fix this issue, I had to filter and tokenize my dataset and then train with sentencepiece. I used the scripts from this helpful source: https://github.com/ymoslem/MT-Preparation to do everything and now my model is training!

Related

Python libraries being incompatible with each other

I am trying to reproduce a piece of code source for Image Segmentation which involves the use of the
keras
and
segmentation-models
libraries.
I was able to run it in another PC running Python 3.9 and tensorflow==2.10.0, keras==2.10.0, segmentation-models==1.0.0. However, when I am trying to reproduce it on my PC with tensorflow==2.11.0, keras==2.11.0, and segmentation-models==1.0.1, I get this error:
File "/home/dev/anaconda3/lib/python3.9/site-packages/efficientnet/__init__.py", line 71, in init_keras_custom_objects
keras.utils.generic_utils.get_custom_objects().update(custom_objects)
AttributeError: module 'keras.utils.generic_utils' has no attribute 'get_custom_objects'
Library versions seem to be consistent with what is mentioned in requirements of source.
I tried downgrading keras, tensorflow, and segmentation-models versions to the older ones (which I reckoned had worked on the earlier PC), but that again shows up further compatibility issues. On further digging, I found a link which seemed to give the right requirements. I created a new conda environment with Python==3.7.0, and have again downgraded the versions into those suggested in link. I still am getting newer issues:
(sm) dev#Turbo:/media/dev/WinD/Curious Dev B/PROJECT STAGE - II$ python3 debug.py
Using TensorFlow backend.
Segmentation Models: using `keras` framework.
Traceback (most recent call last):
File "debug.py", line 150, in <module>
model = sm.Unet(BACKBONE, classes=n_classes, activation=activation)
File "/home/dev/anaconda3/envs/sm/lib/python3.7/site-packages/segmentation_models/__init__.py", line 34, in wrapper
return func(*args, **kwargs)
File "/home/dev/anaconda3/envs/sm/lib/python3.7/site-packages/segmentation_models/models/unet.py", line 226, in Unet
**kwargs,
File "/home/dev/anaconda3/envs/sm/lib/python3.7/site-packages/segmentation_models/backbones/backbones_factory.py", line 103, in get_backbone
model = model_fn(*args, **kwargs)
File "/home/dev/anaconda3/envs/sm/lib/python3.7/site-packages/classification_models/models_factory.py", line 78, in wrapper
return func(*args, **new_kwargs)
File "/home/dev/anaconda3/envs/sm/lib/python3.7/site-packages/efficientnet/model.py", line 538, in EfficientNetB3
**kwargs)
File "/home/dev/anaconda3/envs/sm/lib/python3.7/site-packages/efficientnet/model.py", line 368, in EfficientNet
img_input = layers.Input(shape=input_shape)
File "/home/dev/anaconda3/envs/sm/lib/python3.7/site-packages/keras/engine/topology.py", line 1439, in Input
input_tensor=tensor)
File "/home/dev/anaconda3/envs/sm/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
return func(*args, **kwargs)
File "/home/dev/anaconda3/envs/sm/lib/python3.7/site-packages/keras/engine/topology.py", line 1301, in __init__
name = prefix + '_' + str(K.get_uid(prefix))
File "/home/dev/anaconda3/envs/sm/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 63, in get_uid
graph = tf.get_default_graph()
AttributeError: module 'tensorflow' has no attribute 'get_default_graph'
I am unsure as to what went wrong here, the code worked six months ago. Now I am unable to run the same code, probably due to version changes of libraries. But at least it should have worked with the older versions that I myself used earlier, or at least with what link says.

onnxruntime: Given model could not be parsed while creating inference session. Error message: Protobuf parsing failed

According to the example code mentioned below the library. I have followed the example code but it didn't work.
[Library] https://github.com/notAI-tech/NudeNet/
Code
from nudenet import NudeClassifier
import onnxruntime
classifier = NudeClassifier()
classifier.classify('/home/coremax/Downloads/DETECTOR_AUTO_GENERATED_DATA/IMAGES/3FEF7B75-3823-4153-8490-87483AAC6ABC'
'.jpg')
I have also followed the previous solution on StackOverflow but it didn't work
Error on running Super Resolution Model from ONNX
Traceback (most recent call last):
File "/snap/pycharm-community/276/plugins/python-ce/helpers/pydev/pydevd.py", line 1491, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/snap/pycharm-community/276/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/coremax/Documents/NudeNet/main.py", line 3, in <module>
classifier = NudeClassifier()
File "/home/coremax/Documents/NudeNet/nudenet/classifier.py", line 37, in __init__
self.nsfw_model = onnxruntime.InferenceSession(model_path)
File "/home/coremax/anaconda3/envs/AdultNET/lib/python3.6/site-packages/onnxruntime/capi/session.py", line 158, in __init__
self._load_model(providers or [])
File "/home/coremax/anaconda3/envs/AdultNET/lib/python3.6/site-packages/onnxruntime/capi/session.py", line 166, in _load_model
True)
RuntimeError: /onnxruntime_src/onnxruntime/core/session/inference_session.cc:238 onnxruntime::InferenceSession::InferenceSession(const onnxruntime::SessionOptions&, const onnxruntime::Environment&, const string&) status.IsOK() was false. Given model could not be parsed while creating inference session. Error message: Protobuf parsing failed.
I know it is too late but hope this helps someone build a very useful software.
why it fails
the error is that for NudeClassifier to work it has to download the onnx model from this link
but github now requires you to be logged in to download any file so the constructor for the NudeClassifier fails as it tries to download this model
Solution
create a folder in your user's home folder with the name .NudeNet/
download the model from this link
save the model in the folder you created in step one
you should now have the model at the following path ~/.NudeNet/classifier_model.onnx
now you're ready to go good luck!

Problem converting tensorflow saved_model to tensorflowjs

I want to convert trained python model (.pb) to tensorflowjs model. To accomplish this, first I saved the model with estimator.export_savedmodel function, then I run the tensorflowjs_converter command on Google Colab. However, no file is created for tensorflowjs. The conversion also gives a lot of warning and ends with an error.
Here is the full code, and please run to see the full output:
https://colab.research.google.com/drive/19k2s8eHpQY9Trps9dyaxPp0HqHWp5qpb
What is the reason of the problem and how can I fix it?
Part of the output:
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
Traceback (most recent call last):
File "/usr/local/bin/tensorflowjs_converter", line 8, in <module>
sys.exit(pip_main())
File "/usr/local/lib/python3.6/dist-packages/tensorflowjs/converters/converter.py", line 638, in pip_main
main([' '.join(sys.argv[1:])])
File "/usr/local/lib/python3.6/dist-packages/tensorflowjs/converters/converter.py", line 642, in main
convert(argv[0].split(' '))
File "/usr/local/lib/python3.6/dist-packages/tensorflowjs/converters/converter.py", line 591, in convert
strip_debug_ops=args.strip_debug_ops)
File "/usr/local/lib/python3.6/dist-packages/tensorflowjs/converters/tf_saved_model_conversion_v2.py", line 435, in convert_tf_saved_model
strip_debug_ops=strip_debug_ops)
File "/usr/local/lib/python3.6/dist-packages/tensorflowjs/converters/tf_saved_model_conversion_v2.py", line 141, in optimize_graph
', '.join(unsupported))
ValueError: Unsupported Ops in the model before optimization
ParallelDynamicStitch, StringSplit, Unique, RegexReplace, DynamicPartition, StringToHashBucketFast, ParseExample, LookupTableFindV2, LookupTableSizeV2, SparseFillEmptyRows, StringJoin, AsString, SparseSegmentSqrtN, HashTableV2
Edit:
Seems like it isn't supported:
https://github.com/tensorflow/tfjs/issues/2322
This is because your model has ops that are not supported by tensorflow.js yet. And seems like you missed the missing op name in the output you pasted. Please feel free to update the output with missing op name or file a feature request in the tensorflow.js repo with more details.

undefined symbol: _ZTIN10tensorflow8OpKernelE in building a library

Trying to build roi-pooling and so file is able to be produced.
The issue is when run the sample program.
File "roi_pooling_test.py", line 3, in <module>
from roi_pooling_ops import roi_pooling
File "/home/Data/Softwares/pixel_link/roi-pooling/roi_pooling/roi_pooling_ops.py", line 8, in <module>
roi_pooling_module = tf.load_op_library(lib_path)
File "/home/Data/Softwares/venv_p2_7_buildfromsource/local/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/Data/Softwares/pixel_link/roi-pooling/roi_pooling/roi_pooling.so: undefined symbol: _ZTIN10tensorflow8OpKernelE
Makefile:15: recipe for target 'test' failed
There is one discusion. But no correct answer was approved by OP.
Made sure tensorflow was built from source with gcc/g++-4.8 and roi-pooling was also built with same gcc/g++ version.
My tensorflow is 1.14, gcc/g++-4.8.
How to solve the issue?
Added the following flag to makefile and all are ok now.
-L/home//lib/python2.7/site-packages/tensorflow -ltensorflow_framework

Keras: Error when downloading Fashion_MNIST Data

I am trying to download data from Fashion MNIST, but it produces an error. Originally, it was downloading and working properly, but I had to terminate it because I had to turn off my computer. Once I opened the file up again, it gives me an error. I'm not sure what the problem is, but is it because I already downloaded some parts of the data once, and keras doesn't recognize that? I am using Jupyter notebook in a conda environment
Here is the link to the image:
https://i.stack.imgur.com/wLGDm.png
You have missed adding tf. to the line
fashion_mnist = keras.datasets.fashion_mnist
The below code works perfectly for me. Importing the fashion_mnist dataset has been outlined in tensorflow documention here.
Change your code to:
import tensorflow as tf
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
or, use the better way to do it below. This avoids creating an extra variable fashion_mnist:
import tensorflow as tf
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.fashion_mnist.load_data()
I am using tensorflow 1.9.0, keras 2.2.2 and python 3.6.6 on Windows 10 x64 OS.
I know my pc well, I can't download anything larger than 2.7 MB (in terminal), due to WinError 8.
So I manually downloaded all packs from storage.google (since some packs are 25 MB).
Check the packs:
then I paste all packs to \datasets\fashion-mnist
The next time u run your code, it should be fixed.
Note : If u have VScode then just CTRL and click the link, then you can download it easily.
I had an error regarding the cURL connection, and by looking into the error message I was able to track the file where the URL was declared. In my case it was:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow_core/python/keras/datasets/fashion_mnist.py
At line 44 I have commented out the line:
# base = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/'
And declared a different base URL, which I had found looking into the documentation of the original dataset:
base = 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/'
The download started immediately and gave no errors. Hope this helps.
This is because for some reason you have an incomplete download for the MNIST dataset.
You will have to manually delete the downloaded folder which usually resides in ~/.keras/datasets or any path specified by you relative to this path, in your case MNIST_data.
Go to : C:\Users\Username.keras\datasets
and then Delete the Dataset that you want to redownload or has the error
You should be good to go!
You can also manually add print for the path from which it is taking dataset ..
Ex: print(paths) in file fashion_mnist.py
with gzip.open(paths[3], 'rb') as imgpath:
print(paths) #debug print in fashion_mnist.py
x_test = np.frombuffer(
imgpath.read(), np.uint8, offset=16).reshape(len(y_test), 28, 28)
& from this path, remove the files & this will start to download fresh data ..
Change The base address with 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/' as described previously. It works for me.
I was getting error of Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Traceback (most recent call last):
File "C:\Users\AsadA\AppData\Local\Programs\Python\Python38\lib\site-packages\numpy\lib\npyio.py", line 448, in load
return pickle.load(fid, **pickle_kwargs)
EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\AsadA\AppData\Local\Programs\Python\Python38\lib\site-packages\numpy\lib\npyio.py", line 450, in load
raise IOError(
OSError: Failed to interpret file 'C:\\Users\\AsadA\\.keras\\datasets\\mnist.npz' as a pickle"**
GO TO FILE C:\Users\AsadA\AppData\Local\Programs\Python\Python38\Lib\site-packages\tensorflow\python\keras\datasets (In my Case) and follow the instructions: