Issues while using Snappy for tensorflow preprocessing using BeamIO - tensorflow

While using Apache beamIO for preprocessing data, snappy library was a good to have module for compression but looks like the file transformation doesnt seems to work as it cannot find the crc32 compress function in the library! Im using snappy-0.5.2 version
the error looks like this -
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
ERROR:root:Exception at bundle <apache_beam.runners.direct.bundle_factory._Bundle object at 0x7f1dd1d60e50>, due to an exception.
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/direct/executor.py", line 312, in call
side_input_values)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/direct/executor.py", line 347, in attempt_call
evaluator.process_element(value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/direct/transform_evaluator.py", line 551, in process_element
self.runner.process(element)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 390, in process
self._reraise_augmented(exn)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 388, in process
self.do_fn_invoker.invoke_process(windowed_value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 281, in invoke_process
self._invoke_per_window(windowed_value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 307, in _invoke_per_window
windowed_value, self.process_method(*args_for_process))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/typehints/typecheck.py", line 63, in process
return self.wrapper(self.dofn.process, args, kwargs)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/typehints/typecheck.py", line 81, in wrapper
result = method(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/iobase.py", line 965, in process
self.writer.write(element)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", line 299, in write
self.sink.write_record(self.temp_handle, value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", line 129, in write_record
self.write_encoded_record(file_handle, self.coder.encode(value))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 235, in write_encoded_record
_TFRecordUtil.write_record(file_handle, value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 97, in write_record
struct.pack('<I', cls._masked_crc32c(encoded_length)), #
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 77, in _masked_crc32c
crc = crc32c_fn(value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 43, in _default_crc32c_fn
_default_crc32c_fn.fn = snappy._crc32c # pylint: disable=protected-access
AttributeError: 'module' object has no attribute '_crc32c' [while running 'WriteTrainData/Write/WriteImpl/WriteBundles']
If any one could help me to use snappy with tensorflow correctly!
Thank you

I just hit this issue; I think it is due to Beam being a little careless about versions of optional test-dependencies (in this case, tensorflow and python-snappy).
The problematic code:
import snappy
snappy._crc32c
works in python-snappy version 0.5.1 but not in 0.5.2 (the latest version).
I got these Beam tests passing by installing python-snappy 0.5.1 via:
pip install \
--upgrade --ignore-installed \
python-snappy==0.5.1 \
--global-option=build_ext \
--global-option="-I/usr/local/include" \
--global-option="-L/usr/local/lib"
On OSX I need the three --global-option flags otherwise it doesn't find my snappy headers (symptom: errors about #include <snappy-c.h>) and library files, which brew install snappy placed in /usr/local/include and /usr/local/lib, respectively.
The bits before that seem necessary to override pip's default of wanting to give me the latest version.

Related

How could I use GPU in google colab to train spacy relation extraction model, [E002] Can't find factory for 'transformer' for language English (en)

I am running a relation extraction spacy model on google colab , It works when I use !spacy project run all or !spacy project run train_cpu but when I run !spacy project run train_gpu it returns following error:
================================= train_gpu =================================
Running command: /usr/bin/python3 -m spacy train configs/rel_trf.cfg --output training --paths.train data/train.spacy --paths.dev data/dev.spacy -c ./scripts/custom_functions.py --gpu-id 0
ℹ Saving to output directory: training
ℹ Using GPU: 0
=========================== Initializing pipeline ===========================
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/spacy/__main__.py", line 4, in <module>
setup_cli()
File "/usr/local/lib/python3.7/dist-packages/spacy/cli/_util.py", line 71, in setup_cli
command(prog_name=COMMAND)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/typer/main.py", line 497, in wrapper
return callback(**use_params) # type: ignore
File "/usr/local/lib/python3.7/dist-packages/spacy/cli/train.py", line 45, in train_cli
train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
File "/usr/local/lib/python3.7/dist-packages/spacy/cli/train.py", line 72, in train
nlp = init_nlp(config, use_gpu=use_gpu)
File "/usr/local/lib/python3.7/dist-packages/spacy/training/initialize.py", line 41, in init_nlp
nlp = load_model_from_config(raw_config, auto_fill=True)
File "/usr/local/lib/python3.7/dist-packages/spacy/util.py", line 531, in load_model_from_config
validate=validate,
File "/usr/local/lib/python3.7/dist-packages/spacy/language.py", line 1784, in from_config
raw_config=raw_config,
File "/usr/local/lib/python3.7/dist-packages/spacy/language.py", line 794, in add_pipe
validate=validate,
File "/usr/local/lib/python3.7/dist-packages/spacy/language.py", line 652, in create_pipe
raise ValueError(err)
ValueError: [E002] Can't find factory for 'transformer' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `#Language.component` (for function components) or `#Language.factory` (for class components).
Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, doc_cleaner, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, spancat, textcat_multilabel, relation_extractor, en.lemmatizer
I used both following installations (interchangeably) in case the GPU wasn't called correctly:
!pip install -U spacy[cuda101]
#!pip install -U spacy-nightly --pre
You haven't installed spacy-transformers. The easiest way to do this is probably to spacy download en_core_web_trf.
I would recommend you check the install quickstart again - I don't think spacy-nightly has been updated since v3 was released almost a year ago. Also check the Discussions FAQ - it's been a while since we've heard reports of it, but a while ago you had to specifically not install cupy (that is, not use pip install spacy[cuda101]) in order to get GPU support on Colab.

Linking the French Spacy model but failing to load it

I successfully downloaded SpaCy and the French model to apply it to the Rasa starter pack. Yet when running the rasa_nlu training command it seems the OS can't find the French model.
C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
2019-05-02 19:14:58 INFO rasa_nlu.utils.spacy_utils - Trying to load spacy model with name 'fr'
Traceback (most recent call last):
File "C:\Python36\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\train.py", line 184, in <module>
num_threads=cmdline_args.num_threads)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\train.py", line 148, in do_train
trainer = Trainer(cfg, component_builder)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\model.py", line 155, in __init__
self.pipeline = self._build_pipeline(cfg, component_builder)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\model.py", line 166, in _build_pipeline
component_name, cfg)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\components.py", line 441, in create_component
cfg)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\registry.py", line 142, in create_component_by_name
return component_clz.create(config)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\utils\spacy_utils.py", line 73, in create
nlp = spacy.load(spacy_model_name, parser=False)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\spacy\__init__.py", line 27, in load
return util.load_model(name, **overrides)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\spacy\util.py", line 136, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'fr'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
(staenv) C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack>python -m spacy download fr
Requirement already satisfied: fr_core_news_sm==2.1.0 from https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-2.1.0/fr_core_news_sm-2.1.0.tar.gz#egg=fr_core_news_sm==2.1.0 in c:\users\antoi\documents\programming\paco\starter-pack-rasa-stack\staenv\lib\site-packages (2.1.0)
You are using pip version 10.0.1, however version 19.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
✔ Download and installation successful
You can now load the model via spacy.load('fr_core_news_sm')
You do not have sufficient privilege to perform this operation.
✔ Linking successful
C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\fr_core_news_sm
-->
C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\spacy\data\fr
You can now load the model via spacy.load('fr')
My spacy version is 2.1.3
I think this is actually a SpaCy issue.
Due to the line You do not have sufficient privilege to perform this operation. I think you have to run the Windows command line as administrator.
This spaCy issues describes the same problem and gives further recommendations.

TypeError: __init__() got an unexpected keyword argument 'repeated'

I am using the google cloud vm instance for developing my custom object detector- TENSORFLOW object detection API. I am using pretrained model
:faster_rcnn_inception_resnet_v2_atrous_coco.
After creating all the necessary TFrecord files for input and configuring the object_detection pipeline config files, i used the following command for training:
python train.py --logtostderr --train_dir=training /
--pipeline_config_path=training/faster_rcnn_custom.config
I get the following error:
Traceback (most recent call last):
File "train.py", line 184, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/opt/models/research/object_detection/trainer.py", line 274, in train
train_config.prefetch_queue_capacity, data_augmentation_options)
File "/opt/models/research/object_detection/trainer.py", line 59, in create_input_queue
tensor_dict = create_tensor_dict_fn()
File "train.py", line 121, in get_next
dataset_builder.build(config)).get_next()
File "/opt/models/research/object_detection/builders/dataset_builder.py", line 176, in build
num_additional_channels=num_additional_channels)
File "/opt/models/research/object_detection/data_decoders/tf_example_decoder.py", line 204, in __init__
repeated=True)
TypeError: __init__() got an unexpected keyword argument 'repeated'
How should i fix the error? i am quite new into this. Any help would be appreciated.
Check the correctness of your command and if the config file is in correct relative nested directory. I see there is a space between "training /" it should be "training/"
My assumption is the error is due to the incompatibility of file tf_example_decoder.py with the Tensorflow installed. Try removing that argument. Hopefully this helps.
I had the similar issue. i had older tensorflow installed and trying to use new models, upgrading tensorflow solved my problem.

Error loading library gpuarray with Theano

I am trying to run this script to test Theano's use of my GPU and get the following error:
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
File "/home/me/anaconda3/envs/py35/lib/python3.5/site-
packages/theano/gpuarray/__init__.py", line 164, in <module>
use(config.device)
File "/home/me/anaconda3/envs/py35/lib/python3.5/site-
packages/theano/gpuarray/__init__.py", line 151, in use
init_dev(device)
File "/home/me/anaconda3/envs/py35/lib/python3.5/site-
packages/theano/gpuarray/__init__.py", line 60, in init_dev
sched=config.gpuarray.sched)
File "pygpu/gpuarray.pyx", line 614, in pygpu.gpuarray.init
(pygpu/gpuarray.c:9419)
File "pygpu/gpuarray.pyx", line 566, in pygpu.gpuarray.pygpu_init
(pygpu/gpuarray.c:9110)
File "pygpu/gpuarray.pyx", line 1021, in
pygpu.gpuarray.GpuContext.__cinit__ (pygpu/gpuarray.c:13472)
pygpu.gpuarray.GpuArrayException: Error loading library: -1
I need to use the nvidia-381 driver since my GPU is a 1080 ti and is not compatible with nvidia-375. I'm not sure if that matters but installing nvcc overwrites 381 and causes some errors if I reinstall 381 after setting up nvcc so I can't use nvcc.
I can import pygpu without errors but if I run pygpu.test() I get the following error and I don't know how to specify the DEVICE variable without nvcc.
======================================================================
ERROR: Failure: RuntimeError (No test device specified. Specify one using the DEVICE or GPUARRAY_TEST_DEVICE environment variables.)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/me/anaconda3/envs/py35/lib/python3.5/site-packages/nose/failure.py", line 39, in runTest
raise self.exc_val.with_traceback(self.tb)
File "/home/me/anaconda3/envs/py35/lib/python3.5/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/home/me/anaconda3/envs/py35/lib/python3.5/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/home/me/anaconda3/envs/py35/lib/python3.5/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/me/anaconda3/envs/py35/lib/python3.5/imp.py", line 234, in load_module
return load_source(name, filename, file)
File "/home/me/anaconda3/envs/py35/lib/python3.5/imp.py", line 172, in load_source
module = _load(spec)
File "<frozen importlib._bootstrap>", line 693, in _load
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 665, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/home/me/.local/lib/python3.5/site-packages/pygpu-0.6.2-py3.5-linux-x86_64.egg/pygpu/tests/test_tools.py", line 5, in <module>
from .support import (guard_devsup, rand, check_flags, check_meta, check_all,
File "/home/me/.local/lib/python3.5/site-packages/pygpu-0.6.2-py3.5-linux-x86_64.egg/pygpu/tests/support.py", line 32, in <module>
context = gpuarray.init(get_env_dev())
File "/home/me/.local/lib/python3.5/site-packages/pygpu-0.6.2-py3.5-linux-x86_64.egg/pygpu/tests/support.py", line 29, in get_env_dev
raise RuntimeError("No test device specified. Specify one using the DEVICE or GPUARRAY_TEST_DEVICE environment variables.")
RuntimeError: No test device specified. Specify one using the DEVICE or GPUARRAY_TEST_DEVICE environment variables.
----------------------------------------------------------------------
Ran 7 tests in 0.003s
FAILED (errors=7)
<nose.result.TextTestResult run=7 errors=7 failures=0>
Warning: its entirely possible that this is all wrong and the actual reason for your problem is in fact - as you suspect - your gpu driver.
I had the same issue with gpuarray on Windows 10.
In the end I solved it by:
completely uninstall python
install cuda 8.0 (with cudnn 5.1)
install anaconda
install theano through anaconda:
conda install theano pygpu
As you are using linux: This error message basically means It didn't work, don't ask me why And is mostly shown if something with your setup is wrong (e.g. different compilers used for compiling python and theano, or incompatible cuda version)
I would recommend to update to cuda 8.0 and to reinstall your python environment over anaconda (just in case)
On a side note: I tested your example script from the docu and at least that is working....
Note for windows users: Never try to install Anaconda in a location where you have spaces in the path... Everything looks fine ... until theano starts having trouble finding and compiling things.
Note regarding the pygpu.test():
Normally you just set the environment variable:
windows: set DEVICE=cuda
linux: export DEVICE=cuda
BUT The test has the habit of saying you didn't specify a device if the library couldn't be loaded...

Error installing CNTK - MemoryError while installing .whl file

I have Ubuntu 16.04 server environment and was trying to install CNTK on it. While I was trying to install pip install in an environment section I get the following error.
I succesfully ran below 2 steps:
$ conda create --name cntk-py34 python=3.4 numpy scipy h5py jupyter
$ activate cntk-py35
But when I try to install the cntk whl file I get an error:
$ pip install https://cntk.ai/PythonWheel/CPU-Only/cntk-2.0.beta15.0-cp35-cp35m-linux_x86_64.whl
========error==================
Exception:
Traceback (most recent call last):
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/commands/install.py", line 335, in run
wb.build(autobuilding=True)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/wheel.py", line 749, in build
self.requirement_set.prepare_files(self.finder)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/req/req_set.py", line 380, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/req/req_set.py", line 620, in _prepare_file
session=self.session, hashes=hashes)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/download.py", line 821, in unpack_url
hashes=hashes
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/download.py", line 659, in unpack_http_url
hashes)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/download.py", line 853, in _download_http_url
stream=True,
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/requests/sessions.py", line 488, in get
return self.request('GET', url, **kwargs)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/download.py", line 386, in request
return super(PipSession, self).request(method, url, *args, **kwargs)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/requests/sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/requests/sessions.py", line 596, in send
r = adapter.send(request, **kwargs)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/cachecontrol/adapter.py", line 37, in send
cached_response = self.controller.cached_request(request)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/cachecontrol/controller.py", line 111, in cached_request
resp = self.serializer.loads(request, cache_data)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/cachecontrol/serialize.py", line 114, in loads
return getattr(self, "_loads_v{0}".format(ver))(request, data)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/cachecontrol/serialize.py", line 176, in _loads_v2
cached = json.loads(zlib.decompress(data).decode("utf8"))
MemoryError
Any ideas???
Thanks in advance!
In your question, you mention two different Python versions (3.4 and 3.5). Also, the Anaconda Environment activation needs to be sourced. Assuming you already satisfied the OpenMPI dependency (cf. https://github.com/Microsoft/CNTK/wiki/Setup-Linux-Python), can you try one of these:
# For a Python 3.4 based setup
conda create --name cntk-py34 python=3.4 numpy scipy h5py jupyter
source activate cntk-py34
pip install https://cntk.ai/PythonWheel/CPU-Only/cntk-2.0rc1-cp34-cp34m-linux_x86_64.whl
# For a Python 3.5 based setup
conda create --name cntk-py35 python=3.5 numpy scipy h5py jupyter
source activate cntk-py35
pip install https://cntk.ai/PythonWheel/CPU-Only/cntk-2.0rc1-cp35-cp35m-linux_x86_64.whl