Linking the French Spacy model but failing to load it - spacy

I successfully downloaded SpaCy and the French model to apply it to the Rasa starter pack. Yet when running the rasa_nlu training command it seems the OS can't find the French model.
C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
2019-05-02 19:14:58 INFO rasa_nlu.utils.spacy_utils - Trying to load spacy model with name 'fr'
Traceback (most recent call last):
File "C:\Python36\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Python36\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\train.py", line 184, in <module>
num_threads=cmdline_args.num_threads)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\train.py", line 148, in do_train
trainer = Trainer(cfg, component_builder)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\model.py", line 155, in __init__
self.pipeline = self._build_pipeline(cfg, component_builder)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\model.py", line 166, in _build_pipeline
component_name, cfg)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\components.py", line 441, in create_component
cfg)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\registry.py", line 142, in create_component_by_name
return component_clz.create(config)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\rasa_nlu\utils\spacy_utils.py", line 73, in create
nlp = spacy.load(spacy_model_name, parser=False)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\spacy\__init__.py", line 27, in load
return util.load_model(name, **overrides)
File "C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\spacy\util.py", line 136, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'fr'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
(staenv) C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack>python -m spacy download fr
Requirement already satisfied: fr_core_news_sm==2.1.0 from https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-2.1.0/fr_core_news_sm-2.1.0.tar.gz#egg=fr_core_news_sm==2.1.0 in c:\users\antoi\documents\programming\paco\starter-pack-rasa-stack\staenv\lib\site-packages (2.1.0)
You are using pip version 10.0.1, however version 19.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
✔ Download and installation successful
You can now load the model via spacy.load('fr_core_news_sm')
You do not have sufficient privilege to perform this operation.
✔ Linking successful
C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\fr_core_news_sm
-->
C:\Users\antoi\Documents\Programming\PACO\starter-pack-rasa-stack\staenv\lib\site-packages\spacy\data\fr
You can now load the model via spacy.load('fr')
My spacy version is 2.1.3

I think this is actually a SpaCy issue.
Due to the line You do not have sufficient privilege to perform this operation. I think you have to run the Windows command line as administrator.
This spaCy issues describes the same problem and gives further recommendations.

Related

How could I use GPU in google colab to train spacy relation extraction model, [E002] Can't find factory for 'transformer' for language English (en)

I am running a relation extraction spacy model on google colab , It works when I use !spacy project run all or !spacy project run train_cpu but when I run !spacy project run train_gpu it returns following error:
================================= train_gpu =================================
Running command: /usr/bin/python3 -m spacy train configs/rel_trf.cfg --output training --paths.train data/train.spacy --paths.dev data/dev.spacy -c ./scripts/custom_functions.py --gpu-id 0
ℹ Saving to output directory: training
ℹ Using GPU: 0
=========================== Initializing pipeline ===========================
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/spacy/__main__.py", line 4, in <module>
setup_cli()
File "/usr/local/lib/python3.7/dist-packages/spacy/cli/_util.py", line 71, in setup_cli
command(prog_name=COMMAND)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/typer/main.py", line 497, in wrapper
return callback(**use_params) # type: ignore
File "/usr/local/lib/python3.7/dist-packages/spacy/cli/train.py", line 45, in train_cli
train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
File "/usr/local/lib/python3.7/dist-packages/spacy/cli/train.py", line 72, in train
nlp = init_nlp(config, use_gpu=use_gpu)
File "/usr/local/lib/python3.7/dist-packages/spacy/training/initialize.py", line 41, in init_nlp
nlp = load_model_from_config(raw_config, auto_fill=True)
File "/usr/local/lib/python3.7/dist-packages/spacy/util.py", line 531, in load_model_from_config
validate=validate,
File "/usr/local/lib/python3.7/dist-packages/spacy/language.py", line 1784, in from_config
raw_config=raw_config,
File "/usr/local/lib/python3.7/dist-packages/spacy/language.py", line 794, in add_pipe
validate=validate,
File "/usr/local/lib/python3.7/dist-packages/spacy/language.py", line 652, in create_pipe
raise ValueError(err)
ValueError: [E002] Can't find factory for 'transformer' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `#Language.component` (for function components) or `#Language.factory` (for class components).
Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, doc_cleaner, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, spancat, textcat_multilabel, relation_extractor, en.lemmatizer
I used both following installations (interchangeably) in case the GPU wasn't called correctly:
!pip install -U spacy[cuda101]
#!pip install -U spacy-nightly --pre
You haven't installed spacy-transformers. The easiest way to do this is probably to spacy download en_core_web_trf.
I would recommend you check the install quickstart again - I don't think spacy-nightly has been updated since v3 was released almost a year ago. Also check the Discussions FAQ - it's been a while since we've heard reports of it, but a while ago you had to specifically not install cupy (that is, not use pip install spacy[cuda101]) in order to get GPU support on Colab.

Cannot start tensorboard from console due to numpy issue

I'd like to make use of TensorBoard.I already saved training and validation data in the same directory as the rest of the project data using the tensorflow.keras.callbacks library.
I already pip reinstalled numpy, didn't work out. Then I deleted tb-nightly and tensorboard as I had them both for some reason and just pip reinstalled tb-nightly.
(Tensorflow) C:\Users\alias>python -c "print(__import__('numpy').__version__)"
1.16.2
(Tensorflow) C:\Users\alias>python -c "print(__import__('tensorflow').__version__)"
2.0.0-alpha0
(Tensorflow) C:\Users\alias>python -c "print(__import__('tensorboard.version').version.VERSION)"
1.14.0a20190301
When calling it via Anaconda prompt on a Win10-OS , the following happens:
(Tensorflow) C:\Users\alias>tensorboard --logdir=logs\
TensorBoard 1.14.0a20190301 at http://LAPTOP-4E1BJCAV:6006 (Press CTRL+C to quit)
Traceback (most recent call last):
File "c:\users\alias\anaconda3\envs\tensorflow\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\alias\anaconda3\envs\tensorflow\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\alias\Anaconda3\envs\Tensorflow\Scripts\tensorboard.exe\__main__.py", line 9, in <module>
File "c:\users\alias\anaconda3\envs\tensorflow\lib\site-packages\tensorboard\main.py", line 58, in run_main
app.run(tensorboard.main, flags_parser=tensorboard.configure)
File "c:\users\alias\anaconda3\envs\tensorflow\lib\site-packages\absl\app.py", line 300, in run
_run_main(main, args)
File "c:\users\alias\anaconda3\envs\tensorflow\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "c:\users\alias\anaconda3\envs\tensorflow\lib\site-packages\tensorboard\program.py", line 228, in main
self._register_info(server)
File "c:\users\alias\anaconda3\envs\tensorflow\lib\site-packages\tensorboard\program.py", line 274, in _register_info
manager.write_info_file(info)
File "c:\users\alias\anaconda3\envs\tensorflow\lib\site-packages\tensorboard\manager.py", line 269, in write_info_file
payload = "%s\n" % _info_to_string(tensorboard_info)
File "c:\users\alias\anaconda3\envs\tensorflow\lib\site-packages\tensorboard\manager.py", line 129, in _info_to_string
for k in _TENSORBOARD_INFO_FIELDS
File "c:\users\alias\anaconda3\envs\tensorflow\lib\site-packages\tensorboard\manager.py", line 129, in <dictcomp>
for k in _TENSORBOARD_INFO_FIELDS
File "c:\users\alias\anaconda3\envs\tensorflow\lib\site-packages\tensorboard\manager.py", line 51, in <lambda>
(dt - datetime.datetime.fromtimestamp(0)).total_seconds()),
OSError: [Errno 22] Invalid argument
Would be glad about some help.
Thanks in advance!
Did you do a fresh install and or update recently?
Not sure if this applies to your case or not? Yesterday I installed the latest conda, for python 2.7, from scratch and updated to the latest packages after installation. Running python from pycharm or windows powershell and importing numpy would throw a multiarray import error. The fix for me was to downgrade numpy from ?1.16.12? to ?1.5.14?. Sorry, but I am away from my PC's were I encountered the error but I think those versions are correct.
I experienced the same error in my local Jupyter notebook. Upgrading Numpy package worked for me.
Try upgrading Numpy as below
pip install numpy==1.16
If the above doesn't work, upgrade tensorflow with below command and try again
pip install tensorflow --upgrade
The issue was solved # Github https://github.com/tensorflow/tensorboard/issues/2092
Thanks for the support

Issues while using Snappy for tensorflow preprocessing using BeamIO

While using Apache beamIO for preprocessing data, snappy library was a good to have module for compression but looks like the file transformation doesnt seems to work as it cannot find the crc32 compress function in the library! Im using snappy-0.5.2 version
the error looks like this -
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
ERROR:root:Exception at bundle <apache_beam.runners.direct.bundle_factory._Bundle object at 0x7f1dd1d60e50>, due to an exception.
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/direct/executor.py", line 312, in call
side_input_values)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/direct/executor.py", line 347, in attempt_call
evaluator.process_element(value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/direct/transform_evaluator.py", line 551, in process_element
self.runner.process(element)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 390, in process
self._reraise_augmented(exn)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 388, in process
self.do_fn_invoker.invoke_process(windowed_value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 281, in invoke_process
self._invoke_per_window(windowed_value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/common.py", line 307, in _invoke_per_window
windowed_value, self.process_method(*args_for_process))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/typehints/typecheck.py", line 63, in process
return self.wrapper(self.dofn.process, args, kwargs)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/typehints/typecheck.py", line 81, in wrapper
result = method(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/iobase.py", line 965, in process
self.writer.write(element)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", line 299, in write
self.sink.write_record(self.temp_handle, value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsink.py", line 129, in write_record
self.write_encoded_record(file_handle, self.coder.encode(value))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 235, in write_encoded_record
_TFRecordUtil.write_record(file_handle, value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 97, in write_record
struct.pack('<I', cls._masked_crc32c(encoded_length)), #
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 77, in _masked_crc32c
crc = crc32c_fn(value)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/tfrecordio.py", line 43, in _default_crc32c_fn
_default_crc32c_fn.fn = snappy._crc32c # pylint: disable=protected-access
AttributeError: 'module' object has no attribute '_crc32c' [while running 'WriteTrainData/Write/WriteImpl/WriteBundles']
If any one could help me to use snappy with tensorflow correctly!
Thank you
I just hit this issue; I think it is due to Beam being a little careless about versions of optional test-dependencies (in this case, tensorflow and python-snappy).
The problematic code:
import snappy
snappy._crc32c
works in python-snappy version 0.5.1 but not in 0.5.2 (the latest version).
I got these Beam tests passing by installing python-snappy 0.5.1 via:
pip install \
--upgrade --ignore-installed \
python-snappy==0.5.1 \
--global-option=build_ext \
--global-option="-I/usr/local/include" \
--global-option="-L/usr/local/lib"
On OSX I need the three --global-option flags otherwise it doesn't find my snappy headers (symptom: errors about #include <snappy-c.h>) and library files, which brew install snappy placed in /usr/local/include and /usr/local/lib, respectively.
The bits before that seem necessary to override pip's default of wanting to give me the latest version.

Error installing CNTK - MemoryError while installing .whl file

I have Ubuntu 16.04 server environment and was trying to install CNTK on it. While I was trying to install pip install in an environment section I get the following error.
I succesfully ran below 2 steps:
$ conda create --name cntk-py34 python=3.4 numpy scipy h5py jupyter
$ activate cntk-py35
But when I try to install the cntk whl file I get an error:
$ pip install https://cntk.ai/PythonWheel/CPU-Only/cntk-2.0.beta15.0-cp35-cp35m-linux_x86_64.whl
========error==================
Exception:
Traceback (most recent call last):
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/commands/install.py", line 335, in run
wb.build(autobuilding=True)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/wheel.py", line 749, in build
self.requirement_set.prepare_files(self.finder)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/req/req_set.py", line 380, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/req/req_set.py", line 620, in _prepare_file
session=self.session, hashes=hashes)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/download.py", line 821, in unpack_url
hashes=hashes
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/download.py", line 659, in unpack_http_url
hashes)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/download.py", line 853, in _download_http_url
stream=True,
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/requests/sessions.py", line 488, in get
return self.request('GET', url, **kwargs)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/download.py", line 386, in request
return super(PipSession, self).request(method, url, *args, **kwargs)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/requests/sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/requests/sessions.py", line 596, in send
r = adapter.send(request, **kwargs)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/cachecontrol/adapter.py", line 37, in send
cached_response = self.controller.cached_request(request)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/cachecontrol/controller.py", line 111, in cached_request
resp = self.serializer.loads(request, cache_data)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/cachecontrol/serialize.py", line 114, in loads
return getattr(self, "_loads_v{0}".format(ver))(request, data)
File "/home/ubuntu/.conda/envs/cntk-py35/lib/python3.5/site-packages/pip/_vendor/cachecontrol/serialize.py", line 176, in _loads_v2
cached = json.loads(zlib.decompress(data).decode("utf8"))
MemoryError
Any ideas???
Thanks in advance!
In your question, you mention two different Python versions (3.4 and 3.5). Also, the Anaconda Environment activation needs to be sourced. Assuming you already satisfied the OpenMPI dependency (cf. https://github.com/Microsoft/CNTK/wiki/Setup-Linux-Python), can you try one of these:
# For a Python 3.4 based setup
conda create --name cntk-py34 python=3.4 numpy scipy h5py jupyter
source activate cntk-py34
pip install https://cntk.ai/PythonWheel/CPU-Only/cntk-2.0rc1-cp34-cp34m-linux_x86_64.whl
# For a Python 3.5 based setup
conda create --name cntk-py35 python=3.5 numpy scipy h5py jupyter
source activate cntk-py35
pip install https://cntk.ai/PythonWheel/CPU-Only/cntk-2.0rc1-cp35-cp35m-linux_x86_64.whl

Cannot install or use scrapy on OS X -- blocked by proxy (I think)

So I am really having a tough time with something that should be very easy. AT work we have a lot of annoying proxies etc. and I'm pretty sure that's involved here. Anyhow, when I try to install scrapy, I get "Connection reset by peer" in the middle of downloading libxml, always 37% of the way in:
root#rcmac (~ ): pip install scrapy
Requirement already satisfied (use --upgrade to upgrade): scrapy in /Library/Python/2.7/site-packages/Scrapy-0.24.5-py2.7.egg
Requirement already satisfied (use --upgrade to upgrade): Twisted>=10.0.0 in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from scrapy)
Requirement already satisfied (use --upgrade to upgrade): w3lib>=1.8.0 in /Library/Python/2.7/site-packages (from scrapy)
Collecting queuelib (from scrapy)
Using cached queuelib-1.2.2-py2.py3-none-any.whl
Collecting lxml (from scrapy)
/Library/Python/2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:79: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
Downloading lxml-3.4.2.tar.gz (3.5MB)
37% |############ | 1.3MB 186kB/s eta 0:00:12 Exception:
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/pip/basecommand.py", line 246, in main
status = self.run(options, args)
File "/Library/Python/2.7/site-packages/pip/commands/install.py", line 342, in run
requirement_set.prepare_files(finder)
File "/Library/Python/2.7/site-packages/pip/req/req_set.py", line 345, in prepare_files
functools.partial(self._prepare_file, finder))
File "/Library/Python/2.7/site-packages/pip/req/req_set.py", line 290, in _walk_req_to_install
more_reqs = handler(req_to_install)
File "/Library/Python/2.7/site-packages/pip/req/req_set.py", line 487, in _prepare_file
download_dir, do_download, session=self.session,
File "/Library/Python/2.7/site-packages/pip/download.py", line 827, in unpack_url
session,
File "/Library/Python/2.7/site-packages/pip/download.py", line 673, in unpack_http_url
from_path, content_type = _download_http_url(link, session, temp_dir)
File "/Library/Python/2.7/site-packages/pip/download.py", line 888, in _download_http_url
_download_url(resp, link, content_file)
File "/Library/Python/2.7/site-packages/pip/download.py", line 621, in _download_url
for chunk in progress_indicator(resp_read(4096), 4096):
File "/Library/Python/2.7/site-packages/pip/utils/ui.py", line 133, in iter
for x in it:
File "/Library/Python/2.7/site-packages/pip/download.py", line 586, in resp_read
decode_content=False):
File "/Library/Python/2.7/site-packages/pip/_vendor/requests/packages/urllib3/response.py", line 273, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "/Library/Python/2.7/site-packages/pip/_vendor/requests/packages/urllib3/response.py", line 203, in read
data = self._fp.read(amt)
File "/Library/Python/2.7/site-packages/pip/_vendor/cachecontrol/filewrapper.py", line 49, in read
data = self.__fp.read(amt)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 567, in read
s = self.fp.read(amt)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 241, in recv
return self.read(buflen)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 160, in read
return self._sslobj.read(len)
error: [Errno 54] Connection reset by peer
I can get my hands on the libxml tarball, but I don't know how to get pip past this roadblock. I have somehow managed to get scrapy installed but it blows up when I try to import it:
root#rcmac (~ ): python
Python 2.7.5 (default, Mar 9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import scrapy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/Scrapy-0.24.5-py2.7.egg/scrapy/__init__.py", line 56, in <module>
from scrapy.spider import Spider
File "/Library/Python/2.7/site-packages/Scrapy-0.24.5-py2.7.egg/scrapy/spider.py", line 7, in <module>
from scrapy.http import Request
File "/Library/Python/2.7/site-packages/Scrapy-0.24.5-py2.7.egg/scrapy/http/__init__.py", line 11, in <module>
from scrapy.http.request.form import FormRequest
File "/Library/Python/2.7/site-packages/Scrapy-0.24.5-py2.7.egg/scrapy/http/request/form.py", line 9, in <module>
import lxml.html
ImportError: No module named lxml.html
So let's see: I guess my question is, "Help?" :-) thanks.
OK, solved again. Sorry. It looks like a proxy server here at work was blocking my lxml install from pip. I reran the pip install command for scrapy and lxml got properly installed at that point. After that, the error went away.