Allennlp: How to use CPU instead of GPU? - gpu

I'm running some code that works when there is GPU. But I'm trying to figure out how to run it locally with CPU. Here's the error:
2022-07-06 17:58:39,042 - INFO - allennlp.common.plugins - Plugin allennlp_models available
Traceback (most recent call last):
File "/Users/xiaoqingwan/opt/miniconda3/envs/absa/bin/allennlp", line 8, in <module>
sys.exit(run())
File "/Users/xiaoqingwan/opt/miniconda3/envs/absa/lib/python3.7/site-packages/allennlp/__main__.py", line 34, in run
main(prog="allennlp")
File "/Users/xiaoqingwan/opt/miniconda3/envs/absa/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 118, in main
args.func(args)
File "/Users/xiaoqingwan/opt/miniconda3/envs/absa/lib/python3.7/site-packages/allennlp/commands/predict.py", line 205, in _predict
predictor = _get_predictor(args)
File "/Users/xiaoqingwan/opt/miniconda3/envs/absa/lib/python3.7/site-packages/allennlp/commands/predict.py", line 105, in _get_predictor
check_for_gpu(args.cuda_device)
File "/Users/xiaoqingwan/opt/miniconda3/envs/absa/lib/python3.7/site-packages/allennlp/common/checks.py", line 131, in check_for_gpu
" 'trainer.cuda_device=-1' in the json config file." + torch_gpu_error
allennlp.common.checks.ConfigurationError: **Experiment specified a GPU but none is available; if you want to run on CPU use the override 'trainer.cuda_device=-1' in the json config file.**
module 'torch.cuda' has no attribute '_check_driver'
Could you give me some guidance on what to do? Where is the config file and what is it called?
Here's the code (originally from: https://colab.research.google.com/drive/1F9zW_nVkwfwIVXTOA_juFDrlPz5TLjpK?usp=sharing):
# Use pretrained SpanModel weights for prediction
import sys
sys.path.append("aste")
from pathlib import Path
from data_utils import Data, Sentence, SplitEnum
from wrapper import SpanModel
def predict_sentence(text: str, model: SpanModel) -> Sentence:
path_in = "temp_in.txt"
path_out = "temp_out.txt"
sent = Sentence(tokens=text.split(), triples=[], pos=[], is_labeled=False, weight=1, id=1)
data = Data(root=Path(), data_split=SplitEnum.test, sentences=[sent])
data.save_to_path(path_in)
model.predict(path_in, path_out)
data = Data.load_from_full_path(path_out)
return data.sentences[0]
text = "Did not enjoy the new Windows 8 and touchscreen functions ."
model = SpanModel(save_dir="pretrained_14lap", random_seed=0)
sent = predict_sentence(text, model)

Try using something like:
device = torch.device("cpu")
model = SpanModel(save_dir="pretrained_14lap", random_seed=0)
model.to(device)
The config file is inside of the model.tar.gz in the pretrained_14lap directory (it is always named config.json). It also contains the param "cuda_device": 0, which may be causing your problem.

Related

TPUEstimator error -- AttributeError: module 'tensorflow.contrib.tpu.python.ops.tpu_ops' has no attribute 'cross_replica_sum'

I have written a tensorflow code using the TPUEstimator, but I am having problems running it in use_tpu=False mode. I would like to run it on my local computer to make sure that all the operations are TPU-compatible. The code works fine with the normal Estimator. Here is my master code:
import logging
from tensorflow.contrib.tpu.python.tpu import tpu_config, tpu_estimator, tpu_optimizer
from tensorflow.contrib.cluster_resolver import TPUClusterResolver
from capser_7_model_fn import *
from capser_7_input_fn import *
import subprocess
from absl import flags
flags.DEFINE_bool(
'use_tpu', False,
'Use TPUs rather than plain CPUs')
tf.flags.DEFINE_string(
"tpu", default='$TPU_NAME',
help="The Cloud TPU to use for training. This should be either the name "
"used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 "
"url.")
tf.flags.DEFINE_string("model_dir", LOGDIR, "Estimator model_dir")
flags.DEFINE_integer(
'save_checkpoints_secs', 1000,
'Interval (in seconds) at which the model data '
'should be checkpointed. Set to 0 to disable.')
flags.DEFINE_integer(
'save_summary_steps', 100,
'Number of steps which must have run before showing summaries.')
tf.flags.DEFINE_integer("iterations", 1000,
"Number of iterations per TPU training loop.")
tf.flags.DEFINE_integer("num_shards", 8, "Number of shards (TPU chips).")
tf.flags.DEFINE_integer("batch_size", 1024,
"Mini-batch size for the training. Note that this "
"is the global batch size and not the per-shard batch.")
FLAGS = tf.flags.FLAGS
if FLAGS.use_tpu:
my_project_name = subprocess.check_output(['gcloud', 'config', 'get-value', 'project'])
my_zone = subprocess.check_output(['gcloud', 'config', 'get-value', 'compute/zone'])
cluster_resolver = TPUClusterResolver(
tpu=[FLAGS.tpu],
zone=my_zone,
project=my_project_name)
master = TPUClusterResolver(tpu=[os.environ['TPU_NAME']]).get_master()
else:
master = ''
my_tpu_run_config = tpu_config.RunConfig(
master=master,
model_dir=FLAGS.model_dir,
save_checkpoints_secs=FLAGS.save_checkpoints_secs,
save_summary_steps=FLAGS.save_summary_steps,
session_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True),
tpu_config=tpu_config.TPUConfig(iterations_per_loop=FLAGS.iterations, num_shards=FLAGS.num_shards),
)
# create estimator for model (the model is described in capser_7_model_fn)
capser = tpu_estimator.TPUEstimator(model_fn=model_fn_tpu,
config=my_tpu_run_config,
use_tpu=FLAGS.use_tpu,
train_batch_size=batch_size,
params={'model_batch_size': batch_size_per_shard})
# train model
logging.getLogger().setLevel(logging.INFO) # to show info about training progress
capser.train(input_fn=train_input_fn_tpu, steps=n_steps)
I have a capsule network defined in model_fn_tpu, which returns the TPUEstimator spec. The optimizer is a standard AdamOptimizer. I have made all the changes explained here https://www.tensorflow.org/guide/using_tpu#optimizer to make my code compatible with TPUEstimator. I get the following error:
Traceback (most recent call last):
File "C:/Users/doerig/PycharmProjects/capser/TPU_playground.py", line 85, in <module>
capser.train(input_fn=train_input_fn_tpu, steps=n_steps)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 363, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 843, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 856, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 831, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 2016, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 1121, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_estimator.py", line 1317, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "C:\Users\doerig\PycharmProjects\capser\capser_7_model_fn.py", line 101, in model_fn_tpu
**output_decoder_deconv_params)
File "C:\Users\doerig\PycharmProjects\capser\capser_model.py", line 341, in capser_model
loss_training_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step(), name="training_op")
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\python\training\optimizer.py", line 424, in minimize
name=name)
File "C:\Users\doerig\AppData\Local\Continuum\Anaconda2\envs\tensorflow\lib\site-packages\tensorflow\contrib\tpu\python\tpu\tpu_optimizer.py", line 113, in apply_gradients
summed_grads_and_vars.append((tpu_ops.cross_replica_sum(grad), var))
AttributeError: module 'tensorflow.contrib.tpu.python.ops.tpu_ops' has no attribute 'cross_replica_sum'
Any ideas to solve this problem? Thank you in advance!
I suspect this is either a bug in the version of TensorFlow you are using + Windows, or else an issue with your build of TensorFlow.
For example, when I chase down the file tensorflow\contrib\tpu\python\tpu\tpu_optimizer.py in the TF 1.4 branch, I see that tpu_ops is imported as:
from tensorflow.contrib.tpu.python.ops import tpu_ops
and if you chase that to the relevant file, you see:
if platform.system() != "Windows":
# pylint: disable=wildcard-import,unused-import,g-import-not-at-top
from tensorflow.contrib.tpu.ops.gen_tpu_ops import *
from tensorflow.contrib.util import loader
from tensorflow.python.platform import resource_loader
# pylint: enable=wildcard-import,unused-import,g-import-not-at-top
_tpu_ops = loader.load_op_library(
resource_loader.get_path_to_datafile("_tpu_ops.so"))
else:
# We have already built the appropriate libraries into the binary via CMake
# if we have built contrib, so we don't need this
pass
Following up with the other TF branches that existed at the time of this posting, we see similar comments in 1.5, in 1.6, in 1.7, in 1.8, and in 1.9.
I strongly suspect this would not occur under Linux, but I might test this later and edit this answer.

How to write a pickle file to S3, as a result of a luigi Task?

I want to store a pickle file on S3, as a result of a luigi Task. Below is the class that defines the Task:
class CreateItemVocabulariesTask(luigi.Task):
def __init__(self):
self.client = S3Client(AwsConfig().aws_access_key_id,
AwsConfig().aws_secret_access_key)
super().__init__()
def requires(self):
return [GetItem2VecDataTask()]
def run(self):
filename = 'item2vec_results.tsv'
data = self.client.get('s3://{}/item2vec_results.tsv'.format(AwsConfig().item2vec_path),
filename)
df = pd.read_csv(filename, sep='\t', encoding='latin1')
unique_users = df['CustomerId'].unique()
unique_items = df['ProductNumber'].unique()
item_to_int, int_to_item = utils.create_lookup_tables(unique_items)
user_to_int, int_to_user = utils.create_lookup_tables(unique_users)
with self.output()[0].open('wb') as out_file:
pickle.dump(item_to_int, out_file)
with self.output()[1].open('wb') as out_file:
pickle.dump(int_to_item, out_file)
with self.output()[2].open('wb') as out_file:
pickle.dump(user_to_int, out_file)
with self.output()[3].open('wb') as out_file:
pickle.dump(int_to_user, out_file)
def output(self):
files = [S3Target('s3://{}/item2int.pkl'.format(AwsConfig().item2vec_path), client=self.client),
S3Target('s3://{}/int2item.pkl'.format(AwsConfig().item2vec_path), client=self.client),
S3Target('s3://{}/user2int.pkl'.format(AwsConfig().item2vec_path), client=self.client),
S3Target('s3://{}/int2user.pkl'.format(AwsConfig().item2vec_path), client=self.client),]
return files
When I run this task I get the error ValueError: Unsupported open mode 'wb'. The items I try to dump into a pickle file are just python dictionaries.
Full traceback:
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\luigi\worker.py", line 203, in run
new_deps = self._run_get_new_deps()
File "C:\Anaconda3\lib\site-packages\luigi\worker.py", line 140, in _run_get_new_deps
task_gen = self.task.run()
File "C:\Users\user\Documents\python workspace\pipeline.py", line 60, in run
with self.output()[0].open('wb') as out_file:
File "C:\Anaconda3\lib\site-packages\luigi\contrib\s3.py", line 714, in open
raise ValueError("Unsupported open mode '%s'" % mode)
ValueError: Unsupported open mode 'wb'
This is an issue that only happens on python 3.x as explained here. In order to use python 3 and write a binary file or target (ie using 'wb' mode) just set format parameter for S3Target to Nop. Like this:
S3Target('s3://path/to/file', client=self.client, format=luigi.format.Nop)
Notice it's just a trick and not so intuitive nor documented.

Sagemaker ImportError: Import by filename is not supported

I have a custom algorithm for text prediction. I want to deploy that in sagemaker. I am following this tutorial.
https://docs.aws.amazon.com/sagemaker/latest/dg/tf-example1.html
The only change from the tutorial is.
from sagemaker.tensorflow import TensorFlow
iris_estimator = TensorFlow(entry_point='/home/ec2-user/SageMaker/sagemaker.py',
role=role,
output_path=model_artifacts_location,
code_location=custom_code_upload_location,
train_instance_count=1,
train_instance_type='ml.c4.xlarge',
training_steps=1000,
evaluation_steps=100, source_dir="./", requirements_file="requirements.txt")
.
%%time
import boto3
train_data_location = 's3://sagemaker-<my bucket>'
iris_estimator.fit(train_data_location)
INFO: the dataset is at the root of the bucket.
error log
ValueError: Error training sagemaker-tensorflow-2018-06-19-07-11-13-634: Failed Reason: AlgorithmError: uncaught exception during training: Import by filename is not supported.
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 36, in start
fw.train()
File "/usr/local/lib/python2.7/dist-packages/tf_container/train_entry_point.py", line 143, in train
customer_script = env.import_user_module()
File "/usr/local/lib/python2.7/dist-packages/container_support/environment.py", line 101, in import_user_module
user_module = importlib.import_module(script)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: Import by filename is not supported.
I solved this issue, The problem was using absolute path for entry_point.
when you use a source_dir parameter the path to the entry_point should be relative to the source_dir
I solved with:
region = boto3.Session().region_name
train_data_location = 's3://sagemaker-<my bucket>'.format(region)

Adam optimizer report error in chainer?

version: chainer 2.0.2
I use Adam optimizer ,then report error, I found it was caused by this code(fix1==0?):
in adam.py:
#property
def lr(self):
fix1 = 1. - math.pow(self.hyperparam.beta1, self.t)
fix2 = 1. - math.pow(self.hyperparam.beta2, self.t)
return self.hyperparam.alpha * math.sqrt(fix2) / fix1
error log:
Traceback (most recent call last):
File "AU_rcnn/train.py", line 237, in <module>
main()
File "AU_rcnn/train.py", line 233, in main
trainer.run()
File "/root/anaconda3/lib/python3.6/site-packages/chainer/training/trainer.py", line 285, in run
initializer(self)
File "/root/anaconda3/lib/python3.6/site-packages/chainer/training/extensions/exponential_shift.py", line 48, in initialize
self._init = getattr(optimizer, self._attr)
File "/root/anaconda3/lib/python3.6/site-packages/chainer/optimizers/adam.py", line 121, in lr
return self.hyperparam.alpha * math.sqrt(fix2) / fix1
ZeroDivisionError: float division by zero
Use "alpha" attribute to control learning rate for Adam in Chainer.
"lr" is defined as built-in property, it should not be override by other value.
Set "alpha" as an attribute for ExponentialShift (official doc) as well to decay learning rate, if you use Adam optimizer.
from chainer.optimizers import Adam
optimizer = Adam(alpha=0.001)
# --- Define trainer here... ---
trainer.extend(extensions.ExponentialShift("alpha", 0.99, optimizer=optimizer), trigger=(1, 'epoch'))
I have same ploblem, and tried corochann's approach.
However, it didn't slove the ploblem.
my chainer version 2.1.0
Used code is https://github.com/chainer/chainer/blob/master/examples/cifar/train_cifar.py
be changed L57 into "optimizer = chainer.optimizers.Adam()".

Error : No module named numpy.core.multiarray. Maxent tree bank pos-tagger model is already installed

This is my program:
import nltk
text = "Rabbit is eating"
token2 = nltk.word_tokenize(text)
print token2
txttoken = nltk.pos_tag(token2)
print txttoken
This is the error I'm getting:
Traceback (most recent call last):
File "PosTag.py", line 8, in <module>
txttoken = nltk.pos_tag(token2)
File "C:\Python27\lib\site-packages\nltk-2.0.4-py2.7.egg\nltk\tag\__init__.py", line 99, in pos_tag
tagger = load(_POS_TAGGER)
File "C:\Python27\lib\site-packages\nltk-2.0.4-py2.7.egg\nltk\data.py", line 605, in load
resource_val = pickle.load(_open(resource_url))
ImportError: No module named numpy.core.multiarray
I just check with nltk.download() on python cmd..
These models are already installed. But I still get the error..
NLTK Downloader Models. Maxent tree bank pos-tagger model is installed. punkt model is installed