scrapyd Error on schedule new spider - scrapy

I cannot schedule a spider run
Deploy seems to be ok:
Deploying to project "scraper" in http://localhost:6800/addversion.json
Server response (200):
{"status": "ok", "project": "scraper", "version": "1418909664", "spiders": 3}
I scheduling a new spider run :
curl http://localhost:6800/schedule.json -d project=scraper -d spider=spider
{"status": "ok", "jobid": "3f81a0e486bb11e49a6800163ed5ae93"}
but on scrapyd I get this error:
2014-12-18 14:39:12+0100 [-] Process started: project='scraper' spider='spider' job='3f81a0e486bb11e49a6800163ed5ae93' pid=28565 log='/usr/scrapyd/logs/scraper/spider/3f81a0e486bb11e49a6800163ed5ae93.log' items='/usr/scrapyd/items/scraper/spider/3f81a0e486bb11e49a6800163ed5ae93.jl'
2014-12-18 14:39:13+0100 [Launcher,28565/stderr] Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/scrapyd/runner.py", line 39, in <module>
2014-12-18 14:39:13+0100 [Launcher,28565/stderr] main()
File "/usr/local/lib/python2.7/dist-packages/scrapyd/runner.py", line 36, in main
execute()
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 58, in run
spider = crawler.spiders.create(spname, **opts.spargs)
2014-12-18 14:39:13+0100 [Launcher,28565/stderr] File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermanager.py", line 48, in create
return spcls(**spider_kwargs)
File "build/bdist.linux-x86_64/egg/scraper/spiders/spider.py", line 104, in __init__
File "/usr/lib/python2.7/os.py", line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 20] Not a directory: '/tmp/scraper-1418909944-dKTRZI.egg/logs/'
2014-12-18 14:39:14+0100 [-] Process died: exitstatus=1 project='scraper'
Any ideas? :(

You are trying to create a directory inside an egg.
OSError: [Errno 20] Not a directory: '/tmp/scraper-1418909944-dKTRZI ---->.egg<----- /logs/'

Related

Spyder can't launch

My Spyder crashes while lauching it. It's never successfully opened
PermissionError: [Errno 13] Permission denied: '/Users/macbookair/.config/pylintrc'
Detailed Error:
Traceback (most recent call last): File "/Users/macbookair/anaconda3/bin/spyder", line 11, in sys.exit(main()) File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/spyder/app/start.py", line 252, in main mainwindow.main(options, args) File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/spyder/app/mainwindow.py", line 1956, in main mainwindow = create_window(MainWindow, app, splash, options, args) File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/spyder/app/utils.py", line 289, in create_window main.setup() File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/spyder/app/mainwindow.py", line 736, in setup internal_plugins = find_internal_plugins() File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/spyder/app/find_plugins.py", line 40, in find_internal_plugins mod = importlib.import_module(entry_point.module_name) File "/Users/macbookair/anaconda3/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1030, in _gcd_import File "", line 1007, in _find_and_load File "", line 986, in _find_and_load_unlocked File "", line 680, in _load_unlocked File "", line 850, in exec_module File "", line 228, in _call_with_frames_removed File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/spyder/plugins/pylint/plugin.py", line 22, in from spyder.plugins.pylint.confpage import PylintConfigPage File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/spyder/plugins/pylint/confpage.py", line 16, in from spyder.plugins.pylint.main_widget import (MAX_HISTORY_ENTRIES, File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/spyder/plugins/pylint/main_widget.py", line 35, in from spyder.plugins.pylint.utils import get_pylintrc_path File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/spyder/plugins/pylint/utils.py", line 16, in import pylint.config File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/pylint/config/__init__.py", line 27, in from pylint.config.environment_variable import PYLINTRC File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/pylint/config/environment_variable.py", line 11, in PYLINTRC = find_pylintrc() File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/pylint/config/find_default_config_files.py", line 94, in find_pylintrc for config_file in find_default_config_files(): File "/Users/macbookair/anaconda3/lib/python3.9/site-packages/pylint/config/find_default_config_files.py", line 76, in find_default_config_files if home_rc.is_file(): File "/Users/macbookair/anaconda3/lib/python3.9/pathlib.py", line 1456, in is_file return S_ISREG(self.styour textat().st_mode) File "/Users/macbookair/anaconda3/lib/python3.9/pathlib.py", line 1232, in stat return self._accessor.stat(self) PermissionError: [Errno 13] Permission denied: '/Users/macbookair/.config/pylintrc'
Did anyone meet this problem too?
(Spyder maintainer here) This problem will be fixed in Spyder 5.4.1, to be released before the end of 2022.

problem running python script with Selenium wrong parameter

Good morning, since a few days I have this problem when I run my script in python, until two days ago it was working now it doesn't work anymore, and I can't understand what is the problem...
I attach error code and imports.
Thanks
2022-07-14 15:49:04,691 INFO Your selenium-driver-updater library is up to date.
2022-07-14 15:49:05,756 INFO Latest version of edgedriver: 103.0.1264.51
2022-07-14 15:49:05,756 INFO Started download edgedriver latest_version: 103.0.1264.51
2022-07-14 15:49:05,756 INFO Started download edgedriver by url: https://msedgedriver.azureedge.net/103.0.1264.51/edgedriver_win64.zip
2022-07-14 15:49:06,882 ERROR error: Traceback (most recent call last):
File "C:\Users\diriy\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium_driver_updater\driverUpdater.py", line 133, in install
driver_path = DriverUpdater.__run_specific_driver()
File "C:\Users\diriy\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium_driver_updater\driverUpdater.py", line 339, in __run_specific_driver
driver_path = driver.main()
File "C:\Users\diriy\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium_driver_updater\_edgeDriver.py", line 54, in main
driver_path = self.__check_if_edgedriver_is_up_to_date()
File "C:\Users\diriy\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium_driver_updater\_edgeDriver.py", line 80, in __check_if_edgedriver_is_up_to_date
driver_path = self._download_driver()
File "C:\Users\diriy\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium_driver_updater\_edgeDriver.py", line 187, in _download_driver
archive_path = wget.download(url=url, out=out_path)
File "C:\Users\diriy\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\wget.py", line 526, in download
(tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 239, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 523, in open
response = meth(req, response)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 632, in http_response
response = self.parent.error(
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 561, in error
return self._call_chain(*args)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
c:\Users\diriy\OneDrive\Python\Monatseinteilung\Monatseint_Tool_v3.104.py:65: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Edge(filename, options=options)
Traceback (most recent call last):
File "c:\Users\diriy\OneDrive\Python\Monatseinteilung\Monatseint_Tool_v3.104.py", line 65, in <module>
driver = webdriver.Edge(filename, options=options)
File "C:\Users\diriy\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\edge\webdriver.py", line 61, in __init__
super().__init__(DesiredCapabilities.EDGE['browserName'], "ms",
File "C:\Users\diriy\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\chromium\webdriver.py", line 89, in __init__
self.service.start()
File "C:\Users\diriy\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\selenium\webdriver\common\service.py", line 71, in start
self.process = subprocess.Popen(cmd, env=self.env,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
OSError: [WinError 87] Wrong paramater

Tensorflow TF_records Generate Error

When I try to generate TF record, I'm getting the following error message:
Traceback (most recent call last):
File "generate_tfrecord.py", line 112, in <module>
tf.app.run()
File "/home/harisohmnaathss/anaconda3/envs/tensorflow/lib/python3.5/site-
packages/tensorflow/python/platform/app.
py", line 124, in run
_sys.exit(main(argv))
File "generate_tfrecord.py", line 98, in main
writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
File "/home/harisohmnaathss/anaconda3/envs/tensorflow/lib/python3.5/site-
packages/tensorflow/python/lib/io/tf_rec
ord.py", line 106, in __init__
compat.as_bytes(path), compat.as_bytes(compression_type), status)
File "/home/harisohmnaathss/anaconda3/envs/tensorflow/lib/python3.5/site-
packages/tensorflow/python/framework/err
ors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or
directory
The command that I try to run is:
python generate_tfrecord.py --csv_input=data/Train_labels.csv
--output_path=data/train.records
Any ideas to solve this issue?
You are supplying output path as data/train.records instead of data/train.tfrecord

Job failed on Cloud ML after successful completion of 1000

I had walked through this cloudML tutorial on census data: cloud.google.com/ml-engine/docs/how-tos/getting-started-training-prediction in which the Job was successful. However, when I walk through this tutorial on flower image data: https://cloud.google.com/blog/big-data/2016/12/how-to-classify-images-with-tensorflow-using-google-cloud-machine-learning-and-cloud-dataflow my training task appears to successful based on the completion of 1000 steps from the log. However, upon completion from this snapshot StackDriver logs, it says job failed. I have tried using the same structure replacing the command-line arguments from the census data walkthrough, deleted and recreated JOB_ID and --output_path user argument, used the STANDARD_1 scale tier but to no avail. Any help I can get from the community would be appreciated. Thanks!
Below are the errors, you can see that popped out towards the tail end of the logs snapshot:
{
textPayload: "The replica master 0 exited with a non-zero status of 1. Termination reason: Error.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 542, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 305, in main
run(model, argv)
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 436, in run
dispatch(args, model, cluster, task)
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 477, in dispatch
Trainer(args, model, cluster, task).run_training()
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 241, in run_training
self.eval(session)
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 283, in eval
self.model.format_metric_values(self.evaluator.evaluate()))
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 95, in evaluate
return metric_values
File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 960, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/supervisor.py", line 788, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 386, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 234, in _run
sess.run(enqueue_op)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
NotFoundError: Error executing an HTTP request (HTTP response code 404, error code 0, error message '')
when reading gs://project-166422-ml/User/flowers_User_20170522_121407/preproc/eval
[[Node: ReaderReadUpToV2 = ReaderReadUpToV2[_device="/job:localhost/replica:0/task:0/cpu:0"](TFRecordReaderV2, input_producer, ReaderReadUpToV2/num_records)]]
Caused by op u'ReaderReadUpToV2', defined at:
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 542, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 305, in main
run(model, argv)
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 436, in run
dispatch(args, model, cluster, task)
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 477, in dispatch
Trainer(args, model, cluster, task).run_training()
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 241, in run_training
self.eval(session)
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 283, in eval
self.model.format_metric_values(self.evaluator.evaluate()))
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 57, in evaluate
self.eval_batch_size)
File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 310, in build_eval_graph
return self.build_graph(data_paths, batch_size, GraphMod.EVALUATE)
File "/root/.local/lib/python2.7/site-packages/trainer/model.py", line 231, in build_graph
num_epochs=None if is_training else 2)
File "/root/.local/lib/python2.7/site-packages/trainer/util.py", line 52, in read_examples
filename_queue, batch_size)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/io_ops.py", line 226, in read_up_to
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 380, in _reader_read_up_to_v2
num_records=num_records, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()
NotFoundError (see above for traceback): Error executing an HTTP request (HTTP response code 404, error code 0, error message '')
when reading gs://project-166422-ml/User/flowers_User_20170522_121407/preproc/eval
[[Node: ReaderReadUpToV2 = ReaderReadUpToV2[_device="/job:localhost/replica:0/task:0/cpu:0"](TFRecordReaderV2, input_producer, ReaderReadUpToV2/num_records)]]
To find out more about why your job exited please check the logs: console.cloud.google.com/logs/viewer?project=123456234&resource=ml_job%2Fjob_id%2Fflowers_User_20170524_145125&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%22flowers_User_20170524_145125%22"***
The error indicates a 404 not found when trying to read
gs://project-166422-ml/User/flowers_User_20170522_121407/preproc/eval
Does that file exist?
Based on the name I'm guessing that's evaluation data. So my guess is you are only running evaluation every 1000 steps which is why 1000 steps complete successfully. Then it tries to run evaluation and it fails because the data doesn't exist.

Scrapyd S3 feed export "Connection Reset by Peer"

I'm running Scrapyd with a FEED_URI set to export to S3, but I received the following error at the very end of my scrape. Note that it successfully uploaded a few hundred kb of data to the bucket as the scrape began, then threw this error at the end:
2014-11-24 10:11:23+0000 [word] ERROR: Error storing csv feed (2285242 items) in: s3://kitchen.bucket/FoodItem.csv
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 783, in __bootstrap
self.__bootstrap_inner()
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
--- <exception caught here> ---
File "/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 191, in _worker
result = context.call(ctx, function, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
return func(*args,**kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/contrib/feedexport.py", line 101, in _store_in_thread
key.set_contents_from_file(file)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 1291, in set_contents_from_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 748, in send_file
chunked_transfer=chunked_transfer, size=size)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 949, in _send_file_internal
query_args=query_args
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 664, in make_request
retry_handler=retry_handler
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1068, in make_request
retry_handler=retry_handler)
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 939, in _mexe
request.body, request.headers)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 842, in sender
http_conn.send(chunk)
File "/usr/lib/python2.7/httplib.py", line 805, in send
self.sock.sendall(data)
File "/usr/lib/python2.7/ssl.py", line 329, in sendall
v = self.send(data[count:])
File "/usr/lib/python2.7/ssl.py", line 298, in send
v = self._sslobj.write(data)
socket.error: [Errno 104] Connection reset by peer
Looks similar to boto issue 2207. I'm using gbirke's MultiFeedExporter, and received a similar error on both of my items.