I am trying to deploy a scrapy 2.1.0 project with scrapy-deploy 1.2 and get this error:
scrapyd-deploy example
/Library/Frameworks/Python.framework/Versions/3.8/bin/scrapyd-deploy:23: ScrapyDeprecationWarning: Module `scrapy.utils.http` is deprecated, Please import from `w3lib.http` instead.
from scrapy.utils.http import basic_auth_header
fatal: No names found, cannot describe anything.
Packing version r1-master
Deploying to project "crawler" in http://myip:6843/addversion.json
Server response (200):
{"node_name": "spider1", "status": "error", "message": "/usr/local/lib/python3.8/dist-packages/scrapy/utils/project.py:90: ScrapyDeprecationWarning: Use of environment variables prefixed with SCRAPY_ to override settings is deprecated. The following environment variables are currently defined: EGG_VERSION\n warnings.warn(\nTraceback (most recent call last):\n File \"/usr/lib/python3.8/runpy.py\", line 193, in _run_module_as_main\n return _run_code(code, main_globals, None,\n File \"/usr/lib/python3.8/runpy.py\", line 86, in _run_code\n exec(code, run_globals)\n File \"/usr/local/lib/python3.8/dist-packages/scrapyd/runner.py\", line 40, in <module>\n main()\n File \"/usr/local/lib/python3.8/dist-packages/scrapyd/runner.py\", line 37, in main\n execute()\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/cmdline.py\", line 142, in execute\n cmd.crawler_process = CrawlerProcess(settings)\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py\", line 280, in __init__\n super(CrawlerProcess, self).__init__(settings)\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py\", line 152, in __init__\n self.spider_loader = self._get_spider_loader(settings)\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py\", line 146, in _get_spider_loader\n return loader_cls.from_settings(settings.frozencopy())\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/spiderloader.py\", line 60, in from_settings\n return cls(settings)\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/spiderloader.py\", line 24, in __init__\n self._load_all_spiders()\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/spiderloader.py\", line 46, in _load_all_spiders\n for module in walk_modules(name):\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/utils/misc.py\", line 69, in walk_modules\n mod = import_module(path)\n File \"/usr/lib/python3.8/importlib/__init__.py\", line 127, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n File \"<frozen importlib._bootstrap>\", line 1014, in _gcd_import\n File \"<frozen importlib._bootstrap>\", line 991, in _find_and_load\n File \"<frozen importlib._bootstrap>\", line 973, in _find_and_load_unlocked\nModuleNotFoundError: No module named 'crawler.spiders_prod'\n"}
crawler.spiders_prod is the first module defined in SPIDER_MODULES
Part of crawler.settings.py:
SPIDER_MODULES = ['crawler.spiders_prod', 'crawler.spiders_dev']
NEWSPIDER_MODULE = 'crawler.spiders_dev'
The crawler works localy, but using deploy it will fail to use whatever I call the folder where my spiders live in.
scrapyd-deploy setup.py:
# Automatically created by: scrapyd-deploy
from setuptools import setup, find_packages
setup(
name = 'project',
version = '1.0',
packages = find_packages(),
entry_points = {'scrapy': ['settings = crawler.settings']},
)
scrapy.cfg:
[deploy:example]
url = http://myip:6843/
username = test
password = whatever.
project = crawler
version = GIT
Is this possibly a bug or am I missing something?
Modules have to be initialised within scrapy. This happens through simply placing the following file into each folder defined as a module:
__init__.py
This has solved my described problem.
Learning:
If you want to split your spiders into folders, it is not enough to simple create a folder and specify this folder as a module within the settings file, but you also need to place this file into the new folder. Funny engough the crawler works, without the file just deployment to scrapyd fails.
Related
I'm trying to install numpy on a Virtual Environment using pip, there is a failure when building wheel for it, however.
The problem is only present when trying to install it in the virtualenv, I can install and update it on my system just fine.
It is running on Windows 10, Python 3.10.5, pip 22.3.1.
Running from numpy source directory.
setup.py:67: DeprecationWarning:
`numpy.distutils` is deprecated since NumPy 1.23.0, as a result
of the deprecation of `distutils` itself. It will be removed for
Python >= 3.12. For older Python versions it will remain present.
It is recommended to use `setuptools < 60.0` for those Python versions.
For more details, see:
https://numpy.org/devdocs/reference/distutils_status_migration.html
import numpy.distutils.command.sdist
Processing numpy/random\_bounded_integers.pxd.in
Processing numpy/random\bit_generator.pyx
Processing numpy/random\mtrand.pyx
Processing numpy/random\_bounded_integers.pyx.in
Processing numpy/random\_common.pyx
Processing numpy/random\_generator.pyx
Processing numpy/random\_mt19937.pyx
Processing numpy/random\_pcg64.pyx
Processing numpy/random\_philox.pyx
Processing numpy/random\_sfc64.pyx
Cythonizing sources
INFO: blas_opt_info:
INFO: blas_armpl_info:
Looking for python310.dll
Traceback (most recent call last):
File "C:\stable_diffusion\diffusers_venv\lib\python3.10\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 363, in <module>
main()
File "C:\stable_diffusion\diffusers_venv\lib\python3.10\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 345, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "C:\stable_diffusion\diffusers_venv\lib\python3.10\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 261, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "C:\Users\Vitor\AppData\Local\Temp\pip-build-env-amr3d935\overlay\lib\python3.10\site-packages\setuptools\build_meta.py", line 230, in build_wheel
return self._build_with_temp_dir(['bdist_wheel'], '.whl',
File "C:\Users\Vitor\AppData\Local\Temp\pip-build-env-amr3d935\overlay\lib\python3.10\site-packages\setuptools\build_meta.py", line 215, in _build_with_temp_dir
self.run_setup()
File "C:\Users\Vitor\AppData\Local\Temp\pip-build-env-amr3d935\overlay\lib\python3.10\site-packages\setuptools\build_meta.py", line 267, in run_setup
super(_BuildMetaLegacyBackend,
File "C:\Users\Vitor\AppData\Local\Temp\pip-build-env-amr3d935\overlay\lib\python3.10\site-packages\setuptools\build_meta.py", line 158, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 479, in <module>
setup_package()
File "setup.py", line 471, in setup_package
setup(**metadata)
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\core.py", line 135, in setup
config = configuration()
File "setup.py", line 118, in configuration
config.add_subpackage('numpy')
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\misc_util.py", line 1050, in add_subpackage
config_list = self.get_subpackage(subpackage_name, subpackage_path,
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\misc_util.py", line 1016, in get_subpackage
config = self._get_configuration_from_setup_py(
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\misc_util.py", line 958, in _get_configuration_from_setup_py
config = setup_module.configuration(*args)
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\setup.py", line 9, in configuration
config.add_subpackage('core')
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\misc_util.py", line 1050, in add_subpackage
config_list = self.get_subpackage(subpackage_name, subpackage_path,
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\misc_util.py", line 1016, in get_subpackage
config = self._get_configuration_from_setup_py(
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\misc_util.py", line 958, in _get_configuration_from_setup_py
config = setup_module.configuration(*args)
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\core\setup.py", line 853, in configuration
blas_info = get_info('blas_opt', 0)
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\system_info.py", line 585, in get_info
return cl().get_info(notfound_action)
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\system_info.py", line 845, in get_info
self.calc_info()
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\system_info.py", line 2077, in calc_info
if self._calc_info(blas):
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\system_info.py", line 2063, in _calc_info
return getattr(self, '_calc_info_{}'.format(name))()
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\system_info.py", line 1979, in _calc_info_armpl
info = get_info('blas_armpl')
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\system_info.py", line 585, in get_info
return cl().get_info(notfound_action)
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\system_info.py", line 845, in get_info
self.calc_info()
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\system_info.py", line 1337, in calc_info
info = self.check_libs2(lib_dirs, armpl_libs)
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\system_info.py", line 1001, in check_libs2
exts = self.library_extensions()
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\system_info.py", line 960, in library_extensions
c = customized_ccompiler()
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\system_info.py", line 216, in customized_ccompiler
global_compiler = _customized_ccompiler()
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\__init__.py", line 62, in customized_ccompiler
c = ccompiler.new_compiler(plat=plat, compiler=compiler, verbose=verbose)
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\ccompiler.py", line 780, in new_compiler
compiler = klass(None, dry_run, force)
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\mingw32ccompiler.py", line 64, in __init__
build_import_library()
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\mingw32ccompiler.py", line 348, in build_import_library
return _build_import_library_amd64()
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\mingw32ccompiler.py", line 397, in _build_import_library_amd64
dll_file = find_python_dll()
File "C:\Users\Vitor\AppData\Local\Temp\pip-install-1sr3obr4\numpy_fa19a5e39e9448e7b6553e0f5a275159\numpy\distutils\mingw32ccompiler.py", line 219, in find_python_dll
raise ValueError("%s not found in %s" % (dllname, lib_dirs))
ValueError: python310.dll not found in ['C:\\stable_diffusion\\diffusers_venv\\', 'C:\\stable_diffusion\\diffusers_venv\\lib', 'C:\\stable_diffusion\\diffusers_venv\\bin', 'C:\\Program Files\\Inkscape\\', 'C:\\Program Files\\Inkscape\\lib', 'C:\\Program Files\\Inkscape\\bin', 'C:\\WINDOWS\\System32']
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for numpy
Failed to build numpy
ERROR: Could not build wheels for numpy, which is required to install pyproject.toml-based projects
As I faced a similar problem before, my instinct was to update pip, change the running python version or try an older, specific, version of numpy, but nothing seems to work and the problem persist.
My tests pass locally and in fact on Github Actions it also says "ran 8 tests" and then "OK" (and I have 8). However, the test stage fails due to a strange error in the traceback.
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/db/backends/utils.py", line 82, in _execute
return self.cursor.execute(sql)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 421, in execute
return Database.Cursor.execute(self, query)
sqlite3.OperationalError: near "SCHEMA": syntax error
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/runner/work/store/store/manage.py", line 22, in <module>
main()
File "/home/runner/work/store/store/manage.py", line 18, in main
execute_from_command_line(sys.argv)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
utility.execute()
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/core/management/__init__.py", line 413, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/core/management/commands/test.py", line 23, in run_from_argv
super().run_from_argv(argv)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/core/management/base.py", line 354, in run_from_argv
self.execute(*args, **cmd_options)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/core/management/base.py", line 398, in execute
output = self.handle(*args, **options)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/core/management/commands/test.py", line 55, in handle
failures = test_runner.run_tests(test_labels)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/test/runner.py", line 736, in run_tests
self.teardown_databases(old_config)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django_heroku/core.py", line 41, in teardown_databases
self._wipe_tables(connection)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django_heroku/core.py", line 26, in _wipe_tables
cursor.execute(
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/db/backends/utils.py", line 82, in _execute
return self.cursor.execute(sql)
File "/opt/hostedtoolcache/Python/3.9.9/x64/lib/python3.9/site-packages/django/db/backends/sqlite3/base.py", line 421, in execute
return Database.Cursor.execute(self, query)
django.db.utils.OperationalError: near "SCHEMA": syntax error
Error: Process completed with exit code 1.
These are all just default Django files and I haven't messed with any of them. I don't really know what to do about it and internet searches yield nothing helpful.
Just had the same issue. For me I had django_heroku in my project settings.py. It looks like, based on the comments that your app is using this.
I had something like this in settings.py:
import django_heroku
django_heroku.settings(locals())
change to the following so it doesn't use django_heroku on github actions:
if os.environ.get('ENVIRONMENT') != 'github':
import django_heroku
django_heroku.settings(locals())
and then declare an environment variable in the workflow file, I have this:
jobs:
build:
runs-on: ubuntu-latest
strategy:
max-parallel: 4
matrix:
python-version: [3.7, 3.8, 3.9, '3.10']
env:
DJANGO_SECRET_KEY: "someKey"
ENVIRONMENT: github
Note: this is assuming you have environment variables setup. I know there are different ways of doing it, so depending on how they are setup you might have to change how they are accessed
more info here
edit:
As per the comments, you don't have to declare an additional environment variable because github has a default variable called GITHUB_ACTIONS
so you can do
if os.environ.get('GITHUB_ACTIONS') != 'true':
import django_heroku
django_heroku.settings(locals())
My scrapy settings.py
from datetime import datetime
file_name = datetime.today().strftime('%Y-%m-%d_%H%M_')
save_name = file_name + 'Mobile_Nshopping'
FEED_URI = 'ftp://myusername:mypassword#ftp.mymail.com/uploads/%(save_name)s.csv'
when I'm running my spider scrapy crawl my_project_name getting error...
Can I have to create a pipeline?
\scrapy\extensions\feedexport.py:247: ScrapyDeprecationWarning: The `FEED_URI` and `FEED_FORMAT` settings have been deprecated in favor of the `FEEDS` setting. Please see the `FEEDS` setting docs for more details
exporter = cls(crawler)
Traceback (most recent call last):
File "c:\users\viren\appdata\local\programs\python\python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\viren\appdata\local\programs\python\python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\viren\AppData\Local\Programs\Python\Python38\Scripts\scrapy.exe\__main__.py", line 7, in <module>
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 145, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 100, in _run_print_help
func(*a, **kw)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 153, in _run_command
cmd.run(args, opts)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\commands\crawl.py", line 22, in run
crawl_defer = self.crawler_process.crawl(spname, **opts.spargs)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 191, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 224, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 229, in _create_crawler
return Crawler(spidercls, self.settings)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 72, in __init__
self.extensions = ExtensionManager.from_crawler(self)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\middleware.py", line 35, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\utils\misc.py", line 167, in create_instance
instance = objcls.from_crawler(crawler, *args, **kwargs)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 247, in from_crawler
exporter = cls(crawler)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 282, in __init__
if not self._storage_supported(uri, feed_options):
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 427, in _storage_supported
self._get_storage(uri, feed_options)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 458, in _get_storage
instance = build_instance(feedcls.from_crawler, crawler)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 455, in build_instance
return build_storage(builder, uri, feed_options=feed_options, preargs=preargs)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 46, in build_storage
return builder(*preargs, uri, *args, **kwargs)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 201, in from_crawler
return build_storage(
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 46, in build_storage
return builder(*preargs, uri, *args, **kwargs)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 192, in __init__
self.port = int(u.port or '21')
File "c:\users\viren\appdata\local\programs\python\python38\lib\urllib\parse.py", line 174, in port
raise ValueError(message) from None
ValueError: Port could not be cast to integer value as 'Edh=)9sd'
I don't know how to store CSV into FTP.
error is coming because my password is int?
Is there anything I forget to write?
Can I have to create a pipeline?
Yes, you probably should create a pipeline. As shown in the Scrapy Architecture Diagram, the basic concept is this: requests are sent, responses come back and processed by the spider, and finally, the pipeline does something with the items returned by the spider. In your case, you could create a pipeline that saves the data in a CSV file and uploads it to an ftp server. See Scrapy's Item Pipeline documentation for more information.
I don't know how to store CSV into FTP. error is coming because my password is int? Is there anything I forget to write?
I believe this is due to the deprecation error below (and shown at the top of the errors you provided):
ScrapyDeprecationWarning: The FEED_URI and FEED_FORMAT settings have been deprecated in favor of the FEEDS setting. Please see the FEEDS setting docs for more details.
Try replacing FEED_URI with FEEDS; see the Scrapy documentation on FEEDS.
You need to specify the port as well.
You can specify this in settings.
See also class definition from scrapy docs
class FTPFilesStore:
FTP_USERNAME = None
FTP_PASSWORD = None
USE_ACTIVE_MODE = None
def __init__(self, uri):
if not uri.startswith("ftp://"):
raise ValueError(f"Incorrect URI scheme in {uri}, expected 'ftp'")
u = urlparse(uri)
self.port = u.port
self.host = u.hostname
self.port = int(u.port or 21)
self.username = u.username or self.FTP_USERNAME
self.password = u.password or self.FTP_PASSWORD
self.basedir = u.path.rstrip('/')
I am trying to deploy a scrapy project via scrapyd-deploy to a remote scrapyd server. The project itself is functional and works perfectly on my local machine and on the remote server when I deploy it via git push prod to the remote server.
With scrapyd-deploy I get this error:
% scrapyd-deploy example -p apo
{ "node_name": "spider1",
"status": "error",
"message": "/usr/local/lib/python3.8/dist-packages/scrapy/utils/project.py:90: ScrapyDeprecationWarning: Use of environment variables prefixed with SCRAPY_ to override settings is deprecated. The following environment variables are currently defined: EGG_VERSION\n warnings.warn(\nTraceback (most recent call last):\n File \"/usr/lib/python3.8/runpy.py\", line 193, in _run_module_as_main\n return _run_code(code, main_globals, None,\n File \"/usr/lib/python3.8/runpy.py\", line 86, in _run_code\n exec(code, run_globals)\n File \"/usr/local/lib/python3.8/dist-packages/scrapyd/runner.py\", line 40, in <module>\n main()\n File \"/usr/local/lib/python3.8/dist-packages/scrapyd/runner.py\", line 37, in main\n execute()\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/cmdline.py\", line 142, in execute\n cmd.crawler_process = CrawlerProcess(settings)\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py\", line 280, in __init__\n super(CrawlerProcess, self).__init__(settings)\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py\", line 152, in __init__\n self.spider_loader = self._get_spider_loader(settings)\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py\", line 146, in _get_spider_loader\n return loader_cls.from_settings(settings.frozencopy())\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/spiderloader.py\", line 60, in from_settings\n return cls(settings)\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/spiderloader.py\", line 24, in __init__\n self._load_all_spiders()\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/spiderloader.py\", line 46, in _load_all_spiders\n for module in walk_modules(name):\n File \"/usr/local/lib/python3.8/dist-packages/scrapy/utils/misc.py\", line 77, in walk_modules\n submod = import_module(fullpath)\n File \"/usr/lib/python3.8/importlib/__init__.py\",
line 127, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n File \"<frozen importlib._bootstrap>\",
line 1014, in _gcd_import\n File \"<frozen importlib._bootstrap>\",
line 991, in _find_and_load\n File \"<frozen importlib._bootstrap>\",
line 975, in _find_and_load_unlocked\n File \"<frozen importlib._bootstrap>\",
line 655, in _load_unlocked\n File \"<frozen importlib._bootstrap>\",
line 618, in _load_backward_compatible\n File \"<frozen zipimport>\",
line 259, in load_module\n File \"/tmp/apo-v1.0.2-114-g8a2f218-master-5kgzxesk.egg/bid/spiders/allaboutwatches.py\",
line 31, in <module>\n File \"/tmp/apo-v1.0.2-114-g8a2f218-master-5kgzxesk.egg/bid/spiders/allaboutwatches.py\",
line 36, in GetbidSpider\n File \"/tmp/apo-v1.0.2-114-g8a2f218-master-5kgzxesk.egg/bid/act_functions.py\",
line 10, in create_image_dir\nNotADirectoryError: [Errno 20]
Not a directory: '/tmp/apo-v1.0.2-114-g8a2f218-master-5kgzxesk.egg/bid/../images/allaboutwatches'\n"}
My guess is that it has to do something with this method I am calling, since part of the error disapears once I outcomment the method call:
# function will create a custom name directory to hold the images of each crawl
def create_image_dir(name):
project_dir = os.path.dirname(__file__)+'/../' #<-- absolute dir the script is in
img_dir = project_dir+"images/"+name
if not os.path.exists(img_dir):
os.mkdir(img_dir);
custom_settings = {
'IMAGES_STORE': img_dir ,
}
return custom_settings
Same goes for this method:
def brandnames():
brands = dict()
script_dir = os.path.dirname(__file__) #<-- absolute dir the script is in
rel_path = "imports/brand_names.csv"
abs_file_path = os.path.join(script_dir, rel_path)
with open(abs_file_path, newline='') as csvfile:
reader = csv.DictReader(csvfile, delimiter=';', quotechar='"')
for row in reader:
brands[row['name'].lower()] = row['name']
return brands
How can I change the method or the deploy config and keep my functionality as it is?
os.mkdir might be failing because it cannot create nested directories. You can use os.makedirs(img_dir, exist_ok=True) instead.
os.path.dirname(__file__) will point to /tmp/... under scrapyd. Not sure if this is what you want. I would use an absolute path without calling
os.path.dirname(__file__) if you don't want images to be downloaded under /tmp.
i installed google-cloud-sdk in ubuntu 14.04 and when tried to login,it is showing this error.
krish#jarvis:~$ gcloud auth login
Traceback (most recent call last):
File "/usr/bin/../lib/google-cloud-sdk/./lib/googlecloudsdk/gcloud/gcloud.py", line 87, in
from googlecloudsdk.calliope import base
File "/usr/bin/../lib/google-cloud-sdk/./lib/googlecloudsdk/calliope/base.py", line 8, in
from googlecloudsdk.core import log
File "/usr/bin/../lib/google-cloud-sdk/./lib/googlecloudsdk/core/log.py", line 413, in
_log_manager = _LogManager()
File "/usr/bin/../lib/google-cloud-sdk/./lib/googlecloudsdk/core/log.py", line 195, in init
self.console_formatter = _ConsoleFormatter(sys.stderr)
File "/usr/bin/../lib/google-cloud-sdk/./lib/googlecloudsdk/core/log.py", line 172, in init
use_color = not properties.VALUES.core.disable_color.GetBool()
File "/usr/bin/../lib/google-cloud-sdk/./lib/googlecloudsdk/core/properties.py", line 782, in GetBool
value = _GetBoolProperty(self, PropertiesFile.Load(), required)
File "/usr/bin/../lib/google-cloud-sdk/./lib/googlecloudsdk/core/properties.py", line 1141, in Load
PropertiesFile._PROPERTIES = PropertiesFile(paths)
File "/usr/bin/../lib/google-cloud-sdk/./lib/googlecloudsdk/core/properties.py", line 1160, in init
self.__Load(properties_path)
File "/usr/bin/../lib/google-cloud-sdk/./lib/googlecloudsdk/core/properties.py", line 1174, in __Load
raise PropertiesParseError(e.message)
googlecloudsdk.core.properties.PropertiesParseError: File contains no section headers.
file: /home/krish/.config/gcloud/properties, line: 1
'h4\xaf\xe3\xda^\xa6\xe8\xb2\xdb`$?\x11\x7f\xce\xc1\x1f\x88\xcd"\x82c\x13Bj\x07\xc3\xe3\x9ds\xdd d\xe1\n'
This does look like you have a corruppted gcloud config file /home/krish/.config/gcloud/properties. Check what's the context, may be it's just one line and you can fix it, or just move/remove it (you will need to set all configurations once again if you did any).