Scrapy Selenium geckodriver problem - error while trying to scrape

Scrapy Selenium geckodriver problem - error while trying to scrape - selenium

Unhandled error in Deferred: 2020-07-24 09:12:40 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last): File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 192, in crawl return self._crawl(crawler, *args, **kwargs) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 196, in _crawl d = crawler.crawl(*args, **kwargs) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator return _cancellableInlineCallbacks(gen) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
--- --- File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 87, in crawl self.engine = self._create_engine() File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 101, in _create_engine return ExecutionEngine(self, lambda _: self.stop()) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/core/engine.py", line 69, in init self.downloader = downloader_cls(crawler) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/core/downloader/init.py", line 83, in init self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/middleware.py", line 53, in from_crawler return cls.from_settings(crawler.settings, crawler) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/middleware.py", line 35, in from_settings mw = create_instance(mwcls, settings, crawler) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/utils/misc.py", line 150, in create_instance instance = objcls.from_crawler(crawler,
*args, **kwargs) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy_selenium/middlewares.py", line 67, in from_crawler middleware = cls( File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy_selenium/middlewares.py", line 43, in init for argument in driver_arguments: builtins.TypeError: 'NoneType' object is not iterable
2020-07-24 09:12:40 [twisted] CRITICAL: Traceback (most recent call last): File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 87, in crawl self.engine = self._create_engine() File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/crawler.py", line 101, in _create_engine return ExecutionEngine(self, lambda _: self.stop()) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/core/engine.py", line 69, in init self.downloader = downloader_cls(crawler) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/core/downloader/init.py", line 83, in init self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/middleware.py", line 53, in from_crawler return cls.from_settings(crawler.settings, crawler) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/middleware.py", line 35, in from_settings mw = create_instance(mwcls, settings, crawler) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy/utils/misc.py", line 150, in create_instance instance = objcls.from_crawler(crawler,
*args, **kwargs) File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy_selenium/middlewares.py", line 67, in from_crawler middleware = cls( File "/home/baku/Dev/workspace/moje-python/scrape_linkedin/venv/lib/python3.8/site-packages/scrapy_selenium/middlewares.py", line 43, in init for argument in driver_arguments: TypeError: 'NoneType' object is not iterable
my settings.py
from shutil import which
SELENIUM_DRIVER_NAME = 'firefox'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')
SELENIUM_BROWSER_EXECUTABLE_PATH = which('firefox')
...
'scrapy_selenium.SeleniumMiddleware': 800,
looks like permissions for driver are good:
:/usr/local/bin$ ll | grep gecko
-rwxrwxrwx 1 baku baku 7008696 lip 24 09:09 geckodriver*
crawler code:
class LinkedInProfileSeleniumSpider(scrapy.Spider):
name = 'lips'
allowed_domains = ['www.linkedin.com']
def start_requests(self):
yield SeleniumRequest(
url="https://www.linkedin.com/login/",
callback=self.proceed_login,
wait_until=(
EC.presence_of_element_located(
(By.CSS_SELECTOR, "#username")
)
),
script='window.scrollTo(0, document.body.scrollHeight);',
wait_time=30
)
def proceed_login(self, response):
# AFTER LOGIN
driver = response.request.meta['driver']
...
can you please help why it's failing? thanks!
( btw it works with chrome drivers, fails with gecko )
The same problem I had on mac, this one I am trying on ubuntu machine.
Not sure what can be the issue, there to debug etc.
I does not even land into self.proceed_login. Fails on first request.

Related

pyshark "TypeError: sequence item 6: expected str instance, _io.TextIOWrapper found"

I am using pyshark for live packet capture. when I pass a parameter output_file = myFilObject for saving captures to a file,
getting following error on sniffing line. If output_file parameter is removed, this works absolutely fine. Please suggest.
MySampleCode:
import pyshark
def capturePacket():
outputF = open('capturepcap.pcap', 'w')
cap = pyshark.LiveCapture(interface='Ethernet 8', output_file=outputF)
cap.sniff(timeout=60)
outputF.close()
Error:
Traceback (most recent call last):
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "c:\Users\wxyz\.vscode\extensions\ms-python.python-2022.6.2\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
cli.main()
File "c:\Users\wxyz\.vscode\extensions\ms-python.python-2022.6.2\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 444, in main
run()
File "c:\Users\wxyz\.vscode\extensions\ms-python.python-2022.6.2\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 269, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "c:\Users\wxyz\Documents\automation\practice_set_script\paket_capture\basic_packetCapture.py", line 29, in <module>
capturePacket()
File "c:\Users\wxyz\Documents\automation\practice_set_script\paket_capture\basic_packetCapture.py", line 22, in capturePacket
cap.sniff(timeout=60)
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\site-packages\pyshark\capture\capture.py", line 137, in load_packets
self.apply_on_packets(keep_packet, timeout=timeout, packet_count=packet_count)
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\site-packages\pyshark\capture\capture.py", line 274, in apply_on_packets
return self.eventloop.run_until_complete(coro)
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 641, in run_until_complete
return future.result()
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\asyncio\tasks.py", line 445, in wait_for
return fut.result()
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\site-packages\pyshark\capture\capture.py", line 283, in packets_from_tshark
tshark_process = await self._get_tshark_process(packet_count=packet_count)
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\site-packages\pyshark\capture\live_capture.py", line 94, in _get_tshark_process
tshark = await super(LiveCapture, self)._get_tshark_process(packet_count=packet_count, stdin=read)
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\site-packages\pyshark\capture\capture.py", line 399, in _get_tshark_process
self._log.debug("Creating TShark subprocess with parameters: " + " ".join(parameters))
TypeError: sequence item 6: expected str instance, _io.TextIOWrapper found
Error on reading from the event loop self pipe
loop: <ProactorEventLoop running=True closed=False debug=False>
Traceback (most recent call last):
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\asyncio\proactor_events.py", line 779, in _loop_self_reading
f = self._proactor.recv(self._ssock, 4096)
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\asyncio\windows_events.py", line 450, in recv
self._register_with_iocp(conn)
File "C:\Users\wxyz\AppData\Local\Programs\Python\Python310\lib\asyncio\windows_events.py", line 723, in _register_with_iocp
_overlapped.CreateIoCompletionPort(obj.fileno(), self._iocp, 0, 0)
OSError: [WinError 87] The parameter is incorrect
PS C:\Users\wxyz\Documents\automation\practice_set_script\paket_capture>

The issue in your code is these lines:
outputF = open('capturepcap.pcap', 'w')
cap = pyshark.LiveCapture(interface='Ethernet 8', output_file=outputF)
The output_file parameter is a string and not a io.TextIOWrapper
:param output_file: A string of a file to write every read packet into (useful when filtering).
So this works:
import pyshark
def capturePacket():
cap = pyshark.LiveCapture(interface='en0', output_file='capturepcap.pcap')
cap.sniff(timeout=60)
capturePacket()
Here is a reference that I put together on using PyShark

Django Compressor returns error when applied

I have been implementing this active search with htmx. I changed my settings after which my app couldn't work here is part of my settings
COMPRESS_FILTERS = {'css': ['libman.admin.PostCSSFilter']}
COMPRESS_ROOT = BASE_DIR / 'static'
COMPRESS_ENABLED = True
STATICFILES_FINDERS = [
'compressor.finders.CompressorFinder',
'django.contrib.staticfiles.finders.FileSystemFinder',
'django.contrib.staticfiles.finders.AppDirectoriesFinder',]
STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'
STATIC_URL = '/static/'
STATIC_ROOT = os.path.join(BASE_DIR, 'staticfiles')
STATICFILES_DIRS = [os.path.join(BASE_DIR,'static')]
MEDIA_URL = '/media/'
MEDIA_ROOT = os.path.join(BASE_DIR, 'media')
I already have compressor added to my installed apps
This is what I have in my libman app admin.
from compressor.filters import CompilerFilter
class PostCSSFilter(CompilerFilter):
command = 'postcss'
When I applied these two to my base file, I got an error.
{% compress css %}
<link rel="stylesheet" href="{% static 'src/main.css' %}">
{% endcompress %}
<!-- new -->
{% compress js %}
<script type="text/javascript" src="{% static 'src/htmx.js' %}"></script>
{% endcompress %}
I got the error below. Would anyone help me debug it??
C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\views\debug.py:420: ExceptionCycleWarning: Cycle in the exception chain detected: exception 'Error 10061 connecting to 127.0.0.1:6379. No connection could be made because the target machine actively refused it.' encountered again.
warnings.warn(
Internal Server Error: /library/home
Traceback (most recent call last):
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django_redis\cache.py", line 31, in _decorator
return method(self, *args, **kwargs)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django_redis\cache.py", line 98, in _get
return self.client.get(key, default=default, version=version, client=client)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django_redis\client\default.py", line 260, in get
raise ConnectionInterrupted(connection=client) from e
django_redis.exceptions.ConnectionInterrupted: Redis ConnectionError: Error 10061 connecting to 127.0.0.1:6379. No connection could be made because the target machine actively refused it.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\core\handlers\exception.py", line 47, in inner
response = get_response(request)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\core\handlers\base.py", line 181, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\contrib\auth\decorators.py", line 21, in _wrapped_view
return view_func(request, *args, **kwargs)
File "D:\Python\Django\Completed Projects\lib_system\Library-System\libman\views.py", line 273, in index
return render(request, 'libman/home.html',{'books':books,'students':students,
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\shortcuts.py", line 19, in render
content = loader.render_to_string(template_name, context, request, using=using)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\template\loader.py", line 62, in render_to_string
return template.render(context, request)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\template\backends\django.py", line 61, in render
return self.template.render(context)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\template\base.py", line 170, in render
return self._render(context)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\template\base.py", line 162, in _render
return self.nodelist.render(context)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\template\base.py", line 938, in render
bit = node.render_annotated(context)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\template\base.py", line 905, in render_annotated
return self.render(context)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\template\loader_tags.py", line 150, in render
return compiled_parent._render(context)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\template\base.py", line 162, in _render
return self.nodelist.render(context)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\template\base.py", line 938, in render
bit = node.render_annotated(context)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\template\base.py", line 905, in render_annotated
return self.render(context)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\compressor\templatetags\compress.py", line 143, in render
return self.render_compressed(context, self.kind, self.mode, forced=forced)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\compressor\templatetags\compress.py", line 111, in render_compressed
cache_key, cache_content = self.render_cached(compressor, kind, mode)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\compressor\templatetags\compress.py", line 89, in render_cached
cache_key = get_templatetag_cachekey(compressor, mode, kind)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\compressor\cache.py", line 104, in get_templatetag_cachekey
"templatetag.%s.%s.%s" % (compressor.cachekey, mode, kind))
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\utils\functional.py", line 48, in __get__
res = instance.__dict__[self.name] = self.func(instance)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\compressor\base.py", line 200, in cachekey
[self.content] + self.mtimes).encode(self.charset), 12)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django\utils\functional.py", line 48, in __get__
res = instance.__dict__[self.name] = self.func(instance)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\compressor\base.py", line 193, in mtimes
return [str(get_mtime(value))
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\compressor\base.py", line 193, in <listcomp>
return [str(get_mtime(value))
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\compressor\cache.py", line 110, in get_mtime
mtime = cache.get(key)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django_redis\cache.py", line 91, in get
value = self._get(key, default, version, client)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django_redis\cache.py", line 38, in _decorator
raise e.__cause__
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\django_redis\client\default.py", line 258, in get
value = client.get(key)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\redis\client.py", line 1606, in get
return self.execute_command('GET', name)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\redis\client.py", line 898, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\redis\connection.py", line 1192, in get_connection
connection.connect()
File "C:\Users\Ptar\AppData\Local\Programs\Python\Python39\lib\site-packages\redis\connection.py", line 563, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 10061 connecting to 127.0.0.1:6379. No connection could be made because the target machine actively refused it.

How to store crawled data from Scrapy to FTP as csv?

My scrapy settings.py
from datetime import datetime
file_name = datetime.today().strftime('%Y-%m-%d_%H%M_')
save_name = file_name + 'Mobile_Nshopping'
FEED_URI = 'ftp://myusername:mypassword#ftp.mymail.com/uploads/%(save_name)s.csv'
when I'm running my spider scrapy crawl my_project_name getting error...
Can I have to create a pipeline?
\scrapy\extensions\feedexport.py:247: ScrapyDeprecationWarning: The `FEED_URI` and `FEED_FORMAT` settings have been deprecated in favor of the `FEEDS` setting. Please see the `FEEDS` setting docs for more details
exporter = cls(crawler)
Traceback (most recent call last):
File "c:\users\viren\appdata\local\programs\python\python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\viren\appdata\local\programs\python\python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\viren\AppData\Local\Programs\Python\Python38\Scripts\scrapy.exe\__main__.py", line 7, in <module>
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 145, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 100, in _run_print_help
func(*a, **kw)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\cmdline.py", line 153, in _run_command
cmd.run(args, opts)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\commands\crawl.py", line 22, in run
crawl_defer = self.crawler_process.crawl(spname, **opts.spargs)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 191, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 224, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 229, in _create_crawler
return Crawler(spidercls, self.settings)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\crawler.py", line 72, in __init__
self.extensions = ExtensionManager.from_crawler(self)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\middleware.py", line 35, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\utils\misc.py", line 167, in create_instance
instance = objcls.from_crawler(crawler, *args, **kwargs)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 247, in from_crawler
exporter = cls(crawler)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 282, in __init__
if not self._storage_supported(uri, feed_options):
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 427, in _storage_supported
self._get_storage(uri, feed_options)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 458, in _get_storage
instance = build_instance(feedcls.from_crawler, crawler)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 455, in build_instance
return build_storage(builder, uri, feed_options=feed_options, preargs=preargs)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 46, in build_storage
return builder(*preargs, uri, *args, **kwargs)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 201, in from_crawler
return build_storage(
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 46, in build_storage
return builder(*preargs, uri, *args, **kwargs)
File "c:\users\viren\appdata\local\programs\python\python38\lib\site-packages\scrapy\extensions\feedexport.py", line 192, in __init__
self.port = int(u.port or '21')
File "c:\users\viren\appdata\local\programs\python\python38\lib\urllib\parse.py", line 174, in port
raise ValueError(message) from None
ValueError: Port could not be cast to integer value as 'Edh=)9sd'
I don't know how to store CSV into FTP.
error is coming because my password is int?
Is there anything I forget to write?

Can I have to create a pipeline?
Yes, you probably should create a pipeline. As shown in the Scrapy Architecture Diagram, the basic concept is this: requests are sent, responses come back and processed by the spider, and finally, the pipeline does something with the items returned by the spider. In your case, you could create a pipeline that saves the data in a CSV file and uploads it to an ftp server. See Scrapy's Item Pipeline documentation for more information.
I don't know how to store CSV into FTP. error is coming because my password is int? Is there anything I forget to write?
I believe this is due to the deprecation error below (and shown at the top of the errors you provided):
ScrapyDeprecationWarning: The FEED_URI and FEED_FORMAT settings have been deprecated in favor of the FEEDS setting. Please see the FEEDS setting docs for more details.
Try replacing FEED_URI with FEEDS; see the Scrapy documentation on FEEDS.

You need to specify the port as well.
You can specify this in settings.
See also class definition from scrapy docs
class FTPFilesStore:
FTP_USERNAME = None
FTP_PASSWORD = None
USE_ACTIVE_MODE = None
def __init__(self, uri):
if not uri.startswith("ftp://"):
raise ValueError(f"Incorrect URI scheme in {uri}, expected 'ftp'")
u = urlparse(uri)
self.port = u.port
self.host = u.hostname
self.port = int(u.port or 21)
self.username = u.username or self.FTP_USERNAME
self.password = u.password or self.FTP_PASSWORD
self.basedir = u.path.rstrip('/')

IndentationError: expected an indented block, Scrapy

career#careercrawler:~/stack/stack$ scrapy crawl stack
Traceback (most recent call last): File
"/home/career/.local/bin/scrapy", line 11, in
sys.exit(execute())
File
"/home/career/.local/lib/python2.7/site-packages/scrapy/cmdline.py",
line 141, in execute
cmd.crawler_process = CrawlerProcess(settings)
File
"/home/career/.local/lib/python2.7/site-packages/scrapy/crawler.py",
line 238, in init
super(CrawlerProcess, self).init(settings)
File
"/home/career/.local/lib/python2.7/site-packages/scrapy/crawler.py",
line 129, in init
self.spider_loader = _get_spider_loader(settings)
File
"/home/career/.local/lib/python2.7/site-packages/scrapy/crawler.py",
line 325, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File
"/home/career/.local/lib/python2.7/site-packages/scrapy/spiderloader.py",
line 33, in from_settings
return cls(settings)
File
"/home/career/.local/lib/python2.7/site-packages/scrapy/spiderloader.py",
line 20, in init
self._load_all_spiders()
File
"/home/career/.local/lib/python2.7/site-packages/scrapy/spiderloader.py",
line 28, in _load_all_spiders
for module in walk_modules(name):
File
"/home/career/.local/lib/python2.7/site-packages/scrapy/utils/misc.py",
line 71, in walk_modules
submod = import_module(fullpath)
File "/usr/lib/python2.7/importlib/init.py", line 37, in
import_module
import(name)
File "/home/career/stack/stack/spiders/stack_spider.py", line 4, in
from stack.items import StackItem
File "/home/career/stack/stack/items.py", line 13
title = scrapy.Field()
^
IndentationError: expected an indented block
This is my error, I don't know what is happening there. Someone help me, please.

This error is because of intention,
As mentioned in the traceback:
/home/career/stack/stack/items.py", line 13 title = scrapy.Field()
go to ~/stack/stack/items.py and check indentation at line 13.

Scrapy calling spider other than the one specified on the command line

(P6Svenv)malikarumi#Tetuoan2:~/Projects/P6/P6Svenv/test2/test2/spiders$ scrapy crawl zomd
Traceback (most recent call last):
File "/usr/bin/scrapy", line 9, in <module>
load_entry_point('Scrapy==1.0.3.post6-g2d688cd', 'console_scripts', 'scrapy')()
File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 142, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "/usr/lib/pymodules/python2.7/scrapy/crawler.py", line 209, in __init__
super(CrawlerProcess, self).__init__(settings)
File "/usr/lib/pymodules/python2.7/scrapy/crawler.py", line 115, in __init__
self.spider_loader = _get_spider_loader(settings)
File "/usr/lib/pymodules/python2.7/scrapy/crawler.py", line 296, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/usr/lib/pymodules/python2.7/scrapy/spiderloader.py", line 30, in from_settings
return cls(settings)
File "/usr/lib/pymodules/python2.7/scrapy/spiderloader.py", line 21, in __init__
for module in walk_modules(name):
File "/usr/lib/pymodules/python2.7/scrapy/utils/misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/home/malikarumi/Projects/P6/P6Svenv/test2/test2/spiders/t350_crawl.py", line 36
def parse_item(self, response):
^
IndentationError: unindent does not match any outer indentation level
Do you see it? Scrapy isn't even calling the spider I specified on the command line!
I see that super in the traceback, but all my t350's are derived from CrawlSpider. zomd is subclassed from scrapy.Spider. Why is this happening and what do I do about it?

Spider's name doesn't equal to the file name. It is defined within the spider file by the second line below:
class CAPjobSpider(Spider):
name = "spider_name"
The above spider's name is "spider_name", even if the file may be "New_York.py".

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Scrapy Selenium geckodriver problem - error while trying to scrape - selenium

Related

pyshark "TypeError: sequence item 6: expected str instance, _io.TextIOWrapper found"

Django Compressor returns error when applied

How to store crawled data from Scrapy to FTP as csv?

IndentationError: expected an indented block, Scrapy

Scrapy calling spider other than the one specified on the command line

Categories

Resources