Possible bug in scrapy 2.3.0 Invalid syntax async=False - scrapy

I keep getting syntax error when I'm trying to run scrapy in AWS ubuntu 18.04 instance:
scrapy crawl pcz -o px.csv
here's the log
ubuntu#ip-172-31-60-245:~/free_proxy/free_proxy$ scrapy crawl pcz -o px.csv
2020-08-27 14:09:37 [scrapy.utils.log] INFO: Scrapy 2.3.0 started (bot: free_proxy)
2020-08-27 14:09:37 [scrapy.utils.log] INFO: Versions: lxml 4.5.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 17.9.0, Python 3.8.5 (default, Jul 20 2020, 19:48:14) - [GCC 7.5.0], pyOpenSSL 17.5.0 (OpenSSL 1.1.1 11 Sep 2018), cryptography 2.1.4, Platform Linux-5.3.0-1033-aws-x86_64-with-glibc2.27
2020-08-27 14:09:37 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2020-08-27 14:09:37 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'free_proxy',
'CONCURRENT_REQUESTS_PER_DOMAIN': 2,
'DOWNLOAD_TIMEOUT': 10,
'NEWSPIDER_MODULE': 'free_proxy.spiders',
'RETRY_HTTP_CODES': [500, 503, 504, 400, 403, 404, 408, 401],
'RETRY_TIMES': 10,
'SPIDER_MODULES': ['free_proxy.spiders']}
2020-08-27 14:09:37 [scrapy.middleware] WARNING: Disabled TelnetConsole: TELNETCONSOLE_ENABLED setting is True but required twisted modules failed to import:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.8/site-packages/scrapy/extensions/telnet.py", line 15, in <module>
from twisted.conch import manhole, telnet
File "/usr/lib/python3/dist-packages/twisted/conch/manhole.py", line 154
def write(self, data, async=False):
^
SyntaxError: invalid syntax
2020-08-27 14:09:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
Unhandled error in Deferred:
Temporarily disabling observer LegacyLogObserverWrapper(<bound method PythonLoggingObserver.emit of <twisted.python.log.PythonLoggingObserver object at 0x7fe474901f10>>) due to exception: [Failure instance: Traceback: <class 'TypeError'>: _findCaller() takes from 1 to 2 positional arguments but 3 were given
/home/ubuntu/.local/lib/python3.8/site-packages/scrapy/cmdline.py:153:_run_command
/usr/lib/python3/dist-packages/twisted/internet/defer.py:954:__del__
/usr/lib/python3/dist-packages/twisted/logger/_logger.py:261:critical
/usr/lib/python3/dist-packages/twisted/logger/_logger.py:135:emit
--- <exception caught here> ---
/usr/lib/python3/dist-packages/twisted/logger/_observer.py:131:__call__
/usr/lib/python3/dist-packages/twisted/logger/_legacy.py:93:__call__
/usr/lib/python3/dist-packages/twisted/python/log.py:595:emit
/usr/lib/python3/dist-packages/twisted/logger/_legacy.py:154:publishToNewObserver
/usr/lib/python3/dist-packages/twisted/logger/_stdlib.py:115:__call__
/usr/lib/python3.8/logging/__init__.py:1500:log
/usr/lib/python3.8/logging/__init__.py:1565:_log
]
Temporarily disabling observer LegacyLogObserverWrapper(<bound method PythonLoggingObserver.emit of <twisted.python.log.PythonLoggingObserver object at 0x7fe474901f10>>) due to exception: [Failure instance: Traceback: <class 'TypeError'>: _findCaller() takes from 1 to 2 positional arguments but 3 were given
/home/ubuntu/.local/lib/python3.8/site-packages/scrapy/cmdline.py:153:_run_command
/usr/lib/python3/dist-packages/twisted/internet/defer.py:963:__del__
/usr/lib/python3/dist-packages/twisted/logger/_logger.py:181:failure
/usr/lib/python3/dist-packages/twisted/logger/_logger.py:135:emit
--- <exception caught here> ---
/usr/lib/python3/dist-packages/twisted/logger/_observer.py:131:__call__
/usr/lib/python3/dist-packages/twisted/logger/_legacy.py:93:__call__
/usr/lib/python3/dist-packages/twisted/python/log.py:595:emit
/usr/lib/python3/dist-packages/twisted/logger/_legacy.py:154:publishToNewObserver
/usr/lib/python3/dist-packages/twisted/logger/_stdlib.py:115:__call__
/usr/lib/python3.8/logging/__init__.py:1500:log
/usr/lib/python3.8/logging/__init__.py:1565:_log
]

Related

Scrapy now timesout on a website that used to work well

I'm using scrapy to scrape a website: https://www.sephora.fr/marques/de-a-a-z/.
It used to work well a year ago but it now shows an error:
User timeout caused connection failure: Getting https://www.sephora.fr/robots.txt took longer than 180.0 seconds
It retries for 5 times and then fails completely. I can access the url on chrome but it's not working on scrapy. I've tried using custom user agents and emulating header requests but It still doesn't work.
Below is my code:
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import json
import requests
from urllib.parse import parse_qsl, urlencode
import re
from ..pipelines import Pipeline
class SephoraSpider(scrapy.Spider):
"""
The SephoraSpider object gets you the data on all products hosted on sephora.fr
"""
name = 'sephora'
allowed_domains = ['sephora.fr']
# the url of all the brands
start_urls = ['https://www.sephora.fr/marques-de-a-a-z/']
custom_settings = {
'DOWNLOAD_TIMEOUT': '180',
}
def __init__(self):
self.base_url = 'https://www.sephora.fr'
def parse(self, response):
"""
Parses the response of a webpage we are given when we start crawling the first webpage.
This method is automatically launched by Scrapy when crawling.
:param response: the response from the webpage triggered by a get query while crawling.
A Response object represents an HTTP response, which is usually downloaded (by the Downloader)
and fed to the Spiders for processing.
:return: the results of parse_brand().
:rtype: scrapy.Request()
"""
# if we are given an url of the brand we are interested in (burberry) we send an http request to them
if response.url == "https://www.sephora.fr/marques/de-a-a-z/burberry-burb/":
yield scrapy.Request(url=response.url, callback=self.parse_brand)
# otherwise it means we are visiting another html object (another brand, a higher level url ...)
# we call the url back with another method
else:
self.log("parse: I just visited: " + response.url)
urls = response.css('a.sub-category-link::attr(href)').extract()
if urls:
for url in urls:
yield scrapy.Request(url=self.base_url + url, callback=self.parse_brand)
...
Scrapy log:
(scr_env) antoine.cop1#protonmail.com:~/environment/bass2/scraper (master) $ scrapy crawl sephora
2022-03-13 16:39:19 [scrapy.utils.log] INFO: Scrapy 2.6.1 started (bot: nosetime_scraper)
2022-03-13 16:39:19 [scrapy.utils.log] INFO: Versions: lxml 4.8.0.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 22.2.0, Python 3.6.9 (default, Dec 8 2021, 21:08:43) - [GCC 8.4.0], pyOpenSSL 22.0.0 (OpenSSL 1.1.1m 14 Dec 2021), cryptography 36.0.1, Platform Linux-5.4.0-1068-aws-x86_64-with-Ubuntu-18.04-bionic
2022-03-13 16:39:19 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'nosetime_scraper',
'CONCURRENT_REQUESTS': 1,
'COOKIES_ENABLED': False,
'DOWNLOAD_DELAY': 7,
'DOWNLOAD_TIMEOUT': '180',
'EDITOR': '',
'NEWSPIDER_MODULE': 'nosetime_scraper.spiders',
'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES': ['nosetime_scraper.spiders'],
'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) '
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 '
'Safari/537.36'}
2022-03-13 16:39:19 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2022-03-13 16:39:19 [scrapy.extensions.telnet] INFO: Telnet Password: af81c5b648cc3542
2022-03-13 16:39:19 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2022-03-13 16:39:19 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2022-03-13 16:39:19 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-03-13 16:39:19 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-03-13 16:39:19 [scrapy.core.engine] INFO: Spider opened
2022-03-13 16:39:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:39:19 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-03-13 16:40:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:41:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:42:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:42:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.sephora.fr/robots.txt> (failed 1 times): User timeout caused connection failure: Getting https://www.sephora.fr/robots.txt took longer than 180.0 seconds..
2022-03-13 16:42:19 [py.warnings] WARNING: /home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/scrapy/core/engine.py:276: ScrapyDeprecationWarning: Passing a 'spider' argument to ExecutionEngine.download is deprecated
return self.download(result, spider) if isinstance(result, Request) else result
2022-03-13 16:43:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
current.result, *args, **kwargs
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/scrapy/core/downloader/handlers/http11.py", line 360, in _cb_timeout
raise TimeoutError(f"Getting {url} took longer than {timeout} seconds.")
twisted.internet.error.TimeoutError: User timeout caused connection failure: Getting https://www.sephora.fr/robots.txt took longer than 180.0 seconds..
2022-03-13 16:49:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:50:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:51:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:51:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.sephora.fr/marques-de-a-a-z/> (failed 1 times): User timeout caused connection failure: Getting https://www.sephora.fr/marques-de-a-a-z/ took longer than 180.0 seconds..
2022-03-13 16:52:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:53:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:54:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:54:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.sephora.fr/marques-de-a-a-z/> (failed 2 times): User timeout caused connection failure: Getting https://www.sephora.fr/marques-de-a-a-z/ took longer than 180.0 seconds..
2022-03-13 16:55:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:56:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:57:19 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 16:57:19 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.sephora.fr/marques-de-a-a-z/> (failed 3 times): User timeout caused connection failure: Getting https://www.sephora.fr/marques-de-a-a-z/ took longer than 180.0 seconds..
2022-03-13 16:57:19 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.sephora.fr/marques-de-a-a-z/>
Traceback (most recent call last):
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1657, in _inlineCallbacks
cast(Failure, result).throwExceptionIntoGenerator, gen
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/twisted/internet/defer.py", line 62, in run
return f(*args, **kwargs)
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 49, in process_request
return (yield download_func(request=request, spider=spider))
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/twisted/internet/defer.py", line 858, in _runCallbacks
current.result, *args, **kwargs
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/scrapy/core/downloader/handlers/http11.py", line 360, in _cb_timeout
raise TimeoutError(f"Getting {url} took longer than {timeout} seconds.")
twisted.internet.error.TimeoutError: User timeout caused connection failure: Getting https://www.sephora.fr/marques-de-a-a-z/ took longer than 180.0 seconds..
2022-03-13 16:57:19 [scrapy.core.engine] INFO: Closing spider (finished)
2022-03-13 16:57:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 6,
'downloader/exception_type_count/twisted.internet.error.TimeoutError': 6,
'downloader/request_bytes': 1881,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 6,
'elapsed_time_seconds': 1080.231435,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 3, 13, 16, 57, 19, 904633),
'log_count/DEBUG': 5,
'log_count/ERROR': 4,
'log_count/INFO': 28,
'log_count/WARNING': 1,
'memusage/max': 72749056,
'memusage/startup': 70950912,
'retry/count': 4,
'retry/max_reached': 2,
'retry/reason_count/twisted.internet.error.TimeoutError': 4,
"robotstxt/exception_count/<class 'twisted.internet.error.TimeoutError'>": 1,
'robotstxt/request_count': 1,
'scheduler/dequeued': 3,
'scheduler/dequeued/memory': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/memory': 3,
'start_time': datetime.datetime(2022, 3, 13, 16, 39, 19, 673198)}
2022-03-13 16:57:19 [scrapy.core.engine] INFO: Spider closed (finished)
I am going to look at the request headers using fiddler and doing some tests. Maybe Scrapy is sending a Connection: close header by default due to which I'm not getting any response from the sephora site ?
Here are the logs when I chose not to respect robots.txt:
(scr_env) antoine.cop1#protonmail.com:~/environment/bass2/scraper (master) $ scrapy crawl sephora
2022-03-13 23:23:38 [scrapy.utils.log] INFO: Scrapy 2.6.1 started (bot: nosetime_scraper)
2022-03-13 23:23:38 [scrapy.utils.log] INFO: Versions: lxml 4.8.0.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 22.2.0, Python 3.6.9 (default, Dec 8 2021, 21:08:43) - [GCC 8.4.0], pyOpenSSL 22.0.0 (OpenSSL 1.1.1m 14 Dec 2021), cryptography 36.0.1, Platform Linux-5.4.0-1068-aws-x86_64-with-Ubuntu-18.04-bionic
2022-03-13 23:23:38 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'nosetime_scraper',
'CONCURRENT_REQUESTS': 1,
'COOKIES_ENABLED': False,
'DOWNLOAD_DELAY': 7,
'DOWNLOAD_TIMEOUT': '180',
'EDITOR': '',
'NEWSPIDER_MODULE': 'nosetime_scraper.spiders',
'SPIDER_MODULES': ['nosetime_scraper.spiders'],
'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) '
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 '
'Safari/537.36'}
2022-03-13 23:23:38 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2022-03-13 23:23:38 [scrapy.extensions.telnet] INFO: Telnet Password: 3f4205a34aff02c5
2022-03-13 23:23:38 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2022-03-13 23:23:38 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2022-03-13 23:23:38 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-03-13 23:23:38 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-03-13 23:23:38 [scrapy.core.engine] INFO: Spider opened
2022-03-13 23:23:38 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 23:23:38 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-03-13 23:24:38 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 23:25:38 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 23:26:38 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 23:26:38 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.sephora.fr/marques-de-a-a-z/> (failed 1 times): User timeout caused connection failure: Getting https://www.sephora.fr/marques-de-a-a-z/ took longer than 180.0 seconds..
2022-03-13 23:27:38 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 23:28:38 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 23:29:38 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 23:29:38 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.sephora.fr/marques-de-a-a-z/> (failed 2 times): User timeout caused connection failure: Getting https://www.sephora.fr/marques-de-a-a-z/ took longer than 180.0 seconds..
2022-03-13 23:30:38 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 23:31:38 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 23:32:38 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-13 23:32:38 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://www.sephora.fr/marques-de-a-a-z/> (failed 3 times): User timeout caused connection failure: Getting https://www.sephora.fr/marques-de-a-a-z/ took longer than 180.0 seconds..
2022-03-13 23:32:38 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.sephora.fr/marques-de-a-a-z/>
Traceback (most recent call last):
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1657, in _inlineCallbacks
cast(Failure, result).throwExceptionIntoGenerator, gen
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/twisted/internet/defer.py", line 62, in run
return f(*args, **kwargs)
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/twisted/python/failure.py", line 489, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 49, in process_request
return (yield download_func(request=request, spider=spider))
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/twisted/internet/defer.py", line 858, in _runCallbacks
current.result, *args, **kwargs
File "/home/ubuntu/environment/bass2/scraper/scr_env/lib/python3.6/site-packages/scrapy/core/downloader/handlers/http11.py", line 360, in _cb_timeout
raise TimeoutError(f"Getting {url} took longer than {timeout} seconds.")
twisted.internet.error.TimeoutError: User timeout caused connection failure: Getting https://www.sephora.fr/marques-de-a-a-z/ took longer than 180.0 seconds..
2022-03-13 23:32:39 [scrapy.core.engine] INFO: Closing spider (finished)
2022-03-13 23:32:39 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 3,
'downloader/exception_type_count/twisted.internet.error.TimeoutError': 3,
'downloader/request_bytes': 951,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'elapsed_time_seconds': 540.224149,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 3, 13, 23, 32, 39, 59500),
'log_count/DEBUG': 3,
'log_count/ERROR': 2,
'log_count/INFO': 19,
'memusage/max': 72196096,
'memusage/startup': 70766592,
'retry/count': 2,
'retry/max_reached': 1,
'retry/reason_count/twisted.internet.error.TimeoutError': 2,
'scheduler/dequeued': 3,
'scheduler/dequeued/memory': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/memory': 3,
'start_time': datetime.datetime(2022, 3, 13, 23, 23, 38, 835351)}
2022-03-13 23:32:39 [scrapy.core.engine] INFO: Spider closed (finished)
And here is my environment, pip list output:
(scr_env) C:\Users\antoi\Documents\Programming\Work\scrapy-scraper>pip list
Package Version
------------------- ------------------
async-generator 1.10
attrs 21.4.0
Automat 20.2.0
beautifulsoup4 4.10.0
blis 0.7.5
bs4 0.0.1
catalogue 2.0.6
certifi 2021.10.8
cffi 1.15.0
charset-normalizer 2.0.12
click 8.0.4
colorama 0.4.4
configparser 5.2.0
constantly 15.1.0
crayons 0.4.0
cryptography 36.0.1
cssselect 1.1.0
cymem 2.0.6
DAWG-Python 0.7.2
docopt 0.6.2
en-core-web-sm 3.2.0
et-xmlfile 1.1.0
geographiclib 1.52
geopy 2.2.0
h11 0.13.0
h2 3.2.0
hpack 3.0.0
hyperframe 5.2.0
hyperlink 21.0.0
idna 3.3
incremental 21.3.0
itemadapter 0.4.0
itemloaders 1.0.4
Jinja2 3.0.3
jmespath 0.10.0
langcodes 3.3.0
libretranslatepy 2.1.1
lxml 4.8.0
MarkupSafe 2.1.0
murmurhash 1.0.6
numpy 1.22.2
openpyxl 3.0.9
outcome 1.1.0
packaging 21.3
pandas 1.4.1
parsel 1.6.0
pathy 0.6.1
pip 22.0.4
preshed 3.0.6
priority 1.3.0
Protego 0.2.1
pyaes 1.6.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.21
pydantic 1.8.2
PyDispatcher 2.0.5
pymongo 3.11.0
pymorphy2 0.9.1
pymorphy2-dicts-ru 2.4.417127.4579844
pyOpenSSL 22.0.0
pyparsing 3.0.7
PySocks 1.7.1
python-dateutil 2.8.2
pytz 2021.3
queuelib 1.6.2
requests 2.27.1
rsa 4.8
ru-core-news-md 3.2.0
Scrapy 2.5.1
selenium 4.1.2
service-identity 21.1.0
setuptools 56.0.0
six 1.16.0
smart-open 5.2.1
sniffio 1.2.0
sortedcontainers 2.4.0
soupsieve 2.3.1
spacy 3.2.2
spacy-legacy 3.0.9
spacy-loggers 1.0.1
srsly 2.4.2
Telethon 1.24.0
thinc 8.0.13
tqdm 4.62.3
translate 3.6.1
trio 0.20.0
trio-websocket 0.9.2
Twisted 22.1.0
twisted-iocpsupport 1.0.2
typer 0.4.0
typing_extensions 4.1.1
urllib3 1.26.8
w3lib 1.22.0
wasabi 0.9.0
webdriver-manager 3.5.3
wsproto 1.0.0
zope.interface 5.4.0
With scrapy runspider sephora.py I remark it doesn't accept my relative import from ..pipelines import Pipeline:
(scr_env) C:\Users\antoi\Documents\Programming\Work\scrapy-scraper\nosetime_scraper\spiders>scrapy runspider sephora.py
2022-03-14 01:00:27 [scrapy.utils.log] INFO: Scrapy 2.5.1 started (bot: nosetime_scraper)
2022-03-14 01:00:27 [scrapy.utils.log] INFO: Versions: lxml 4.8.0.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 22.1.0, Python 3.
9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)], pyOpenSSL 22.0.0 (OpenSSL 1.1.1m 14 Dec 2021), cryptography 36.0.1, Platform
Windows-10-10.0.19043-SP0
2022-03-14 01:00:27 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
Usage
=====
scrapy runspider [options] <spider_file>
runspider: error: Unable to load 'sephora.py': attempted relative import with no known parent package
Here are my settings.py:
# Scrapy settings for nosetime_scraper project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://docs.scrapy.org/en/latest/topics/settings.html
# https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html
BOT_NAME = 'nosetime_scraper'
SPIDER_MODULES = ['nosetime_scraper.spiders']
NEWSPIDER_MODULE = 'nosetime_scraper.spiders'
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
# Configure maximum concurrent requests performed by Scrapy (default: 16)
CONCURRENT_REQUESTS = 1
# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
DOWNLOAD_DELAY = 7
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16
# Disable cookies (enabled by default)
COOKIES_ENABLED = True
# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False
# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
# 'Accept-Language': 'en',
#}
# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# 'nosetime_scraper.middlewares.NosetimeScraperSpiderMiddleware': 543,
#}
# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
# 'nosetime_scraper.middlewares.NosetimeScraperDownloaderMiddleware': 543,
#}
# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#}
# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
#ITEM_PIPELINES = {
# 'nosetime_scraper.pipelines.NosetimeScraperPipeline': 300,
#}
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False
# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

scrapy list command can see spider but runspider cant find it?

If i type in
scrapy list
I get the correct answer which is a single spider i made called
wiki
but (staying in the same dir in cmd line) when i do;
scrapy runspider wiki
i get;
2020-02-19 18:58:05 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: wikipedia)
2020-02-19 18:58:05 [scrapy.utils.log] INFO: Versions: lxml 4.4.2.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 22:39:24) [MSC v.1916 32 bit (Intel)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1d 10 Sep 2019), cryptography 2.8, Platform Windows-10-10.0.18362-SP0
Usage
=====
scrapy runspider [options] <spider_file>
runspider: error: File not found: wiki
What's going on?
You are looking for scrapy crawl. scrapy runspider requires the file of your spider as argument.
scrapy runspider <spider_file.py>
scrapy crawl <spider>
https://docs.scrapy.org/en/latest/topics/commands.html

docker api client/server version mismatch?

docker version showing correct api version for both client and server when i run inside python it is throwing error as below.
# docker version
Client:
Version: 1.12.6
API version: 1.24
Package version: docker-1.12.6-48.git0fdc778.el7.x86_64
Go version: go1.8.3
Git commit: 0fdc778/1.12.6
Built: Thu Jul 20 00:06:39 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6
API version: 1.24
Package version: docker-1.12.6-48.git0fdc778.el7.x86_64
Go version: go1.8.3
Git commit: 0fdc778/1.12.6
Built: Thu Jul 20 00:06:39 2017
OS/Arch: linux/amd64
#
But when i run with python it is throwing error as below.
# python
Python 2.7.5 (default, Aug 29 2016, 10:12:21)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import docker
>>> client = docker.APIClient(base_url='unix://var/run/docker.sock')
>>> print client.version()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/docker/api/daemon.py", line 177, in version
return self._result(self._get(url), json=True)
File "/usr/lib/python2.7/site-packages/docker/api/client.py", line 226, in _result
self._raise_for_status(response)
File "/usr/lib/python2.7/site-packages/docker/api/client.py", line 222, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/usr/lib/python2.7/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 400 Client Error: Bad Request ("client is newer than server (client API version: 1.30, server API version: 1.24)")
>>>
It says API of your docker python package doesn’t match the docker engine server API. You should install a docker python package compatible with 1.24 or update your docker engine API to 1.30.
Additionally, you can try assigning new value to your docker client as follows:
client = docker.DockerClient(base_url='unix://var/run/docker.sock', version="1.24")
OR
client = docker.APIClient(base_url='unix://var/run/docker.sock', version="1.24")

pyinstaller and matplotlib - error while executring .exe

I'm building a simple program in Python that creates a bar chart. Since further I want to build a more complicated version, that will be used in other PC (where there is no python installed), I need to create a .exe. In order to create the executable, I'm using pyinstaller.
Pyinstaller seems to work without any problem and creates the executable. But, when I run it, I got the following error:
Traceback (most recent call last):
File "PyInstaller\loader\rthooks\pyi_rth_pkgres.py", line 11, in <module>
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\loader\pyimod03_importers.py", line 389, in load_module exec(bytecode, module.__dict__)
File "pkg_resources\__init__.py", line 68, in <module>
File "pkg_resources\extern\__init__.py", line 60, in load_module
ImportError: The 'packaging' package is required; normally this is bundled with this package so if you get this warning, consult the packager of your distribution.Failed to execute script pyi_rth_pkgres
Do you guys have any idea how to solve it?
Here the sourcecode:
import matplotlib.pyplot as plt
A = [5., 30., 45., 22.]
B = [5., 25., 50., 20.]
X = range(4)
plt.bar(X, A, color = 'b')
plt.bar(X,B, color = 'r', bottom = A)
plt.show()
and here the output log of pyinstaller:
223 INFO: PyInstaller: 3.2
223 INFO: Python: 3.5.2
223 INFO: Platform: Windows-7-6.1.7601-SP1
226 INFO: wrote C:\Users\310251823\PycharmProjects\Prove1\Prova.spec
243 INFO: UPX is not available.
251 INFO: Extending PYTHONPATH with paths
['C:\\Users\\310251823\\PycharmProjects\\Prove1',
'C:\\Users\\310251823\\PycharmProjects\\Prove1']
251 INFO: checking Analysis
252 INFO: Building Analysis because out00-Analysis.toc is non existent
252 INFO: Initializing module dependency graph...
255 INFO: Initializing module graph hooks...
256 INFO: Analyzing base_library.zip ...
4572 INFO: running Analysis out00-Analysis.toc
4646 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-stdio-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
4652 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-heap-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
4657 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-locale-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
4689 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-math-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
4703 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-runtime-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
5129 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-time-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
5134 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-conio-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
5160 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-string-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
5169 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-process-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
5186 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-convert-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
5190 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-environment-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
5203 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-filesystem-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
5207 INFO: Caching module hooks...
5213 INFO: Analyzing C:\Users\310251823\PycharmProjects\Prove1\Prova.py
5471 INFO: Processing pre-find module path hook distutils
10751 INFO: Processing pre-find module path hook site
10752 INFO: site: retargeting to fake-dir 'C:\\Users\\310251823\\AppData\\Local\\Continuum\\Anaconda3\\lib\\site-packages\\pyinstaller-3.2-py3.5.egg\\PyInstaller\\fake-modules'
10772 INFO: Processing pre-safe import module hook win32com
13072 INFO: Processing pre-safe import module hook six.moves
28631 INFO: Loading module hooks...
28633 INFO: Loading module hook "hook-jinja2.py"...
28644 INFO: Loading module hook "hook-xml.dom.domreg.py"...
28647 INFO: Loading module hook "hook-_tkinter.py"...
28922 INFO: checking Tree
28922 INFO: Building Tree because out00-Tree.toc is non existent
28922 INFO: Building Tree out00-Tree.toc
28991 INFO: checking Tree
28991 INFO: Building Tree because out01-Tree.toc is non existent
28992 INFO: Building Tree out01-Tree.toc
29005 INFO: Loading module hook "hook-pythoncom.py"...
29569 INFO: Loading module hook "hook-sqlite3.py"...
29573 INFO: Loading module hook "hook-PyQt4.QtGui.py"...
30078 INFO: Loading module hook "hook-jsonschema.py"...
30082 INFO: Loading module hook "hook-win32com.py"...
30194 INFO: Loading module hook "hook-pydoc.py"...
30196 INFO: Loading module hook "hook-encodings.py"...
30205 INFO: Loading module hook "hook-setuptools.py"...
30207 INFO: Loading module hook "hook-requests.py"...
30212 INFO: Loading module hook "hook-matplotlib.py"...
30644 INFO: Loading module hook "hook-lib2to3.py"...
30647 INFO: Loading module hook "hook-matplotlib.backends.py"...
31457 INFO: Matplotlib backend "GTK": ignored
Gtk* backend requires pygtk to be installed.
32068 INFO: Matplotlib backend "GTKAgg": ignored
Gtk* backend requires pygtk to be installed.
32468 INFO: Matplotlib backend "GTKCairo": ignored
No module named 'gtk'
33046 INFO: Matplotlib backend "MacOSX": ignored
cannot import name '_macosx'
33684 INFO: Matplotlib backend "Qt4Agg": added
34328 INFO: Matplotlib backend "Qt5Agg": added
34923 INFO: Matplotlib backend "TkAgg": added
35499 INFO: Matplotlib backend "WX": ignored
Matplotlib backend_wx and backend_wxagg require wxPython >=2.8.12
36105 INFO: Matplotlib backend "WXAgg": ignored
Matplotlib backend_wx and backend_wxagg require wxPython >=2.8.12
36529 INFO: Matplotlib backend "GTK3Cairo": ignored
Gtk3 backend requires pygobject to be installed.
37143 INFO: Matplotlib backend "GTK3Agg": ignored
Gtk3 backend requires pygobject to be installed.
38607 INFO: Matplotlib backend "WebAgg": added
39577 INFO: Matplotlib backend "nbAgg": added
40146 INFO: Matplotlib backend "agg": added
40539 INFO: Matplotlib backend "cairo": ignored
Cairo backend requires that cairocffi or pycairo is installed.
40930 INFO: Matplotlib backend "emf": ignored
No module named 'matplotlib.backends.backend_emf'
41330 INFO: Matplotlib backend "gdk": ignored
No module named 'gobject'
41939 INFO: Matplotlib backend "pdf": added
42663 INFO: Matplotlib backend "pgf": added
43259 INFO: Matplotlib backend "ps": added
44023 INFO: Matplotlib backend "svg": added
44768 INFO: Matplotlib backend "template": added
45009 INFO: Loading module hook "hook-pytz.py"...
45088 INFO: Loading module hook "hook-IPython.py"...
45101 INFO: Excluding import 'PySide'
45113 WARNING: Removing import IPython.external.qt_loaders from module PySide.QtSvg
45113 WARNING: Removing import IPython.external.qt_loaders from module PySide
45113 WARNING: Removing import IPython.external.qt_loaders from module PySide.QtGui
45113 WARNING: Removing import IPython.external.qt_loaders from module PySide.QtCore
45114 INFO: Excluding import 'PyQt4'
45124 WARNING: Removing import IPython.external.qt_loaders from module PyQt4.QtCore
45124 WARNING: Removing import IPython.external.qt_loaders from module PyQt4.QtSvg
45124 WARNING: Removing import IPython.external.qt_loaders from module PyQt4
45141 WARNING: Removing import IPython.external.qt_loaders from module PyQt4.QtGui
45144 INFO: Excluding import 'matplotlib'
45159 WARNING: Removing import IPython.core.pylabtools from module matplotlib
45163 WARNING: Removing import IPython.core.pylabtools from module matplotlib._pylab_helpers
45163 WARNING: Removing import IPython.core.pylabtools from module matplotlib.rcParams
45164 WARNING: Removing import IPython.core.pylabtools from module matplotlib.pylab
45164 WARNING: Removing import IPython.core.pylabtools from module matplotlib.figure
45164 WARNING: Removing import IPython.core.pylabtools from module matplotlib.pyplot
45164 WARNING: Removing import IPython.core.pylabtools from module matplotlib.figure.Figure
45166 INFO: Excluding import 'gtk'
45175 WARNING: Removing import IPython.lib.inputhook from module gtk
45175 WARNING: Removing import IPython.lib.inputhookgtk from module gtk
45177 INFO: Excluding import 'tkinter'
45186 WARNING: Removing import IPython.lib.inputhook from module tkinter
45189 WARNING: Removing import IPython.lib.clipboard from module tkinter
45189 INFO: Excluding import 'PyQt5'
45200 WARNING: Removing import IPython.external.qt_loaders from module PyQt5.QtGui
45202 WARNING: Removing import IPython.external.qt_loaders from module PyQt5.QtCore
45202 WARNING: Removing import IPython.external.qt_loaders from module PyQt5
45202 WARNING: Removing import IPython.external.qt_loaders from module PyQt5.QtSvg
45202 WARNING: Removing import IPython.external.qt_loaders from module PyQt5.QtWidgets
45203 INFO: Loading module hook "hook-pycparser.py"...
45358 INFO: Loading module hook "hook-pygments.py"...
46509 INFO: Loading module hook "hook-PIL.py"...
46514 INFO: Excluding import 'PyQt5'
46519 WARNING: Removing import PIL.ImageQt from module PyQt5.QPixmap
46519 WARNING: Removing import PIL.ImageQt from module PyQt5.qRgba
46519 WARNING: Removing import PIL.ImageQt from module PyQt5.QImage
46521 WARNING: Removing import PIL.ImageQt from module PyQt5
46521 INFO: Excluding import 'PySide'
46526 WARNING: Removing import PIL.ImageQt from module PySide.QImage
46527 WARNING: Removing import PIL.ImageQt from module PySide
46527 WARNING: Removing import PIL.ImageQt from module PySide.QPixmap
46527 WARNING: Removing import PIL.ImageQt from module PySide.qRgba
46528 INFO: Excluding import 'tkinter'
46532 INFO: Import to be excluded not found: 'FixTk'
46532 INFO: Excluding import 'PyQt4'
46539 WARNING: Removing import PIL.ImageQt from module PyQt4.QtCore
46539 WARNING: Removing import PIL.ImageQt from module PyQt4.QtCore.QBuffer
46539 WARNING: Removing import PIL.ImageQt from module PyQt4.QtGui.qRgba
46539 WARNING: Removing import PIL.ImageQt from module PyQt4.QtCore.QIODevice
46539 WARNING: Removing import PIL.ImageQt from module PyQt4.QtGui.QImage
46541 WARNING: Removing import PIL.ImageQt from module PyQt4.QtGui.QPixmap
46542 WARNING: Removing import PIL.ImageQt from module PyQt4.QtGui
46544 INFO: Loading module hook "hook-PyQt4.QtCore.py"...
46643 INFO: Loading module hook "hook-zmq.py"...
47427 INFO: Excluding import 'zmq.libzmq'
47432 WARNING: Removing import zmq from module zmq.libzmq
47433 INFO: Loading module hook "hook-PyQt4.QtSvg.py"...
47435 INFO: Loading module hook "hook-xml.etree.cElementTree.py"...
47436 INFO: Loading module hook "hook-distutils.py"...
47437 INFO: Loading module hook "hook-xml.py"...
47439 INFO: Loading module hook "hook-pkg_resources.py"...
47440 INFO: Loading module hook "hook-sysconfig.py"...
47442 INFO: Loading module hook "hook-PyQt4.py"...
47443 INFO: Loading module hook "hook-PIL.Image.py"...
47621 INFO: Loading module hook "hook-cryptography.py"...
47627 INFO: Loading module hook "hook-pywintypes.py"...
48177 INFO: Loading module hook "hook-shelve.py"...
48187 INFO: Loading module hook "hook-gevent.monkey.py"...
48192 INFO: Loading module hook "hook-PIL.SpiderImagePlugin.py"...
48198 INFO: Excluding import 'tkinter'
48202 INFO: Import to be excluded not found: 'FixTk'
48350 INFO: checking Tree
48351 INFO: Building Tree because out02-Tree.toc is non existent
48351 INFO: Building Tree out02-Tree.toc
48581 INFO: checking Tree
48582 INFO: Building Tree because out03-Tree.toc is non existent
48582 INFO: Building Tree out03-Tree.toc
48585 INFO: Looking for ctypes DLLs
48657 INFO: Analyzing run-time hooks ...
48677 INFO: Including run-time hook 'pyi_rth_pkgres.py'
48680 INFO: Including run-time hook 'pyi_rth_traitlets.py'
48682 INFO: Including run-time hook 'pyi_rth_win32comgenpy.py'
48687 INFO: Including run-time hook 'pyi_rth__tkinter.py'
48690 INFO: Including run-time hook 'pyi_rth_qt4plugins.py'
48693 INFO: Including run-time hook 'pyi_rth_mplconfig.py'
48695 INFO: Including run-time hook 'pyi_rth_mpldata.py'
48742 INFO: Looking for dynamic libraries
65958 WARNING: lib not found: libzmq.cp35-win_amd64.pyd dependency of C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\backend\cython\socket.cp35-win_amd64.p
yd
66065 WARNING: lib not found: libzmq.cp35-win_amd64.pyd dependency of C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\backend\cython\_device.cp35-win_amd64.
pyd
66161 WARNING: lib not found: libzmq.cp35-win_amd64.pyd dependency of C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\backend\cython\error.cp35-win_amd64.py
d
66361 WARNING: lib not found: libzmq.cp35-win_amd64.pyd dependency of C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\backend\cython\context.cp35-win_amd64.
pyd
66574 WARNING: lib not found: libzmq.cp35-win_amd64.pyd dependency of C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\backend\cython\_version.cp35-win_amd64
.pyd
66679 WARNING: lib not found: libzmq.cp35-win_amd64.pyd dependency of C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\backend\cython\utils.cp35-win_amd64.py
d
66797 WARNING: lib not found: libzmq.cp35-win_amd64.pyd dependency of C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\backend\cython\message.cp35-win_amd64.
pyd
66983 WARNING: lib not found: libzmq.cp35-win_amd64.pyd dependency of C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\zmq\backend\cython\_poll.cp35-win_amd64.py
d
75241 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-utility-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
76302 WARNING: Can not get binary dependencies for file: C:\Users\310251823\AppData\Local\Continuum\Anaconda3\api-ms-win-crt-multibyte-l1-1-0.dll
Traceback (most recent call last):
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 695, in getImports
return _getImports_pe(pth)
File "C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\depend\bindepend.py", line 122, in _getImports_pe
dll, _ = sym.forwarder.split('.')
TypeError: a bytes-like object is required, not 'str'
86495 INFO: Found C:\windows\WinSxS\Manifests\amd64_policy.9.0.microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4940_none_acd19a1fe1da248a.manifest
86496 INFO: Found C:\windows\WinSxS\Manifests\amd64_policy.9.0.microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.6161_none_acd388d7e1d8689f.manifest
86726 INFO: Searching for assembly amd64_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.30729.6161_none ...
86726 INFO: Found manifest C:\windows\WinSxS\Manifests\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.6161_none_08e61857a83bc251.manifest
86728 INFO: Searching for file msvcr90.dll
86728 INFO: Found file C:\windows\WinSxS\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.6161_none_08e61857a83bc251\msvcr90.dll
86728 INFO: Searching for file msvcp90.dll
86729 INFO: Found file C:\windows\WinSxS\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.6161_none_08e61857a83bc251\msvcp90.dll
86729 INFO: Searching for file msvcm90.dll
86729 INFO: Found file C:\windows\WinSxS\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.6161_none_08e61857a83bc251\msvcm90.dll
86947 INFO: Found C:\windows\WinSxS\Manifests\amd64_policy.9.0.microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4940_none_acd19a1fe1da248a.manifest
86948 INFO: Found C:\windows\WinSxS\Manifests\amd64_policy.9.0.microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.6161_none_acd388d7e1d8689f.manifest
86949 INFO: Adding redirect Microsoft.VC90.CRT version (9, 0, 21022, 8) -> (9, 0, 30729, 6161)
92515 WARNING: Attempted to add Python module twice with different upper/lowercases: PyQt4.QtCore
92515 WARNING: Attempted to add Python module twice with different upper/lowercases: PyQt4.QtSvg
92515 WARNING: Attempted to add Python module twice with different upper/lowercases: PyQt4.QtGui
92516 WARNING: Attempted to add Python module twice with different upper/lowercases: PIL._imaging
92516 WARNING: Attempted to add Python module twice with different upper/lowercases: PIL._imagingft
92516 INFO: Looking for eggs
92517 INFO: Using Python library C:\Users\310251823\AppData\Local\Continuum\Anaconda3\python35.dll
92517 INFO: Found binding redirects:
[BindingRedirect(name='Microsoft.VC90.CRT', language=None, arch='amd64', oldVersion=(9, 0, 21022, 8), newVersion=(9, 0, 30729, 6161), publicKeyToken='1fc8b3b9a1e18e3b')]
92561 INFO: Warnings written to C:\Users\310251823\PycharmProjects\Prove1\build\Prova\warnProva.txt
92722 INFO: checking PYZ
92722 INFO: Building PYZ because out00-PYZ.toc is non existent
92722 INFO: Building PYZ (ZlibArchive) C:\Users\310251823\PycharmProjects\Prove1\build\Prova\out00-PYZ.pyz
96605 INFO: checking PKG
96605 INFO: Building PKG because out00-PKG.toc is non existent
96605 INFO: Building PKG (CArchive) out00-PKG.pkg
96703 INFO: Bootloader C:\Users\310251823\AppData\Local\Continuum\Anaconda3\lib\site-packages\pyinstaller-3.2-py3.5.egg\PyInstaller\bootloader\Windows-64bit\run.exe
96703 INFO: checking EXE
96703 INFO: Building EXE because out00-EXE.toc is non existent
96703 INFO: Building EXE from out00-EXE.toc
96703 INFO: Appending archive to EXE C:\Users\310251823\PycharmProjects\Prove1\build\Prova\Prova.exe
96733 INFO: checking COLLECT
96733 INFO: Building COLLECT because out00-COLLECT.toc is non existent
96733 INFO: Building COLLECT out00-COLLECT.toc
Here seems something like what I have, but I do not understand the solution. How can I import the 'packaging' package into the .spec file?
Thank you,
Sm
First of all I see in the pyinstaller log that the PIL hook is loaded, which excludes the tkinter module required by matplotlib. This is a common problem, that I also had, and the error message doesn't always point to that direction.
To solve it, I did two things.
Use import Tkinter and import FileDialog in my code. Preferably after importing PIL.
Download and install the latest development version of pyinstaller. 3.2 was still giving me issues.
If the packaging error persists, you can add it as a hidden-import. In the .spec file, that is an option of the Analysis.
A part of setuptools, pkg_resources._vendor.packaging is missing in case of anaconda. As it says, it might be necessary to contact the packager of the distribution.
Development version of pyinstaller can be installed by
pip install https://github.com/pyinstaller/pyinstaller/archive/develop.zip
That version seems to work with anaconda3 and it even copies necessary dll dependencies. tkinter.filedialog needs to be specified as a hidden import though.
It is possible to install a more standard version of setuptools by
python -m pip install --upgrade --force-reinstall setuptools
I don't know if this causes problems with anaconda though.
I was able to build an executable of your script after installing another version of setuptools, adding tkinter.filedialog as a hidden import, and copying mkl_*.dll files from Anaconda3/Library/bin folder.

Scrapy (1.0) - Signals not received

What i'm trying to do is trigger a function (abc) when a scrapy spider is opened, which sould be triggered by scrapys 'signals'.
(Later on i wanna change it to 'closed' to save the stats from each spider to the database for a daily monitoring.)
So for now i tried this simply solution just to print something out, what i would expect to see in the console when i run the crawlerprocess in the moment the spider is opend.
What happen is that the crawler runs fine, but is does not print the output of 'abc' the moment the spider is openend which should trigger the output. At the end i posted what is see in the console, which is just that the spider is running perfectly fine.
Why is the abc function not triggered by the signal at the point where is see 'INFO: Spider opened' in the log (or at all)?
MyCrawlerProcess.py:
from twisted.internet import reactor
from scrapy import signals
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
process = CrawlerProcess(get_project_settings())
def abc():
print '######################works!######################'
def from_crawler(crawler):
crawler.signals.connect(abc, signal=signals.spider_opened)
process.crawl('Dissident')
process.start() # the script will block here until the crawling is finished
Console output:
2016-03-17 13:00:14 [scrapy] INFO: Scrapy 1.0.4 started (bot: Chrome 41.0.2227.1. Mozilla/5.0 (Macintosh; Intel Mac Osource)
2016-03-17 13:00:14 [scrapy] INFO: Optional features available: ssl, http11
2016-03-17 13:00:14 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'scrapytry.spiders', 'SPIDER_MODULES': ['scrapytry.spiders'], 'DOWNLOAD_DELAY': 5, 'BOT_NAME': 'Chrome 41.0.2227.1. Mozilla/5.0 (Macintosh; Intel Mac Osource'}
2016-03-17 13:00:14 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2016-03-17 13:00:14 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-03-17 13:00:14 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-03-17 13:00:14 [scrapy] INFO: Enabled item pipelines: ImagesPipeline, FilesPipeline, ScrapytryPipeline
2016-03-17 13:00:14 [scrapy] INFO: Spider opened
2016-03-17 13:00:14 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-03-17 13:00:14 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-03-17 13:00:14 [scrapy] DEBUG: Crawled (200) <GET http://www.xyz.zzm/> (referer: None)
Simply defining from_crawler isn't enough as it's not hooked into the scrapy framework. Take a look at the docs here which show how to create an extension that does exactly what you're attempting to do. Be sure to follow the instruction for enabling the extension via the MYEXT_ENABLED setting.