Scrapy crawlera bug - scrapy

Scrapy 2.0.1, scrapy_crawlera 1.7.0.
I think scrapy_crawlera should access meta differently (https://github.com/scrapy/scrapy/issues/3516)
2020-04-02 06:02:36 [scrapy.core.engine] INFO: Spider opened
2020-04-02 06:02:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-04-02 06:02:36 [officedepot] INFO: Spider opened: officedepot
2020-04-02 06:02:36 [root] INFO: Using crawlera at http://proxy.crawlera.com:8010 (apikey: 0036675)
2020-04-02 06:02:36 [root] INFO: CrawleraMiddleware: disabling download delays on Scrapy side to optimize delays introduced by Crawlera. To avoid this behaviour you can use the CRAWLERA_PRESERVE_DELAY setting but keep in mind that this may slow down the crawl significantly
2020-04-02 06:02:36 [scrapy.extensions.httpcache] DEBUG: Using filesystem cache storage in /root/.scrapy/httpcache
2020-04-02 06:02:36 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-04-02 06:02:36 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.officedepot.com/a/products/9859127/Office-Depot-Brand-EverBind-D-Ring/>
Traceback (most recent call last):
File "/root/venv/lib/python3.7/site-packages/scrapy/http/response/__init__.py", line 30, in meta
return self.request.meta
AttributeError: 'NoneType' object has no attribute 'meta'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/venv/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/root/venv/lib/python3.7/site-packages/scrapy/core/downloader/middleware.py", line 53, in process_response
response = yield method(request=request, response=response, spider=spider)
File "/root/venv/lib/python3.7/site-packages/scrapy_crawlera/middleware.py", line 192, in process_response
retries = response.meta.get('crawlera_auth_retry_times', 0)
File "/root/venv/lib/python3.7/site-packages/scrapy/http/response/__init__.py", line 33, in meta
"Response.meta not available, this response "
AttributeError: Response.meta not available, this response is not tied to any request

Just update your scrapy-crawlera version to >= 1.7.2

Related

driver.get(URL) not navigating to the specified URL

The webdriver is opening the browser but not navigating to the specified URL, and is returning the following exception:
Traceback (most recent call last):
File "C:/Users/91800/PycharmProjects/Automation/automation.py", line 3, in <module>
driver = webdriver.Chrome(executable_path='C:\\Program Files (x86)\\Google\Chrome\\Application\\chrome.exe')
File "C:\Users\91800\PycharmProjects\Automation\venv\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in __init__
self.service.start()
File "C:\Users\91800\PycharmProjects\Automation\venv\lib\site-packages\selenium\webdriver\common\service.py", line 98, in start
self.assert_process_still_running()
File "C:\Users\91800\PycharmProjects\Automation\venv\lib\site-packages\selenium\webdriver\common\service.py", line 111, in assert_process_still_running
% (self.path, return_code)
selenium.common.exceptions.WebDriverException: Message: Service C:\Program Files (x86)\Google\Chrome\Application\chrome.exe unexpectedly exited. Status code was: 0
from selenium import webdriver
driver = webdriver.Chrome(executable_path='C:\\Program Files (x86)\\Google\Chrome\\Application\\chrome.exe')
driver.get('https://www.google.com/')
so first check whether your chrome version is the same as your web driver version..otherwise, there might be an issue executing the code
#to check the chrome version:
go to help in google chrome then to about google chrome and you will find a version there
#install the specific version of web driver from URL("https://chromedriver.chromium.org/downloads")
and...
it should be fine !!!

Running headless Firefox or chrome in DigitalOcean

I am trying to run headless Chrome or Firefox in DigitalOcean and have tried alot of solutions but none seem to work.
The code works superb in my local system but in my DigitalOcean server it doesn't.
This is a test i am using to test headless Firefox
from pyvirtualdisplay import Display
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
# print("open virtual display")
with Display():
print("open Firefox browser")
browser = webdriver.Firefox()
browser.set_window_size(1120, 550)
url = 'http://arbspiper.com/'
browser.get(url)
title = browser.title
print(title)
browser.quit()
The error i get is
Traceback (most recent call last):
File "firefox.py", line 9, in <module>
browser = webdriver.Firefox()
File "/home/arbspiper_project/env/lib/python3.5/site-packages/selenium/webdriver/firefox/webdriver.py", line 174, in __init__
keep_alive=True)
File "/home/arbspiper_project/env/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/home/arbspiper_project/env/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/home/arbspiper_project/env/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/home/arbspiper_project/env/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: newSession
Mozilla Firefox 64.0
Selenium version (3.141.0)
answer from Manoj Kengudelu's comment.
gekodriver has to be compatible with ther version of firefox you are using.
check out supported platforms table here
once you found the gekodriver you would like to use you can download it, make it executable and add it to path.
Thanks again: Manoj Kengudelu

selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 1

version:
firefox : Mozilla Firefox 61.0
geckodriver : geckodriver v0.20.1
I only tried below code:
from selenium import webdriver
browser = webdriver.Firefox()
But getting an error as below:
Traceback (most recent call last):
File "my.py", line 3, in <module>
browser = webdriver.Firefox()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 170, in __init__
keep_alive=True)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 156, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 245, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 314, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status: 1
And geckodriver.log:
1528101123327 geckodriver INFO geckodriver 0.20.1
1528101123336 geckodriver INFO Listening on 127.0.0.1:43481
1528101124336 mozrunner::runner INFO Running command: "/usr/bin/firefox" "-marionette" "-profile" "/tmp/rust_mozprofile.y93GPXwtXuKC"
Running Firefox as root in a regular user's session is not supported. ($XAUTHORITY is /home/username/.Xauthority which is owned by username.)
It's only makes a problem in root account , Please help..
This error message...
Running Firefox as root in a regular user's session is not supported. ($XAUTHORITY is /home/keti/.Xauthority which is owned by keti.)
...implies you were either trying to invoke Firefox Browser as a root user or running Firefox Browser as root in a non-root session.
As per User's Firefox process runs as root (if root is running Firefox) both the cases are not supported and should have been relatively difficult to achieve. But technically it was still possible (as the --new-instance and --no-remote flags are available to control remote control) but X11's permissive security model meant an user should basically treat the user account as if it had passwordless sudo.
There were a couple of issues associated as follows:
If a user runs Firefox as root but using their own home directory, many things become broken for that user, sometimes permanently.
When firefox is running as root, other users on the same display can gain root privileges
With the GA (General Availability) of Firefox v60.0 Mozilla Team decided to Disallow Firefox from running as sudo as:
Use clone() instead of fork() for sandboxed Linux processes and remove SandboxEarlyInit etc.
Earlier running sudo firefox, which previously seemed to work but was unsupported, now will fail to load content (tab crash on any page) on most Linux Distributions and it will fail to start and print a message as:
Running Firefox as root in a regular user's session is not supported. ($XAUTHORITY is /home/username/.Xauthority which is owned by username.)

PermissionError: [Errno 1] Operation not permitted while using Selenium with Pythonista on iOS

I want to create a program in pythonista that can control the web browser. I know Selenium is the best for this but I have tried it on pythonista for my iOS iPhone and I get an error.
This is the code:
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('http://www.yahoo.com')
Here is the error:
PermissionError: [Errno 1] Operation not permitted
Traceback (most recent call last):
File "/private/var/mobile/Containers/Shared/AppGroup/A2EBDF28-CB6C-4190-8199-7406AA3821A3/Pythonista3/Documents/selen.py", line 3, in <module>
browser = webdriver.Chrome()
File "/private/var/mobile/Containers/Shared/AppGroup/A2EBDF28-CB6C-4190-8199-7406AA3821A3/Pythonista3/Documents/site-packages-3/selenium/webdriver/chrome/webdriver.py", line 68, in __init__
self.service.start()
File "/private/var/mobile/Containers/Shared/AppGroup/A2EBDF28-CB6C-4190-8199-7406AA3821A3/Pythonista3/Documents/site-packages-3/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/var/containers/Bundle/Application/24DD2A57-320E-4E21-9BE2-7C3605830DE0/Pythonista3.app/Frameworks/Py3Kit.framework/pylib/subprocess.py", line 708, in __init__
restore_signals, start_new_session)
File "/var/containers/Bundle/Application/24DD2A57-320E-4E21-9BE2-7C3605830DE0/Pythonista3.app/Frameworks/Py3Kit.framework/pylib/subprocess.py", line 1261, in _execute_child
restore_signals, start_new_session, preexec_fn)
PermissionError: [Errno 1] Operation not permitted
This error message...
PermissionError: [Errno 1] Operation not permitted
...implies that the ChromeDriver was unable to create a desired new resource e.g. logfile while initializing a new WebDriver and Web Client session.
As per the discussion Pythonista - Limitations due to iOS following are some of the limitations while using Pythonista :
No fork/exec for new processes. Impacts the subprocess module.
Due to missing fork, no full cleanup of process resources (memory, threads, file handles).
No file access outside of application directory.
No /dev/null and other special files.
Limited processing power of devices (compared to typical PC/Mac).
Process usually is stopped/killed after a while.
An simple example is as follows :
>>> import subprocess
>>> subprocess.call(["ls", "-l"])
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/mobile/Containers/Bundle/Application/8C59C68D-71BF-4CBB-90F8-373A1752DEE1/Pythonista.app/pylib/subprocess.py", line 524, in call
return Popen(*popenargs, **kwargs).wait()
File "/private/var/mobile/Containers/Bundle/Application/8C59C68D-71BF-4CBB-90F8-373A1752DEE1/Pythonista.app/pylib/subprocess.py", line 711, in __init__
errread, errwrite)
File "/private/var/mobile/Containers/Bundle/Application/8C59C68D-71BF-4CBB-90F8-373A1752DEE1/Pythonista.app/pylib/subprocess.py", line 1205, in _execute_child
self.pid = os.fork()
OSError: [Errno 1] Operation not permitted
What's wrong in your usecase
There can be 2 issues as follows :
When you invoke the following line of code :
browser = webdriver.Chrome()
The ChromeDriver tries to create/modify/access the scoped_directory within the file system. For example on Windows OS :
"chromedriverVersion": "2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e73)",
"userDataDir": "C:\\Users\\username\\AppData\\Local\\Temp\\scoped_dir5188_12717"
Possibly ChromeDriver is unable to perform this task/method/functionality.
Again when you invoke the following line of code :
browser = webdriver.Chrome()
As per selenium.webdriver.chrome.webdriver ChromeDriver tries to create a logfile within the file system as per the constructor as follows :
class selenium.webdriver.chrome.webdriver.WebDriver(executable_path='chromedriver', port=0, options=None, service_args=None, desired_capabilities=None, service_log_path=None, chrome_options=None)
Possibly ChromeDriver is unable to perform this task/method/functionality,
Due to the above mentioned reasons you are seeing the error :
PermissionError: [Errno 1] Operation not permitted
Solution
Incase of any of the above mentioned cases the solution would be to restrict the access/creation of the resources within the application directory only.

why does selenium code execute successsfully only in debug mode but fail in run mode?

C:\apache-tomcat-8.0.27>python
Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (
Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from selenium import webdriver
>>> webdriver.__version__
'3.0.2'
>>>
the selenium python testing code as the following:
driver.get("http://localhost:8080/")
self.assertEqual("Cubiender", driver.title)
driver.find_element_by_id("login_email").clear()
driver.find_element_by_id("login_email").send_keys("gin#cubi.com")
driver.find_element_by_id("login_pwd").clear()
driver.find_element_by_id("login_pwd").send_keys("pass")
driver.find_element_by_css_selector("input[type=\"submit\"]").click()
driver.find_element_by_link_text('Project List')
sleep(0.05)
driver.find_element_by_xpath("//input[#value='2588']").click()
sleep(0.05)
driver.find_element_by_css_selector("div.menu > #inquireProject").click()
the above code can run successfully in debug mode ,but will be failed when running at
driver.find_element_by_xpath("//input[#value='2588']").click()
even i added sleep
stacktrace:
C:\Python27\python.exe "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pycharm\utrunner.py" F:\python\sub_proj2.py true
Testing started at 10:53 ...
Error
Traceback (most recent call last):
File "F:\python\sub_proj2.py", line 41, in test_untitled
driver.find_element_by_css_selector("span.triangle").click()
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 437, in find_element_by_css_selector
return self.find_element(by=By.CSS_SELECTOR, value=css_selector)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 752, in find_element
'value': value})['value']
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 192, in check_response
raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: Unable to locate element: span.triangle
Process finished with exit code 0
Please share the stacktrace.
To me it seems,
Either your value='2588 is a dynamic one and changing with every instance load.
May be the sleep halts the whole process. Why don't you use wait, an example mentioned below.
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.ID,'someid')))
It is probably because in debug mode, it has more time to find the element. Try to increase sleep time or other solution to make sure the element is loaded.