Scrapy error: TypeError: __init__() got an unexpected keyword argument 'cb_kwargs' - scrapy

I use scrapy to parse the site. Scrapy version 2.1.0
when I try to make an additional request:
taglines_request = Request(url=tagline_url,
callback=self.get_tags_and_awards,
cb_kwargs={'item':item, 'awards_url': awards_url})
I get the following error:
TypeError: __init__() got an unexpected keyword argument 'cb_kwargs'
But in the __init __ () method, there is a cb_kwargs parameter. Tell me, please, what could be the problem?
I launch through ScrapyD

I think the problem here is that you are passing cb_kwargs to Request which Request in turn doesn't accept. From what I understand, cb_kwargs is new in Scrapy version 1.7, so you should check again if ScrapyD in your case is working with a version of Scrapy >= 1.7.
Alternatively, to pass data to your callback, you could use Request's meta attribute.
taglines_request = Request(
url=tagline_url,
callback=self.get_tags_and_awards,
meta={
'item':item,
'awards_url': awards_url
}
)
You can then access the data from your response via meta.
def get_tags_and_awards(self, response):
item = response.meta['item']
awards_url = response.meta['awards_url']

Related

Robot Framework ConnectionError Handling Syntax

import requests
def make_request():
try:
url = 'https://reqres.in/api/users'
response = requests.get(url)
parsed = response.json()
print(parsed)
except requests.exceptions.ConnectionError:
# 👇️ handle error here or use a `pass` statement
print('connection error occurred')
make_request()
I know how to have try and except block in python using
requests.exceptions.ConnectionError:
but using the same is not syntactically correct in Robot Framework.
I am using the RequestsLibrary in Robot Framework.

when compiling in python3 I'm getting an error "TypeError: 'module' object is not callable" [duplicate]

This is my third python project, and I've received an error message: 'module object' is not callable.
I know that this means I'm referencing a variable or function incorrectly. But trial and error hasn't been able to help me solve this.
import urllib
def get_url(url):
'''get_url accepts a URL string and return the server response code, response headers, and contents of the file'''
req_headers = {
'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13',
'Referer': 'http://python.org'}
#errors here on next line
request = urllib.request(url, headers=req_headers) # create a request object for the URL
opener = urllib.build_opener() # create an opener object
response = opener.open(request) # open a connection and receive the http response headers + contents
code = response.code
headers = response.headers # headers object
contents = response.read() # contents of the URL (HTML, javascript, css, img, etc.)
return code , headers, contents
testURL = get_url('http://www.urlhere.filename.zip')
print ("outputs: %s" % (testURL,))
I've been using this link for reference:
http://docs.python.org/release/3.0.1/library/urllib.request.html
Traceback:
Traceback (most recent call last):
File "C:\Project\LinkCrawl\LinkCrawl.py", line 31, in <module>
testURL = get_url('http://www.urlhere.filename.zip')
File "C:\Project\LinkCrawl\LinkCrawl.py", line 21, in get_url
request = urllib.request(url, headers=req_headers) # create a request object for the URL
TypeError: 'module' object is not callable
In python 3, the urllib.request object is a module. You need to call objects contained in this module. This is an important change from Python 2, if you are using example code you need to take that into account.
For example, creating the Request object and the opener:
request = urllib.request.Request(url, headers=req_headers)
opener = urllib.request.build_opener()
response = opener.open(request)
Read the documentation carefully.
urllib.request is a module. urllib.request.Request is a class. Calling a module like you're currently doing raises an error. You probably want to call the class, like this:
request = urllib.request.Request(url, headers=req_headers) # create a request object for the URL
You'll also probably want to use build_opener of urllib.request rather than just urllib:
opener = urllib.request.build_opener() # create an opener object
It also occurs if you have declared the returning method as a property method by annotating with #property.

Code Error with Scrapy Tutorial

I am trying to learn Scrapy and going through the basic tutorial.I am using Anaconda Navigator. I am working in an environment with scrapy installed. I have inputted the code, but keep getting an error.
Here is the code:
import scrapy
class FirstSpider(scrapy.Spider):
name = "FirstSpider"
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Requests(url=url, callback = self.parse)
def parse(self, response):
page = response.url.split("/")[-2]
filename = "quotes-%.html" % page
with open(filename, "wb") as f:
f.write(response.body)
self.log("saved file %s")% filename
The code runs for a bit. Says it crawled 0 pages. Then DEBUGS: Telnet Console, and then puts out this error,"[scrapy.core.engine] ERROR: Error while obtaining start requests."
The code then runs some more, and puts out another error after, "yield scrapy.Requests(utl=url, callback = self.parse)" that says "AttributeError: Module 'scrapy' has no attribute 'Requests'.
I have re-written the code, and looked for answers. Please help. Thanks!
You have a typo here:
yield scrapy.Requests(url=url, callback = self.parse)
It's Request and not Requests.

HttpResponseRedirect' object has no attribute 'client'

Django 1.9.6
I'd like to write some unit test for checking redirection.
Could you help me understand what am I doing wrongly here.
Thank you in advance.
The test:
from django.test import TestCase
from django.core.urlresolvers import reverse
from django.http.request import HttpRequest
from django.contrib.auth.models import User
class GeneralTest(TestCase):
def test_anonymous_user_redirected_to_login_page(self):
user = User(username='anonymous', email='vvv#mail.ru', password='ttrrttrr')
user.is_active = False
request = HttpRequest()
request.user = user
hpv = HomePageView()
response = hpv.get(request)
self.assertRedirects(response, reverse("auth_login"))
The result:
ERROR: test_anonymous_user_redirected_to_login_page (general.tests.GeneralTest)
Traceback (most recent call last):
File "/home/michael/workspace/photoarchive/photoarchive/general/tests.py", line 44, in test_anonymous_user_redirected_to_login_page
self.assertRedirects(response, reverse("auth_login"))
File "/home/michael/workspace/venvs/photoarchive/lib/python3.5/site-packages/django/test/testcases.py", line 326, in assertRedirects
redirect_response = response.client.get(path, QueryDict(query),
AttributeError: 'HttpResponseRedirect' object has no attribute 'client'
Ran 3 tests in 0.953s
What pdb says:
-> self.assertRedirects(response, reverse("auth_login"))
(Pdb) response
<HttpResponseRedirect status_code=302, "text/html; charset=utf-8", url="/accounts/login/">
You need to add a client to the response object. See the updated code below.
from django.test import TestCase, Client
from django.core.urlresolvers import reverse
from django.http.request import HttpRequest
from django.contrib.auth.models import User
class GeneralTest(TestCase):
def test_anonymous_user_redirected_to_login_page(self):
user = User(username='anonymous', email='vvv#mail.ru', password='ttrrttrr')
user.is_active = False
request = HttpRequest()
request.user = user
hpv = HomePageView()
response = hpv.get(request)
response.client = Client()
self.assertRedirects(response, reverse("auth_login"))
Looks like you are directly calling your view's get directly rather than using the built-in Client. When you use the test client, you get your client instance back in the response, presumably for cases such as this where you want to check/fetch a redirect.
One solution would be to use the client to fetch the response from your view. Another is to stick a client in the response as mentioned above.
A third option is tell assertRedirects not to fetch the redirect. There is no need for client if you don't ask the assertion to fetch the redirect. That's done by adding fetch_redirect_response=False to your assertion.

ScrapyDeprecationWaring: Command's default `crawler` is deprecated and will be removed. Use `create_crawler` method to instantiate crawlers

Scrapy version 0.19
I am using the code at this page ( Run multiple scrapy spiders at once using scrapyd ). When I run scrapy allcrawl, I got
ScrapyDeprecationWaring: Command's default `crawler` is deprecated and will be removed. Use `create_crawler` method to instantiate crawlers
Here is the code:
from scrapy.command import ScrapyCommand
import urllib
import urllib2
from scrapy import log
class AllCrawlCommand(ScrapyCommand):
requires_project = True
default_settings = {'LOG_ENABLED': False}
def short_desc(self):
return "Schedule a run for all available spiders"
def run(self, args, opts):
url = 'http://localhost:6800/schedule.json'
for s in self.crawler.spiders.list(): #this line raise the warning
values = {'project' : 'YOUR_PROJECT_NAME', 'spider' : s}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
log.msg(response)
How do I fix the DeprecationWarning ?
Thanks
Use:
crawler = self.crawler_process.create_crawler()