I tried to use a boolean in the dict for the formdata parameter in scrapy.FormRequest() because the POST request requires a boolean. Scrapy wouldn't let me use a boolean in the dict. So I tried to use the string "false" instead but it doesn't seem to work and I get this error after the POST request:
2019-12-11 21:48:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <400 https://passport.twitch.tv/login>: HTTP status code is not handled or not allowed
2019-12-11 21:48:55 [scrapy.core.engine] INFO: Closing spider (finished)
Is there a way to use a boolean in the POST request? And am I getting the error because I'm not using the right formdata?
Related
I'm having a problem to mock a HTTP request from my Node.js code using nock.
To make the request I'm using the axios library:
axios.get(`https://api.spoonacular.com/recipes/${id}/information?apiKey=${process.env.MY_SPOONACULAR_KEY}`);
id is just an integer like 1131030.
I'm using nock this way:
nock('https://api.spoonacular.com')
.persist()
.log(console.log)
.get('/recipes\/\d*\/information$/')
.query(true)
.reply(200, {answer: 'any'});
I'm using query(true) to match any query string. The regex in the get is to match any possible id on the request.
However it is not working, and from the nock log I'm getting the following:
matching https://api.spoonacular.com:443/recipes/1131030/information to GET https://api.spoonacular.com:443/recipes/d*/information$/: false
You're passing a string, not a regex, to the .get() method. You need to remove the quotes.
.get(/recipes\/\d*\/information$/)
Also, using .log() is not preferred to when trying to figure out why Nock is not matching. Use DEBUG instead to get more information. https://github.com/nock/nock#debugging
I am trying to scrape https://www.mywebsite.com.sg but the following command returns 400 bad request error:
scrapy view https://www.mywebsite.com.sg
If I use:
data=requests.get("https://www.mywebsite.com.sg")
I can get the content of the webpage in data.text and data.content.
However all the xpath operations in my script dotn't work as data.xpath and data.content are both empty.
There seems to be no protection in the webpage as postman software can get result with a simple HTTP GET query.
How do I get the response object to be properly filled?
I am trying to HTTP POST an item to an API using Scrapy. In my pipeline I have:
Request( url, method='POST',
body=json.dumps(item),
headers={'Content-Type':'application/json'} )
This does not work. The error is:
{ some JSON } is not JSON serializable
Any idea on what I am doing wrong?
As stated in paul trmbrth's comment, instead of
body=json.dumps(item)
use
body=json.dumps(dict(item))
So your code would become:
Request( url, method='POST',
body=json.dumps(dict(item)),
headers={'Content-Type':'application/json'} )
I have used JsonRequest from from scrapy.http as follows:
JsonRequest(
url=<url>,
method='POST',
# Replace with your item or dict(item)
data=item
)
Behind it calls json.dumps on the dictionary you pass to the data parameter so you first might want to make sure that your item that you're trying to send is json serializable.
During my crawling, some pages return a response with partial html body and status 200, after I compare the response body with the one I open in browser, the former one miss something.
How can I catch this unexpected partial response body case in spider or in download middleware?
Below is about the log example:
2014-01-23 16:31:53+0100 [filmweb_multi] DEBUG: Crawled (408) http://www.filmweb.pl/film/Labirynt-2013-507169/photos> (referer: http://www.filmweb.pl/film/Labirynt-2013-507169) ['partial']
Its not partial content as such. The rest of the content is dynamically loaded by a Javacript AJAX call.
To debug what content is being sent as response for a particular request, use Scrapy's open_in_browser() function.
There's another thread on How to extract dynamic content from websites that are using AJAX ?. Refer this for a workaround.
Seeing ['partial'] in the log means that the response is missing certain headers; see my answer here for more detail on what causes the partial flag.
To catch these responses, you can simply check the response flags. For example, if you created the request using Request(url=url, callback=self.parse), you would do the following in the callback:
def parse(self, response):
if 'partial' in response.flags:
# Do something with the response
pass
I am testing RESTapi with (json format) using (HTTP Request sampler) in jmeter. I am facing problems with the PUT calls for update operation.
The PUT call with parameters don't work at all using (HTTP Request sampler), so now i am using the post body to pass the Json.
How can i pass the extracted values from the previous response to next PUT request in thread group? Passing the 'Regex veritable' to PUT call in Post body don't work, it doesn't take ${value} in Post body.
How do i carry out UPDATE operations using (HTTP Request sampler) in Jmeter?
Check that your regexp extractor really worked using a debug sampler to show extracted value.
Check your regexp extractor is scoped correctly.
See this configuration:
A Variable:
Its use with a PUT request:
The sampler result: