Where is a Response transformed into one of its subclasses? - scrapy

I'm trying to write a downloader middleware that ignores responses that don't have a pre-defined element. However, I can't use the css method of the HtmlResponse class inside the middleware because, at that point, the response's type is just Response. When it reaches the spider it's an HtmlResponse, but then it's too late because I can't perform certain actions to the middleware state.
Where is the response's final type set?

Without seeing your code of the middleware it is hard to tell what the matter is.
However my middleware below gets an HtmlResponse object:
class FilterMiddleware(object):
def process_response(self, request, response, spider):
print response.__class__
print type(response)
return response**strong text**
Both print statements verify this:
<class 'scrapy.http.response.html.HtmlResponse'>
<class 'scrapy.http.response.html.HtmlResponse'>
And I can use the css method on the response without any exception. The order of the middleware in the settings.py does not matter either: with 10, 100 or 500 I get the same result as above.
However if I configure the middleware to 590 or above I get plain old Response object. And this is because the conversion happens in the HttpCompressionMiddleware class on line 35 in the current version.
To solve your issue order your middleware somewhere later on the pipeline (with a lower order number) or convert the response yourself (I would not do this however).

Related

api request problem response 500 from backend

All welcome, I ran into a problem in which at the time of passing id and body in the method it is received, it gives 500 errors. In swagger I have an infected method, it looks like this:
swagger
Accordingly, in the body of the request, I pass all the parameters of the fields it needs (I drive them in strictly, just to check for operability):
code
In response I get this:
response
If you try to do the same in the swagger, then the method works and a 200 response comes:
swagger 200 response
What am I doing wrong?
I tried to make the code asynchronous, I tried to pass fields in different ways, but nothing happens, the answer comes 500
Try to put the id field inside the obj.
const form = {
id: 790,
...
}
You are passing too much params inside the callback (id, form).

Insert Record to BigQuery or some RDB during API Call

I am writing a REST API GET endpoint that needs to both return a response and store records to either GCP Cloud SQL (MySQL), but I want the return to not be dependent on completion of the writing of the records. Basically, my code will look like:
def predict():
req = request.json.get("instances")
resp = make_response(req)
write_to_bq(req)
write_to_bq(resp)
return resp
Is there any easy way to do this with Cloud SQL Client Library or something?
Turns our flask has a functionality that does what I require:
#app.route("predict", method=["GET"]):
def predict():
# do some stuff with the request.json object
return jsonify(response)
#app.after_request
def after_request_func(response):
# do anything you want that relies on context of predict()
#response.call_on_close
def persist():
# this will happen after response is sent,
# so even if this function fails, the predict()
# will still get it's response out
write_to_db()
return response
One important thing is that a method tagged with after_request must take an argument and return something of type flask.Response. Also I think if method has call_on_close tag, you cannot access from context of main method, so you need to define anything you want to use from the main method inside the after_request tagged method but outside (above) the call_on_close method.

use case of process_spider_input in spidermiddleware

Does anyone know the difference between process_spider_input(response, spider) in spidermiddleware and process_response(request, response, spider) in Downloadermiddleware.
And how to choose one over another, because I see they do quite the same work, they handle response.
According to the source, they do have difference
return value
spider_mw.process_spider_input() returns None, you can check or modify the Response. Basically it supposes the response has been accepted and you can't refuse it.
downloader_mw.process_response() returns Response or Request. You can refuse the response from download handler and generate a new request. (e.g. the RetryMiddleware)

Flask Error: If something fails in the flask backend, how does the error propagate to the front end?

Consider a simple application where a user fills a form to divide two numbers, in the routes the form data is proceeded [made into float] and then passed as parameters to a python script's function that has the division logic.
The logic fails due to division by 0 is handled as a custom message in the terminal. How does one send this custom message back to the front end UI along with a 500 error message? Trying to make a restful flask app here.
So far I can abort and show a custom message but not the one that propagated from the backend. Also looked into custom error handling but I want to writer of the external python script to be able to write the custom message.
You can Flask errorhandler(errorcode) to manage your errors and display those on the frontend.
#app.errorhandler(500)
def code_500(error):
return render_template("errors/500.html", error=error), 500
You can put whatever else you want in the html template.
You can also call the code_500(error) func directly.
Same principle applies for any other HTTP error code if you want to customize the page and the message (401, 403, 404, etc...).
If you're inside a blueprint, you can use app_errorhandler instead.
You could use the abort() function. From the docs:
When using Flask for web APIs, you can use the same techniques as above to return JSON responses to API errors. abort() is called with a description parameter. The errorhandler() will use that as the JSON error message, and set the status code to 404.
You could implement it like this
#app.route("/div")
def divide():
x, y = request.form['x'], request.form['y']
try:
result = x / y
except ZeroDivisionError:
abort(400, description="Your message here")
else:
# Proper response
From there, the important step is properly catching that message on your frontend.

Persist items using a POST request within a Pipeline

I want to persist items within a Pipeline posting them to a url.
I am using this code within the Pipeline
class XPipeline(object):
def process_item(self, item, spider):
log.msg('in SpotifylistPipeline', level=log.DEBUG)
yield FormRequest(url="http://www.example.com/additem, formdata={'title': item['title'], 'link': item['link'], 'description': item['description']})
but it seems it's not making the http request.
Is it possible to make http request from pipelines? If not, do I have to do it in the Spider?
Do I need to specify a callback function? If so, which one?
If I can make the http call, can I check the response (JSON) and return the item if everything went ok, or discard the item if it didn't get saved?
As I final thing, is there a diagram that explains the flow that Scrapy follows from beginning to end? I am getting slightly lost which what gets called when. For instance, if Pipelines returned items to Spiders, what do Spiders do with those items? What's after a Pipeline call?
Many thanks in advance
Migsy
You can inherit your pipeline from scrapy.contrib.pipeline.media.MediaPipeline and yield Requests in 'get_media_requests'. Responses are passed into 'media_downloaded' callback.
Quote:
This method is called for every item pipeline component and must
either return a Item (or any descendant class) object or raise a
DropItem exception. Dropped items are no longer processed by further
pipeline components.
So, only spider can yield a request with a callback.
Pipelines are used for processing items.
You better describe what do you want to achieve.
is there a diagram that explains the flow that Scrapy follows from beginning to end
Architecture overview
For instance, if Pipelines returned items to Spiders
Pipelines do not return items to spiders. The items returned are passed to the next pipeline.
This could be done easily by using the requests library. If you don't want to use another library then look into urllib2.
import requests
class XPipeline(object):
def process_item(self, item, spider):
r = requests.post("http://www.example.com/additem", data={'title': item['title'], 'link': item['link'], 'description': item['description']})
if r.status_code == 200:
return item
else:
raise DropItem("Failed to post item with title %s." % item['title'])