JSONDecodeError with Scrapy: Expecting value: line 1 column 1 (char 0) - scrapy

I am using requests in order to fetch and parse some data scraped using Scrapy with Scrapyrt (real time scraping).
This is how I do it:
#pass spider to requests parameters #
params = {
'spider_name': spider,
'start_requests':True
}
# scrape items
response = requests.get('http://scrapyrt:9080/crawl.json', params)
print ('RESPONSE JSON',response.json())
data = response.json()
As per Scrapy documentation, with 'start_requests' parameter set as True, the spider automatically requests urls and passes the response to the parse method which is the default method used for parsing requests.
start_requests
type: boolean
optional
Whether spider should execute Scrapy.Spider.start_requests method. start_requests are executed by default when you run Scrapy Spider normally without ScrapyRT, but this method is NOT executed in API by default. By default we assume that spider is expected to crawl ONLY url provided in parameters without making any requests to start_urls defined in Spider class. start_requests argument overrides this behavior. If this argument is present API will execute start_requests Spider method.
But the setup is not working. Log:
[2019-05-19 06:11:14,835: DEBUG/ForkPoolWorker-4] Starting new HTTP connection (1): scrapyrt:9080
[2019-05-19 06:11:15,414: DEBUG/ForkPoolWorker-4] http://scrapyrt:9080 "GET /crawl.json?spider_name=precious_tracks&start_requests=True HTTP/1.1" 500 7784
[2019-05-19 06:11:15,472: ERROR/ForkPoolWorker-4] Task project.api.routes.background.scrape_allmusic[87dbd825-dc1c-4789-8ee0-4151e5821798] raised unexpected: JSONDecodeError('Expecting value: line 1 column 1 (char 0)',)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/src/app/project/api/routes/background.py", line 908, in scrape_allmusic
print ('RESPONSE JSON',response.json())
File "/usr/lib/python3.6/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The error was due to a bug with Twisted 19.2.0, a scrapyrt dependency, which assumed response to be of wrong type.
Once I installed Twisted==18.9.0, it worked.

Related

python configparser key error raised when using Globally

When I'm trying to pass the API endpoint values in the post API file, KeryError has unfortunately been raised. In the baseapi.ini file, I wrote [API] endpoint = value
Post API file:
import requests
from APIs.payLoad import addBookPayload
from Utilities.configration import config
from Utilities.resources import *
url = config()['API']['endpoint']+ApiResources.addBook
header = {"Content-Type": "application/json"}
response = requests.post(url, json=addBookPayload("pl74"), headers=header,)
print(response.json())
response_json = response.json()
book_ID = response_json['ID']
Error:
Traceback (most recent call last):
File "C:\Users\Muhammad Azmul Haq\PycharmProjects\BackEndProject\APIs\PostAPI.py", line 8, in <module>
url = config()['API']['endpoint']+ApiResources.addBook
File "C:\Users\Muhammad Azmul Haq\AppData\Local\Programs\Python\Python39\lib\configparser.py", line 960, in __getitem__
raise KeyError(key)
KeyError: 'API'
Does anyone have an idea what I did wrong Kind regards?
You are not initializing your global variable in config before accessing it. Try assigning value in the current file,
or
Put all configure in the separate configuration file and import that configuration file.

How to read an image sent in body of req in falcon

Sending jpg image in body of POST, using postman to do so:
Reading it with
image_text_similarity.py:
import json
class ImageTextSimilarity():
def on_post(self, req, resp):
image_raw = json.loads(req.stream.read())
which errors out with
Traceback (most recent call last):
File "/home/dario/.local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
self.handle_request(listener, req, client, addr)
File "/home/dario/.local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "falcon/api.py", line 274, in falcon.api.API.__call__
File "falcon/api.py", line 269, in falcon.api.API.__call__
File "/home/dario/ImageTextSimilarityApp/image_text_similarity.py", line 95, in on_post
image_raw = json.loads(req.stream.read())
File "/usr/lib/python3.6/json/__init__.py", line 349, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
How do we read the image from the body of the POST request?
Rest of the code is
image_similarity_app.py:
import falcon
from image_text_similarity import ImageTextSimilarity
api = application = falcon.API()
api.req_options.auto_parse_form_urlencoded = True
image_text_similarity_object = ImageTextSimilarity()
api.add_route('/image_text_similarity', image_text_similarity_object)
And starting the service with gunicorn image_similarity_app
I'm not an expert at Postman, but it appears that by choosing binary, you are sending your JPEG image data as the request body: Postman Chrome: What is the difference between form-data, x-www-form-urlencoded and raw
In Falcon, you can simply read the request payload as
jpeg_data = req.stream.read()
(Note that on some app servers such as the stdlib's wsgiref.simple_server, you may need to use the safe Request.bounded_stream wrapper.)
See also Falcon's WSGI and ASGI tutorials for inspiration; they use are very related topic (building an image service) to illustrate the basic concepts of the framework. You'll find examples how to handle RESTful image resources: upload, convert, store, list, serve, cache etc.

bot.sendAudio and bot.sendPhoto methods in telepot return { 'error code' : 400 , 'Bad Request: wrong HTTP URL specified'}

I am using the
telepot.Bot(bot_id).sendAudio(chat_id, file_url)
method, is supposed to send the file, but it returns
Traceback (most recent call last):
File "C:\Users\vinu\AppData\Local\Programs\Python\Python37\lib\site-packages\telepot\__init__.py", line 1158, in collector
callback(item)
File "bot.py", line 72, in handle
bot.sendAudio(chat_id, url)
File "C:\Users\vinu\AppData\Local\Programs\Python\Python37\lib\site-packages\telepot\__init__.py", line 556, in sendAudio
return self._api_request_with_file('sendAudio', _rectify(p), 'audio', audio)
File "C:\Users\vinu\AppData\Local\Programs\Python\Python37\lib\site-packages\telepot\__init__.py", line 496, in _api_request_with_file
return self._api_request(method, _rectify(params), **kwargs)
File "C:\Users\vinu\AppData\Local\Programs\Python\Python37\lib\site-packages\telepot\__init__.py", line 491, in _api_request
return api.request((self._token, method, params, files), **kwargs)
File "C:\Users\vinu\AppData\Local\Programs\Python\Python37\lib\site-packages\telepot\api.py", line 155, in request
return _parse(r)
File "C:\Users\vinu\AppData\Local\Programs\Python\Python37\lib\site-packages\telepot\api.py", line 150, in _parse
raise exception.TelegramError(description, error_code, data)
telepot.exception.TelegramError: ('Bad Request: wrong HTTP URL specified', 400, {'ok': False, 'error_code': 400, 'description': 'Bad Request: wrong HTTP URL specified'})
the same happened with sendPhoto, but I used python requests to send photos
response =requests.post('https://api.telegram.org/bot/sendphoto', files=files`)
I either want to know why the sendAudio() and sendPhoto() methods work or the http url to send audio
with telepot bot.SendPhoto and bot.sendVideo and bot.sendAudio work either with files and urls that contains a file.
In your case it seems that you used and url and it was uncorrect, can you share it?
In my experience it can be because the url contains & instead of &

Catching multiple exceptions - Python

I have a program that occasionally throws a badStatusLine exception, after catching it we are now getting another error and I can't seem to catch it so the program doesn't stop. Here is what I have, any help would be appreciated.
The error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/Users/mattduhon/trading4.py", line 30, in trade
execution.execute_order(event)
File "/Users/mattduhon/execution.py", line 33, in execute_order
params, headers
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1001, in request
self._send_request(method, url, body, headers)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1029, in _send_request
self.putrequest(method, url, **skips)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 892, in putrequest
raise CannotSendRequest()
CannotSendRequest
The file responsible for catching the error:
import httplib
import urllib
from httplib import BadStatusLine
from httplib import CannotSendRequest
class Execution(object):
def __init__(self, domain, access_token, account_id):
self.domain = domain
self.access_token = access_token
self.account_id = account_id
self.conn = self.obtain_connection()
def obtain_connection(self):
return httplib.HTTPSConnection(self.domain)
def execute_order(self, event):
headers = {
"Content-Type": "application/x-www-form-urlencoded",
"Authorization": "Bearer " + self.access_token}
params = urllib.urlencode({
"instrument" : event.instrument,
"units" : event.units,
"type" : event.order_type,
"side" : event.side,
"stopLoss" : event.stopLoss,
"takeProfit" : event.takeProfit
})
self.conn.request(
"POST",
"/v1/accounts/%s/orders" % str(self.account_id),
params, headers)
try:
response = self.conn.getresponse().read()
except BadStatusLine as e:
print(e)
except CannotSendRequest as a: ######my attempt at catching the error
print(a)
else:
print response
If you change the final else to:
except:
print "Unexpected error:", sys.exc_info()[0]
raise
You should get the real uncaught error if it's really coming from the try-catch block. But are you sure you haven't gotten into a bad state which excepts outside that block?

Scrapyd with Polipo and Tor

UPDATE: I am now running this command:
scrapyd-deploy <project_name>
And getting this error:
504 Connect to localhost:8123 failed: General SOCKS server failure
I am trying to deploy my scrapy spider through scrapyd-deploy, the following is the command I use:
scrapyd-deploy -L <project_name>
I get the following error message:
Traceback (most recent call last):
File "/usr/local/bin/scrapyd-deploy", line 269, in <module>
main()
File "/usr/local/bin/scrapyd-deploy", line 74, in main
f = urllib2.urlopen(req)
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not found
The following is my scrapy.cfg file:
[settings]
default = <project_name>.settings
[deploy:<project_name>]
url = http://localhost:8123
project = <project_name>
eggs_dir = eggs
logs_dir = logs
items_dir = items
jobs_to_keep = 5
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5
http_port = 8123
debug = on
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
I am running tor and polipo, with the polipo proxy on port 'http://localhost:8123'. I can perform a wget and download that page without any problems. The proxy is correctly working, I can connect to the internet and so on. Please ask if you need more clarification.
Thanks!
urllib2.HTTPError: HTTP Error 404: Not found
The url is not reached.
Anything interesting in /var/log/polipo/polipo.log? What comes from tail -100 /var/log/polipo/polipo.log?
Apparently this is because I forgot to run the main command. It is easy to miss because it is mentioned in the Overview page of the documentation, and not the Deployment page. The following is the command:
scrapyd
504 Connect to localhost:8123 failed: General SOCKS server failure
You're asking Polipo to connect to localhost:8123; Polipo passes the request to tor, which returns a failure result which is dutifully returned by Polipo ("General SOCKS server failure").
url = http://localhost:8123
This is certainly not what you meant.
http_port = 8123
I'm also pretty sure you didn't want to run scrapyd on the same port as Polipo.