I was trying to deploy my Scrapy code to AWS by using scrapyd, but I got into this issue I could not figure out. It has been two days. I saw similar problems on the web, but did not find any helpful solution to fix this issue.
2016-02-15 08:41:20+0000 [HTTPChannel,1,xx.xxx.x.xxx] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 1730, in allContentReceived
req.requestReceived(command, path, version)
File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 826, in requestReceived
self.process()
File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 189, in process
self.render(resrc)
File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 238, in render
body = resrc.render(self)
--- <exception caught here> ---
File "/usr/local/lib/python2.7/dist-packages/scrapyd/webservice.py", line 17, in render
return JsonResource.render(self, txrequest)
File "/usr/local/lib/python2.7/dist-packages/scrapyd/utils.py", line 19, in render
r = resource.Resource.render(self, txrequest)
File "/usr/lib/python2.7/dist-packages/twisted/web/resource.py", line 249, in render
raise UnsupportedMethod(allowedMethods)
twisted.web.error.UnsupportedMethod: ['HEAD', 'object', 'POST']
I have tried to run the scrapy code alone in both my macbook, and EC2 server. It works in both cases. It's just not working when I use my macbook to schedule a job in EC2.
These are the steps I followed to set things up.
Related
Since my site is more of a demonstration, I haven't used it in a couple months. When I came back to the site, I found that I wasn't able to access the site securely. So I logged into linux (Ubuntu 20.04) and tried certbot and letsencrypt commands, to renew. This is the output that I got:
Original exception was:
Traceback (most recent call last):
File "/usr/bin/letsencrypt", line 11, in <module>
load_entry_point('certbot==0.40.0', 'console_scripts', 'certbot')()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 490, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2854, in load_entry_point
return ep.load()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2445, in load
return self.resolve()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2451, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/usr/lib/python3/dist-packages/certbot/main.py", line 17, in <module>
from certbot import account
File "/usr/lib/python3/dist-packages/certbot/account.py", line 17, in <module>
from acme import messages
File "/usr/lib/python3/dist-packages/acme/messages.py", line 7, in <module>
from acme import challenges
File "/usr/lib/python3/dist-packages/acme/challenges.py", line 9, in <module>
import requests
ModuleNotFoundError: No module named 'requests'
I don't know what could have happened between now and while I last accessed the site because I am sure that I did not change anything between that period.
Sure do appreciate any help.
I tried to deploy a test Scrapy project to the remote Scrapyd server. I got the following error message in client side.
curl http://IP:6800/addversion.json -d project=test_project -d spider=quotes
{"status": "error", "message": "'version'", "node_name": "serverName"}
Error message in server-side
2018-11-13T12:22:22+0000 [_GenericHTTPChannelProtocol,0,IP Address] Unhandled Error
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/twisted/web/http.py", line 2190, in allContentReceived
req.requestReceived(command, path, version)
File "/usr/lib64/python2.7/site-packages/twisted/web/http.py", line 917, in requestReceived
self.process()
File "/usr/lib64/python2.7/site-packages/twisted/web/server.py", line 199, in process
self.render(resrc)
File "/usr/lib64/python2.7/site-packages/twisted/web/server.py", line 259, in render
body = resrc.render(self)
--- <exception caught here> ---
File "/usr/lib/python2.7/site-packages/scrapyd/webservice.py", line 21, in render
return JsonResource.render(self, txrequest).encode('utf-8')
File "/usr/lib/python2.7/site-packages/scrapyd/utils.py", line 20, in render
r = resource.Resource.render(self, txrequest)
File "/usr/lib64/python2.7/site-packages/twisted/web/resource.py", line 250, in render
return m(request)
File "/usr/lib/python2.7/site-packages/scrapyd/webservice.py", line 83, in render_POST
version = txrequest.args[b'version'][0].decode('utf-8')
exceptions.KeyError: 'version'
I checked both client and server sides, the Scrapy version are all 1.5.1. The python version are 2.7.*
The sample curl command you shown earlier is not supposed to work. According to the documentation, you'll also need:
A version argument, which is believed to be the cause of issue you experience now.
A egg argument containing the actual project code, otherwise scrapyd won't be able to receive it when you pass in only the project name and spider name.
When I try yo schedule a job after I have deployed a project I get the following error:
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 18, in render
return JsonResource.render(self, txrequest)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py", line 10, in render
r = resource.Resource.render(self, txrequest)
File "/usr/local/lib/python2.7/dist-packages/twisted/web/resource.py", line 250, in render
return m(request)
File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 37, in render_POST
self.root.scheduler.schedule(project, spider, **args)
File "/usr/lib/pymodules/python2.7/scrapyd/scheduler.py", line 16, in schedule
q.add(spider_name, **spider_args)
File "/usr/lib/pymodules/python2.7/scrapyd/spiderqueue.py", line 18, in add
self.q.put(d, priority)
File "/usr/lib/pymodules/python2.7/scrapyd/sqlite.py", line 103, in put
self.conn.execute(q, args)
OperationalError: attempt to write a readonly database
If after deploying the project I go and restart scrapyd and after schedule a job there is no problems with it and works fine. But to be honest I do not see the point of going and restarting scrapyd everytime I deploy... does not make sense.
I have checked the DB folder and there is a crawling_project.db file with root:root ownsership this could be causing the issue??
i tried to migrate plone 3.3.6 to a newer plone 4.0.7 version (and then to 4.3.x) but I ran in multiple errors:
Full traceback
2013-10-07 13:51:33 INFO ProgressHandler Process started (1842 objects to go)
2013-10-07 13:51:33 ERROR plone.app.upgrade Upgrade aborted. Error:
Traceback (most recent call last):
File "/Users/iie/Projects/plone4.0/rwa/eggs/Plone-4.0.7-py2.6.egg/Products/CMFPlone/MigrationTool.py", line 175, in upgrade
step['step'].doStep(setup)
File "/Users/iie/Projects/plone4.0/rwa/eggs/Products.GenericSetup-1.6.3-py2.6.egg/Products/GenericSetup/upgrade.py", line 142, in doStep
self.handler(tool)
File "/Users/iie/Projects/plone4.0/rwa/eggs/plone.app.upgrade-1.0.7-py2.6.egg/plone/app/upgrade/v40/betas.py", line 117, in updateIconMetadata
obj = brain.getObject()
File "/Users/iie/Projects/plone4.0/rwa/eggs/Zope2-2.12.18-py2.6-macosx-10.7-x86_64.egg/Products/ZCatalog/CatalogBrains.py", line 92, in getObject
target = parent.restrictedTraverse(path[-1])
File "/Users/iie/Projects/plone4.0/rwa/eggs/Zope2-2.12.18-py2.6-macosx-10.7-x86_64.egg/OFS/Traversable.py", line 310, in restrictedTraverse
return self.unrestrictedTraverse(path, default, restricted=True)
File "/Users/iie/Projects/plone4.0/rwa/eggs/Zope2-2.12.18-py2.6-macosx-10.7-x86_64.egg/OFS/Traversable.py", line 278, in unrestrictedTraverse
raise e
AttributeError: pa_20120810.pdf
If I delete "pa_20120810.pdf" another file throws an error, and so on ...
I hope you understand me and someone can help me
Thanks
Something to try: before migration use collective.catalogcleanup to remove broken references from your catalog. It's easy to use: add to your buildout, restart the site, go to /##collective-catalogcleanup?dry_run=false in your browser.
As collective.catalogcleanup's documentation states:
The goal is to get rid of outdated brains that could otherwise cause problems, for example during an upgrade to Plone 4.
I followed an instruction for indexing pdf on Plone(4) (and Windows 2008) , that was originaly written for Plone3:
http://plone.org/documentation/kb/enable-indexing-of-pdf-and-word-docs-with-windows-in-five-steps-occurs-three-minutes-without-problems
I got an error on he fiths step "Add Transform; Enter in ID: pdf_to_text", when I tried to add the Module: Products.PortalTransforms.transforms.pdf_to_text.
Here is the Report:
Traceback (innermost last):
Module ZPublisher.Publish, line 127, in publish
Module ZPublisher.mapply, line 77, in mapply
Module ZPublisher.Publish, line 47, in call_object
Module Products.PortalTransforms.TransformEngine, line 487, in manage_addTransform
Module Products.PortalTransforms.TransformEngine, line 254, in _mapTransform
Module Products.MimetypesRegistry.MimeTypesRegistry, line 220, in lookup
- __traceback_info__: ("'BROKEN'", 'BROKEN')
Module Products.MimetypesRegistry.MimeTypesRegistry, line 457, in split
MimeTypeException: Malformed MIME type (BROKEN)
Well, that is very old.
On Windows, the word indexing "Just Works" in Plone 4 if you have MS Office installed. PDF still needs xpdf.
Is pdftotext.exe on your PATH? It won't work if it isn't. Remember, if you have added it to the system-wide environment, you are still going to have to stop/start Zope to have the Zope process pick up the changed PATH: run Zope in the foreground, from a command window that you know has pdftotext on the path, and see what happens.