UnsupportedMethod error when deploy scrapy project in EC2 - scrapy

I was trying to deploy my Scrapy code to AWS by using scrapyd, but I got into this issue I could not figure out. It has been two days. I saw similar problems on the web, but did not find any helpful solution to fix this issue.
2016-02-15 08:41:20+0000 [HTTPChannel,1,xx.xxx.x.xxx] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 1730, in allContentReceived
req.requestReceived(command, path, version)
File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 826, in requestReceived
self.process()
File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 189, in process
self.render(resrc)
File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 238, in render
body = resrc.render(self)
--- <exception caught here> ---
File "/usr/local/lib/python2.7/dist-packages/scrapyd/webservice.py", line 17, in render
return JsonResource.render(self, txrequest)
File "/usr/local/lib/python2.7/dist-packages/scrapyd/utils.py", line 19, in render
r = resource.Resource.render(self, txrequest)
File "/usr/lib/python2.7/dist-packages/twisted/web/resource.py", line 249, in render
raise UnsupportedMethod(allowedMethods)
twisted.web.error.UnsupportedMethod: ['HEAD', 'object', 'POST']
I have tried to run the scrapy code alone in both my macbook, and EC2 server. It works in both cases. It's just not working when I use my macbook to schedule a job in EC2.
These are the steps I followed to set things up.

Related

Why Did Letsencrypt Stop Renewing?

Since my site is more of a demonstration, I haven't used it in a couple months. When I came back to the site, I found that I wasn't able to access the site securely. So I logged into linux (Ubuntu 20.04) and tried certbot and letsencrypt commands, to renew. This is the output that I got:
Original exception was:
Traceback (most recent call last):
File "/usr/bin/letsencrypt", line 11, in <module>
load_entry_point('certbot==0.40.0', 'console_scripts', 'certbot')()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 490, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2854, in load_entry_point
return ep.load()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2445, in load
return self.resolve()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2451, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/usr/lib/python3/dist-packages/certbot/main.py", line 17, in <module>
from certbot import account
File "/usr/lib/python3/dist-packages/certbot/account.py", line 17, in <module>
from acme import messages
File "/usr/lib/python3/dist-packages/acme/messages.py", line 7, in <module>
from acme import challenges
File "/usr/lib/python3/dist-packages/acme/challenges.py", line 9, in <module>
import requests
ModuleNotFoundError: No module named 'requests'
I don't know what could have happened between now and while I last accessed the site because I am sure that I did not change anything between that period.
Sure do appreciate any help.

Deploy Scrapy project to remote Scrapyd service error

I tried to deploy a test Scrapy project to the remote Scrapyd server. I got the following error message in client side.
curl http://IP:6800/addversion.json -d project=test_project -d spider=quotes
{"status": "error", "message": "'version'", "node_name": "serverName"}
Error message in server-side
2018-11-13T12:22:22+0000 [_GenericHTTPChannelProtocol,0,IP Address] Unhandled Error
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/twisted/web/http.py", line 2190, in allContentReceived
req.requestReceived(command, path, version)
File "/usr/lib64/python2.7/site-packages/twisted/web/http.py", line 917, in requestReceived
self.process()
File "/usr/lib64/python2.7/site-packages/twisted/web/server.py", line 199, in process
self.render(resrc)
File "/usr/lib64/python2.7/site-packages/twisted/web/server.py", line 259, in render
body = resrc.render(self)
--- <exception caught here> ---
File "/usr/lib/python2.7/site-packages/scrapyd/webservice.py", line 21, in render
return JsonResource.render(self, txrequest).encode('utf-8')
File "/usr/lib/python2.7/site-packages/scrapyd/utils.py", line 20, in render
r = resource.Resource.render(self, txrequest)
File "/usr/lib64/python2.7/site-packages/twisted/web/resource.py", line 250, in render
return m(request)
File "/usr/lib/python2.7/site-packages/scrapyd/webservice.py", line 83, in render_POST
version = txrequest.args[b'version'][0].decode('utf-8')
exceptions.KeyError: 'version'
I checked both client and server sides, the Scrapy version are all 1.5.1. The python version are 2.7.*
The sample curl command you shown earlier is not supposed to work. According to the documentation, you'll also need:
A version argument, which is believed to be the cause of issue you experience now.
A egg argument containing the actual project code, otherwise scrapyd won't be able to receive it when you pass in only the project name and spider name.

Scrapyd error when trying to schedule a job

When I try yo schedule a job after I have deployed a project I get the following error:
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 18, in render
return JsonResource.render(self, txrequest)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py", line 10, in render
r = resource.Resource.render(self, txrequest)
File "/usr/local/lib/python2.7/dist-packages/twisted/web/resource.py", line 250, in render
return m(request)
File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 37, in render_POST
self.root.scheduler.schedule(project, spider, **args)
File "/usr/lib/pymodules/python2.7/scrapyd/scheduler.py", line 16, in schedule
q.add(spider_name, **spider_args)
File "/usr/lib/pymodules/python2.7/scrapyd/spiderqueue.py", line 18, in add
self.q.put(d, priority)
File "/usr/lib/pymodules/python2.7/scrapyd/sqlite.py", line 103, in put
self.conn.execute(q, args)
OperationalError: attempt to write a readonly database
If after deploying the project I go and restart scrapyd and after schedule a job there is no problems with it and works fine. But to be honest I do not see the point of going and restarting scrapyd everytime I deploy... does not make sense.
I have checked the DB folder and there is a crawling_project.db file with root:root ownsership this could be causing the issue??

upgrade plone 3.3.6 to plone 4.0.7 File Error

i tried to migrate plone 3.3.6 to a newer plone 4.0.7 version (and then to 4.3.x) but I ran in multiple errors:
Full traceback
2013-10-07 13:51:33 INFO ProgressHandler Process started (1842 objects to go)
2013-10-07 13:51:33 ERROR plone.app.upgrade Upgrade aborted. Error:
Traceback (most recent call last):
File "/Users/iie/Projects/plone4.0/rwa/eggs/Plone-4.0.7-py2.6.egg/Products/CMFPlone/MigrationTool.py", line 175, in upgrade
step['step'].doStep(setup)
File "/Users/iie/Projects/plone4.0/rwa/eggs/Products.GenericSetup-1.6.3-py2.6.egg/Products/GenericSetup/upgrade.py", line 142, in doStep
self.handler(tool)
File "/Users/iie/Projects/plone4.0/rwa/eggs/plone.app.upgrade-1.0.7-py2.6.egg/plone/app/upgrade/v40/betas.py", line 117, in updateIconMetadata
obj = brain.getObject()
File "/Users/iie/Projects/plone4.0/rwa/eggs/Zope2-2.12.18-py2.6-macosx-10.7-x86_64.egg/Products/ZCatalog/CatalogBrains.py", line 92, in getObject
target = parent.restrictedTraverse(path[-1])
File "/Users/iie/Projects/plone4.0/rwa/eggs/Zope2-2.12.18-py2.6-macosx-10.7-x86_64.egg/OFS/Traversable.py", line 310, in restrictedTraverse
return self.unrestrictedTraverse(path, default, restricted=True)
File "/Users/iie/Projects/plone4.0/rwa/eggs/Zope2-2.12.18-py2.6-macosx-10.7-x86_64.egg/OFS/Traversable.py", line 278, in unrestrictedTraverse
raise e
AttributeError: pa_20120810.pdf
If I delete "pa_20120810.pdf" another file throws an error, and so on ...
I hope you understand me and someone can help me
Thanks
Something to try: before migration use collective.catalogcleanup to remove broken references from your catalog. It's easy to use: add to your buildout, restart the site, go to /##collective-catalogcleanup?dry_run=false in your browser.
As collective.catalogcleanup's documentation states:
The goal is to get rid of outdated brains that could otherwise cause problems, for example during an upgrade to Plone 4.

Indexing of pdf and word docs with Plone4 and Windows

I followed an instruction for indexing pdf on Plone(4) (and Windows 2008) , that was originaly written for Plone3:
http://plone.org/documentation/kb/enable-indexing-of-pdf-and-word-docs-with-windows-in-five-steps-occurs-three-minutes-without-problems
I got an error on he fiths step "Add Transform; Enter in ID: pdf_to_text", when I tried to add the Module: Products.PortalTransforms.transforms.pdf_to_text.
Here is the Report:
Traceback (innermost last):
Module ZPublisher.Publish, line 127, in publish
Module ZPublisher.mapply, line 77, in mapply
Module ZPublisher.Publish, line 47, in call_object
Module Products.PortalTransforms.TransformEngine, line 487, in manage_addTransform
Module Products.PortalTransforms.TransformEngine, line 254, in _mapTransform
Module Products.MimetypesRegistry.MimeTypesRegistry, line 220, in lookup
- __traceback_info__: ("'BROKEN'", 'BROKEN')
Module Products.MimetypesRegistry.MimeTypesRegistry, line 457, in split
MimeTypeException: Malformed MIME type (BROKEN)
Well, that is very old.
On Windows, the word indexing "Just Works" in Plone 4 if you have MS Office installed. PDF still needs xpdf.
Is pdftotext.exe on your PATH? It won't work if it isn't. Remember, if you have added it to the system-wide environment, you are still going to have to stop/start Zope to have the Zope process pick up the changed PATH: run Zope in the foreground, from a command window that you know has pdftotext on the path, and see what happens.