Deploy Scrapy project to remote Scrapyd service error - scrapy

I tried to deploy a test Scrapy project to the remote Scrapyd server. I got the following error message in client side.
curl http://IP:6800/addversion.json -d project=test_project -d spider=quotes
{"status": "error", "message": "'version'", "node_name": "serverName"}
Error message in server-side
2018-11-13T12:22:22+0000 [_GenericHTTPChannelProtocol,0,IP Address] Unhandled Error
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/twisted/web/http.py", line 2190, in allContentReceived
req.requestReceived(command, path, version)
File "/usr/lib64/python2.7/site-packages/twisted/web/http.py", line 917, in requestReceived
self.process()
File "/usr/lib64/python2.7/site-packages/twisted/web/server.py", line 199, in process
self.render(resrc)
File "/usr/lib64/python2.7/site-packages/twisted/web/server.py", line 259, in render
body = resrc.render(self)
--- <exception caught here> ---
File "/usr/lib/python2.7/site-packages/scrapyd/webservice.py", line 21, in render
return JsonResource.render(self, txrequest).encode('utf-8')
File "/usr/lib/python2.7/site-packages/scrapyd/utils.py", line 20, in render
r = resource.Resource.render(self, txrequest)
File "/usr/lib64/python2.7/site-packages/twisted/web/resource.py", line 250, in render
return m(request)
File "/usr/lib/python2.7/site-packages/scrapyd/webservice.py", line 83, in render_POST
version = txrequest.args[b'version'][0].decode('utf-8')
exceptions.KeyError: 'version'
I checked both client and server sides, the Scrapy version are all 1.5.1. The python version are 2.7.*

The sample curl command you shown earlier is not supposed to work. According to the documentation, you'll also need:
A version argument, which is believed to be the cause of issue you experience now.
A egg argument containing the actual project code, otherwise scrapyd won't be able to receive it when you pass in only the project name and spider name.

Related

How to fix "AttributeError: 'module' object has no attribute 'SOL_UDP'" error in Python Connector Mule

I'm trying to execute a basic script to return Cisco Config File as a JSON Format, and I have a success process over Python2.7.16 and Python 3.7.3, but when I'm trying to execute the same script over Python Connector for Mule ESB I receive the error refered in the title of this thread.
This is for a Mule feature, the Python connector script in this tool, works with a Jython 2.7.1, and is loaded as a library for the Mule.
I expect the output as a JSON file but actual output is:
Root Exception stack trace:
Traceback (most recent call last):
File "<script>", line 2, in <module>
File "C:\Python27\Lib\site-packages\ciscoconfparse\__init__.py", line 1, in <module>
from ciscoconfparse import *
File "C:\Python27\Lib\site-packages\ciscoconfparse\ciscoconfparse.py", line 17, in <module>
from models_cisco import IOSHostnameLine, IOSRouteLine, IOSIntfLine
File "C:\Python27\Lib\site-packages\ciscoconfparse\models_cisco.py", line 8, in <module>
from ccp_util import _IPV6_REGEX_STR_COMPRESSED1, _IPV6_REGEX_STR_COMPRESSED2
File "C:\Python27\Lib\site-packages\ciscoconfparse\ccp_util.py", line 16, in <module>
from dns.resolver import Resolver
File "C:\Python27\Lib\site-packages\dns\resolver.py", line 1148, in <module>
_protocols_for_socktype = {
AttributeError: 'module' object has no attribute 'SOL_UDP'
The only thing I had to do was comment that line in the script resolver.py and in this way the script on Anypoint Studio ran smoothly.
Thanks for your help, I hope that this helps to other people.
The problem appears to be that you are trying to execute a script that depends on a different python package. Mule supports executing python scripts using the Java Jython implementation but it probably doesn't know about pyhton packages dependencies.

Cuckoo sandbox, api error after installation

I'm investigating the possibility of using cuckoo sandbox as a malware detonator in series with Cortex.
I've (seemingly) installed all of the dependencies, enabled reporting, and elasticsearch in the config files, and started the webserver using the below command without issues.
sudo cuckoo web runserver [ip redacted]:[port]
I am able to connect to my web instance without errors on the browser side. But, in the stdout, I get the following:
2018-07-06 05:32:19,152 [django.request] ERROR: Internal Server Error: /cuckoo/api/status
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 132, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python2.7/dist-packages/cuckoo/web/utils.py", line 55, in inner
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/http.py", line 45, in inner
return func(request, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/cuckoo/web/controllers/cuckoo/api.py", line 45, in status
temp_file = Files.temp_put("")
File "/usr/local/lib/python2.7/dist-packages/cuckoo/common/files.py", line 97, in temp_put
prefix="upload_", dir=path or temppath()
File "/usr/lib/python2.7/tempfile.py", line 314, in mkstemp
return _mkstemp_inner(dir, prefix, suffix, flags)
File "/usr/lib/python2.7/tempfile.py", line 244, in _mkstemp_inner
fd = _os.open(file, flags, 0600)
OSError: [Errno 2] No such file or directory: '/tmp/cuckoo-tmp-root/upload_IUQt4r'
[06/Jul/2018 05:32:19] "POST /analysis/api/tasks/recent/ HTTP/1.1" 200 13
[06/Jul/2018 05:32:19] "GET /cuckoo/api/status HTTP/1.1" 500 12976
In addition to this error, I both cannot upload a file, or submit a URL, both resulting in exactly the same error.
Does anyone here have experience setting up Cuckoo that can give me a hint? Not sure if this is a dependency issue, or a configuration issue after installation?
Thanks in advance!
Had the same problem. Mine was due to the fact that my virtual environment's root did not include the default folder "/tmp/" that cuckoo tries to establish as a default temp file path in its "files.py". Yours could be related to the directory structure changing in "~" when sudo'ing to run the server.
Either way, the fix was to update "cuckoo.conf"'s "tmppath" setting from blank to an explicit directory with no permissions issues (i.e. "/tmp/").
Once I updated this, the error stopped and my cuckoo api was able to run properly.

UnsupportedMethod error when deploy scrapy project in EC2

I was trying to deploy my Scrapy code to AWS by using scrapyd, but I got into this issue I could not figure out. It has been two days. I saw similar problems on the web, but did not find any helpful solution to fix this issue.
2016-02-15 08:41:20+0000 [HTTPChannel,1,xx.xxx.x.xxx] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 1730, in allContentReceived
req.requestReceived(command, path, version)
File "/usr/lib/python2.7/dist-packages/twisted/web/http.py", line 826, in requestReceived
self.process()
File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 189, in process
self.render(resrc)
File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 238, in render
body = resrc.render(self)
--- <exception caught here> ---
File "/usr/local/lib/python2.7/dist-packages/scrapyd/webservice.py", line 17, in render
return JsonResource.render(self, txrequest)
File "/usr/local/lib/python2.7/dist-packages/scrapyd/utils.py", line 19, in render
r = resource.Resource.render(self, txrequest)
File "/usr/lib/python2.7/dist-packages/twisted/web/resource.py", line 249, in render
raise UnsupportedMethod(allowedMethods)
twisted.web.error.UnsupportedMethod: ['HEAD', 'object', 'POST']
I have tried to run the scrapy code alone in both my macbook, and EC2 server. It works in both cases. It's just not working when I use my macbook to schedule a job in EC2.
These are the steps I followed to set things up.

Scrapyd error when trying to schedule a job

When I try yo schedule a job after I have deployed a project I get the following error:
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 18, in render
return JsonResource.render(self, txrequest)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/txweb.py", line 10, in render
r = resource.Resource.render(self, txrequest)
File "/usr/local/lib/python2.7/dist-packages/twisted/web/resource.py", line 250, in render
return m(request)
File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 37, in render_POST
self.root.scheduler.schedule(project, spider, **args)
File "/usr/lib/pymodules/python2.7/scrapyd/scheduler.py", line 16, in schedule
q.add(spider_name, **spider_args)
File "/usr/lib/pymodules/python2.7/scrapyd/spiderqueue.py", line 18, in add
self.q.put(d, priority)
File "/usr/lib/pymodules/python2.7/scrapyd/sqlite.py", line 103, in put
self.conn.execute(q, args)
OperationalError: attempt to write a readonly database
If after deploying the project I go and restart scrapyd and after schedule a job there is no problems with it and works fine. But to be honest I do not see the point of going and restarting scrapyd everytime I deploy... does not make sense.
I have checked the DB folder and there is a crawling_project.db file with root:root ownsership this could be causing the issue??

upgrade plone 3.3.6 to plone 4.0.7 File Error

i tried to migrate plone 3.3.6 to a newer plone 4.0.7 version (and then to 4.3.x) but I ran in multiple errors:
Full traceback
2013-10-07 13:51:33 INFO ProgressHandler Process started (1842 objects to go)
2013-10-07 13:51:33 ERROR plone.app.upgrade Upgrade aborted. Error:
Traceback (most recent call last):
File "/Users/iie/Projects/plone4.0/rwa/eggs/Plone-4.0.7-py2.6.egg/Products/CMFPlone/MigrationTool.py", line 175, in upgrade
step['step'].doStep(setup)
File "/Users/iie/Projects/plone4.0/rwa/eggs/Products.GenericSetup-1.6.3-py2.6.egg/Products/GenericSetup/upgrade.py", line 142, in doStep
self.handler(tool)
File "/Users/iie/Projects/plone4.0/rwa/eggs/plone.app.upgrade-1.0.7-py2.6.egg/plone/app/upgrade/v40/betas.py", line 117, in updateIconMetadata
obj = brain.getObject()
File "/Users/iie/Projects/plone4.0/rwa/eggs/Zope2-2.12.18-py2.6-macosx-10.7-x86_64.egg/Products/ZCatalog/CatalogBrains.py", line 92, in getObject
target = parent.restrictedTraverse(path[-1])
File "/Users/iie/Projects/plone4.0/rwa/eggs/Zope2-2.12.18-py2.6-macosx-10.7-x86_64.egg/OFS/Traversable.py", line 310, in restrictedTraverse
return self.unrestrictedTraverse(path, default, restricted=True)
File "/Users/iie/Projects/plone4.0/rwa/eggs/Zope2-2.12.18-py2.6-macosx-10.7-x86_64.egg/OFS/Traversable.py", line 278, in unrestrictedTraverse
raise e
AttributeError: pa_20120810.pdf
If I delete "pa_20120810.pdf" another file throws an error, and so on ...
I hope you understand me and someone can help me
Thanks
Something to try: before migration use collective.catalogcleanup to remove broken references from your catalog. It's easy to use: add to your buildout, restart the site, go to /##collective-catalogcleanup?dry_run=false in your browser.
As collective.catalogcleanup's documentation states:
The goal is to get rid of outdated brains that could otherwise cause problems, for example during an upgrade to Plone 4.