How to disable or change the path of ghostdriver.log? - scrapy

Question is straightfoward, but some context may help.
I'm trying to deploy scrapy while using selenium and phantomjs as downloader. But the problem is that it keeps on saying permission denied when trying to deploy. So I want to change the path of ghostdriver.log or just disable it. Looking at phantomjs -h and ghostdriver github page I couldn't find the answer, my friend google let me down also.
$ scrapy deploy
Building egg of crawler-1370960743
'build/scripts-2.7' does not exist -- can't clean it
zip_safe flag not set; analyzing archive contents...
tests.fake_responses.__init__: module references __file__
Deploying crawler-1370960743 to http://localhost:6800/addversion.json
Server response (200):
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 18, in render
return JsonResource.render(self, txrequest)
File "/usr/lib/pymodules/python2.7/scrapy/utils/txweb.py", line 10, in render
r = resource.Resource.render(self, txrequest)
File "/usr/lib/python2.7/dist-packages/twisted/web/resource.py", line 216, in render
return m(request)
File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 66, in render_POST
spiders = get_spider_list(project)
File "/usr/lib/pymodules/python2.7/scrapyd/utils.py", line 65, in get_spider_list
raise RuntimeError(msg.splitlines()[-1])
RuntimeError: IOError: [Errno 13] Permission denied: 'ghostdriver.log

When using the PhantomJS driver add the following parameter:
driver = webdriver.PhantomJS(service_log_path='/var/log/phantomjs/ghostdriver.log')
Related code, would be nice to have an option to turn off logging though, seems thats not supported:
selenium/webdriver/phantomjs/service.py
class Service(object):
"""
Object that manages the starting and stopping of PhantomJS / Ghostdriver
"""
def __init__(self, executable_path, port=0, service_args=None, log_path=None):
"""
Creates a new instance of the Service
:Args:
- executable_path : Path to PhantomJS binary
- port : Port the service is running on
- service_args : A List of other command line options to pass to PhantomJS
- log_path: Path for PhantomJS service to log to
"""
self.port = port
self.path = executable_path
self.service_args= service_args
if self.port == 0:
self.port = utils.free_port()
if self.service_args is None:
self.service_args = []
self.service_args.insert(0, self.path)
self.service_args.append("--webdriver=%d" % self.port)
if not log_path:
log_path = "ghostdriver.log"
self._log = open(log_path, 'w')

#Reduce logging level
driver = webdriver.PhantomJS(service_args=["--webdriver-loglevel=SEVERE"])
#Remove logging
import os
driver = webdriver.PhantomJS(service_log_path=os.path.devnull)

Related

'Page' object has no attribute 'site_id'

I am building a brand new website with Django cms and I'm using aldryn_bootstrap3.
When I create a link/button my site seems broken. I get:
File "c:\newCMS\venv37\lib\site-packages\aldryn_bootstrap3\model_fields.py", line 172, in get_link_url
if ref_page.site_id != getattr(cms_page, 'site_id', None):
AttributeError: 'Page' object has no attribute 'site_id'
I tried to install the multisite module (I've seen in the forum that it worked for other people...)
Then, the configuration in my seetings.py is:
from multisite import SiteID
SITE_ID = SiteID(default=1)
Environment:
Request Method: GET
Request URL: http://localhost:8000/es/?edit&language=es
Django Version: 1.11.22
Python Version: 3.7.3
Installed Applications:
...
'django.contrib.sites',
'aldryn_bootstrap3',
'multisite',
'djangocms_multisite',
'MyCMS']
Error during template rendering:
In template c:\newCMS\venv37\lib\site-packages\aldryn_bootstrap3\templates\aldryn_bootstrap3\plugins\button.html, error at line 2
Traceback:
File "c:\newCMS\venv37\lib\site-packages\aldryn_bootstrap3\model_fields.py" in get_link_url
172. if ref_page.site_id != getattr(cms_page, 'site_id', None):
Exception Type: AttributeError at /es/
Exception Value: 'Page' object has no attribute 'site_id'
Edit this file and comment lines 172/173/174 (in your case)
$ c:\newCMS\venv37\lib\site-packages\aldryn_bootstrap3\model_fields.py
#if ref_page.site_id != getattr(self.page, 'site_id', None):
#ref_site = Site.objects._get_site_by_id(ref_page.site_id)
#link = '//{}{}'.format(ref_site.domain, link)
Then return to the GUI and delete the 'Link-button' that caused the problem
Open the file 'model_fields.py' again and uncomment the 3 lines. Then restart your instance
$ c:\newCMS\venv37\lib\site-packages\aldryn_bootstrap3\model_fields.py
if ref_page.site_id != getattr(self.page, 'site_id', None):
ref_site = Site.objects._get_site_by_id(ref_page.site_id)
link = '//{}{}'.format(ref_site.domain, link)
!! This is not a definitive solution but it will allow you to make your site operational again
On a ** Debian/Ubuntu ** server, edit and comment this file
$ sudo vim /usr/local/lib/python3.6/site-packages/aldryn_bootstrap3/model_fields.py
--> Raphaƫl Jonard | Web Performances Accelerator <--
Edit the file: aldryn_bootstrap3/model_fields.py
Replace line 169 which looks like:
if ref_page.site_id != getattr(self.page, 'site_id', None):
with:
if ref_page.node.site_id != getattr(self.page.node, 'site_id', None):

nornir napalm jinja Template Issue

Jinja Template Issue when using napalm function within nornir framework.
updated: My hosts are Cisco IOS devices.
I am running this nornir Jinja template script in a python3.6 virtualenv. I have other simple nornir and napalm code running fine, which makes we suspect the issue is related to jinja2 template function i am trying to use.
The error i am receiving is below. Can anyone assist me with spotting the problem?
working nornir script w/ napalm function - Example showing working environment
from nornir.core import InitNornir
from nornir.plugins.tasks import text
from nornir.plugins.tasks.networking import napalm_get, napalm_configure
from nornir.plugins.functions.text import print_title, print_result
from nornir.core.exceptions import NornirExecutionError
import logging
nr = InitNornir(config_file="config.yaml", dry_run=True)
ctil_net = nr.filter(site="ctil", type="network_device")
core = nr.filter(role="core", type="network_device")
results_napalm_get = nr.run(
task=napalm_get, getters=["facts", "interfaces"]
)
print_result(results_napalm_get)
Nornir script causing error:
from nornir.core import InitNornir
from nornir.plugins.tasks import text
from nornir.plugins.tasks.networking import napalm_get, napalm_configure
from nornir.plugins.functions.text import print_title, print_result
from nornir.core.exceptions import NornirExecutionError
import logging
nr = InitNornir(config_file="config.yaml", dry_run=True)
ctil_net = nr.filter(site="ctil", type="network_device")
core = nr.filter(role="core", type="network_device")
def basic_configuration(task):
# Transform inventory data to configuration via a template file
r = task.run(task=text.template_file,
name="Base Template Configuration",
template="base.j2", ## modified
path=f"templates",
severity_level=logging.DEBUG)
# Save the compiled configuration into a host variable
task.host["config"] = r.result
print (r.result)
# Deploy that configuration to the device using NAPALM
task.run(task=networking.napalm_configure,
name="Loading Configuration on the device",
replace=False,
configuration=task.host["config"],
severity_level=logging.INFO)
try:
print_title("Playbook to configure the network")
result = core.run(task=basic_configuration)
print_result(result)
except NornirExecutionError:
print("ERROR!!!")
config.yaml
num_workers: 100
inventory: nornir.plugins.inventory.simple.SimpleInventory
SimpleInventory:
host_file: "inventory/hosts.yaml"
group_file: "inventory/groups.yaml"
base.j2 (template file)
hostname {{ system.hostname }}
ip domain-name {{ site }}.{{ domain }}
base.j2 (template file) - Alt version (same exact error msg / result
hostname {{ host.name }}
ip domain-name {{ site }}.{{ domain }}
Hosts.yaml
switch101:
nornir_host: 192.168.2.101
site: ctil
role: core
groups:
- lab
nornir_nos: ios
type: network_device
switch007:
nornir_host: 192.168.2.7
site: ctil
role: access
groups:
- production
nornir_nos: ios
type: network_device
error output when I run:
$ python adv-nornir-example.py
**** Playbook to configure the network *****************************************
basic_configuration*************************************************************
* switch101 ** changed : False *************************************************
vvvv basic_configuration ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ERROR
Subtask: Base Template Configuration (failed)
---- Base Template Configuration ** changed : False ---------------------------- ERROR
Traceback (most recent call last):
File "/home/melshman/projects/virtualenv/nornir/lib/python3.6/site-packages/nornir/core/task.py", line 62, in start
r = self.task(self, **self.params)
File "/home/melshman/projects/virtualenv/nornir/lib/python3.6/site-packages/nornir/plugins/tasks/text/template_file.py", line 26, in template_file
**merged
File "/home/melshman/projects/virtualenv/nornir/lib/python3.6/site-packages/nornir/core/helpers/jinja_helper.py", line 11, in render_from_file
return template.render(**kwargs)
File "/home/melshman/projects/virtualenv/nornir/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render
return original_render(self, *args, **kwargs)
File "/home/melshman/projects/virtualenv/nornir/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/home/melshman/projects/virtualenv/nornir/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/melshman/projects/virtualenv/nornir/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise
raise value.with_traceback(tb)
File "templates/base.j2", line 2, in top-level template code
hostname {{ system.hostname }}
File "/home/melshman/projects/virtualenv/nornir/lib/python3.6/site-packages/jinja2/environment.py", line 430, in getattr
return getattr(obj, attribute)
jinja2.exceptions.UndefinedError: 'system' is undefined
^^^^ END basic_configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You are not passing system to the template. I think what you are trying to do in your template is:
hostname {{ host.name }}
ip domain-name {{ site }}.{{ domain }}
Basically the template, unless you add extra kwargs, is going to have access to the host itself via the host variable and to all of the host attributes specified in the inventory.

Mongos + Pymongo 2.5 ==>No suitable hosts found

Our application is using pymongo. I'm trying to connect to mongos. The code fails on the following line
pymongo.MongoReplicaSetClient(ec2-aa-bbb-124-22.compute-1.amazonaws.com:27017,
replicaSet=self.class_settings['mongo_rs'])
Exception
/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/.../server_tornado.py --config=conf/development.conf --port=9001
Traceback (most recent call last):
File "/Users/..../server_tornado.py", line 319, in
BaseCatalog.db_instance = DBInit(config=settings)
File "/Users/..../lib/sc/singleton.py", line 20, in call
cls._instances[cls] = super(Singleton, cls).call(*args, **kwargs)
File "/Users/..../app/models/db_init.py", line 50, in init
raise Exception("init() => " + str(err))
Exception: init() => No suitable hosts found
Process finished with exit code 1`
Found the solution, if at all anyone faces this issue:
Using MongoClient instead of MongoReplicaClient fixes the issue. This is because Mongos acts like a single instance of mongodb.

Scrapyd with Polipo and Tor

UPDATE: I am now running this command:
scrapyd-deploy <project_name>
And getting this error:
504 Connect to localhost:8123 failed: General SOCKS server failure
I am trying to deploy my scrapy spider through scrapyd-deploy, the following is the command I use:
scrapyd-deploy -L <project_name>
I get the following error message:
Traceback (most recent call last):
File "/usr/local/bin/scrapyd-deploy", line 269, in <module>
main()
File "/usr/local/bin/scrapyd-deploy", line 74, in main
f = urllib2.urlopen(req)
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not found
The following is my scrapy.cfg file:
[settings]
default = <project_name>.settings
[deploy:<project_name>]
url = http://localhost:8123
project = <project_name>
eggs_dir = eggs
logs_dir = logs
items_dir = items
jobs_to_keep = 5
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5
http_port = 8123
debug = on
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
I am running tor and polipo, with the polipo proxy on port 'http://localhost:8123'. I can perform a wget and download that page without any problems. The proxy is correctly working, I can connect to the internet and so on. Please ask if you need more clarification.
Thanks!
urllib2.HTTPError: HTTP Error 404: Not found
The url is not reached.
Anything interesting in /var/log/polipo/polipo.log? What comes from tail -100 /var/log/polipo/polipo.log?
Apparently this is because I forgot to run the main command. It is easy to miss because it is mentioned in the Overview page of the documentation, and not the Deployment page. The following is the command:
scrapyd
504 Connect to localhost:8123 failed: General SOCKS server failure
You're asking Polipo to connect to localhost:8123; Polipo passes the request to tor, which returns a failure result which is dutifully returned by Polipo ("General SOCKS server failure").
url = http://localhost:8123
This is certainly not what you meant.
http_port = 8123
I'm also pretty sure you didn't want to run scrapyd on the same port as Polipo.

https with jython2.7 + trusting all certificates does not work. Result: httplib.BadStatusLine

UPDATE: Problem related to bug in jython 2.7b1. See bug report: http://bugs.jython.org/issue2021. jython-coders are working on a fix!
After changing to jython2.7beta1 from Jython2.5.3 I am no longer able to read content of webpages using SSL, http and "trusting all certificates". The response from the https-page is always an empty string, resulting in httplib.BadStatusLine exception from httplib.py in Jython.
I need to be able to read from a webpage which requires authentication and do not want to setup any certificate store since I must have portability. Therefore my solution is to use the excellent implementation provided by http://tech.pedersen-live.com/2010/10/trusting-all-certificates-in-jython/
Example code is detailed below. Twitter might not be the best example, since it does not require certificate trusting; but the result is the same with or without the decorator.
#! /usr/bin/python
import sys
from javax.net.ssl import TrustManager, X509TrustManager
from jarray import array
from javax.net.ssl import SSLContext
class TrustAllX509TrustManager(X509TrustManager):
# Define a custom TrustManager which will blindly
# accept all certificates
def checkClientTrusted(self, chain, auth):
pass
def checkServerTrusted(self, chain, auth):
pass
def getAcceptedIssuers(self):
return None
# Create a static reference to an SSLContext which will use
# our custom TrustManager
trust_managers = array([TrustAllX509TrustManager()], TrustManager)
TRUST_ALL_CONTEXT = SSLContext.getInstance("SSL")
TRUST_ALL_CONTEXT.init(None, trust_managers, None)
# Keep a static reference to the JVM's default SSLContext for restoring
# at a later time
DEFAULT_CONTEXT = SSLContext.getDefault()
def trust_all_certificates(f):
# Decorator function that will make it so the context of the decorated
# method will run with our TrustManager that accepts all certificates
def wrapped(*args, **kwargs):
# Only do this if running under Jython
if 'java' in sys.platform:
from javax.net.ssl import SSLContext
SSLContext.setDefault(TRUST_ALL_CONTEXT)
print "SSLContext set to TRUST_ALL"
try:
res = f(*args, **kwargs)
return res
finally:
SSLContext.setDefault(DEFAULT_CONTEXT)
else:
return f(*args, **kwargs)
return wrapped
##trust_all_certificates
def read_page(host):
import httplib
print "Host: " + host
conn = httplib.HTTPSConnection(host)
conn.set_debuglevel(1)
conn.request('GET', '/example')
response = conn.getresponse()
print response.read()
read_page("twitter.com")
This results in:
Host: twitter.com
send: 'GET /example HTTP/1.1\r\nHost: twitter.com\r\nAccept-Encoding: identity\r\n\r\n'
reply: ''
Traceback (most recent call last):
File "jytest.py", line 62, in <module>
read_page("twitter.com")
File "jytest.py", line 59, in read_page
response = conn.getresponse()
File "/Users/erikiveroth/Workspace/Procera/sandbox/jython/jython2.7.jar/Lib/httplib.py", line 1030, in getresponse
File "/Users/erikiveroth/Workspace/Procera/sandbox/jython/jython2.7.jar/Lib/httplib.py", line 407, in begin
File "/Users/erikiveroth/Workspace/Procera/sandbox/jython/jython2.7.jar/Lib/httplib.py", line 371, in _read_status
httplib.BadStatusLine: ''
Changing back to jython2.5.3 gives me parseable output from twitter.
Have any of you seen this before? Can not find any bug-tickets on jython project page about this nor can I understand what changes could result in this behaviour (more than maybe #1309, but I do not understand if it is related to my problem).
Cheers