Ive launched a EMR cluster with jupyterhub included and set up LDAP following the guide below:
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub-ldap-users.html
I can log into jupyterhub using LDAP but it doesnt launch the notebook server and throws a error:
500 : Internal Server Error
Spawner failed to start [status=1]. The logs for joe.blogg may contain
details.
You can try restarting your server from the home page.
Checking the jupyterhub logs i find the following error:
[I 2019-04-30 12:24:11.590 JupyterHub log:122] 302 GET / → /hub (#::ffff:172.31.150.206) 0.75ms
[I 2019-04-30 12:24:11.629 JupyterHub log:122] 302 GET /hub → /hub/home (joe.blogg#::ffff:172.31.150.206) 2.22ms
[I 2019-04-30 12:24:11.685 JupyterHub log:122] 200 GET /hub/home (joe.blogg#::ffff:172.31.150.206) 26.51ms
[I 2019-04-30 12:24:13.741 JupyterHub log:122] 302 GET /hub/spawn → /user/joe.blogg/ (joe.blogg#::ffff:172.31.150.206) 3.70ms
[I 2019-04-30 12:24:13.769 JupyterHub log:122] 302 GET /user/joe.blogg/ → /hub/user/joe.blogg/ (#::ffff:172.31.150.206) 0.51ms
[I 2019-04-30 12:24:13.847 JupyterHub spawner:978] Spawning jupyterhub-singleuser --port=46821 --debug
[D 2019-04-30 12:24:14.255 SingleUserNotebookApp application:177] Searching ['/home', '/home/users/joe.blogg/.jupyter', '/opt/conda/etc/jupyter', '/usr/local/etc/jupyter', '/etc/jupyter'] for config files
[D 2019-04-30 12:24:14.256 SingleUserNotebookApp application:555] Looking for jupyter_config in /etc/jupyter
[D 2019-04-30 12:24:14.256 SingleUserNotebookApp application:555] Looking for jupyter_config in /usr/local/etc/jupyter
[D 2019-04-30 12:24:14.256 SingleUserNotebookApp application:555] Looking for jupyter_config in /opt/conda/etc/jupyter
[D 2019-04-30 12:24:14.256 SingleUserNotebookApp application:555] Looking for jupyter_config in /home/users/joe.blogg/.jupyter
[D 2019-04-30 12:24:14.256 SingleUserNotebookApp application:555] Looking for jupyter_config in /home
[D 2019-04-30 12:24:14.257 SingleUserNotebookApp application:555] Looking for jupyter_notebook_config in /etc/jupyter
[D 2019-04-30 12:24:14.257 SingleUserNotebookApp application:577] Loaded config file: /etc/jupyter/jupyter_notebook_config.py
[D 2019-04-30 12:24:14.258 SingleUserNotebookApp application:555] Looking for jupyter_notebook_config in /usr/local/etc/jupyter
[D 2019-04-30 12:24:14.258 SingleUserNotebookApp application:555] Looking for jupyter_notebook_config in /opt/conda/etc/jupyter
[D 2019-04-30 12:24:14.258 SingleUserNotebookApp application:577] Loaded config file: /opt/conda/etc/jupyter/jupyter_notebook_config.json
[D 2019-04-30 12:24:14.258 SingleUserNotebookApp application:555] Looking for jupyter_notebook_config in /home/users/joe.blogg/.jupyter
[D 2019-04-30 12:24:14.258 SingleUserNotebookApp application:555] Looking for jupyter_notebook_config in /home
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/traitlets/traitlets.py", line 528, in get
value = obj._trait_values[self.name]
KeyError: 'runtime_dir'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/bin/jupyterhub-singleuser", line 6, in <module>
main()
File "/opt/conda/lib/python3.6/site-packages/jupyterhub/singleuser.py", line 455, in main
return SingleUserNotebookApp.launch_instance(argv)
File "/opt/conda/lib/python3.6/site-packages/jupyter_core/application.py", line 266, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/traitlets/config/application.py", line 657, in launch_instance
app.initialize(argv)
File "<decorator-gen-7>", line 2, in initialize
File "/opt/conda/lib/python3.6/site-packages/traitlets/config/application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/notebook/notebookapp.py", line 1366, in initialize
self.init_configurables()
File "/opt/conda/lib/python3.6/site-packages/notebook/notebookapp.py", line 1100, in init_configurables
connection_dir=self.runtime_dir,
File "/opt/conda/lib/python3.6/site-packages/traitlets/traitlets.py", line 556, in __get__
return self.get(obj, cls)
File "/opt/conda/lib/python3.6/site-packages/traitlets/traitlets.py", line 535, in get
value = self._validate(obj, dynamic_default())
File "/opt/conda/lib/python3.6/site-packages/jupyter_core/application.py", line 99, in _runtime_dir_default
ensure_dir_exists(rd, mode=0o700)
File "/opt/conda/lib/python3.6/site-packages/jupyter_core/utils/__init__.py", line 13, in ensure_dir_exists
os.makedirs(path, mode=mode)
File "/opt/conda/lib/python3.6/os.py", line 210, in makedirs
makedirs(head, mode, exist_ok)
File "/opt/conda/lib/python3.6/os.py", line 210, in makedirs
makedirs(head, mode, exist_ok)
File "/opt/conda/lib/python3.6/os.py", line 210, in makedirs
makedirs(head, mode, exist_ok)
[Previous line repeated 1 more times]
File "/opt/conda/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/users'
[W 2019-04-30 12:24:23.874 JupyterHub web:1588] 500 GET /hub/user/joe.blogg/ (::ffff:172.31.150.206): Spawner failed to start [status=1]. The logs for joe.blogg may contain details.
[E 2019-04-30 12:24:23.886 JupyterHub log:114] {
"X-Forwarded-Host": "ip-172-31-150-206.eu-west-1.compute.internal:9443",
"X-Forwarded-Proto": "https",
"X-Forwarded-Port": "9443",
"X-Forwarded-For": "::ffff:172.31.150.206",
"Upgrade-Insecure-Requests": "1",
"Cookie": "jupyter-hub-token=\"2|1:0|10:1556624768|17:jupyter-hub-token|44:ZThlOTZkYWM0NzRiNDRkMDlmYzdkNDUwOTUzMTNjYjA=|12a53077b8d92723bba01fc9273eb64050911e22317385f96c1c4f52ff5253a8\"; _xsrf=2|5370034d|137cc417d37f89a6aed65c0ec72ad572|1556623914",
"Connection": "close",
"Referer": "https://ip-172-31-150-206.eu-west-1.compute.internal:9443/hub/home",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.5",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0",
"Host": "ip-172-31-150-206.eu-west-1.compute.internal:9443"
}
[E 2019-04-30 12:24:23.886 JupyterHub log:122] 500 GET /hub/user/joe.blogg/ (joe.blogg#::ffff:172.31.150.206) 10088.04ms
[W 2019-04-30 12:24:42.370 JupyterHub user:458] joe.blogg's server never showed up at http://127.0.0.1:46821/user/joe.blogg/ after 30 seconds. Giving up
[E 2019-04-30 12:24:42.380 JupyterHub gen:914] Exception in Future <tornado.concurrent.Future object at 0x7f53b1b9ce10> after timeout
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 910, in error_callback
future.result()
File "/opt/conda/lib/python3.6/site-packages/jupyterhub/handlers/base.py", line 445, in finish_user_spawn
yield spawn_future
File "/opt/conda/lib/python3.6/site-packages/jupyterhub/user.py", line 476, in spawn
raise e
File "/opt/conda/lib/python3.6/site-packages/jupyterhub/user.py", line 450, in spawn
resp = yield server.wait_up(http=True, timeout=spawner.http_timeout)
File "/opt/conda/lib/python3.6/site-packages/jupyterhub/utils.py", line 180, in wait_for_http_server
timeout=timeout
File "/opt/conda/lib/python3.6/site-packages/jupyterhub/utils.py", line 135, in exponential_backoff
raise TimeoutError(fail_message)
TimeoutError: Server at http://127.0.0.1:46821/user/joe.blogg/ didn't respond in 30 seconds
The /etc/jupyter/conf/jupyterhub_conf file looks like the following after following aws guide:
# Configuration file for jupyterhub.
import os
notebook_dir = os.environ.get('DOCKER_NOTEBOOK_DIR')
network_name='jupyterhub-network'
c.Spawner.debug = True
c.Spawner.environment = {'SPARKMAGIC_CONF_DIR':'/etc/jupyter/conf'}
c.JupyterHub.hub_ip = '0.0.0.0'
c.JupyterHub.admin_access = True
c.JupyterHub.ssl_key = '/etc/jupyter/conf/server.key'
c.JupyterHub.ssl_cert = '/etc/jupyter/conf/server.crt'
c.JupyterHub.port = 9443
c.Authenticator.admin_users = {'jovyan'}
c.JupyterHub.authenticator_class = 'ldapauthenticator.LDAPAuthenticator'
c.LDAPAuthenticator.use_ssl = False
c.LDAPAuthenticator.server_address = 'openldap.companyx.com'
c.LDAPAuthenticator.bind_dn_template = 'cn={username},ou=Users,dc=openldap,dc=companyx,dc=com'
Has anyone else managed to setup multi user access on AWS EMR jupyterHub please.
I had faced similar issues, Can you try commenting out
#c.JupyterHub.ssl_key = '/etc/jupyter/conf/server.key'
#c.JupyterHub.ssl_cert = '/etc/jupyter/conf/server.crt'
After this try accessing with http (instead of https), this should fix your problem (Hope without https its ok for you)
Related
I have an anaconda environment with selenium installed. When I try to run I get this error:
Traceback (most recent call last):
File "c:\Users\Nick\Desktop\Code\product-scraper\sephora-scraper\scraper.py", line 31, in <module>
ChromeDriverManager().install(), options=options)
File "C:\Users\Nick\anaconda3\envs\web-scraper\lib\site-packages\webdriver_manager\chrome.py", line 34, in install
driver_path = self._get_driver_path(self.driver)
File "C:\Users\Nick\anaconda3\envs\web-scraper\lib\site-packages\webdriver_manager\manager.py", line 21, in _get_driver_path
driver_version = driver.get_version()
File "C:\Users\Nick\anaconda3\envs\web-scraper\lib\site-packages\webdriver_manager\driver.py", line 40, in get_version
return self.get_latest_release_version()
File "C:\Users\Nick\anaconda3\envs\web-scraper\lib\site-packages\webdriver_manager\driver.py", line 63, in get_latest_release_version
resp = requests.get(f"{self._latest_release_url}_{self.browser_version}")
File "C:\Users\Nick\anaconda3\envs\web-scraper\lib\site-packages\requests\api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\Nick\anaconda3\envs\web-scraper\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\Nick\anaconda3\envs\web-scraper\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\Nick\anaconda3\envs\web-scraper\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "C:\Users\Nick\anaconda3\envs\web-scraper\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='chromedriver.storage.googleapis.com', port=443): Max retries exceeded with url: /LATEST_RELEASE_88.0.4324 (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available."))
I'm new to anaconda so I don't know what else to provide. Please leave a comment if I need to anything and I will add it right away. Thanks.
Try to add this path to your environment variable:
..\Anaconda3
..\Anaconda3\scripts
..\Anaconda3\Library\bin
You might need to restart windows after set up environment path
s3fs seems to fail from time to time when reading from an S3 bucket using an AWS Lambda function within a VPN. I am using s3fs==0.4.0 and pandas==1.0.1.
import s3fs
import pandas as pd
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
s3_file = event['Records'][0]['s3']['object']['key']
s3fs.S3FileSystem.connect_timeout = 1800
s3fs.S3FileSystem.read_timeout = 1800
with s3fs.S3FileSystem(anon=False).open(f"s3://{bucket}/{s3_file}", 'rb') as f:
self.data = pd.read_json(f, **kwargs)
The stacktrace is the following:
Traceback (most recent call last):
File "/var/task/urllib3/connection.py", line 157, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/var/task/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/var/task/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/task/botocore/httpsession.py", line 263, in send
chunked=self._chunked(request.headers),
File "/var/task/urllib3/connectionpool.py", line 720, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/var/task/urllib3/util/retry.py", line 376, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/var/task/urllib3/packages/six.py", line 735, in reraise
raise value
File "/var/task/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/var/task/urllib3/connectionpool.py", line 376, in _make_request
self._validate_conn(conn)
File "/var/task/urllib3/connectionpool.py", line 994, in _validate_conn
conn.connect()
File "/var/task/urllib3/connection.py", line 300, in connect
conn = self._new_conn()
File "/var/task/urllib3/connection.py", line 169, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7f4d578e3ed0>: Failed to establish a new connection: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/task/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/var/task/botocore/endpoint.py", line 244, in _send
return self.http_session.send(request)
File "/var/task/botocore/httpsession.py", line 283, in send
raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://my_bucket.s3.eu-west-1.amazonaws.com/?list-type=2&prefix=my_folder%2Fsomething%2F&delimiter=%2F&encoding-type=url"
Has someone faced this same issue? Why would it fail only sometimes? Is there a s3fs configuration that could help for this specific issue?
Actually there was no problem at all with s3fs. Seems like we were using a Lambda function with two Subnets within the VPC and one was working normally but the other one wasn't allowed to access S3 resources, therefore when a Lambda was spawned using the second network it wouldn't be able to connect at all.
Fixing this issue was as easy as removing the second subnet.
You could also use boto3 which is supported by AWS, in order to get json from S3.
import json
import boto3
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
s3 = boto3.resource('s3')
file_object = s3_resource.Object(bucket, key)
json_content = json.loads(file_object.get()['Body'].read())
I am trying to get Jupyter notebook in wsl with this tutorial and I got a server error in http://localhost:8889/tree. I don't know if it is important:
Server error: Traceback (most recent call last): File "/home/antoi/anaconda3/lib/python3.7/site-packages/tornado/web.py",
line 1592, in _execute result = yield result File "/home/antoi/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run value = future.result() File "/home/antoi/anaconda3/lib/python3.7/site-packages/tornado/gen.py",
line 326, in wrapper yielded = next(result) File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/services/contents/handlers.py",
line 112, in get path=path, type=type, format=format, content=content, File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/services/contents/filemanager.py",
line 431, in get model = self._dir_model(path, content=content) File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/services/contents/filemanager.py",
line 337, in _dir_model if self.should_list(name) and not is_file_hidden(os_path, stat_res=st): File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/utils.py",
line 145, in is_file_hidden_posix stat_res = os.stat(abs_path)
PermissionError: [Errno 13] Permission denied: '/mnt/c/Users/antoi/Application Data'
And in wsl terminal:
(base) antoi#LAPTOP-UTL8OHHO:/mnt/c/Users/antoi$ jupyter notebook --no-browser
[I 16:19:55.476 NotebookApp] The port 8888 is already in use, trying another port.
[I 16:19:55.505 NotebookApp] JupyterLab extension loaded from /home/antoi/anaconda3/lib/python3.7/site-packages/jupyterlab
[I 16:19:55.506 NotebookApp] JupyterLab application directory is /home/antoi/anaconda3/share/jupyter/lab
[I 16:19:55.532 NotebookApp] Serving notebooks from local directory: /mnt/c/Users/antoi
[I 16:19:55.533 NotebookApp] The Jupyter Notebook is running at:
[I 16:19:55.534 NotebookApp] http://localhost:8889/?token=6e5f12a846547af4515d05140a142a60945bae661cce6571
[I 16:19:55.551 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:19:55.566 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8889/?token=6e5f12a846547af4515d05140a142a60945bae661cce6571
[I 16:20:02.914 NotebookApp] 302 GET /?token=6e5f12a846547af4515d05140a142a60945bae661cce6571 (127.0.0.1) 0.44ms
[E 16:20:03.523 NotebookApp] Uncaught exception GET /api/contents?type=directory&_=1582129203252 (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:8889', method='GET', uri='/api/contents?type=directory&_=1582129203252', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
File "/home/antoi/anaconda3/lib/python3.7/site-packages/tornado/web.py", line 1592, in _execute
result = yield result
File "/home/antoi/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/home/antoi/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/services/contents/handlers.py", line 112, in get
path=path, type=type, format=format, content=content,
File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/services/contents/filemanager.py", line 431, in get
model = self._dir_model(path, content=content)
File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/services/contents/filemanager.py", line 337, in _dir_model
if self.should_list(name) and not is_file_hidden(os_path, stat_res=st):
File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/utils.py", line 145, in is_file_hidden_posix
stat_res = os.stat(abs_path)
PermissionError: [Errno 13] Permission denied: '/mnt/c/Users/antoi/Application Data'
[W 16:20:03.528 NotebookApp] Unhandled error
[E 16:20:03.529 NotebookApp] {
"Host": "localhost:8889",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3",
"Accept-Encoding": "gzip, deflate",
"X-Xsrftoken": "2|2adc4da8|04ae20472b692936cf36fbc3b6869509|1581352093",
"X-Requested-With": "XMLHttpRequest",
"Connection": "keep-alive",
"Referer": "http://localhost:8889/tree",
"Cookie": "username-localhost-8888=\"2|1:0|10:1581936567|23:username-localhost-8888|44:ZjIyODAyNzlmYTAxNGE0MTljODBmNGU3MGJlMmZmZGI=|ca6177ff5468b65d9b1202ac00c91c3c0536635c39d430be915f2b96ee1c2934\"; username-localhost-8889=\"2|1:0|10:1582129202|23:username-localhost-8889|44:OWNhNzhlM2E2Yzc3NDc1NDlkNWQ1NzAwZGRlZGU5N2Q=|e22d8c299570cd29dd8933e8f60b6481cd5aa78d972d6d4fd2258c630440915c\"; _xsrf=2|2adc4da8|04ae20472b692936cf36fbc3b6869509|1581352093",
"Dnt": "1"
}
[E 16:20:03.547 NotebookApp] 500 GET /api/contents?type=directory&_=1582129203252 (127.0.0.1) 18.27ms referer=http://localhost:8889/tree
[E 16:25:22.478 NotebookApp] Uncaught exception GET /api/contents?type=directory&_=1582129203255 (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:8889', method='GET', uri='/api/contents?type=directory&_=1582129203255', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
File "/home/antoi/anaconda3/lib/python3.7/site-packages/tornado/web.py", line 1592, in _execute
result = yield result
File "/home/antoi/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/home/antoi/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/services/contents/handlers.py", line 112, in get
path=path, type=type, format=format, content=content,
File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/services/contents/filemanager.py", line 431, in get
model = self._dir_model(path, content=content)
File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/services/contents/filemanager.py", line 337, in _dir_model
if self.should_list(name) and not is_file_hidden(os_path, stat_res=st):
File "/home/antoi/anaconda3/lib/python3.7/site-packages/notebook/utils.py", line 145, in is_file_hidden_posix
stat_res = os.stat(abs_path)
PermissionError: [Errno 13] Permission denied: '/mnt/c/Users/antoi/Application Data'
[W 16:25:22.589 NotebookApp] Unhandled error
[E 16:25:22.589 NotebookApp] {
"Host": "localhost:8889",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3",
"Accept-Encoding": "gzip, deflate",
"X-Xsrftoken": "2|2adc4da8|04ae20472b692936cf36fbc3b6869509|1581352093",
"X-Requested-With": "XMLHttpRequest",
"Connection": "keep-alive",
"Referer": "http://localhost:8889/tree",
"Cookie": "username-localhost-8888=\"2|1:0|10:1581936567|23:username-localhost-8888|44:ZjIyODAyNzlmYTAxNGE0MTljODBmNGU3MGJlMmZmZGI=|ca6177ff5468b65d9b1202ac00c91c3c0536635c39d430be915f2b96ee1c2934\"; username-localhost-8889=\"2|1:0|10:1582129202|23:username-localhost-8889|44:OWNhNzhlM2E2Yzc3NDc1NDlkNWQ1NzAwZGRlZGU5N2Q=|e22d8c299570cd29dd8933e8f60b6481cd5aa78d972d6d4fd2258c630440915c\"; _xsrf=2|2adc4da8|04ae20472b692936cf36fbc3b6869509|1581352093",
"Dnt": "1"
}
[E 16:25:22.625 NotebookApp] 500 GET /api/contents?type=directory&_=1582129203255 (127.0.0.1) 122.78ms referer=http://localhost:8889/tree
Trying to do a REST get using Jython but getting an SSLError.
Python 2.7.1 (default:0df7adb1b397, Jun 30 2017, 19:02:43)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.8.0_172
>>> import requests
>>> r = requests.get('https://jsonplaceholder.typicode.com/posts/1')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\jython2.7.1\Lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\jython2.7.1\Lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\jython2.7.1\Lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\jython2.7.1\Lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\jython2.7.1\Lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
SSLError: HTTPSConnectionPool(host='jsonplaceholder.typicode.com', port=443): Max retries exceeded with url: /posts/1 (Caused by SSLError(SSLError(1, u'Received fatal alert: handshake_failure'),))
The regular java way of doing it works.
>>> import java.net.URL as URL
>>> url = URL("https://jsonplaceholder.typicode.com/posts/1")
>>> conn = url.openConnection()
>>> conn.setRequestMethod("GET")
>>> conn.setRequestProperty("Accept", "application/json")
>>> conn.getResponseCode()
200
Why am I getting an SSLError when using requests? Do I need some further configuration to make it work?
I'm trying to have scrapy download a copy of each page it crawls but when I run my spider the log contains entries like
2016-06-20 15:39:12 [scrapy] ERROR: Error processing {'file_urls': 'http://example.com/page',
'title': u'PageTitle'}
Traceback (most recent call last):
File "c:\anaconda3\envs\scrapy\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "c:\anaconda3\envs\scrapy\lib\site-packages\scrapy\pipelines\media.py", line 44, in process_item
requests = arg_to_iter(self.get_media_requests(item, info))
File "c:\anaconda3\envs\scrapy\lib\site-packages\scrapy\pipelines\files.py", line 365, in get_media_requests
return [Request(x) for x in item.get(self.files_urls_field, [])]
File "c:\anaconda3\envs\scrapy\lib\site-packages\scrapy\http\request\__init__.py", line 25, in __init__
self._set_url(url)
File "c:\anaconda3\envs\scrapy\lib\site-packages\scrapy\http\request\__init__.py", line 57, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: h
Other questions in SO about this error seem to relate to problems with the start_urls but my start url is fine as the spider crawls across the site, it just doesn't save the page to my specified files_store.
I populate the file_urls using item['file_urls'] = response.url
Do I need to specify the url a different way?