Py Selenium uploaded file is unclosed - selenium

I have a working test case that uploads a file, but I am getting a warning about closing the file.
I can't find a way of closing the file, since I am not the one opening it.
The portion of the test that handles the file:
file = "/tmp/CoolPDF.pdf"
upload_document = self.driver.find_element(By.ID,"file-upload")
upload_document.clear()
upload_document.send_keys(file)
ResourceWarning: unclosed <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 50145), raddr=('127.0.0.1', 50141)>
return self.run(*args, **kwds)
ResourceWarning: Enable tracemalloc to get the object allocation traceback

Related

Zappa packaged lambda error ..botocore.exceptions.SSLError: SSL validation failed for <s3 file> [Errno 2] No such file or directory

Running AWS lambda service packaged using Zappa.io
The service is running however, its not able to reach the S3 file due to ssl error
Getting the below error while trying to access remote_env from an s3 bucket
[1592935276008] [DEBUG] 2020-06-23T18:01:16.8Z b8374974-f820-484a-bcc3-64a530712769 Exception received when sending HTTP request.
Traceback (most recent call last):
File "/var/task/urllib3/util/ssl_.py", line 336, in ssl_wrap_socket
context.load_verify_locations(ca_certs, ca_cert_dir)
FileNotFoundError: [Errno 2] No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/runtime/botocore/httpsession.py", line 254, in send
urllib_response = conn.urlopen(
File "/var/task/urllib3/connectionpool.py", line 719, in urlopen
retries = retries.increment(
File "/var/task/urllib3/util/retry.py", line 376, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/var/task/six.py", line 703, in reraise
raise value
File "/var/task/urllib3/connectionpool.py", line 665, in urlopen
httplib_response = self._make_request(
File "/var/task/urllib3/connectionpool.py", line 376, in _make_request
self._validate_conn(conn)
File "/var/task/urllib3/connectionpool.py", line 996, in _validate_conn
conn.connect()
File "/var/task/urllib3/connection.py", line 352, in connect
self.sock = ssl_wrap_socket(
File "/var/task/urllib3/util/ssl_.py", line 338, in ssl_wrap_socket
raise SSLError(e)
urllib3.exceptions.SSLError: [Errno 2] No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/runtime/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File "/var/runtime/botocore/endpoint.py", line 244, in _send
return self.http_session.send(request)
File "/var/runtime/botocore/httpsession.py", line 281, in send
raise SSLError(endpoint_url=request.url, error=e)
botocore.exceptions.SSLError: SSL validation failed for ....... [Errno 2] No such file or directory
My Environment
Zappa version used: 0.51.0
Operating System and Python version: Ubuntu , Python 3.8
Output of pip freeze
appdirs==1.4.3
argcomplete==1.11.1
boto3==1.14.8
botocore==1.17.8
CacheControl==0.12.6
certifi==2019.11.28
cffi==1.14.0
cfn-flip==1.2.3
chardet==3.0.4
click==7.1.2
colorama==0.4.3
contextlib2==0.6.0
cryptography==2.9.2
distlib==0.3.0
distro==1.4.0
docutils==0.15.2
durationpy==0.5
Flask==1.1.2
Flask-Cors==3.0.8
future==0.18.2
h11==0.9.0
hjson==3.0.1
html5lib==1.0.1
httptools==0.1.1
idna==2.8
ipaddr==2.2.0
itsdangerous==1.1.0
Jinja2==2.11.2
jmespath==0.10.0
kappa==0.6.0
lockfile==0.12.2
mangum==0.9.2
MarkupSafe==1.1.1
msgpack==0.6.2
packaging==20.3
pep517==0.8.2
pip-tools==5.2.1
placebo==0.9.0
progress==1.5
pycparser==2.20
pydantic==1.5.1
PyMySQL==0.9.3
pyOpenSSL==19.1.0
pyparsing==2.4.6
python-dateutil==2.6.1
python-slugify==4.0.0
pytoml==0.1.21
PyYAML==5.3.1
requests==2.22.0
retrying==1.3.3
s3transfer==0.3.3
six==1.14.0
starlette==0.13.4
text-unidecode==1.3
toml==0.10.1
tqdm==4.46.1
troposphere==2.6.1
typing-extensions==3.7.4.2
urllib3==1.25.8
uvloop==0.14.0
webencodings==0.5.1
websockets==8.1
Werkzeug==0.16.1
wsgi-request-logger==0.4.6
zappa==0.51.0
My zappa_settings.json:
{
"dev": {
"app_function": "main.app",
"aws_region": "us-west-2",
"profile_name": "default",
"project_name": "d3c",
"runtime": "python3.8",
"keep_warm":false,
"cors": true,
"s3_bucket": "my-lambda-deployables",
"remote_env":"<my remote s3 file>"
}
}
I have confirmed that my S3 file is accessible from my local ubuntu machine however does not work on aws
This seems to be related to an open issue open issue on Zappa
I had the same issue my Zappa deployment,
I tried all possible options but nothing was working, But after trying different suggestions the following steps worked for me
I copied python3.8/site-packages/botocore/cacert.pem to my lambda folder
I Set the "REQUESTS_CA_BUNDLE" environment variable to /var/task/cacert.pem
/var/task is where AWS Lambda extracts your zipped up code to.
How to set environment variables in Zappa
I updated my Zappa function and everything worked fine
fixed this by adding the cert path to environment (python)
os.environ['REQUESTS_CA_BUNDLE'] = os.path.join('/etc/ssl/certs/','ca-certificates.crt')
Edit: Sorry the issue was not really fixed with the above code, but found a hack work around by adding verify=False for all ssl requests
boto3.client('s3', verify=False)

JSONDecodeError with Scrapy: Expecting value: line 1 column 1 (char 0)

I am using requests in order to fetch and parse some data scraped using Scrapy with Scrapyrt (real time scraping).
This is how I do it:
#pass spider to requests parameters #
params = {
'spider_name': spider,
'start_requests':True
}
# scrape items
response = requests.get('http://scrapyrt:9080/crawl.json', params)
print ('RESPONSE JSON',response.json())
data = response.json()
As per Scrapy documentation, with 'start_requests' parameter set as True, the spider automatically requests urls and passes the response to the parse method which is the default method used for parsing requests.
start_requests
type: boolean
optional
Whether spider should execute Scrapy.Spider.start_requests method. start_requests are executed by default when you run Scrapy Spider normally without ScrapyRT, but this method is NOT executed in API by default. By default we assume that spider is expected to crawl ONLY url provided in parameters without making any requests to start_urls defined in Spider class. start_requests argument overrides this behavior. If this argument is present API will execute start_requests Spider method.
But the setup is not working. Log:
[2019-05-19 06:11:14,835: DEBUG/ForkPoolWorker-4] Starting new HTTP connection (1): scrapyrt:9080
[2019-05-19 06:11:15,414: DEBUG/ForkPoolWorker-4] http://scrapyrt:9080 "GET /crawl.json?spider_name=precious_tracks&start_requests=True HTTP/1.1" 500 7784
[2019-05-19 06:11:15,472: ERROR/ForkPoolWorker-4] Task project.api.routes.background.scrape_allmusic[87dbd825-dc1c-4789-8ee0-4151e5821798] raised unexpected: JSONDecodeError('Expecting value: line 1 column 1 (char 0)',)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/src/app/project/api/routes/background.py", line 908, in scrape_allmusic
print ('RESPONSE JSON',response.json())
File "/usr/lib/python3.6/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The error was due to a bug with Twisted 19.2.0, a scrapyrt dependency, which assumed response to be of wrong type.
Once I installed Twisted==18.9.0, it worked.

Error occurred while adding a new consumer

I am getting following error when I am trying to add a new consumer(queue) from the current_app imported from celery by issuing the control command.
The details of logged error are as follows:
reply = current_app.control.add_consumer(queue_name, destination = WORKER_PROCESSES, reply = True)
File "/opt/msx/python-env/lib/python2.7/site-packages/celery/app/control.py", line 232, in add_consumer
**kwargs
File "/opt/msx/python-env/lib/python2.7/site-packages/celery/app/control.py", line 307, in broadcast
limit, callback, channel=channel,
File "/opt/msx/python-env/lib/python2.7/site-packages/kombu/pidbox.py", line 300, in _broadcast
channel=chan)
File "/opt/msx/python-env/lib/python2.7/site-packages/kombu/pidbox.py", line 336, in _collect
with consumer:
File "/opt/msx/python-env/lib/python2.7/site-packages/kombu/messaging.py", line 396, in __enter__
self.consume()
File "/opt/msx/python-env/lib/python2.7/site-packages/kombu/messaging.py", line 445, in consume
self._basic_consume(T, no_ack=no_ack, nowait=False)
File "/opt/msx/python-env/lib/python2.7/site-packages/kombu/messaging.py", line 567, in _basic_consume
no_ack=no_ack, nowait=nowait)
File "/opt/msx/python-env/lib/python2.7/site-packages/kombu/entity.py", line 611, in consume
nowait=nowait)
File "/opt/msx/python-env/lib/python2.7/site-packages/librabbitmq/__init__.py", line 81, in basic_consume
no_local, no_ack, exclusive, arguments or {},
ChannelError: basic.consume: server channel error 404, message: NOT_FOUND - no queue '2795c73e-2b6a-34d6-bd1f-13de0d1e5497.reply.celery.pidbox' in vhost '/'
I didn't understand the error. I am passing a queue name different from the one mentioned in logs.
Any help will be appreciated. Thanks.
Note: This issue starts occurring after setting MAX_TASK_PER_CHILD value. Is this related to the error ?

Mongos + Pymongo 2.5 ==>No suitable hosts found

Our application is using pymongo. I'm trying to connect to mongos. The code fails on the following line
pymongo.MongoReplicaSetClient(ec2-aa-bbb-124-22.compute-1.amazonaws.com:27017,
replicaSet=self.class_settings['mongo_rs'])
Exception
/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/.../server_tornado.py --config=conf/development.conf --port=9001
Traceback (most recent call last):
File "/Users/..../server_tornado.py", line 319, in
BaseCatalog.db_instance = DBInit(config=settings)
File "/Users/..../lib/sc/singleton.py", line 20, in call
cls._instances[cls] = super(Singleton, cls).call(*args, **kwargs)
File "/Users/..../app/models/db_init.py", line 50, in init
raise Exception("init() => " + str(err))
Exception: init() => No suitable hosts found
Process finished with exit code 1`
Found the solution, if at all anyone faces this issue:
Using MongoClient instead of MongoReplicaClient fixes the issue. This is because Mongos acts like a single instance of mongodb.

How to disable or change the path of ghostdriver.log?

Question is straightfoward, but some context may help.
I'm trying to deploy scrapy while using selenium and phantomjs as downloader. But the problem is that it keeps on saying permission denied when trying to deploy. So I want to change the path of ghostdriver.log or just disable it. Looking at phantomjs -h and ghostdriver github page I couldn't find the answer, my friend google let me down also.
$ scrapy deploy
Building egg of crawler-1370960743
'build/scripts-2.7' does not exist -- can't clean it
zip_safe flag not set; analyzing archive contents...
tests.fake_responses.__init__: module references __file__
Deploying crawler-1370960743 to http://localhost:6800/addversion.json
Server response (200):
Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 18, in render
return JsonResource.render(self, txrequest)
File "/usr/lib/pymodules/python2.7/scrapy/utils/txweb.py", line 10, in render
r = resource.Resource.render(self, txrequest)
File "/usr/lib/python2.7/dist-packages/twisted/web/resource.py", line 216, in render
return m(request)
File "/usr/lib/pymodules/python2.7/scrapyd/webservice.py", line 66, in render_POST
spiders = get_spider_list(project)
File "/usr/lib/pymodules/python2.7/scrapyd/utils.py", line 65, in get_spider_list
raise RuntimeError(msg.splitlines()[-1])
RuntimeError: IOError: [Errno 13] Permission denied: 'ghostdriver.log
When using the PhantomJS driver add the following parameter:
driver = webdriver.PhantomJS(service_log_path='/var/log/phantomjs/ghostdriver.log')
Related code, would be nice to have an option to turn off logging though, seems thats not supported:
selenium/webdriver/phantomjs/service.py
class Service(object):
"""
Object that manages the starting and stopping of PhantomJS / Ghostdriver
"""
def __init__(self, executable_path, port=0, service_args=None, log_path=None):
"""
Creates a new instance of the Service
:Args:
- executable_path : Path to PhantomJS binary
- port : Port the service is running on
- service_args : A List of other command line options to pass to PhantomJS
- log_path: Path for PhantomJS service to log to
"""
self.port = port
self.path = executable_path
self.service_args= service_args
if self.port == 0:
self.port = utils.free_port()
if self.service_args is None:
self.service_args = []
self.service_args.insert(0, self.path)
self.service_args.append("--webdriver=%d" % self.port)
if not log_path:
log_path = "ghostdriver.log"
self._log = open(log_path, 'w')
#Reduce logging level
driver = webdriver.PhantomJS(service_args=["--webdriver-loglevel=SEVERE"])
#Remove logging
import os
driver = webdriver.PhantomJS(service_log_path=os.path.devnull)