use AWS_PROFILE in pandas.read_parquet - pandas

I'm testing this locally where I have a ~/.aws/config file.
~/.aws/config looks some thing like:
[profile a]
...
[profile b]
...
I also have a AWS_PROFILE environmental variable set as "a".
I would like to read a file in which is accessible with profile b using pandas.
I am able to access it through s3fs by doing:
import s3fs
fs = s3fs.S3FileSystem(profile="b")
fs.get("BUCKET/FILE.parquet", "FILE.parquet")
pd.read_parquet("FILE.parquet")
However, if I try to pass this to pd.read_parquet using storage_options I get a PermissionError: Forbidden.
pd.read_parquet(
"s3://BUCKET/FILE.parquet",
storage_options={"profile": "b"},
)
full Traceback below
Traceback (most recent call last):
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/s3fs/core.py", line 233, in _call_s3
out = await method(**additional_kwargs)
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/aiobotocore/client.py", line 154, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/pandas/io/parquet.py", line 459, in read_parquet
return impl.read(
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/pandas/io/parquet.py", line 221, in read
return self.api.parquet.read_table(
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/pyarrow/parquet.py", line 1672, in read_table
dataset = _ParquetDatasetV2(
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/pyarrow/parquet.py", line 1504, in __init__
if filesystem.get_file_info(path_or_paths).is_file:
File "pyarrow/_fs.pyx", line 438, in pyarrow._fs.FileSystem.get_file_info
File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/_fs.pyx", line 1004, in pyarrow._fs._cb_get_file_info
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/pyarrow/fs.py", line 226, in get_file_info
info = self.fs.info(path)
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/fsspec/asyn.py", line 72, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/fsspec/asyn.py", line 53, in sync
raise result[0]
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/fsspec/asyn.py", line 20, in _runner
result[0] = await coro
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/s3fs/core.py", line 911, in _info
out = await self._call_s3(
File "/home/ray/local/bin/anaconda3/envs/main/lib/python3.8/site-packages/s3fs/core.py", line 252, in _call_s3
raise translate_boto_error(err)
PermissionError: Forbidden
Note: there is an old question somewhat related to this but it didn't help: How to read parquet file from s3 using dask with specific AWS profile

You just need to add the following argument to the function:
storage_options=dict(profile='your_profile_name')
Hence the read statement is:
pd.read_parquet("s3://your_bucket",storage_options=dict(profile='your_profile_name'))

Related

(Scrapy-Redis) Error caught on signal handler: <bound method RedisMixin.spider_idle of spider

I have three machines on Azure. One is for redis server. The others are crawlers. But, they showed this message after few hours. Did anyone encounter this situation before? Thanks.
2023-02-04 10:58:14 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method RedisMixin.spider_idle of <Spider at 0x29ca19e9580>>
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\connection.py", line 812, in read_response
response = self._parser.read_response(disable_decoding=disable_decoding)
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\connection.py", line 318, in read_response
raw = self._buffer.readline()
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\connection.py", line 249, in readline
self._read_from_socket()
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\connection.py", line 192, in _read_from_socket
data = self._sock.recv(socket_read_size)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy\utils\signal.py", line 30, in send_catch_log
response = robustApply(receiver, signal=signal, sender=sender, *arguments, **named)
File "C:\ProgramData\Anaconda3\lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "C:\ProgramData\Anaconda3\lib\site-packages\scrapy_redis\spiders.py", line 205, in spider_idle
if self.server is not None and self.count_size(self.redis_key) > 0:
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\commands\core.py", line 2604, in llen
return self.execute_command("LLEN", name)
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\client.py", line 1258, in execute_command
return conn.retry.call_with_retry(
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\retry.py", line 49, in call_with_retry
fail(error)
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\client.py", line 1262, in <lambda>
lambda error: self._disconnect_raise(conn, error),
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\client.py", line 1248, in _disconnect_raise
raise error
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\retry.py", line 46, in call_with_retry
return do()
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\client.py", line 1259, in <lambda>
lambda: self._send_command_parse_response(
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\client.py", line 1235, in _send_command_parse_response
return self.parse_response(conn, command_name, **options)
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\client.py", line 1275, in parse_response
response = connection.read_response()
File "C:\ProgramData\Anaconda3\lib\site-packages\redis\connection.py", line 818, in read_response
raise ConnectionError(f"Error while reading from {hosterr}" f" : {e.args}")
redis.exceptions.ConnectionError: Error while reading from X.X.X.X:X : (10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)
Expect to get the clue/suggestion/solution.

Python 3.8 Downloading Packages/Modules error using PIP

I am trying to install numpy but it is giving this error please help what should I do ?
ERROR: Exception:
Traceback (most recent call last):
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_vendor\urllib3\response.py", line 425, in _error_catcher
yield
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_vendor\urllib3\response.py", line 507, in read
data = self._fp.read(amt) if not fp_closed else b""
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_vendor\cachecontrol\filewrapper.py", line 62, in read
data = self.__fp.read(amt)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\http\client.py", line 454, in read
n = self.readinto(b)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\http\client.py", line 498, in readinto
n = self.fp.readinto(b)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_internal\cli\base_command.py", line 186, in _main
status = self.run(options, args)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_internal\commands\install.py", line 331, in run
resolver.resolve(requirement_set)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_internal\legacy_resolve.py", line 177, in resolve
discovered_reqs.extend(self._resolve_one(requirement_set, req))
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_internal\legacy_resolve.py", line 333, in _resolve_one
abstract_dist = self._get_abstract_dist_for(req_to_install)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_internal\legacy_resolve.py", line 282, in _get_abstract_dist_for
abstract_dist = self.preparer.prepare_linked_requirement(req)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_internal\operations\prepare.py", line 480, in prepare_linked_requirement
local_path = unpack_url(
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_internal\operations\prepare.py", line 282, in unpack_url
return unpack_http_url(
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_internal\operations\prepare.py", line 158, in unpack_http_url
from_path, content_type = _download_http_url(
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_internal\operations\prepare.py", line 303, in _download_http_url
for chunk in download.chunks:
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_internal\utils\ui.py", line 160, in iter
for x in it:
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_internal\network\utils.py", line 15, in response_chunks
for chunk in response.raw.stream(
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_vendor\urllib3\response.py", line 564, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_vendor\urllib3\response.py", line 529, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "c:\users\cutea\appdata\local\programs\python\python38-32\lib\site-packages\pip\_vendor\urllib3\response.py", line 430, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.
Look directly at the last line :
Read timed out
Connect to wifi or faster internet and try again.
my internet connection was poor then i got this error. Then i tried it with faster connection and it worked for me...

Apache BEAM pipeline fails when writing TF Records - AttributeError: 'str' object has no attribute 'iteritems'

The issue started appearing over the weekend. For some reason, it feels to be a DataFlow issue.
Previously, I was able to execute the script and write TF records just fine. However, now, I am unable to initialize the computation graph to process the data.
The traceback is:
Traceback (most recent call last):
File "my_script.py", line 1492, in <module>
MyBeamClass()
File "my_script.py", line 402, in __init__
self.run()
File "my_script.py", line 514, in run
transform_fn_io.WriteTransformFn(path=self.JOB_DIR + '/transform/'))
File "/anaconda3/envs/ml27/lib/python2.7/site-packages/apache_beam/pipeline.py", line 426, in __exit__
self.run().wait_until_finish()
File "/anaconda3/envs/ml27/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1238, in wait_until_finish
(self.state, getattr(self._runner, 'last_error_msg', None)), self)
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 649, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 176, in execute
op.start()
File "apache_beam/runners/worker/operations.py", line 531, in apache_beam.runners.worker.operations.DoOperation.start
def start(self):
File "apache_beam/runners/worker/operations.py", line 532, in apache_beam.runners.worker.operations.DoOperation.start
with self.scoped_start_state:
File "apache_beam/runners/worker/operations.py", line 533, in apache_beam.runners.worker.operations.DoOperation.start
super(DoOperation, self).start()
File "apache_beam/runners/worker/operations.py", line 202, in apache_beam.runners.worker.operations.Operation.start
def start(self):
File "apache_beam/runners/worker/operations.py", line 206, in apache_beam.runners.worker.operations.Operation.start
self.setup()
File "apache_beam/runners/worker/operations.py", line 480, in apache_beam.runners.worker.operations.DoOperation.setup
with self.scoped_start_state:
File "apache_beam/runners/worker/operations.py", line 485, in apache_beam.runners.worker.operations.DoOperation.setup
pickler.loads(self.spec.serialized_fn))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 247, in loads
return dill.loads(s)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 317, in loads
return load(file, ignore)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 305, in load
obj = pik.load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1232, in load_build
for k, v in state.iteritems():
AttributeError: 'str' object has no attribute 'iteritems'
I am using tensorflow==1.13.1 and tensorflow-transform==0.9.0 and apache_beam==2.7.0
with beam.Pipeline(options=self.pipe_opt) as p:
with beam_impl.Context(temp_dir=self.google_cloud_options.temp_location):
# rest of the script
_ = (
transform_fn
| 'WriteTransformFn' >>
transform_fn_io.WriteTransformFn(path=self.JOB_DIR + '/transform/'))
I was experiencing the same error.
It seems to be triggered by a mismatch in the tensorflow-transform versions of your local (or master) machine and the workers one (specified in the setup.py file).
In my case I was running tensorflow-transform==0.13 on my local machine whereas the workers were running 0.8.
Downgrading the local version to 0.8 fixed the issue.

sqlalchemy.orm.exc.DetachedInstanceError depending on Pyramid webapp serving method?

This error in thrown only on production Apache2 server, and not at all when using pserve method to test the project locally :
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <My_Table at 0x7fc82076d2d0> is not bound to a Session; lazy load operation of attribute 'my_table_relationship' cannot proceed
Crashing code in template is:
<tal:x repeat="oMot_cn oMot_mcnii_cni.r_mcnii_cp.r_newphonets">
This kind of chained relationships pattern had been working well so far, I am just trying to load the whole query object at launching time of the Pyramid app, instead of loading a new one each time a request is handled by the view/controller...
It keeps crashing on production while it does not crash on pserve dev testing, so it's hard to debug.
Whole traceback on production server:
mod_wsgi (pid=15170): Exception occurred processing WSGI script '/var/local/env_py3/env_py3_2an/MyProject/myproject.wsgi'.
Traceback (most recent call last):
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/chameleon/template.py", line 170, in render
self._render(stream, econtext, rcontext)
File "tp_lesson_T_ex_mots_from_audio_765c6cd23ec517b3893410ce969cf2d9.py", line 584, in render
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/chameleon/py26.py", line 5, in lookup_attr
return getattr(obj, key)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/sqlalchemy/orm/attributes.py", line 233, in __get__
return self.impl.get(instance_state(instance), dict_)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/sqlalchemy/orm/attributes.py", line 579, in get
value = self.callable_(state, passive)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/sqlalchemy/orm/strategies.py", line 479, in _load_for_state
(orm_util.state_str(state), self.key)
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Caract_phonet at 0x7fc82076d2d0> is not bound to a Session; lazy load operation of attribute 'r_newphonets' cannot proceed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/fanstatic/publisher.py", line 219, in __call__
return self.app(environ, start_response)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/fanstatic/injector.py", line 64, in __call__
response = request.get_response(self.app)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/webob/request.py", line 1296, in send
application, catch_exc_info=False)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/webob/request.py", line 1260, in call_application
app_iter = application(self.environ, start_response)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/pyramid/router.py", line 251, in __call__
response = self.invoke_subrequest(request, use_tweens=True)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/pyramid/router.py", line 227, in invoke_subrequest
response = handle_request(request)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/pyramid/tweens.py", line 21, in excview_tween
response = handler(request)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/pyramid_tm-0.7-py3.2.egg/pyramid_tm/__init__.py", line 82, in tm_tween
reraise(*exc_info)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/pyramid_tm-0.7-py3.2.egg/pyramid_tm/compat.py", line 13, in reraise
raise value
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/pyramid_tm-0.7-py3.2.egg/pyramid_tm/__init__.py", line 63, in tm_tween
response = handler(request)
File "/var/local/env_py3/env_py3_2an/lib/python3.2/site-packages/pyramid/router.py", line 161, in handle_request
response = view_callable(context, request)
Is there any missing PasteDeploy configuration to get it crashing the same way it does once it's pushed to production server ?

python alternative for devices which does not support SSH exec_command

I am new to python and want to know alternate way for doing the following.
I am having issue with the exec_command of paramiko...
Following is the code:
sshdell = paramiko.SSHClient()
sshdell.set_missing_host_key_policy(paramiko.AutoAddPolicy())
sshdell.connect('ip', port=22, username='user', password='pwd')
stdin,stdout,stderr = sshdell.exec_command("ping 4.2.2.2 interface X1")
ping_check = stdout.readlines()
for line in ping_check:
print(line)
the given error is thrown.
Traceback (most recent call last):
File "delltest.py", line 36, in <module>
stdin,stdout,stderr = sshdell.exec_command("ping 4.2.2.2 interface X1")
File "C:\python35\lib\site-packages\paramiko\client.py", line 441, in exec_command
chan.exec_command(command)
File "C:\python35\lib\site-packages\paramiko\channel.py", line 60, in _check
return func(self, *args, **kwds)
File "C:\python35\lib\site-packages\paramiko\channel.py", line 234, in exec_command
self._wait_for_event()
File "C:\python35\lib\site-packages\paramiko\channel.py", line 1161, in _wait_for_event
raise e
paramiko.ssh_exception.SSHException: Channel closed.
Please suggest as my device may not support the exec_command() function.