Unable to connect to endpoint when writing to S3 using Tensorflow - amazon-s3

Tensorflow 1.4.0 comes with the S3 filesystem driver by default. I'm having trouble using it, and have this minimal example, that does not work for me:
import tensorflow as tf
f = tf.gfile.Open("s3://bucket/plipp", mode='w')
f.write("foo")
f.close()
which gives the following error:
Traceback (most recent call last):
File "test2.py", line 5, in <module>
f.close()
File "/Users/me/venv3/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 234, in close
pywrap_tensorflow.Set_TF_Status_from_Status(status, ret_status)
File "/Users/me/venv3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: : Unable to connect to endpoint
From what I can see, it seems like "Unable to connect to endpoint" is an error from the C++ AWS SDK. I've given myself * permissions to the bucket.
My bucket is in eu-west-1 and I've tried doing export S3_ENDPOINT=https://s3-eu-west-1.amazonaws.com and export S3_REGION=eu-west-1 since it seems that those variables are consumed by the S3 driver, but this changes nothing.
I've also tried using s3://bucket.s3-eu-west-1.amazonaws.com/plipp as the path, instead of just using the bucket name.
I can copy files to the bucket fine:
~> aws s3 cp foo s3://bucket/plipp
upload: ./foo to s3://bucket/plipp
Any ideas what I might be doing wrong? How can I debug further?

I'm not quite sure what went wrong last time I tried this, but now I got it working by just doing export S3_REGION=eu-west-1 and writing to the bucket with
with tf.gfile.Open("s3://bucket/plipp", mode='w') as f:
f.write("foo")
So, don't export the S3_ENDPOINT variable.

Related

How to download netCDF4 file from webpage?

I want to download a netCDF4 file from a webpage. I can download the datafile, but there seems to be some errors in the file I downloaded using following codes:
import requests
from netCDF4 import Dataset
def download_file(url):
local_filename = url.split('/')[-1]
with requests.get(url, stream=True) as r:
with open(local_filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)
return local_filename
url = 'https://smos-diss.eo.esa.int/oads/data/SMOS_Open_V7/SM_REPR_MIR_SMUDP2_20191222T183243_20191222T192549_700_300_1.nc'
local_filename = download_file(url)
sm_nc = Dataset(local_filename)
But finally I got error message:
Traceback (most recent call last):
File "<ipython-input-98-809c92d8bce8>", line 1, in <module>
sm_nc = Dataset(local_filename)
File "netCDF4/_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.__init__
File "netCDF4/_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -51] NetCDF: Unknown file format: b'SM_REPR_MIR_SMUDP2_20191222T183243_20191222T192549_700_300_1.nc'
I also simply tried urllib.request.urlretrieve(url, './1.nc'), then sm_nc = Dataset('./1.nc'), but just got the following error message:
Traceback (most recent call last):
File "<ipython-input-101-61d1f577421e>", line 1, in <module>
sm_nc = Dataset('./1.nc')
File "netCDF4/_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.__init__
File "netCDF4/_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -51] NetCDF: Unknown file format: b'./1.nc'
But the thing is that, if I paste the url in the search box of my Safari or Chrome, then click download, the file I got is readable by netCDF4.Dataset. (You could also try that.) I tried with many other solutions but didn't work. So is there anybody who could do me a favour? Thanks!
By the way, the requests and netCDF4 I am using are of version 2.26.0 and 1.5.3, urllib.request is of 3.7.
Tiy probably want to use urlretrieve. The following call to urllib should work:
import urllib
new_x = "/tmp/temp.nc"
x = "https://smos-diss.eo.esa.int/oads/data/SMOS_Open_V7/SM_REPR_MIR_SMUDP2_20191222T183243_20191222T192549_700_300_1.nc"
urllib.request.urlretrieve(x, new_x)
When I try to wget it gives me nc file but I am not sure it size is 19 KB. You can use wget in python if this file okey for you.
wget https://smos-diss.eo.esa.int/oads/data/SMOS_Open_V7/SM_REPR_MIR_SMUDP2_20191222T183243_20191222T192549_700_300_1.nc
But it is not readable because if you try access without login to site, it gives meaningless file. Just paste this link to your browser then login it gives 6 MB file which I'm sure it is readable. Still if you want to get file with python script check selenium that provide click on the website so you can login then download your file with script.

Tensorflow TFX pipeline in Windows machine is failing when trying to create a folder with Linux like folder naming structure

I am trying to run the simple TFX pipeline in Windows 10 machine. I am using the codes as given in Tensorflow website (https://www.tensorflow.org/tfx/tutorials/tfx/penguin_simple). While trying to run the pipeline, it is throwing below error. The folder name is using a mix of '\' and '/' while TFX is trying to create the folder. I am not sure, how to solve this issue as it is happening within Tensorflow internal code.
ERROR:absl:Failed to make stateful working dir: pipelines\penguin-simple\CsvExampleGen.system\stateful_working_dir\2021-06-24T20:11:37.715669
Traceback (most recent call last):
File "G:\Anaconda3\lib\site-packages\tfx\orchestration\portable\outputs_utils.py", line 211, in get_stateful_working_directory
fileio.makedirs(stateful_working_dir)
File "G:\Anaconda3\lib\site-packages\tfx\dsl\io\fileio.py", line 83, in makedirs
_get_filesystem(path).makedirs(path)
File "G:\Anaconda3\lib\site-packages\tfx\dsl\io\plugins\tensorflow_gfile.py", line 76, in makedirs
tf.io.gfile.makedirs(path)
File "G:\Anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 483, in recursive_create_dir_v2
_pywrap_file_io.RecursivelyCreateDir(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Failed to create a directory: pipelines\penguin-simple\CsvExampleGen.system\stateful_working_dir/2021-06-24T20:11:37.715669; Invalid argument

ERROR trying to load Data to Google Collab from disk

i am trying to open and load some data from disk in Google Collab but i get the following error message:
FileNotFoundError Traceback (most recent call last)
<ipython-input-38-cc9c795dc8d8> in <module>()
----> 1 test=open(r"C:\Users\Stefanos\Desktop\ΑΕΡΟΜΑΓΝΗΤΙΚΑ PUBLICATION\data\test.txt",mode="r")
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Stefanos\\Desktop\\ΑΕΡΟΜΑΓΝΗΤΙΚΑ PUBLICATION\\data\\test.txt'
the error occurs by this code:
test=open(r"C:\Users\Stefanos\Desktop\ΑΕΡΟΜΑΓΝΗΤΙΚΑ PUBLICATION\data\test.txt",mode="r")
Your problem is that you are trying to load from disk with path of your computer!
Collab gives you a completely different computer in the cloud to work with so it wont be able to open the files in your computer:
You have to upload files to collab:
Use this function to upload files. It will SAVE them as well.
def upload_files():
from google.colab import files
uploaded = files.upload()
for k, v in uploaded.items():
open(k, 'wb').write(v)
return list(uploaded.keys())

Anaconda 2020.07 with python3.8 lacks support for 'snappy' compressor in blosc?

I'm loading an hdf file wrote under pandas.to_hdf(...,complib="blosc:snappy") in python3.7 installed by anaconda
after I upgraded anaconda to py3.8, it shows
HDF5ExtError: HDF5 error back trace
File "C:\ci\hdf5_1545244154871\work\src\H5Dio.c", line 199, in H5Dread
can't read data
File "C:\ci\hdf5_1545244154871\work\src\H5Dio.c", line 601, in H5D__read
can't read data
File "C:\ci\hdf5_1545244154871\work\src\H5Dchunk.c", line 2229, in H5D__chunk_read
unable to read raw data chunk
File "C:\ci\hdf5_1545244154871\work\src\H5Dchunk.c", line 3609, in H5D__chunk_lock
data pipeline read failed
File "C:\ci\hdf5_1545244154871\work\src\H5Z.c", line 1326, in H5Z_pipeline
filter returned failure during read
File "hdf5-blosc/src/blosc_filter.c", line 188, in blosc_filter
this Blosc library does not have support for the 'snappy' compressor, but only for: blosclz,lz4,lz4hc,zlib,zstd
End of HDF5 error back trace
Problems reading the array data.
seems like Blosc 1.19.0 deprecates support for 'snappy' or not included by default? how to solve it?

How to fix "AttributeError: 'module' object has no attribute 'SOL_UDP'" error in Python Connector Mule

I'm trying to execute a basic script to return Cisco Config File as a JSON Format, and I have a success process over Python2.7.16 and Python 3.7.3, but when I'm trying to execute the same script over Python Connector for Mule ESB I receive the error refered in the title of this thread.
This is for a Mule feature, the Python connector script in this tool, works with a Jython 2.7.1, and is loaded as a library for the Mule.
I expect the output as a JSON file but actual output is:
Root Exception stack trace:
Traceback (most recent call last):
File "<script>", line 2, in <module>
File "C:\Python27\Lib\site-packages\ciscoconfparse\__init__.py", line 1, in <module>
from ciscoconfparse import *
File "C:\Python27\Lib\site-packages\ciscoconfparse\ciscoconfparse.py", line 17, in <module>
from models_cisco import IOSHostnameLine, IOSRouteLine, IOSIntfLine
File "C:\Python27\Lib\site-packages\ciscoconfparse\models_cisco.py", line 8, in <module>
from ccp_util import _IPV6_REGEX_STR_COMPRESSED1, _IPV6_REGEX_STR_COMPRESSED2
File "C:\Python27\Lib\site-packages\ciscoconfparse\ccp_util.py", line 16, in <module>
from dns.resolver import Resolver
File "C:\Python27\Lib\site-packages\dns\resolver.py", line 1148, in <module>
_protocols_for_socktype = {
AttributeError: 'module' object has no attribute 'SOL_UDP'
The only thing I had to do was comment that line in the script resolver.py and in this way the script on Anypoint Studio ran smoothly.
Thanks for your help, I hope that this helps to other people.
The problem appears to be that you are trying to execute a script that depends on a different python package. Mule supports executing python scripts using the Java Jython implementation but it probably doesn't know about pyhton packages dependencies.