ERROR trying to load Data to Google Collab from disk - google-colaboratory

i am trying to open and load some data from disk in Google Collab but i get the following error message:
FileNotFoundError Traceback (most recent call last)
<ipython-input-38-cc9c795dc8d8> in <module>()
----> 1 test=open(r"C:\Users\Stefanos\Desktop\ΑΕΡΟΜΑΓΝΗΤΙΚΑ PUBLICATION\data\test.txt",mode="r")
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Stefanos\\Desktop\\ΑΕΡΟΜΑΓΝΗΤΙΚΑ PUBLICATION\\data\\test.txt'
the error occurs by this code:
test=open(r"C:\Users\Stefanos\Desktop\ΑΕΡΟΜΑΓΝΗΤΙΚΑ PUBLICATION\data\test.txt",mode="r")

Your problem is that you are trying to load from disk with path of your computer!
Collab gives you a completely different computer in the cloud to work with so it wont be able to open the files in your computer:
You have to upload files to collab:
Use this function to upload files. It will SAVE them as well.
def upload_files():
from google.colab import files
uploaded = files.upload()
for k, v in uploaded.items():
open(k, 'wb').write(v)
return list(uploaded.keys())

Related

Error regarding no such file or directory

I am new to Python. Currently using Google Colab. I am currently learning Football(Soccer) Analytics using Python and am stuck loading Wyscout Data.
The error reads FileNotFoundError: [Errno 2] No such file or directory: '/content/Wyscout/competitions.json'.
The code is
path=os.path.join(str(pathlib.Path().resolve()),'Wyscout','competitions.json')
with open(path) as f:
data=json.load(f)
The error occurs in the second line of the above code.
anyone help?

plt.style.use('./deeplearning.mplstyle') is not working

I am trying to run the optional labs of the machine learning specialization from coursera, and I stuck with some libraries and functions that
I can not install
plt.style.use('./deeplearning.mplstyle')
I got the error message
ModuleNotFoundError Traceback (most recent call last)
in
3 import matplotlib.pyplot as plt
4 print(plt.style.available)
----> 5 plt.style.use('./deeplearning.mplstyle')
OSError: './deeplearning.mplstyle' not found in the style library and input is not a valid URL or path; see `style.available` for list of available styles
What can I do?
It is because you may did not download all the files from coursera. Make sure to download all the files, especially, deeplearning.mplstyle, lab_utils_common.py, lab_utils_multi.py and keep them in one folder.
You need to download the deeplearning.mplstyle file to use the plotting style.
To download this file from the Optional Lab follow these steps
Open the Optional Lab from your course.
Click on File -> Open
Select deeplearning.mplstyle and select download option on top
Save this file to your working directory. To use plt.style.use('./deeplearning.mplstyle') as is - make sure your main code file and deeplearning.mplstyle are in the same folder.

How to download netCDF4 file from webpage?

I want to download a netCDF4 file from a webpage. I can download the datafile, but there seems to be some errors in the file I downloaded using following codes:
import requests
from netCDF4 import Dataset
def download_file(url):
local_filename = url.split('/')[-1]
with requests.get(url, stream=True) as r:
with open(local_filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)
return local_filename
url = 'https://smos-diss.eo.esa.int/oads/data/SMOS_Open_V7/SM_REPR_MIR_SMUDP2_20191222T183243_20191222T192549_700_300_1.nc'
local_filename = download_file(url)
sm_nc = Dataset(local_filename)
But finally I got error message:
Traceback (most recent call last):
File "<ipython-input-98-809c92d8bce8>", line 1, in <module>
sm_nc = Dataset(local_filename)
File "netCDF4/_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.__init__
File "netCDF4/_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -51] NetCDF: Unknown file format: b'SM_REPR_MIR_SMUDP2_20191222T183243_20191222T192549_700_300_1.nc'
I also simply tried urllib.request.urlretrieve(url, './1.nc'), then sm_nc = Dataset('./1.nc'), but just got the following error message:
Traceback (most recent call last):
File "<ipython-input-101-61d1f577421e>", line 1, in <module>
sm_nc = Dataset('./1.nc')
File "netCDF4/_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.__init__
File "netCDF4/_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -51] NetCDF: Unknown file format: b'./1.nc'
But the thing is that, if I paste the url in the search box of my Safari or Chrome, then click download, the file I got is readable by netCDF4.Dataset. (You could also try that.) I tried with many other solutions but didn't work. So is there anybody who could do me a favour? Thanks!
By the way, the requests and netCDF4 I am using are of version 2.26.0 and 1.5.3, urllib.request is of 3.7.
Tiy probably want to use urlretrieve. The following call to urllib should work:
import urllib
new_x = "/tmp/temp.nc"
x = "https://smos-diss.eo.esa.int/oads/data/SMOS_Open_V7/SM_REPR_MIR_SMUDP2_20191222T183243_20191222T192549_700_300_1.nc"
urllib.request.urlretrieve(x, new_x)
When I try to wget it gives me nc file but I am not sure it size is 19 KB. You can use wget in python if this file okey for you.
wget https://smos-diss.eo.esa.int/oads/data/SMOS_Open_V7/SM_REPR_MIR_SMUDP2_20191222T183243_20191222T192549_700_300_1.nc
But it is not readable because if you try access without login to site, it gives meaningless file. Just paste this link to your browser then login it gives 6 MB file which I'm sure it is readable. Still if you want to get file with python script check selenium that provide click on the website so you can login then download your file with script.

How to Include a Directory Structure in Analytics Zoo

I want to execute a file on Analytics zoo but this file uses functions from other files in different subdirectories .
I am getting this error:
LogType:stdout LogLastModifiedTime:Tue Jun 15 06:43:07 -0500 2021 LogLength:157 LogContents: Traceback (most recent call last): File "main.py", line 7, in <module>
from args import define_main_parser ModuleNotFoundError: No module named 'args'
End of LogType:stdout
Here args.py is a separate file (I provided this file path in --py-files while spark file submission)
Please try to add your module to the PYTHONPATH environment variable using export PYTHONPATH=$PYTHONPATH:/path/to/your/modules on the driver node.

Unable to connect to endpoint when writing to S3 using Tensorflow

Tensorflow 1.4.0 comes with the S3 filesystem driver by default. I'm having trouble using it, and have this minimal example, that does not work for me:
import tensorflow as tf
f = tf.gfile.Open("s3://bucket/plipp", mode='w')
f.write("foo")
f.close()
which gives the following error:
Traceback (most recent call last):
File "test2.py", line 5, in <module>
f.close()
File "/Users/me/venv3/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 234, in close
pywrap_tensorflow.Set_TF_Status_from_Status(status, ret_status)
File "/Users/me/venv3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: : Unable to connect to endpoint
From what I can see, it seems like "Unable to connect to endpoint" is an error from the C++ AWS SDK. I've given myself * permissions to the bucket.
My bucket is in eu-west-1 and I've tried doing export S3_ENDPOINT=https://s3-eu-west-1.amazonaws.com and export S3_REGION=eu-west-1 since it seems that those variables are consumed by the S3 driver, but this changes nothing.
I've also tried using s3://bucket.s3-eu-west-1.amazonaws.com/plipp as the path, instead of just using the bucket name.
I can copy files to the bucket fine:
~> aws s3 cp foo s3://bucket/plipp
upload: ./foo to s3://bucket/plipp
Any ideas what I might be doing wrong? How can I debug further?
I'm not quite sure what went wrong last time I tried this, but now I got it working by just doing export S3_REGION=eu-west-1 and writing to the bucket with
with tf.gfile.Open("s3://bucket/plipp", mode='w') as f:
f.write("foo")
So, don't export the S3_ENDPOINT variable.