Error regarding no such file or directory - google-colaboratory

I am new to Python. Currently using Google Colab. I am currently learning Football(Soccer) Analytics using Python and am stuck loading Wyscout Data.
The error reads FileNotFoundError: [Errno 2] No such file or directory: '/content/Wyscout/competitions.json'.
The code is
path=os.path.join(str(pathlib.Path().resolve()),'Wyscout','competitions.json')
with open(path) as f:
data=json.load(f)
The error occurs in the second line of the above code.
anyone help?

Related

ERROR trying to load Data to Google Collab from disk

i am trying to open and load some data from disk in Google Collab but i get the following error message:
FileNotFoundError Traceback (most recent call last)
<ipython-input-38-cc9c795dc8d8> in <module>()
----> 1 test=open(r"C:\Users\Stefanos\Desktop\ΑΕΡΟΜΑΓΝΗΤΙΚΑ PUBLICATION\data\test.txt",mode="r")
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Stefanos\\Desktop\\ΑΕΡΟΜΑΓΝΗΤΙΚΑ PUBLICATION\\data\\test.txt'
the error occurs by this code:
test=open(r"C:\Users\Stefanos\Desktop\ΑΕΡΟΜΑΓΝΗΤΙΚΑ PUBLICATION\data\test.txt",mode="r")
Your problem is that you are trying to load from disk with path of your computer!
Collab gives you a completely different computer in the cloud to work with so it wont be able to open the files in your computer:
You have to upload files to collab:
Use this function to upload files. It will SAVE them as well.
def upload_files():
from google.colab import files
uploaded = files.upload()
for k, v in uploaded.items():
open(k, 'wb').write(v)
return list(uploaded.keys())

Anaconda 2020.07 with python3.8 lacks support for 'snappy' compressor in blosc?

I'm loading an hdf file wrote under pandas.to_hdf(...,complib="blosc:snappy") in python3.7 installed by anaconda
after I upgraded anaconda to py3.8, it shows
HDF5ExtError: HDF5 error back trace
File "C:\ci\hdf5_1545244154871\work\src\H5Dio.c", line 199, in H5Dread
can't read data
File "C:\ci\hdf5_1545244154871\work\src\H5Dio.c", line 601, in H5D__read
can't read data
File "C:\ci\hdf5_1545244154871\work\src\H5Dchunk.c", line 2229, in H5D__chunk_read
unable to read raw data chunk
File "C:\ci\hdf5_1545244154871\work\src\H5Dchunk.c", line 3609, in H5D__chunk_lock
data pipeline read failed
File "C:\ci\hdf5_1545244154871\work\src\H5Z.c", line 1326, in H5Z_pipeline
filter returned failure during read
File "hdf5-blosc/src/blosc_filter.c", line 188, in blosc_filter
this Blosc library does not have support for the 'snappy' compressor, but only for: blosclz,lz4,lz4hc,zlib,zstd
End of HDF5 error back trace
Problems reading the array data.
seems like Blosc 1.19.0 deprecates support for 'snappy' or not included by default? how to solve it?

Error: File b'tpu_train/metadata/results.csv' does not exist: b'tpu_train/metadata/results.csv' when running automl_gs on Colab

I came across this interesting automl library, and was following this guide to run the module on Colab. In my Colab instance, I have this code:
!pip install automl_gs
import os
from automl_gs import automl_grid_search
from google.colab import files
tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
automl_grid_search(csv_path='20190419.csv',
target_field='UpdatedSiteName',
model_name='tpu',
tpu_address = tpu_address)
and this results in the following error:
FileNotFoundError: [Errno 2] File b'tpu_train/metadata/results.csv' does not exist: b'tpu_train/metadata/results.csv'
I'm wondering if anyone on StackOverflow tried this automl_gs module on Colab and get it working. Based on the guide, it is supposed to create a folder tpu_* on its own, but it is creating the *results.csv file somewhere else as shown in the screenshot attached below.
.
Any tip/advice on how to overcome this would be greatly appreciated! Thank you.

Unable to connect to endpoint when writing to S3 using Tensorflow

Tensorflow 1.4.0 comes with the S3 filesystem driver by default. I'm having trouble using it, and have this minimal example, that does not work for me:
import tensorflow as tf
f = tf.gfile.Open("s3://bucket/plipp", mode='w')
f.write("foo")
f.close()
which gives the following error:
Traceback (most recent call last):
File "test2.py", line 5, in <module>
f.close()
File "/Users/me/venv3/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 234, in close
pywrap_tensorflow.Set_TF_Status_from_Status(status, ret_status)
File "/Users/me/venv3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: : Unable to connect to endpoint
From what I can see, it seems like "Unable to connect to endpoint" is an error from the C++ AWS SDK. I've given myself * permissions to the bucket.
My bucket is in eu-west-1 and I've tried doing export S3_ENDPOINT=https://s3-eu-west-1.amazonaws.com and export S3_REGION=eu-west-1 since it seems that those variables are consumed by the S3 driver, but this changes nothing.
I've also tried using s3://bucket.s3-eu-west-1.amazonaws.com/plipp as the path, instead of just using the bucket name.
I can copy files to the bucket fine:
~> aws s3 cp foo s3://bucket/plipp
upload: ./foo to s3://bucket/plipp
Any ideas what I might be doing wrong? How can I debug further?
I'm not quite sure what went wrong last time I tried this, but now I got it working by just doing export S3_REGION=eu-west-1 and writing to the bucket with
with tf.gfile.Open("s3://bucket/plipp", mode='w') as f:
f.write("foo")
So, don't export the S3_ENDPOINT variable.

Scrapy - Issue with setting FILES_STORE?

So I have a custom pipeline that extends Scrapy's current FilesPipeline. However, I'm having trouble with setting the FILES_STORE variable. My current file structure is:
my_scraper.py
files/
#this is where I want the files to download to
so, I set FILES_STORE=/files/ and run the spider. But when I do that I get the following error:
PermissionError: [Errno 13] Permission denied: '/files/'
Why does this happen? Is there anything that I am doing wrong?
If it's useful to anyone else, it was simple error - FILES_STORE requires the full path, not just the relative path from the folder.