Google Colab unzipped file won't appear - google-colaboratory

I have downloaded some datasets via Kaggle API into Colab. However, after unzipping them they do not appear in my directory and I can read them with pandas
As you can see the file where successfully unzip and then I unzip them again as I couldn't find them. However, they do no appear in the directory as I mentioned.
Furthermore the pd.read_csv can't read either the csv files that don't show or the csv.zip that show using compression = zip argument.
I get
FileNotFoundError: File b'/data/train.csv' does not exist
FileNotFoundError: [Errno 2] No such file or directory: 'data/train.csv.zip'
Any idea what's going on?

try unzipping them individually like
!unzip train.csv.zip
then do
train = pd.read_csv('train.csv', nrows=6000000, dtype={'acoustic_data': np.int16, 'time_to_failure': np.float64})
I got this from this github repo, which you can follow the steps for or just import into colab then replace it with your data
https://github.com/llSourcell/Kaggle_Earthquake_challenge/blob/master/Earthquake_Challenge.ipynb
you can import .ipynb notebooks through searching for them in colab

Related

Cannot Locate the file in AWS EMR Notebook

I have been trying to use some .txt, .csv files in an EMR Notebook but I cannot locate them.
I am trying to read via:
with open('file.txt', 'r') as f:
notes = f.read()
Things I tried:
Uploaded the file by using JupyterHub UI. I can see the file but I cant read it from the path. I also checked the file using JupyterHub terminal.
Tried to read from s3 (lots of people got it working in this way):
with open('s3://<repo>/file.txt', 'r') as f:
Copied the file to hdfs and hadoop in master node (in cluster) using both: hdfs dfs and hadoop fs . File is present in both directories.
However, I have no clue how I can reach the file in EMR Notebook.
Any ideas?

There is a way to read the images for my convolutional neural network directly from my desktop?

I'm training a Convolutional Neural Network using Google-Colaboratory. I have my data (images) stored in Google Drive and I'm able to use it correctly. However sometimes the process to read the images is too slow and does not work (other times the process is faster and I have no problem reading the images). In order to read the images from Google Drive I use:
from google.colab import drive
drive.mount('/content/drive')
!unzip -u "/content/drive/My Drive/the folder/files.zip"
IMAGE_PATH = '/content/drive/My Drive/the folder'
file_paths = glob.glob(path.join(IMAGE_PATH, '*.png'))
and sometimes works and other times not or it is too slow :).
Either way I would like to read my data from a folder on my desktop without using google drive but I'm not able to do this.
I'm trying the following:
IMAGE_PATH = 'C:/Users/path/to/my/folder'
file_paths = glob.glob(path.join(IMAGE_PATH, '*.png'))
But I get an error saying that the directory/file does not exist.
Google Colab cannot directly access our local machine dataset because it runs on a separate virtual machine on the cloud. We need to upload the dataset into Google Drive then we can load it into Google Colab’s runtime for model building.
For that you need to follow the steps given below:
Create a zip file of your large dataset and then upload this file in your Google Drive.
Now, open the Google Colab with the same google id and mount the Google Drive using the below code and authorize to access the drive:
from google.colab import drive
drive.mount('/content/drive')
Your uploaded zip file will be available in the Google Colab mounted drive /drive/MyDrive/ in left pane.
To read the dataset into the Google Colab, you need to unzip the folder and extract its contents into the /tmp folder using the code below.
import zipfile
import os
zip_ref = zipfile.ZipFile('/content/drive/MyDrive/train.zip', 'r') #Opens the zip file in read mode
zip_ref.extractall('/tmp') #Extracts the files into the /tmp folder
zip_ref.close()
You can check the extracted file in /drive/train folder in left pane.
Now finally you need to join the path of your dataset to use it in the Google Colab's runtime environment.
train_dataset = os.path.join('/tmp/train/') # dataset

VideoColorizerColab.ipynb doesn't communicate with google drive links

DownloadError: ERROR: unable to download video data: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found
Since yesterday I can't use my google drive files shared links with VideoColorizerColab.ipynb. I get this error above all the time I try to colorize my videos.
Does anyone know what's going on? Thank you, Géza.
You might want to try mounting your Google Drive to your colab and copying the video to the colab rather than using the link to download the video.
The code to mount your google drive to colab is
from google.colab import drive
drive.mount('/content/drive')
After this step, you can use all the content in your Drive as folders in your colab. You can see them in the Files section on the left side of your notebook. You can select a file, right-click and copy path and use the path to do any operation on the file.
This is an example of copying
!cp -r /content/gdrive/My\ Drive/headTrainingDatastructure/eval /content/models/research/object_detection/

Save files/pictures in Google Colaboratory

at the moment, I work with 400+ images and upload them with
from google.colab import files
uploaded = files.upload()
This one's working fine but I have to reupload all the images every time I leave my colaboratory. Pretty annoying because the upload takes like 5-10 minutes.
Any possibilities to prevent this? It seems like Colaboratory is saving the files only temporarily.
I have to use Google Colaboratory because I need their GPU.
Thanks in advance :)
As far as I know, there is no way to permanently store data on a Google Colab VM, but there are faster ways to upload data on Colab than files.upload().
For example you can upload your images on Google Drive once and then 1) mount Google Drive directly in your VM or 2) use PyDrive to download your images on your VM. Both of these options should be way faster than uploading your images from your local drive.
Mounting Drive in your VM
Mount Google Drive:
from google.colab import drive
drive.mount('/gdrive')
Print the contents of foo.txt located in the root directory of Drive:
with open('/gdrive/foo.txt') as f:
for line in f:
print(line)
Using PyDrive
Take a look at the first answer to this question.
First Of All Mount Your Google Drive:
# Load the Drive helper and mount
from google.colab import drive
# This will prompt for authorization.
drive.mount('/content/drive')
Result is :
Mounted at /content/drive
For Checking Directory Mounted Run this command:
# After executing the cell above, Drive
# files will be present in "/content/drive/My Drive".
!ls "/content/drive/My Drive"
Result is Something Like This:
07_structured_data.ipynb Sample Excel file.xlsx
BigQuery recipes script.ipynb
Colab Notebooks TFGan tutorial in Colab.txt
Copy of nima colab.ipynb to_upload (1).ipynb
created.txt to_upload (2).ipynb
Exported DataFrame sheet.gsheet to_upload (3).ipynb
foo.txt to_upload.ipynb
Pickle + Drive FUSE example.ipynb variables.pickle
Sample Excel file.gsheet

Failed to download .ckpt weights from Google Colab

I've trained a Tensorflow model on Google Colab, and saved that model in ".ckpt" format.
I want to download the model so I tried to do this:
from google.colab import files
files.download('/content/model.ckpt.index')
files.download('/content/model.ckpt.meta')
files.download('/content/model.ckpt.data-00000-of-00001')
I was able to get meta and index files. However, data file is giving me the following error:
"MessageError: Error: Failed to download: Service Worker Response
Error"
Could anybody tell me how should I solve this problem.
Google Colab doesn't allow downloading files of large sizes (not sure about the exact limit). Possible solutions could be to either split the file into smaller files or can use github to push your files and then download to your local machine.
I just tried with a 17 Mb graph file using the same command syntax with no error. Perhaps a transient problem on Google's servers?
For me it helped to rename the file before download.
I had a file named
26.9766_0.5779_150-Adam-mean_absolute_error#3#C-8-1-....-RL#training-set-6x6.04.hdf5
and renamed it to
model.hdf5
before download, then it worked. Maybe the '-' in the filename caused the error in my case.