Read filenames from CSV and then copy the files to different directory in google drive - tensorflow

I want to copy my images file that are based on csv to another directory in google drive. Below is the code that I'm using:
import os
import shutil
from shutil import move
from shutil import copy
from shutil import make_archive
hm_read = pd.read_csv('/content/drive/MyDrive/TA_NAUFAL/HAM10000/metadata_balanced.csv')
for i in range(len(hm_read)):
copy(f"/drive/MyDrive/TA_NAUFAL/HAM10000/HAM10000_ ALLDULLRAZOR/{hm_read['image_id'].values[i]}.jpg", f"/drive/MyDrive/TA_NAUFAL/HAM10000/HAM10000_BALANCED_DULLRAZOR/{hm_read['image_id'].values[i]}.jpg")
I ran the code and it said
FileNotFoundError
Traceback (most recent call last)
<ipython-input-13-4119d3dfcf84> in <module>
7 hm_read = pd.read_csv('/content/drive/MyDrive/TA_NAUFAL/HAM10000/metadata_balanced.csv')
8 for i in range(len(hm_read)):
----> 9 copy(f"/drive/MyDrive/TA_NAUFAL/HAM10000/HAM10000_ ALLDULLRAZOR/{hm_read['image_id'].values[i]}.jpg", f"/drive/MyDrive/TA_NAUFAL/HAM10000/HAM10000_BALANCED_DULLRAZOR/{hm_read['image_id'].values[i]}.jpg")
1 frames
/usr/lib/python3.8/shutil.py in copyfile(src, dst, follow_symlinks)
262 os.symlink(os.readlink(src), dst)
263 else:
--> 264 with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
265 # macOS
266 if _HAS_FCOPYFILE:
FileNotFoundError: [Errno 2] No such file or directory: '/drive/MyDrive/TA_NAUFAL/HAM10000/HAM10000_ ALLDULLRAZOR/ISIC_0027419.jpg'
But when I checked the file /drive/MyDrive/TA_NAUFAL/HAM10000/HAM10000_ ALLDULLRAZOR/ISIC_0027419.jpg
there is an image
here is my CSV

Related

Writing pandas dataframe to excel in dbfs azure databricks: OSError: [Errno 95] Operation not supported

I am trying to write a pandas dataframe to the local file system in azure databricks:
import pandas as pd
url = 'https://www.stats.govt.nz/assets/Uploads/Business-price-indexes/Business-price-indexes-March-2019-quarter/Download-data/business-price-indexes-march-2019-quarter-csv.csv'
data = pd.read_csv(url)
with pd.ExcelWriter(r'/dbfs/tmp/export.xlsx', engine="openpyxl") as writer:
data.to_excel(writer)
Then I get the following error message:
OSError: [Errno 95] Operation not supported
--------------------------------------------------------------------------- OSError Traceback (most recent call
last) in
3 data = pd.read_csv(url)
4 with pd.ExcelWriter(r'/dbfs/tmp/export.xlsx', engine="openpyxl") as writer:
----> 5 data.to_excel(writer)
/databricks/python/lib/python3.8/site-packages/pandas/io/excel/_base.py
in exit(self, exc_type, exc_value, traceback)
892
893 def exit(self, exc_type, exc_value, traceback):
--> 894 self.close()
895
896 def close(self):
/databricks/python/lib/python3.8/site-packages/pandas/io/excel/_base.py
in close(self)
896 def close(self):
897 """synonym for save, to make it more file-like"""
--> 898 content = self.save()
899 self.handles.close()
900 return content
I read in this post some limitations for mounted file systems: Pandas: Write to Excel not working in Databricks
But if I got it right, the solution is to write to the local workspace file system, which is exactly what is not working for me.
My user is workspace admin and I am using a standard cluster with 10.4 Runtime.
I also verified I can write csv file to the same location using pd.to_csv
What could be missing.
Databricks has a drawback that does not allow random write operations into DBFS which is indicated in the SO thread you are referring to.
So, a workaround for this would be to write the file to local file system (file:/) and then move to the required location inside DBFS. You can use the following code:
import pandas as pd
url = 'https://www.stats.govt.nz/assets/Uploads/Business-price-indexes/Business-price-indexes-March-2019-quarter/Download-data/business-price-indexes-march-2019-quarter-csv.csv'
data = pd.read_csv(url)
with pd.ExcelWriter(r'export.xlsx', engine="openpyxl") as writer:
#file will be written to /databricks/driver/ i.e., local file system
data.to_excel(writer)
dbutils.fs.ls("/databricks/driver/") indicates that the path you want to use to list the files is dbfs:/databricks/driver/ (absolute path) which does not exist.
/databricks/driver/ belongs to the local file system (DBFS is a part of this). The absolute path of /databricks/driver/ is file:/databricks/driver/. You can list the contents of this path by using either of the following:
import os
print(os.listdir("/databricks/driver/")
#OR
dbutils.fs.ls("file:/databricks/driver/")
So, use the file located in this path and move (or copy) it to your destination using shutil library as the following:
from shutil import move
move('/databricks/driver/export.xlsx','/dbfs/tmp/export.xlsx')

move files within Google Colab directory

what do I need to do or change?
I want to move files within Google Colab directory,
file label
0 img_44733.jpg 1
1 img_72999.jpg 1
2 img_25094.jpg 1
3 img_69092.jpg 1
4 img_92629.jpg 1
all files were stored in /content/train/ and separate folder 0 or 1 depend on their label (as seen on pic attached).
I pick some filename to use as validation dataset using sklearn and store it in x_test, and trying to move those file from their original directory of /content/train/ into /content/validation/ folder (again separated based on their label) using code below.
import os.path
from os import path
from os import mkdir
filter_list = x_test['file']
tmp = data[data.file.isin(filter_list)].index
for validator in tmp:
src_name = os.path.join('/content/train/'+str(data.loc[validator]['label'])+"/"+data.loc[validator]['file'])
trg_name = os.path.join('/content/validation/'+str(data.loc[validator]['label'])+"/"+data.loc[validator]['file'])
shutil.move(src_name, trg_name)
data is the original dataframe storing filename and label (shown above).
I get error code below:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
/usr/lib/python3.7/shutil.py in move(src, dst, copy_function)
565 try:
--> 566 os.rename(src, real_dst)
567 except OSError:
FileNotFoundError: [Errno 2] No such file or directory: '/content/train/1/img_92629.jpg' -> '/content/validation/1/img_92629.jpg'
During handling of the above exception, another exception occurred:
FileNotFoundError Traceback (most recent call last)
3 frames
/usr/lib/python3.7/shutil.py in copyfile(src, dst, follow_symlinks)
119 else:
120 with open(src, 'rb') as fsrc:
--> 121 with open(dst, 'wb') as fdst:
122 copyfileobj(fsrc, fdst)
123 return dst
FileNotFoundError: [Errno 2] No such file or directory: '/content/validation/1/img_92629.jpg'
I've tried using these for source directory input (shutil):
/content/train/
/train/
content/train/
not using the os.path
and using shell command:
!mv /content/train/
!mv content/train/
!mv /train/
!mv train/
what need to be changed?

Using pandas' read_hdf to load data on Google Drive fails with ValueError

I have uploaded a HDF file to Google Drive and wish to load it in Colab. The file was created from a dataframe with DataFrame.to_hdf() and it can be loaded successfully locally with pd.read_hdf(). However, when I try to mount my Google Drive and read the data in Colab, it fails with a ValueError.
Here is the code I am using to read the data:
from google.colab import drive
drive.mount('/content/drive')
data = pd.read_hdf('/content/drive/My Drive/Ryhmäytyminen/data/data.h5', 'students')
And this is the full error message:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-cfe913c26e60> in <module>()
----> 1 data = pd.read_hdf('/content/drive/My Drive/Ryhmäytyminen/data/data.h5', 'students')
7 frames
/usr/local/lib/python3.6/dist-packages/tables/vlarray.py in read(self, start, stop, step)
819 listarr = []
820 else:
--> 821 listarr = self._read_array(start, stop, step)
822
823 atom = self.atom
tables/hdf5extension.pyx in tables.hdf5extension.VLArray._read_array()
ValueError: cannot set WRITEABLE flag to True of this array
Reading some JSON data was successful, so the problem probably is not with mounting. Any ideas what is wrong or how to debug this problem?
Thank you!
Try navigating to the directory that you store your HDF file first:
cd /content/drive/My Drive/Ryhmäytyminen/data
From here you should be able to load the HDF file directly:
data = pd.read_hdf('data.h5', 'students')

pandas.read_clipboard from cloud-hosted jupyter?

I am running a Data8 instance of JupyterHub running JupyterLab on a server, and pd.read_clipboard() does not seem to work. I see the same problem in google colab.
import pandas as pd
pd.read_clipboard()
errors out like so:
---------------------------------------------------------------------------
PyperclipException Traceback (most recent call last)
<ipython-input-2-8cbad928c47b> in <module>()
----> 1 pd.read_clipboard()
/opt/conda/lib/python3.6/site-packages/pandas/io/clipboards.py in read_clipboard(sep, **kwargs)
29 from pandas.io.clipboard import clipboard_get
30 from pandas.io.parsers import read_table
---> 31 text = clipboard_get()
32
33 # try to decode (if needed on PY3)
/opt/conda/lib/python3.6/site-packages/pandas/io/clipboard/clipboards.py in __call__(self, *args, **kwargs)
125
126 def __call__(self, *args, **kwargs):
--> 127 raise PyperclipException(EXCEPT_MSG)
128
129 if PY2:
PyperclipException:
Pyperclip could not find a copy/paste mechanism for your system.
For more information, please visit https://pyperclip.readthedocs.org
Is there a way to get this working?
No. The machine is run in the cloud. Python from there cannot access your local machine to get clipboard content.
I tried Javascript clipboad api, but it didn't work probably because the output is in an iframe which isn't allow access to clipboard either. If it did, this would have worked
from google.colab.output import eval_js
text = eval_js("navigator.clipboard.readText()")

Failure to import numpy in Jupyter notebook

I am new to iPython/Jupyter. Python skills limited, but learning. I am trying to import numpy as np and get the following:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-4ee716103900> in <module>()
----> 1 import numpy as np
/Users/jmmiii/Library/Enthought/Canopy_32bit/User/lib/python2.7/site-packages/numpy/__init__.py in <module>()
166 return loader(*packages, **options)
167
--> 168 from . import add_newdocs
169 __all__ = ['add_newdocs', 'ModuleDeprecationWarning']
170
/Users/jmmiii/Library/Enthought/Canopy_32bit/User/lib/python2.7/site-packages/numpy/add_newdocs.py in <module>()
11 from __future__ import division, absolute_import, print_function
12
---> 13 from numpy.lib import add_newdoc
14
15 ###############################################################################
/Users/jmmiii/Library/Enthought/Canopy_32bit/User/lib/python2.7/site-packages/numpy/lib/__init__.py in <module>()
6 from numpy.version import version as __version__
7
----> 8 from .type_check import *
9 from .index_tricks import *
10 from .function_base import *
/Users/jmmiii/Library/Enthought/Canopy_32bit/User/lib/python2.7/site-packages/numpy/lib/type_check.py in <module>()
9 'common_type']
10
---> 11 import numpy.core.numeric as _nx
12 from numpy.core.numeric import asarray, asanyarray, array, isnan, \
13 obj2sctype, zeros
/Users/jmmiii/Library/Enthought/Canopy_32bit/User/lib/python2.7/site-packages/numpy/core/__init__.py in <module>()
4 from numpy.version import version as __version__
5
----> 6 from . import multiarray
7 from . import umath
8 from . import _internal # for freeze programs
ImportError: dlopen(/Users/jmmiii/Library/Enthought/Canopy_32bit/User/lib/python2.7/site-packages/numpy/core/multiarray.so, 2): no suitable image found. Did find:
/Users/jmmiii/Library/Enthought/Canopy_32bit/User/lib/python2.7/site-packages/numpy/core/multiarray.so: mach-o, but wrong architecture
I have several python installs on my Mac, which has Yosemite, including Canopy and Anaconda. I want my Jupyter notebook to use the Anaconda install including all the modules, libraries, etc. associated with it. It seems however that jupyter is targeting Canopy instead. Thus, I think my problem might stem from the wrong linkage.
QUESTION 1: Does my conclusion hold water? If not, what might I be missing?
QUESTION 2: How can I direct/link jupyter with Anaconda and not with Canopy so that I import everything from anaconda only?
Thanks for everyone's help!
You can either set the PATH to execute python commands from the ~/anaconda/bin directory by prepending it to your .bah_profile by running the following command.
export PATH="/Users/jmmiii/anaconda/bin:$PATH"
OR, you can create an alias for the command by editing your ~/.bash_profile and adding:
alias jupyter-notebook="/Users/jmmiii/anaconda/bin/jupyter-notebook"