Google colab unable to work with hdf5 files - google-colaboratory

I have 4 hdf5 files in my drive. While using colab, db=h5py.File(path_to_file, "r") works sometimes and doesn't rest of the time. While writing the hdf5 file, I have ensured that I closed it after writing. Say File1 works on notebook_#1, when I try to use it on notebook_#2 it works sometimes, and doesn't other times. When I run it again on notebook_#1 it may work or maynot.
Probably size is not a matter because my 4 files are 32GB and others 4GB and mostly the problem is with 4GB files.
The hdf5 files were generated using colab itself. The error that I get is:
OSError: Unable to open file (file read failed: time = Tue May 19 12:58:36 2020
, filename = '/content/drive/My Drive/Hrushi/dl4cv/hdf5_files/train.hdf5', file descriptor = 61, errno = 5, error message = 'Input/output error', buf = 0x7ffc437c4c20, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0
or
/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
171 if swmr and swmr_support:
172 flags |= h5f.ACC_SWMR_READ
--> 173 fid = h5f.open(name, flags, fapl=fapl)
174 elif mode == 'r+':
175 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5f.pyx in h5py.h5f.open()
OSError: Unable to open file (bad object header version number)
Would be grateful for any help, thanks in advance.

Reading directly from Google Drive can cause problems.
Try copying it to local directory e.g. /content/ first.

Related

In Google collab I get IOPub data rate exceeded

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
--NotebookApp.iopub_data_rate_limit.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
An IOPub error usually occurs when you try to print a large amount of data to the console. Check your print statements - if you're trying to print a file that exceeds 10MB, its likely that this caused the error. Try to read smaller portions of the file/data.
I faced this issue while reading a file from Google Drive to Colab.
I used this link https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb
and the problem was in this block of code
# Download the file we just uploaded.
#
# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz
file_id = 'target_file_id'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
#Remove this print statement
#print('Downloaded file contents are: {}'.format(downloaded.read()))
I had to remove the last print statement since it exceeded the 10MB limit in the notebook - print('Downloaded file contents are: {}'.format(downloaded.read()))
Your file will still be downloaded and you can read it in smaller chunks or read a portion of the file.
The above answer is correct, I just commented the print statement and the error went away. just keeping it here so someone might find it useful. Suppose u are reading a csv file from google drive just import pandas and add pd.read_csv(downloaded) it will work just fine.
file_id = 'FILEID'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
pd.read_csv(downloaded);
Maybe this will help..
from via sv1997
IOPub Error on Google Colaboratory in Jupyter Notebook
IoPub Error is occurring in Colab because you are trying to display the output on the console itself(Eg. print() statements) which is very large.
The IoPub Error maybe related in print function.
So delete or annotate the print function. It may resolve the error.
%cd darknet
!sed -i 's/OPENCV=0/OPENCV=1/' Makefile
!sed -i 's/GPU=0/GPU=1/' Makefile
!sed -i 's/CUDNN=0/CUDNN=1/' Makefile
!sed -i 's/CUDNN_HALF=0/CUDNN_HALF=1/' Makefile
!apt update
!apt-get install libopencv-dev
its important to update your make file. and also, keep your input file name correct

writeOGR error: creation of output file failed

I'm an R rookie and attempting to create home ranges from fish telemetry data using kernel density estimates within the adehabitatHR package
kud <- kernelUD(muskydetectdata.P[,6], h="href", extent = 5)
class(kud)
image(kud)
kud[[1]]#h
muskykud.P95 <- getverticeshr(kud, percent = 95)
muskykud.P95
muskykud.P50 <- getverticeshr(kud, percent = 50)
muskykud.P50
when exporting to a shapefile
writeOGR(muskydetectdata.sp,"musky_kde1", "gps",
driver="ESRI Shapefile",
dataset_options= "FieldName= id")
an error message is displayed
##creation of output file failed
I have also attempted to use writeSpatialShape with similar results
I'm using R version 3.3.2 on windows 64 bit
I had the same problem and have solved it only when I added a full name of my directory and a name of a layer plus a shp suffix:
writeOGR(muskydetectdata.sp, dsn="d:/your directory here/musky_kde.shp", layer="musky_kde", driver="ESRI Shapefile")
I had that same error.
I resolved mine by correcting the directory it was saving to (making sure it existed)
e.g.
writeOGR(muskydetectdata.sp, dsn = save.dir, layer = filename.save, driver = 'ESRI Shapefile')
where save.dir is the directory you want saved as a string and filename.save is the filename you want it saved as (excluding extension)
I guess you are trying to write on an existing file and the writeOGR function don't allow that. I guess this is a known behavior of some drivers supported by OGR (as far as I remember in R as in python and in the C API).
You have to check if the file exists prior to your writing and removing it (or changing the path you want to use).
For example here the first write operation succeed but the attempt to overwrite the file fails with your error message :
> rgdal::writeOGR(spdf, 'b.shp', layer="brazil", driver='ESRI Shapefile')
> rgdal::writeOGR(spdf, 'b.shp', layer="brazil", driver='ESRI Shapefile')
Error in rgdal::writeOGR(spdf, "b.shp", layer = "brazil", driver = "ESRI Shapefile") :
Creation of output file failed

Graphlab Create setup error: graphlab.get_dependencies() results in BadZipFile error

After installing Graphlab Create on Win 10, it asks us to install 2 dependencies using graphlab.get_dependencies().
However, I am getting the following error:
In [9]: gl.get_dependencies()
By running this function, you agree to the following licenses.
* libstdc++: https://gcc.gnu.org/onlinedocs/libstdc++/manual/license.html
* xz: http://git.tukaani.org/?p=xz.git;a=blob;f=COPYING
Downloading xz.
Extracting xz.
---------------------------------------------------------------------------
BadZipfile Traceback (most recent call last)
in ()
----> 1 gl.get_dependencies()
C:\Users\nikulk\Anaconda2\envs\gl-env\lib\site-packages\graphlab\dependencies.pyc in get_dependencies()
34 xzarchive_dir = tempfile.mkdtemp()
35 print('Extracting xz.')
---> 36 xzarchive = zipfile.ZipFile(xzarchive_file)
37 xzarchive.extractall(xzarchive_dir)
38 xz = os.path.join(xzarchive_dir, 'bin_x86-64', 'xz.exe')
C:\Users\nikulk\Anaconda2\envs\gl-env\lib\zipfile.pyc in __init__(self, file, mode, compression, allowZip64)
768 try:
769 if key == 'r':
--> 770 self._RealGetContents()
771 elif key == 'w':
772 # set the modified flag so central directory gets written
C:\Users\nikulk\Anaconda2\envs\gl-env\lib\zipfile.pyc in _RealGetContents(self)
809 raise BadZipfile("File is not a zip file")
810 if not endrec:
--> 811 raise BadZipfile, "File is not a zip file"
812 if self.debug > 1:
813 print endrec
BadZipfile: File is not a zip file
Anyone knows how to resolve?
If you get this error, a firewall might be blocking you from downloading a dependency. Here is some information and a work around:
Please see the SFrame source code for get_dependencies to see how GraphLab uses this package: https://github.com/turicode/SFrame/blob/master/oss_src/unity/python/sframe/dependencies.py
The xz utility is only used to extract runtime dependencies from the other file downloaded there (from repo.msys2.org): http://repo.msys2.org/mingw/x86_64/mingw-w64-x86_64-gcc-libs-5.1.0-1-any.pkg.tar.xz. Two DLLs from that file need to be extracted into the "cython" directory inside the GraphLab Create install path (typically something like lib/site-packages/python2.7/graphlab within a virtualenv or conda env). Once extracted the dependency issue should be resolved.
In graphlab folder make the folder writable.Initially it is only readable.Go to properties of folder undo the only read option.Hope it solve your problem.

Connection reset by peer error in MongoDb on bulk insert

I am trying to insert 500 documents by doing a bulk insert in pymongo and i get this error
File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 306, in insert
continue_on_error, self.__uuid_subtype), safe)
File "/usr/lib64/python2.6/site-packages/pymongo/connection.py", line 748, in _send_message
raise AutoReconnect(str(e))
pymongo.errors.AutoReconnect: [Errno 104] Connection reset by peer
i looked around and found here that this happens because the size of inserted documents exceeds 16 MB so according to that the size of 500 documents should be over 16 MB. So i checked the size of the size of the 500 documents(python dictionaries) like this
size=0
for dict in dicts:
size+=dict.__sizeof__()
print size
this gives me 502920. This is like 500 KB. way less than 16 MB. Then why do i get this error.
I know i am calculating the size of python dictionaries not BSON documents and MongoDB takes in BSON documents but that cant turn 500KB into 16+ MB. Moreover i dont know how to convert a python dict into A BSON document.
My MongoDB version is 2.0.6 and pymongo version is 2.2.1
EDIT
I can do a bulk insert with 150 documents and thats fine but over 150 documents this error appears
This Bulk Inserts bug has been resolved, but you may need to update your pymongo version:
pip install --upgrade pymongo
The error occurs due to the fact that the bulk inserted documents have
an overall size of greater than 16 MB
My method of calculating the size of dictionaries was wrong.
When i manually inspected each key of the dictionary and found that 1 key was having a value of size 300 KB. So that did make the overall size of documents in the bulk insert more than 16 MB. (500*(300+)KB) > 16 MB. But i still dont know how to calculate size of a dictionary without manually inspecting it. Can someone please suggest?
Just had the same error and got around it by creating my own small bulks like this:
region_list = []
region_counter = 0
write_buffer = 1000
# loop through regions
for region in source_db.region.find({}, region_column):
region_counter += 1 # up _counter
region_list.append(region)
# save bulk if we're at the write buffer
if region_counter == write_buffer:
result = user_db.region.insert(region_list)
region_list = []
region_counter = 0
# if there is a rest, also save that
if region_counter > 0:
result = user_db.region.insert(region_list)
Hope this helps
NB: small update, from pymongo 2.6 on, PyMongo will auto-split lists based on the max transfer size: "The insert() method automatically splits large batches of documents into multiple insert messages based on max_message_size"

Cannot append to file when some other process writes to it on *nix systems

I have a very simple piece of code which just writes a small amount of data to a file at some regular interval. Once my program has created the file and appended some data, when I open this file in vim(or any other editor for that matter) and edit it, my process cannot seem to update the file anymore. I do not see any errors being returned from the syscall. I tried tracing the system calls, and did not observe anything weird even while the file is NOT being updated.
Since each process gets its own file table entry which has the current offset, all I was expecting was an output file with data interspersed with writes from the two non-cooperating processes(possibly garbled too). But what I am observing is that my program cannot update the file anymore once any other editor writes to the file.
Couple of other interesting observations
1) When I cat something to the output file, my program can continue to update no problem
2) When multiple instances of my own program are writing to the same file, everything is fine again
I understand that there's mandatory locking to prevent multiple writes, but I am trying to understand whats happening underneath. Also this kind of scenario behaves normally for some loggers (like system log, apache logs etc)
Any ideas to explain this behavior?. Also any hints on how I can debug this further?
My code is pretty simple:
1 int main(int argc, char** argv)
2 {
3 const char* buf;
4 if(argc < 2)
5 buf = "test->";
6 else
7 buf = argv[1];
8
9 int fd;
10 if((fd = open("test.log", O_CREAT|O_WRONLY|O_APPEND, 0644)) == -1) {
11 perror("Cannot open test.log");
12 exit(1);
13 }
14
15 int num_bytes = strlen(buf), num_bytes_written = -1;
16
17 while(1) {
18 if((num_bytes_written = write(fd, buf, num_bytes)) == -1) {
19 perror("Could not write to fd");
20 }
21 fsync(fd);
22 sleep(5);
23 }
24 }
When the vim(1) editor exits, it's likely replacing the original file with the edited version. Your process is holding the original file open but that file no longer exists in the sense that it's directory entry has been replaced and, so, no process that doesn't already have the file open can access it. Your process is now appending to a file that can't be accessed by any other process. Once your process closes the file, it will be gone for good (unless you run a partition recovery program).
Your vim editor works on a cached version of your file. It modifies this cache while your other programs append to the original file. When you save with vim, you overwrite the original file with the updated cached file and loose all logs.