Getting a EOF error when calling pd.read_pickle - pandas

Had a quick question regarding a pandas DataFrame and the pd.read_pickle() function. Basically, I have a large but simple Dataframe (333 mb). When I run pd.read_pickle on the dataframe, I am getting and EOFError.
Is there any way around this issue? What might be causing this?

I saw the same EOFError when I created a pickle using:
pandas.DataFrame.to_pickle('path.pkl', compression='bz2')
and then tried to read with:
pandas.read_pickle('path.pkl')
I fixed the issue by supplying the compression on read:
pandas.read_pickle('path.pkl', compression='bz2')
According to the Pandas docs:
compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’
string representing the compression to use in the output file. By default,
infers from the file extension in specified path.
Thus, simply changing the path from 'path.pkl' to 'path.bz2' also fixed the problem.

I can confirm the valuable comment of greg_data:
When I encountered this error I worked out that it was due to the
initial pickling not having completed correctly. The pickle file was
created, but not finished correctly. Seems to me this is the only
possible source of the EOFError in pickle, that the pickle is
malformed, i.e. not finished.
My error during pickling was:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-263240bbee7e> in <module>()
----> 1 main()
<ipython-input-38-9b3c6d782a2a> in main()
43 with open("/content/drive/MyDrive/{}.file".format(tm.id), "wb") as f:
---> 44 pickle.dump(tm, f, pickle.HIGHEST_PROTOCOL)
45
46 print('Coherence:', get_coherence(tm, token_lists, 'c_v'))
TypeError: can't pickle weakref objects
And when reading that pickle file that was obviously not finished during pickling, the reported error occured:
pd.read_pickle(r'/content/drive/MyDrive/TEST_2021_06_01_10_23_02.file')
Error:
---------------------------------------------------------------------------
EOFError Traceback (most recent call last)
<ipython-input-41-460bdd0a2779> in <module>()
----> 1 object = pd.read_pickle(r'/content/drive/MyDrive/TEST_2021_06_01_10_23_02.file')
/usr/local/lib/python3.7/dist-packages/pandas/io/pickle.py in read_pickle(filepath_or_buffer, compression)
180 # We want to silence any warnings about, e.g. moved modules.
181 warnings.simplefilter("ignore", Warning)
--> 182 return pickle.load(f)
183 except excs_to_catch:
184 # e.g.
EOFError: Ran out of input

Related

How to fix __init__() got an unexpected keyword argument 'location' error in scanpy.pl.umap?

I am trying to run single-cell analysis on scanpy 1.9.1. When I try to run scanpy.pl.umap(adata, color=["PDGFRB","RGS5"], s = 30), I get the following error:
TypeError Traceback (most recent call last)
in
----> 1 sc.pl.umap(adata, color=["PDGFRB","RGS5"], s = 30)
5 frames
/usr/local/lib/python3.8/dist-packages/matplotlib/colorbar.py in init(self, ax, mappable, **kw)
1228 """
1229 Return colorbar data coordinates for the boundaries of
-> 1230 a proportional colorbar, plus extension lengths if required:
1231 """
1232 if (isinstance(self.norm, colors.BoundaryNorm) or
TypeError: init() got an unexpected keyword argument 'location'
And I get a blank color heat legend.
blank_color_heat_legend
I saw someone suggested using an older version of matplotlib for a similar problem. This error occurred in matplotlib 3.6.3, so I tried installing matplotlib 3.1.3 but it did not work either.
Any help will be appreciated!

'DB2ExecutionContext_ibm_db' object has no attribute 'compiled_parameters'

I ran into a problem while querying on ibm-db2 cloud.
I checked the db connection. The connection is okay but whenever I try to query something (anything) it gives me this error.
'DB2ExecutionContext_ibm_db' object has no attribute 'compiled_parameters'
a snapshot of the error -
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-22-ec7ae9958cc4> in <module>
......
......
~/conda/envs/python/lib/python3.6/site-packages/ibm_db_sa/ibm_db.py in pre_exec(self)
47 def pre_exec(self):
48 # if a single execute, check for outparams
---> 49 if len(self.compiled_parameters) == 1:
50 for bindparam in self.compiled.binds.values():
51 if bindparam.isoutparam:
AttributeError: 'DB2ExecutionContext_ibm_db' object has no attribute 'compiled_parameters'
Any solution ?
you could check for these installment if you haven't
!pip install sqlalchemy==1.3.9
!pip install ibm_db_sa
I got the same error and this helps

Using pandas' read_hdf to load data on Google Drive fails with ValueError

I have uploaded a HDF file to Google Drive and wish to load it in Colab. The file was created from a dataframe with DataFrame.to_hdf() and it can be loaded successfully locally with pd.read_hdf(). However, when I try to mount my Google Drive and read the data in Colab, it fails with a ValueError.
Here is the code I am using to read the data:
from google.colab import drive
drive.mount('/content/drive')
data = pd.read_hdf('/content/drive/My Drive/Ryhmäytyminen/data/data.h5', 'students')
And this is the full error message:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-cfe913c26e60> in <module>()
----> 1 data = pd.read_hdf('/content/drive/My Drive/Ryhmäytyminen/data/data.h5', 'students')
7 frames
/usr/local/lib/python3.6/dist-packages/tables/vlarray.py in read(self, start, stop, step)
819 listarr = []
820 else:
--> 821 listarr = self._read_array(start, stop, step)
822
823 atom = self.atom
tables/hdf5extension.pyx in tables.hdf5extension.VLArray._read_array()
ValueError: cannot set WRITEABLE flag to True of this array
Reading some JSON data was successful, so the problem probably is not with mounting. Any ideas what is wrong or how to debug this problem?
Thank you!
Try navigating to the directory that you store your HDF file first:
cd /content/drive/My Drive/Ryhmäytyminen/data
From here you should be able to load the HDF file directly:
data = pd.read_hdf('data.h5', 'students')

What is OSError: [Errno 95] Operation not supported for pandas to_csv on colab?

My input is:
test=pd.read_csv("/gdrive/My Drive/data-kaggle/sample_submission.csv")
test.head()
It ran as expected.
But, for
test.to_csv('submitV1.csv', header=False)
The full error message that I got was:
OSError Traceback (most recent call last)
<ipython-input-5-fde243a009c0> in <module>()
9 from google.colab import files
10 print(test)'''
---> 11 test.to_csv('submitV1.csv', header=False)
12 files.download('/gdrive/My Drive/data-
kaggle/submission/submitV1.csv')
2 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in
to_csv(self, path_or_buf, sep, na_rep, float_format, columns,
header, index, index_label, mode, encoding, compression, quoting,
quotechar, line_terminator, chunksize, tupleize_cols, date_format,
doublequote, escapechar, decimal)
3018 doublequote=doublequote,
3019 escapechar=escapechar,
decimal=decimal)
-> 3020 formatter.save()
3021
3022 if path_or_buf is None:
/usr/local/lib/python3.6/dist-packages/pandas/io/formats/csvs.pyi
in save(self)
155 f, handles = _get_handle(self.path_or_buf,
self.mode,
156 encoding=self.encoding,
--> 157
compression=self.compression)
158 close = True
159
/usr/local/lib/python3.6/dist-packages/pandas/io/common.py in
_get_handle(path_or_buf, mode, encoding, compression, memory_map,
is_text)
422 elif encoding:
423 # Python 3 and encoding
--> 424 f = open(path_or_buf, mode,encoding=encoding,
newline="")
425 elif is_text:
426 # Python 3 and no explicit encoding
OSError: [Errno 95] Operation not supported: 'submitV1.csv'
Additional Information about the error:
Before running this command, if I run
df=pd.DataFrame()
df.to_csv("file.csv")
files.download("file.csv")
It is running properly, but the same code is producing the operation not supported error if I try to run it after trying to convert test data frame to a csv file.
I am also getting a message A Google Drive timeout has occurred (most recently at 13:02:43). More info. just before running the command.
You are currently in the directory in which you don't have write permissions.
Check your current directory with pwd. It might be gdrive of some directory inside it, that's why you are unable to save there.
Now change the current working directory to some other directory where you have permissions to write. cd ~ will work fine. It wil chage the directoy to /root.
Now you can use:
test.to_csv('submitV1.csv', header=False)
It will save 'submitV1.csv' to /root

geopandas cannot read a geojson properly

I am trying the following:
After downloading http://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_20m.json
In [2]: import geopandas
In [3]: geopandas.read_file('./gz_2010_us_050_00_20m.json')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-83a1d4a0fc1f> in <module>
----> 1 geopandas.read_file('./gz_2010_us_050_00_20m.json')
~/miniconda3/envs/ml3/lib/python3.6/site-packages/geopandas/io/file.py in read_file(filename, **kwargs)
24 else:
25 f_filt = f
---> 26 gdf = GeoDataFrame.from_features(f_filt, crs=crs)
27
28 # re-order with column order from metadata, with geometry last
~/miniconda3/envs/ml3/lib/python3.6/site-packages/geopandas/geodataframe.py in from_features(cls, features, crs)
207
208 rows = []
--> 209 for f in features_lst:
210 if hasattr(f, "__geo_interface__"):
211 f = f.__geo_interface__
fiona/ogrext.pyx in fiona.ogrext.Iterator.__next__()
fiona/ogrext.pyx in fiona.ogrext.FeatureBuilder.build()
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
On the page http://eric.clst.org/tech/usgeojson/ with 4 geojson files under the 20m column, the above file corresponds to the US Counties row, and is the only one that cannot be read out of the 4. The error message isn't very informative, I wonder what's the reason, please?
If your error message looks anything like "Polygons and MultiPolygons should follow the right-hand rule", it means the order of the coordinates in those GeoObjects should be clock-wise.
Here's an online tool to "fix" your objects, with a short explanation:
https://mapster.me/right-hand-rule-geojson-fixer/
Possibly an answer for people arriving at this page, I received the same error and the error was thrown due to encoding issues.
Try encoding the initial file with utf-8 or try opening the file with an encoding which you think is applied to the file. This fixed my error.
More info here