I am running a Data8 instance of JupyterHub running JupyterLab on a server, and pd.read_clipboard() does not seem to work. I see the same problem in google colab.
import pandas as pd
pd.read_clipboard()
errors out like so:
---------------------------------------------------------------------------
PyperclipException Traceback (most recent call last)
<ipython-input-2-8cbad928c47b> in <module>()
----> 1 pd.read_clipboard()
/opt/conda/lib/python3.6/site-packages/pandas/io/clipboards.py in read_clipboard(sep, **kwargs)
29 from pandas.io.clipboard import clipboard_get
30 from pandas.io.parsers import read_table
---> 31 text = clipboard_get()
32
33 # try to decode (if needed on PY3)
/opt/conda/lib/python3.6/site-packages/pandas/io/clipboard/clipboards.py in __call__(self, *args, **kwargs)
125
126 def __call__(self, *args, **kwargs):
--> 127 raise PyperclipException(EXCEPT_MSG)
128
129 if PY2:
PyperclipException:
Pyperclip could not find a copy/paste mechanism for your system.
For more information, please visit https://pyperclip.readthedocs.org
Is there a way to get this working?
No. The machine is run in the cloud. Python from there cannot access your local machine to get clipboard content.
I tried Javascript clipboad api, but it didn't work probably because the output is in an iframe which isn't allow access to clipboard either. If it did, this would have worked
from google.colab.output import eval_js
text = eval_js("navigator.clipboard.readText()")
Related
When executing the below
profile = ProfileReport(df,title="Data Profile Report")
profile.to_file("data_profile_report.html")
Here is the exception thrown
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
c:\Projections 2022-08-16\Projections.py in <cell line: 4>()
102 # %%
103 #Creating EDA of data
104 profile = ProfileReport(df_cdap,title="CDAP Data Profile Report")
----> 105 profile.to_file("cdap_data_profile_report.html")
File c:\Users\fengq\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas_profiling\profile_report.py:257, in ProfileReport.to_file(self, output_file, silent)
254 self.config.html.assets_prefix = str(output_file.stem) + "_assets"
255 create_html_assets(self.config, output_file)
--> 257 data = self.to_html()
259 if output_file.suffix != ".html":
260 suffix = output_file.suffix
File c:\Users\fengq\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas_profiling\profile_report.py:368, in ProfileReport.to_html(self)
360 def to_html(self) -> str:
361 """Generate and return complete template as lengthy string
362 for using with frameworks.
363
(...)
366
367 """
--> 368 return self.html
...
--> 810 fig = manager.canvas.figure
811 if fig_label:
812 fig.set_label(fig_label)
AttributeError: 'NoneType' object has no attribute 'canvas'
I've tried to re-install python and reinstalling the dependencies for pandas-profiling but nothing seems to work so far. I've also tried downgrading python to python 3.9 and the matplotlib to an older version as well. It has not changed this error.
I notice that the error seems to be attributed to "manager.canvas.figure" but I'm not sure how to resolve it from that point onwards. Any help is greatly appreciated!
The problem resolved as I set the matplotlib to inline as per some comments that I was able to find on another forum. I'm still really interested to learn what causes this! Please feel free to answer and suggest other solutions! I would love to try them!
I am trying to write a pandas dataframe to the local file system in azure databricks:
import pandas as pd
url = 'https://www.stats.govt.nz/assets/Uploads/Business-price-indexes/Business-price-indexes-March-2019-quarter/Download-data/business-price-indexes-march-2019-quarter-csv.csv'
data = pd.read_csv(url)
with pd.ExcelWriter(r'/dbfs/tmp/export.xlsx', engine="openpyxl") as writer:
data.to_excel(writer)
Then I get the following error message:
OSError: [Errno 95] Operation not supported
--------------------------------------------------------------------------- OSError Traceback (most recent call
last) in
3 data = pd.read_csv(url)
4 with pd.ExcelWriter(r'/dbfs/tmp/export.xlsx', engine="openpyxl") as writer:
----> 5 data.to_excel(writer)
/databricks/python/lib/python3.8/site-packages/pandas/io/excel/_base.py
in exit(self, exc_type, exc_value, traceback)
892
893 def exit(self, exc_type, exc_value, traceback):
--> 894 self.close()
895
896 def close(self):
/databricks/python/lib/python3.8/site-packages/pandas/io/excel/_base.py
in close(self)
896 def close(self):
897 """synonym for save, to make it more file-like"""
--> 898 content = self.save()
899 self.handles.close()
900 return content
I read in this post some limitations for mounted file systems: Pandas: Write to Excel not working in Databricks
But if I got it right, the solution is to write to the local workspace file system, which is exactly what is not working for me.
My user is workspace admin and I am using a standard cluster with 10.4 Runtime.
I also verified I can write csv file to the same location using pd.to_csv
What could be missing.
Databricks has a drawback that does not allow random write operations into DBFS which is indicated in the SO thread you are referring to.
So, a workaround for this would be to write the file to local file system (file:/) and then move to the required location inside DBFS. You can use the following code:
import pandas as pd
url = 'https://www.stats.govt.nz/assets/Uploads/Business-price-indexes/Business-price-indexes-March-2019-quarter/Download-data/business-price-indexes-march-2019-quarter-csv.csv'
data = pd.read_csv(url)
with pd.ExcelWriter(r'export.xlsx', engine="openpyxl") as writer:
#file will be written to /databricks/driver/ i.e., local file system
data.to_excel(writer)
dbutils.fs.ls("/databricks/driver/") indicates that the path you want to use to list the files is dbfs:/databricks/driver/ (absolute path) which does not exist.
/databricks/driver/ belongs to the local file system (DBFS is a part of this). The absolute path of /databricks/driver/ is file:/databricks/driver/. You can list the contents of this path by using either of the following:
import os
print(os.listdir("/databricks/driver/")
#OR
dbutils.fs.ls("file:/databricks/driver/")
So, use the file located in this path and move (or copy) it to your destination using shutil library as the following:
from shutil import move
move('/databricks/driver/export.xlsx','/dbfs/tmp/export.xlsx')
I have uploaded a HDF file to Google Drive and wish to load it in Colab. The file was created from a dataframe with DataFrame.to_hdf() and it can be loaded successfully locally with pd.read_hdf(). However, when I try to mount my Google Drive and read the data in Colab, it fails with a ValueError.
Here is the code I am using to read the data:
from google.colab import drive
drive.mount('/content/drive')
data = pd.read_hdf('/content/drive/My Drive/Ryhmäytyminen/data/data.h5', 'students')
And this is the full error message:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-cfe913c26e60> in <module>()
----> 1 data = pd.read_hdf('/content/drive/My Drive/Ryhmäytyminen/data/data.h5', 'students')
7 frames
/usr/local/lib/python3.6/dist-packages/tables/vlarray.py in read(self, start, stop, step)
819 listarr = []
820 else:
--> 821 listarr = self._read_array(start, stop, step)
822
823 atom = self.atom
tables/hdf5extension.pyx in tables.hdf5extension.VLArray._read_array()
ValueError: cannot set WRITEABLE flag to True of this array
Reading some JSON data was successful, so the problem probably is not with mounting. Any ideas what is wrong or how to debug this problem?
Thank you!
Try navigating to the directory that you store your HDF file first:
cd /content/drive/My Drive/Ryhmäytyminen/data
From here you should be able to load the HDF file directly:
data = pd.read_hdf('data.h5', 'students')
Had a quick question regarding a pandas DataFrame and the pd.read_pickle() function. Basically, I have a large but simple Dataframe (333 mb). When I run pd.read_pickle on the dataframe, I am getting and EOFError.
Is there any way around this issue? What might be causing this?
I saw the same EOFError when I created a pickle using:
pandas.DataFrame.to_pickle('path.pkl', compression='bz2')
and then tried to read with:
pandas.read_pickle('path.pkl')
I fixed the issue by supplying the compression on read:
pandas.read_pickle('path.pkl', compression='bz2')
According to the Pandas docs:
compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’
string representing the compression to use in the output file. By default,
infers from the file extension in specified path.
Thus, simply changing the path from 'path.pkl' to 'path.bz2' also fixed the problem.
I can confirm the valuable comment of greg_data:
When I encountered this error I worked out that it was due to the
initial pickling not having completed correctly. The pickle file was
created, but not finished correctly. Seems to me this is the only
possible source of the EOFError in pickle, that the pickle is
malformed, i.e. not finished.
My error during pickling was:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-263240bbee7e> in <module>()
----> 1 main()
<ipython-input-38-9b3c6d782a2a> in main()
43 with open("/content/drive/MyDrive/{}.file".format(tm.id), "wb") as f:
---> 44 pickle.dump(tm, f, pickle.HIGHEST_PROTOCOL)
45
46 print('Coherence:', get_coherence(tm, token_lists, 'c_v'))
TypeError: can't pickle weakref objects
And when reading that pickle file that was obviously not finished during pickling, the reported error occured:
pd.read_pickle(r'/content/drive/MyDrive/TEST_2021_06_01_10_23_02.file')
Error:
---------------------------------------------------------------------------
EOFError Traceback (most recent call last)
<ipython-input-41-460bdd0a2779> in <module>()
----> 1 object = pd.read_pickle(r'/content/drive/MyDrive/TEST_2021_06_01_10_23_02.file')
/usr/local/lib/python3.7/dist-packages/pandas/io/pickle.py in read_pickle(filepath_or_buffer, compression)
180 # We want to silence any warnings about, e.g. moved modules.
181 warnings.simplefilter("ignore", Warning)
--> 182 return pickle.load(f)
183 except excs_to_catch:
184 # e.g.
EOFError: Ran out of input
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns;
tips = sns.load_dataset('tips')
tips.head()
tips['tip_pct'] = 100 * tips['tip'] / tips['total_bill']
grid = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True)
grid.map(plt.hist, "tip_pct", bins=np.linspace(0, 40, 15));
When I run the above code in Spyder IDE (Anaconda Navigator's Package), I get the desired results. But when the same code is run in Jupter QtConsole (including the line: %matplotlib inline) I get the following errors:
Out:
ValueErrorTraceback (most recent call last)
<ipython-input-47-c7ea1bbe0c80> in <module>()
----> 1 grid.map(plt.hist, "tip_pct", bins=np.linspace(0, 40, 15));
/Users/waqas/anaconda/lib/python3.5/site-packages/seaborn/axisgrid.py in map(self, func, *args, **kwargs)
701
702 # Get the current axis
--> 703 ax = self.facet_axis(row_i, col_j)
704
705 # Decide what color to plot with
/Users/waqas/anaconda/lib/python3.5/site-packages/seaborn/axisgrid.py in facet_axis(self, row_i, col_j)
832
833 # Get a reference to the axes object we want, and make it active
--> 834 plt.sca(ax)
835 return ax
836
/Users/waqas/anaconda/lib/python3.5/site-packages/matplotlib/pyplot.py in sca(ax)
905 m.canvas.figure.sca(ax)
906 return
--> 907 raise ValueError("Axes instance argument was not found in a figure.")
908
909
ValueError: Axes instance argument was not found in a figure.
I don't know what's going on.
Somewhat related...I was getting the same error when running jupyter notebook because I was running the following lines in separate cells.
g = sns.FacetGrid(data=titanic,col='sex')
g.map(plt.hist,'age')
Once I ran them both in the same cell the image displayed properly.
Since you're using Qt console see if it helps to assign your mapping to grid.
grid = grid.map(plt.hist, "tip_pct", bins=np.linspace(0, 40, 15))
You'll see the same approach is used in the documentation for FacetGrid.