Incomplete graph in Google Colab - google-colaboratory

Good day to all,
2 weeks ago my pyplots graph were working good. Hoever this week using the same code they are not plotting corectly on the y_axis. I thank you in advance all the help. Please find below a link to an example notebook as well as the example code I took from here [2], using seabrone library.
On the following picture you can see that the first and last row are incomplete.
https://drive.google.com/open?id=1My18DBfbTLsmeN2TxKYeFMkezXMGeb5W
Or you can copy the following code:
import seaborn as sn
import pandas as pd
import matplotlib.pyplot as plt
array = [[33,2,0,0,0,0,0,0,0,1,3],
[3,31,0,0,0,0,0,0,0,0,0],
[0,4,41,0,0,0,0,0,0,0,1],
[0,1,0,30,0,6,0,0,0,0,1],
[0,0,0,0,38,10,0,0,0,0,0],
[0,0,0,3,1,39,0,0,0,0,4],
[0,2,2,0,4,1,31,0,0,0,2],
[0,1,0,0,0,0,0,36,0,2,0],
[0,0,0,0,0,0,1,5,37,5,1],
[3,0,0,0,0,0,0,0,0,39,0],
[0,0,0,0,0,0,0,0,0,0,38]]
df_cm = pd.DataFrame(array, index = [i for i in "ABCDEFGHIJK"],
columns = [i for i in "ABCDEFGHIJK"])
plt.figure(figsize = (10,7))
sn.heatmap(df_cm, annot=True)
## Retrieved from https://stackoverflow.com/questions/35572000/how-can-i-plot-a-confusion-matrix

I already solved it or at least I found the reason of the problem. There was an update of matplotlib. The lastest version is 3.1.1 and I was using 3.1.0. So I used the following command to install the 3.1.0 version on colab. After that everything went back to normal
#This was the code for changing the matplotlib version in Google Colab:
! pip install matplotlib==3.1.0

Related

Plot a subset of data from a grib file on google colab

I'm trying to plot a subset of a field from a grib file on google colab. The issue I am finding is that due to google colab using an older version of python I can't get enough libraries to work together to 1.) get a field from the grib file and then 2.) extract a subset of that field by lat/lon, and then 3.) be able to plot with matplotlib/cartopy.
I've been able to do each of the above steps on my own PC and there are numerous answers on this forum already that work away from colab, so the issue is related to making it work on the colab environment, which uses python 3.7.
For simplicity, here are some assumptions that could be made for anybody who wants to help.
1.) Use this file, since its been what I have been trying to use:
https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20221113/conus/hrrr.t18z.wrfnatf00.grib2
2.) You could use any field, but I've been extracting this one (output from pygrib):
14:Temperature:K (instant):lambert:hybrid:level 1:fcst time 0 hrs:from 202211131800
3.) You can get this data in zarr format from AWS, but the grib format uploads to the AWS database faster so I need to use it.
Here are some notes on what I've tried:
Downloading the data isn't an issue, it's mostly relating to extracting the data (by lat lon) that is the main issue. I've tried using condacolab or pip to download pygrib, pupygrib, pinio, or cfgrib. I can then use these to download the data above.
I could never get pupygrib or pinio to even download correctly. Cfgrib I was able to get it to work with conda, but then xarray fails when trying to extract fields due to a library conflict. Pygrib worked the best, I was able to extract fields from the grib file. However, the function grb.data(lat1=30,lat2=40,lon1=-100,lon2-90) fails. It dumps the data into 1d arrays instead of 2d as it is supposed to per the documentation found here: https://jswhit.github.io/pygrib/api.html#example-usage
Here is some code I used for the pygrib set up in case that is useful:
!pip install pyproj
!pip install pygrib
# Uninstall existing shapely
!pip uninstall --yes shapely
!apt-get install -qq libgdal-dev libgeos-dev
!pip install shapely --no-binary shapely
!pip install cartopy==0.19.0.post1
!pip install metpy
!pip install wget
!pip install s3fs
import time
from matplotlib import pyplot as plt
import numpy as np
import scipy
import pygrib
import fsspec
import xarray as xr
import metpy.calc as mpcalc
from metpy.interpolate import cross_section
from metpy.units import units
from metpy.plots import USCOUNTIES
import cartopy.crs as ccrs
import cartopy.feature as cfeature
!wget https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20221113/conus/hrrr.t18z.wrfnatf00.grib2
grbs = pygrib.open('/content/hrrr.t18z.wrfnatf00.grib2')
grb2 = grbs.message(1)
data, lats, lons = grb2.data(lat1=30,lat2=40,lon1=-100,lon2=-90)
data.shape
This will output a 1d array for data, or lats and lons. That is as far as I can get here because existing options like meshgrib don't work on big datasets (I tried it).
The other option is to get data this way:
grb_t = grbs.select(name='Temperature')[0]
This is plottable, but I don't know of a way to extract a subset of the data from here using lat/lons.
If you can help, feel free to ask me anything I can add more details, but since I've tried like 10 different ways probably no sense in adding every failure. Really, I am open to any way to accomplish this task. Thank you.

Sklearn datasets default data structure is pandas or numPy?

I'm working through an exercise in https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/ and am finding unexpected behavior on my computer when I fetch a dataset. The following code returns
numpy.ndarray
on the author's Google Collab page, but returns
pandas.core.frame.DataFrame
on my local Jupyter notebook. As far as I know, my environment is using the exact same versions of libraries as the author. I can easily convert the data to a numPy array, but since I'm using this book as a guide for novices, I'd like to know what could be causing this discrepancy.
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
mnist.keys()
type(mnist['data'])
The author's Google Collab is at the following link, scrolling down to the "MNIST" heading. Thanks!
https://colab.research.google.com/github/ageron/handson-ml2/blob/master/03_classification.ipynb#scrollTo=LjZxzwOs2Q2P.
Just to close off this question, the comment by Ben Reiniger, namely to add as_frame=False, is correct. For example:
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
The OP has already made this change to the Colab code in the link.

JupyterLab fig does not show. It shows blank result (but works fine on jupyternotebook)

I am new to JupyterLab trying to learn.
When I try to plot a graph, it works fine on jupyter notebook, but does not show the result on jupyterlab. Can anyone help me with this?
Here are the codes below:
import pandas as pd
import pandas_datareader.data as web
import time
# import matplotlib.pyplot as plt
import datetime as dt
import plotly.graph_objects as go
import numpy as np
from matplotlib import style
# from matplotlib.widgets import EllipseSelector
from alpha_vantage.timeseries import TimeSeries
Here is the code for plotting below:
def candlestick(df):
fig = go.Figure(data = [go.Candlestick(x = df["Date"], open = df["Open"], high = df["High"], low = df["Low"], close = df["Close"])])
fig.show()
JupyterLab Result:
Link to the image (JupyterLab)
JupyterNotebook Result:
Link to the image (Jupyter Notebook)
I have updated both JupyterLab and Notebook to the latest version. I do not know what is causing JupyterLab to stop showing the figure.
Thank you for reading my post. Help would be greatly appreciated.
Note*
I did not include the parts for data reading (Stock OHLC values). It contains the API keys. I am sorry for inconvenience.
Also, this is my second post on stack overflow. If this is not a well-written post, I am sorry. I will try to put more effort if it is possible. Thank you again for help.
TL;DR:
run the following and then restart your jupyter lab
jupyter labextension install #jupyterlab/plotly-extension
Start the lab with:
jupyter lab
Test with the following code:
import plotly.graph_objects as go
from alpha_vantage.timeseries import TimeSeries
def candlestick(df):
fig = go.Figure(data = [go.Candlestick(x = df.index, open = df["1. open"], high = df["2. high"], low = df["3. low"], close = df["4. close"])])
fig.show()
# preferable to save your key as an environment variable....
key = # key here
ts = TimeSeries(key = key, output_format = "pandas")
data_av_hist, meta_data_av_hist = ts.get_daily('AAPL')
candlestick(data_av_hist)
Note: Depending on system and installation of JupyterLab versus bare Jupyter, jlab may work instead of jupyter
Longer explanation:
Since this issue is with plotly and not matplotlib, you do NOT have to use the "inline magic" of:
%matplotlib inline
Each extension has to be installed to the jupyter lab, you can see the list with:
jupyter labextension list
For a more verbose explanation on another extension, please see related issue:
jupyterlab interactive plot
Patrick Collins already gave the correct answer.
However, the current JupyterLab might not be supported by the extension, and for various reasons one might not be able to update the JupyterLab:
ValueError: The extension "#jupyterlab/plotly-extension" does not yet support the current version of JupyterLab.
In this condition a quick workaround would be to save the image and show it again:
from IPython.display import Image
fig.write_image("image.png")
Image(filename='image.png')
To get the write_image() method of Plotly to work, kaleido must be installed:
pip install -U kaleido
This is a full example (originally from Plotly) to test this workaround:
import os
import pandas as pd
import plotly.express as px
from IPython.display import Image
df = pd.DataFrame([
dict(Task="Job A", Start='2009-01-01', Finish='2009-02-28', Resource="Alex"),
dict(Task="Job B", Start='2009-03-05', Finish='2009-04-15', Resource="Alex"),
dict(Task="Job C", Start='2009-02-20', Finish='2009-05-30', Resource="Max")
])
fig = px.timeline(df, x_start="Start", x_end="Finish", y="Resource", color="Resource")
if not os.path.exists("images"):
os.mkdir("images")
fig.write_image("images/fig1.png")
Image(filename='images/fig1.png')

NameError: name 'pd' is not defined

I am attempting run in Jupyter
import pandas as pd
import matplotlib.pyplot as plt # plotting
import numpy as np # dense matrices
from scipy.sparse import csr_matrix # sparse matrices
%matplotlib inline
However when loading the dataset with
wiki = pd.read_csv('people_wiki.csv')
# add id column
wiki['id'] = range(0, len(wiki))
wiki.head(10)
the following error persists
NameError Traceback (most recent call last)
<ipython-input-1-56330c326580> in <module>()
----> 1 wiki = pd.read_csv('people_wiki.csv')
2 # add id column
3 wiki['id'] = range(0, len(wiki))
4 wiki.head(10)
NameError: name 'pd' is not defined
Any suggestions appreciated
Select Restart & Clear Output and run the cells again from the beginning.
I had the same issue, and as Ivan suggested in the comment, this resolved it.
If you came here from a duplicate, notice also that your code needs to contain
import pandas as pd
in the first place. If you are using a notebook like Jupyter and it's already there, or if you just added it, you probably need to re-evaluate the cell, as suggested in the currently top-voted answer by martin-martin.
python version will need 3.6 above, I think you have been use the python 2.7. Please select from top right for your python env version.
Be sure to load / import Pandas first
When stepping through the Anaconda Navigator demo, I found that pressing "play" on the first line before inputting the second line resolved the issue.

Is there any 2D plotting library compatible with pypy?

I am a heavy user of jupyter notebook and, lately, I am running it using pypy instead of python to get extra speed. It works perfectly but I am missing matplotlib so much. Is there any decent 2D plotting library compatible with pypy and jupyter notebook? I don't need fancy stuff, scatter, line and bar plots would be more than enough.
Bokeh is working fairly good with pypy. The only problem I have encountered is linked to the use of numpy.datetime64 that is not yet supported by pypy. Fortunately it is enough to monkey-patch bokeh/core/properties.py and bokeh/util/serialization.py to pass in case of datetime64 reference.
I did it in this way:
bokeh/core/properties.py
...
try:
import numpy as np
datetime_types += (np.datetime64,)
except:
pass
...
and
bokeh/util/serialization.py
...
# Check for astype failures (putative Numpy < 1.7)
try:
dt2001 = np.datetime64('2001')
legacy_datetime64 = (dt2001.astype('int64') ==
dt2001.astype('datetime64[ms]').astype('int64'))
except:
legacy_datetime64 = False
pass
...
And managed to get nice looking plots in jupyter using pypy.