how to save Correlation map generated in pandas [duplicate] - pandas

The code below when run in jupyter notebook renders a table with a colour gradient format that I would like to export to an image file.
The resulting 'styled_table' object that notebook renders is a pandas.io.formats.style.Styler type.
I have not been able to find a way to export the Styler to an image.
I hope someone can share a working example of an export, or give me some pointers.
import pandas as pd
import seaborn as sns
data = {('count', 's25'):
{('2017-08-11', 'Friday'): 88.0,
('2017-08-12', 'Saturday'): 90.0,
('2017-08-13', 'Sunday'): 93.0},
('count', 's67'):
{('2017-08-11', 'Friday'): 404.0,
('2017-08-12', 'Saturday'): 413.0,
('2017-08-13', 'Sunday'): 422.0},
('count', 's74'):
{('2017-08-11', 'Friday'): 203.0,
('2017-08-12', 'Saturday'): 227.0,
('2017-08-13', 'Sunday'): 265.0},
('count', 's79'):
{('2017-08-11', 'Friday'): 53.0,
('2017-08-12', 'Saturday'): 53.0,
('2017-08-13', 'Sunday'): 53.0}}
table = pd.DataFrame.from_dict(data)
table.sort_index(ascending=False, inplace=True)
cm = sns.light_palette("seagreen", as_cmap=True)
styled_table = table.style.background_gradient(cmap=cm)
styled_table

As mentioned in the comments, you can use the render property to obtain an HTML of the styled table:
html = styled_table.render()
You can then use a package that converts html to an image. For example, IMGKit: Python library of HTML to IMG wrapper. Bear in mind that this solution requires the installation of wkhtmltopdf, a command line tool to render HTML into PDF and various image formats. It is all described in the IMGKit page.
Once you have that, the rest is straightforward:
import imgkit
imgkit.from_string(html, 'styled_table.png')

You can use dexplo's dataframe_image from https://github.com/dexplo/dataframe_image. After installing the package, it also lets you save styler objects as images like in this example from the README:
import numpy as np
import pandas as pd
import dataframe_image as dfi
df = pd.DataFrame(np.random.rand(6,4))
df_styled = df.style.background_gradient()
dfi.export(df_styled, 'df_styled.png')

Related

can't create a graph with matplotlib from a csv file / data type issue

I'm hoping to get some help here. I'm trying to create some simple bar/line graphs from a csv file, however, it gives me an empty graph until I open this csv file manually in excel and change the data type to numeric. I've tried changing the data type with pd.to_numeric but it still gives an empty graph.
The csv that I'm trying to visualise is web data that I scraped using Beautiful Soup, I used .text method do get rid of all of the HTML tags so maybe it's causing the issue?
Would really appreciate some help. thanks!
Data file: https://dropmefiles.com/AYTUT
import numpy
import matplotlib
from matplotlib import pyplot as plt
import pandas as pd
import csv
my_data = pd.read_csv('my_data.csv')
my_data_n = my_data.apply(pd.to_numeric)
plt.bar(x=my_data_n['Company'], height=my_data_n['Market_Cap'])
plt.show()
Your csv file is corrupt. There are commas at the end of each line. Remove them and your code should work. pd.to_numeric is not required for this sample dataset.
Test code:
from matplotlib import pyplot as plt
import pandas as pd
my_data = pd.read_csv('/mnt/ramdisk/my_data.csv')
fig = plt.bar(x=my_data['Company'], height=my_data['Market_Cap'])
plt.tick_params(axis='x', rotation=90)
plt.title("Title")
plt.tight_layout()
plt.show()

GeoViews saving inline HTML file is very large

I have created geo-dataframe using a combination of geopandas and geoviews. Libraries I'm using are below:
import pandas as pd
import numpy as np
import geopandas as gpd
import holoviews as hv
import geoviews as gv
import matplotlib.pyplot as plt
import matplotlib
import panel as pn
from cartopy import crs
gv.extension('bokeh')
I have concatenated 3 shapefiles to build a polygon picture of UK healthcare boundaries (links to files provided if needed). Unfortunately, from what i have found the UK doesn't produce one file that combines all of those, so have had to merge the shape files from the 3 individual countries i'm interested in. The 3 shape files have a size of:
shape file 1 = 5mb (https://www.opendatani.gov.uk/dataset/department-of-health-trust-boundaries)
shape file 2 = 204kb (https://geoportal.statistics.gov.uk/datasets/5252644ec26e4bffadf9d3661eef4826_4)
shape file 3 = 22kb (https://data.gov.uk/dataset/31ab16a2-22da-40d5-b5f0-625bafd76389/local-health-boards-december-2016-ultra-generalised-clipped-boundaries-in-wales)
I have merged them all successfully to build the picture i am looking for using:
Test = gv.Polygons(Merged_Shapes, vdims=[('Data'), ('CCG_Name')], crs=crs.OSGB()).options(tools=['hover'], width=550, height=700)
Test_2 = gv.Polygons(Merged_Shapes, vdims=[('Data'), ('CCG_Name')], crs=crs.OSGB()).options(tools=['hover'], width=550, height=700)
However, I would like to include these charts in a shareable html file. The issue I'm running into, is that when I save the HTML using:
from bokeh.resources import INLINE
layout = hv.Layout(Test + Test_2)
Final_report = pn.Tabs(('Test',layout)).save('Map_test.html', resources=INLINE)
I generate a html file that displays the charts, but the size is 80mb, which is far to large, especially if I want include more polygon charts and other charts in the same html.
Does anyone know of a more efficient way, from a memory perspective, I can store my polygon charts within a HTML file for sharing?
You can make the file smaller by rasterizng or by decimating the shapes. For rasterizng you can call hv.operation.datashader.rasterize(obj), and I think there is something in Shapely or GeoPandas for simplifying the shapes.

JupyterLab fig does not show. It shows blank result (but works fine on jupyternotebook)

I am new to JupyterLab trying to learn.
When I try to plot a graph, it works fine on jupyter notebook, but does not show the result on jupyterlab. Can anyone help me with this?
Here are the codes below:
import pandas as pd
import pandas_datareader.data as web
import time
# import matplotlib.pyplot as plt
import datetime as dt
import plotly.graph_objects as go
import numpy as np
from matplotlib import style
# from matplotlib.widgets import EllipseSelector
from alpha_vantage.timeseries import TimeSeries
Here is the code for plotting below:
def candlestick(df):
fig = go.Figure(data = [go.Candlestick(x = df["Date"], open = df["Open"], high = df["High"], low = df["Low"], close = df["Close"])])
fig.show()
JupyterLab Result:
Link to the image (JupyterLab)
JupyterNotebook Result:
Link to the image (Jupyter Notebook)
I have updated both JupyterLab and Notebook to the latest version. I do not know what is causing JupyterLab to stop showing the figure.
Thank you for reading my post. Help would be greatly appreciated.
Note*
I did not include the parts for data reading (Stock OHLC values). It contains the API keys. I am sorry for inconvenience.
Also, this is my second post on stack overflow. If this is not a well-written post, I am sorry. I will try to put more effort if it is possible. Thank you again for help.
TL;DR:
run the following and then restart your jupyter lab
jupyter labextension install #jupyterlab/plotly-extension
Start the lab with:
jupyter lab
Test with the following code:
import plotly.graph_objects as go
from alpha_vantage.timeseries import TimeSeries
def candlestick(df):
fig = go.Figure(data = [go.Candlestick(x = df.index, open = df["1. open"], high = df["2. high"], low = df["3. low"], close = df["4. close"])])
fig.show()
# preferable to save your key as an environment variable....
key = # key here
ts = TimeSeries(key = key, output_format = "pandas")
data_av_hist, meta_data_av_hist = ts.get_daily('AAPL')
candlestick(data_av_hist)
Note: Depending on system and installation of JupyterLab versus bare Jupyter, jlab may work instead of jupyter
Longer explanation:
Since this issue is with plotly and not matplotlib, you do NOT have to use the "inline magic" of:
%matplotlib inline
Each extension has to be installed to the jupyter lab, you can see the list with:
jupyter labextension list
For a more verbose explanation on another extension, please see related issue:
jupyterlab interactive plot
Patrick Collins already gave the correct answer.
However, the current JupyterLab might not be supported by the extension, and for various reasons one might not be able to update the JupyterLab:
ValueError: The extension "#jupyterlab/plotly-extension" does not yet support the current version of JupyterLab.
In this condition a quick workaround would be to save the image and show it again:
from IPython.display import Image
fig.write_image("image.png")
Image(filename='image.png')
To get the write_image() method of Plotly to work, kaleido must be installed:
pip install -U kaleido
This is a full example (originally from Plotly) to test this workaround:
import os
import pandas as pd
import plotly.express as px
from IPython.display import Image
df = pd.DataFrame([
dict(Task="Job A", Start='2009-01-01', Finish='2009-02-28', Resource="Alex"),
dict(Task="Job B", Start='2009-03-05', Finish='2009-04-15', Resource="Alex"),
dict(Task="Job C", Start='2009-02-20', Finish='2009-05-30', Resource="Max")
])
fig = px.timeline(df, x_start="Start", x_end="Finish", y="Resource", color="Resource")
if not os.path.exists("images"):
os.mkdir("images")
fig.write_image("images/fig1.png")
Image(filename='images/fig1.png')

Generating a NetCDF from a text file

Using Python can I open a text file, read it into an array, then save the file as a NetCDF?
The following script I wrote was not successful.
import os
import pandas as pd
import numpy as np
import PIL.Image as im
path = 'C:\path\to\data'
grb = [[]]
for fn in os.listdir(path):
file = os.path.join(path,fn)
if os.path.isfile(file):
df = pd.read_table(file,skiprows=6)
grb.append(df)
df2 = pd.np.array(grb)
#imarray = im.fromarray(df2) ##cannot handle this data type
#imarray.save('Save_Array_as_TIFF.tif')
i once used xray or xarray (they renamed them selfs) to get a NetCDF file into an ascii dataframe... i just googled and appearantly they have a to_netcdf function
import xarray and it allows you to treat dataframes just like pandas.
so give this a try:
df.to_netcdf(file_path)
xarray slow to save netCDF

How to plot remote image (from http url)

This must be easy, but I can't figure how right now without using urllib module and manually fetching remote file
I want to overlay plot with remote image (let's say "http://matplotlib.sourceforge.net/_static/logo2.png"), and neither imshow() nor imread() can load the image.
Any ideas which function will allow loading remote image?
It is easy indeed:
import urllib2
import matplotlib.pyplot as plt
# create a file-like object from the url
f = urllib2.urlopen("http://matplotlib.sourceforge.net/_static/logo2.png")
# read the image file in a numpy array
a = plt.imread(f)
plt.imshow(a)
plt.show()
This works for me in a notebook with python 3.5:
from skimage import io
import matplotlib.pyplot as plt
image = io.imread(url)
plt.imshow(image)
plt.show()
you can do it with this code;
from matplotlib import pyplot as plt
a = plt.imread("http://matplotlib.sourceforge.net/_static/logo2.png")
plt.imshow(a)
plt.show()
pyplot.imread for URLs is deprecated
Passing a URL is deprecated. Please open the URL for reading and pass
the result to Pillow, e.g. with
np.array(PIL.Image.open(urllib.request.urlopen(url))).
Matplotlib suggests using PIL instead. I prefer using imageio as sugested by SciPy:
imread is deprecated in SciPy 1.0.0, and will be removed in 1.2.0. Use
imageio.imread instead.
imageio.imread(uri, format=None, **kwargs)
Reads an image from the specified file. Returns a numpy array, which
comes with a dict of meta data at its ‘meta’ attribute.
Note that the image data is returned as-is, and may not always have a
dtype of uint8 (and thus may differ from what e.g. PIL returns).
Example:
import matplotlib.pyplot as plt
from imageio import imread
url = "http://matplotlib.sourceforge.net/_static/logo2.png"
img = imread(url)
plt.imshow(img)