GeoViews saving inline HTML file is very large - pandas

I have created geo-dataframe using a combination of geopandas and geoviews. Libraries I'm using are below:
import pandas as pd
import numpy as np
import geopandas as gpd
import holoviews as hv
import geoviews as gv
import matplotlib.pyplot as plt
import matplotlib
import panel as pn
from cartopy import crs
gv.extension('bokeh')
I have concatenated 3 shapefiles to build a polygon picture of UK healthcare boundaries (links to files provided if needed). Unfortunately, from what i have found the UK doesn't produce one file that combines all of those, so have had to merge the shape files from the 3 individual countries i'm interested in. The 3 shape files have a size of:
shape file 1 = 5mb (https://www.opendatani.gov.uk/dataset/department-of-health-trust-boundaries)
shape file 2 = 204kb (https://geoportal.statistics.gov.uk/datasets/5252644ec26e4bffadf9d3661eef4826_4)
shape file 3 = 22kb (https://data.gov.uk/dataset/31ab16a2-22da-40d5-b5f0-625bafd76389/local-health-boards-december-2016-ultra-generalised-clipped-boundaries-in-wales)
I have merged them all successfully to build the picture i am looking for using:
Test = gv.Polygons(Merged_Shapes, vdims=[('Data'), ('CCG_Name')], crs=crs.OSGB()).options(tools=['hover'], width=550, height=700)
Test_2 = gv.Polygons(Merged_Shapes, vdims=[('Data'), ('CCG_Name')], crs=crs.OSGB()).options(tools=['hover'], width=550, height=700)
However, I would like to include these charts in a shareable html file. The issue I'm running into, is that when I save the HTML using:
from bokeh.resources import INLINE
layout = hv.Layout(Test + Test_2)
Final_report = pn.Tabs(('Test',layout)).save('Map_test.html', resources=INLINE)
I generate a html file that displays the charts, but the size is 80mb, which is far to large, especially if I want include more polygon charts and other charts in the same html.
Does anyone know of a more efficient way, from a memory perspective, I can store my polygon charts within a HTML file for sharing?

You can make the file smaller by rasterizng or by decimating the shapes. For rasterizng you can call hv.operation.datashader.rasterize(obj), and I think there is something in Shapely or GeoPandas for simplifying the shapes.

Related

How to save histogram plots from Tensorboard 2 to disk, just like you can do with scalars?

I am using Tensorboard 2 to visualize my training data and I am able to save scalar plots to disk. However, I am unable to find a way to do this for histogram plots (tf.summary.histogram).
Is it possible to save histogram plots from Tensorboard 2 to disk, just like it is possible to do with scalars? I have looked through the documentation and it seems like this is not supported, but I wanted to confirm with the community before giving up. Any help or suggestions would be greatly appreciated.
There is an open issue to add a download button for histograms. However, this issue is open for more than 4 years, so I doubt it is getting resolved soon.
A workaround is to use the url that tensorboard would use to get the data.
A short example:
# writing some data to tensorboard
from torch.utils.tensorboard import SummaryWriter
import numpy as np
writer = SummaryWriter('./tmp')
writer.add_histogram('hist', np.arange(10), 0)
Open tensorboard in the browser (here localhost:6006):
Get data as JSON using the template
http://<tb-host>/data/plugin/histograms/histograms?run=<run-name>&tag=<tag-name>.
Here http://localhost:6006/data/plugin/histograms/histograms/?run=.&tag=hist:
Now you can download the data as JSON.
Quick comparison with matplotlib:
import pandas as pd
import json
import matplotlib.pyplot as plt
with open('histograms.json', 'r') as f:
d = pd.DataFrame(json.load(f)[0][2])
fix, axes = plt.subplots(1, 2, figsize=(10, 3))
axes[0].bar(d[1], d[2])
axes[0].set_title('tb')
axes[1].hist(data)
axes[1].set_title('original')

Generate graphs for all the columns in a excel file with Pandas in Google Colab

I'm trying to generate graphs for all the columns from a excel file but I'm having one error. My goal is getting this graphs in png files and then download it. Let me give you some context: I'm reading a csv, for each column, I'm trying to use a for to use .value_counts() and then create a graph once the graph is generated saving this one in a png file with the number of the index my code is this one:
import pandas as pd
from google.colab import files
from matplotlib import pyplot as plt
df = pd.read_excel('columns.xlsx')
for i in df.columns:
print(i)
#i.value_counts().plot(kind="bar", figsize=(15,7), color="#61d199")
df[i].value_counts().plot(kind="bar", figsize=(15,7), color="#61d199")
#plt.savefig('viz_movies.png')
Error in this code line:
df[i].value_counts().plot(kind="bar", figsize=(15,7), color="#61d199")
Error:
index 0 is out of bounds for axis 0 with size 0
Also I want to add in the for the names of the files something like this:
for index, number in enumerate(numbers):
plt.savefig('index.png') #use the index as a name

how to save Correlation map generated in pandas [duplicate]

The code below when run in jupyter notebook renders a table with a colour gradient format that I would like to export to an image file.
The resulting 'styled_table' object that notebook renders is a pandas.io.formats.style.Styler type.
I have not been able to find a way to export the Styler to an image.
I hope someone can share a working example of an export, or give me some pointers.
import pandas as pd
import seaborn as sns
data = {('count', 's25'):
{('2017-08-11', 'Friday'): 88.0,
('2017-08-12', 'Saturday'): 90.0,
('2017-08-13', 'Sunday'): 93.0},
('count', 's67'):
{('2017-08-11', 'Friday'): 404.0,
('2017-08-12', 'Saturday'): 413.0,
('2017-08-13', 'Sunday'): 422.0},
('count', 's74'):
{('2017-08-11', 'Friday'): 203.0,
('2017-08-12', 'Saturday'): 227.0,
('2017-08-13', 'Sunday'): 265.0},
('count', 's79'):
{('2017-08-11', 'Friday'): 53.0,
('2017-08-12', 'Saturday'): 53.0,
('2017-08-13', 'Sunday'): 53.0}}
table = pd.DataFrame.from_dict(data)
table.sort_index(ascending=False, inplace=True)
cm = sns.light_palette("seagreen", as_cmap=True)
styled_table = table.style.background_gradient(cmap=cm)
styled_table
As mentioned in the comments, you can use the render property to obtain an HTML of the styled table:
html = styled_table.render()
You can then use a package that converts html to an image. For example, IMGKit: Python library of HTML to IMG wrapper. Bear in mind that this solution requires the installation of wkhtmltopdf, a command line tool to render HTML into PDF and various image formats. It is all described in the IMGKit page.
Once you have that, the rest is straightforward:
import imgkit
imgkit.from_string(html, 'styled_table.png')
You can use dexplo's dataframe_image from https://github.com/dexplo/dataframe_image. After installing the package, it also lets you save styler objects as images like in this example from the README:
import numpy as np
import pandas as pd
import dataframe_image as dfi
df = pd.DataFrame(np.random.rand(6,4))
df_styled = df.style.background_gradient()
dfi.export(df_styled, 'df_styled.png')

can't create a graph with matplotlib from a csv file / data type issue

I'm hoping to get some help here. I'm trying to create some simple bar/line graphs from a csv file, however, it gives me an empty graph until I open this csv file manually in excel and change the data type to numeric. I've tried changing the data type with pd.to_numeric but it still gives an empty graph.
The csv that I'm trying to visualise is web data that I scraped using Beautiful Soup, I used .text method do get rid of all of the HTML tags so maybe it's causing the issue?
Would really appreciate some help. thanks!
Data file: https://dropmefiles.com/AYTUT
import numpy
import matplotlib
from matplotlib import pyplot as plt
import pandas as pd
import csv
my_data = pd.read_csv('my_data.csv')
my_data_n = my_data.apply(pd.to_numeric)
plt.bar(x=my_data_n['Company'], height=my_data_n['Market_Cap'])
plt.show()
Your csv file is corrupt. There are commas at the end of each line. Remove them and your code should work. pd.to_numeric is not required for this sample dataset.
Test code:
from matplotlib import pyplot as plt
import pandas as pd
my_data = pd.read_csv('/mnt/ramdisk/my_data.csv')
fig = plt.bar(x=my_data['Company'], height=my_data['Market_Cap'])
plt.tick_params(axis='x', rotation=90)
plt.title("Title")
plt.tight_layout()
plt.show()

reading arrays from netCDF, why I get a size of (1,1,n)

I am trying to read and later on to plot data from a netcdf file. Some of the arrays contained at the .nc file that I am trying to store as variables, are created as a (1,1,n) size variable. When printing them i see [[[ numbers, numbers,....]]]. Why are these three [[[ are created? How can I read these variables as a simple (n,1) array?
Here is my code
import pandas as pd
import netCDF4 as nc
import matplotlib.pyplot as plt
from tkinter import filedialog
import numpy as np
file_path=filedialog.askopenfilename(title = "Select files", filetypes = (("all files","*.*"),("txt files","*.txt")))
file=nc.Dataset(file_path)
print(file.variables.keys()) # get all variable names
read_alt=file.variables['altitude'][:]
alt=np.array(read_alt)
read_b355=file.variables['backscatter'][:]
read_error_b355=file.variables['error_backscatter'][:]
b355=np.array(read_b355)
error_b355=np.array(read_error_b355)
the variable alt is fine, for the other two I have the aforementioned problem.
Is it possible that your variables - altitude, backscatter and error_backscatter - have more than one dimensions? Whenever you load that kind of data, the number of dimensions is kept by the netCDF library.
Nevertheless, what I usually do, is that I remove the dimensions that I do not need from the arrays by squeezing them:
read_alt = np.squeeze(file.variables['altitude'][:])
read_b355 = np.squeeze(file.variables['backscatter'][:]);
read_error_b355 = np.squeeze(file.variables['error_backscatter'][:]);