Saving Matplotlib Output to DBFS on Databricks - matplotlib

I'm writing Python code on Databricks to process some data and output graphs. I want to be able to save these graphs as a picture file (.png or something, the format doesn't really matter) to DBFS.
Code:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]})
plt.close()
df.set_index('fruits',inplace = True)
df.plot.bar()
# plt.show()
Things that I tried:
plt.savefig("/FileStore/my-file.png")
[Errno 2] No such file or directory: '/FileStore/my-file.png'
fig = plt.gcf()
dbutils.fs.put("/dbfs/FileStore/my-file.png", fig)
TypeError: has the wrong type - (,) is expected.
After some research, I think the fs.put only works if you want to save text files.
running the above code with plt.show() will get you a bar graph - I want to be able to save the bar graph as an image to DBFS. Any help is appreciated, thanks in advance!

Easier way, just with matplotlib.pyplot. Fix the dbfs path:
Example
import matplotlib.pyplot as plt
plt.scatter(x=[1,2,3], y=[2,4,3])
plt.savefig('/dbfs/FileStore/figure.png')

You can do this by saving the figure to memory and then using the Python local file APIs to write to the DataBricks filesystem (DBFS).
Example:
import matplotlib.pyplot as plt
from io import BytesIO
# Create a plt or fig, then:
buf = BytesIO()
plt.savefig(buf, format='png')
path = '/dbfs/databricks/path/to/file.png'
# Make sure to open the file in bytes mode
with open(path, 'wb') as f:
# You can also use Bytes.IO.seek(0) then BytesIO.read()
f.write(buf.getvalue())

Related

Plot from multiple files imported with glob

I need to process hundreds of data files and I want to plot the results in a single graph. I'm using glob with a for loop to read and store the data, but I have no idea how to plot them with plotly.
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import glob
pio.renderers.default = 'browser'
files = glob.glob('GIRS12_L_8V_0.95bar.*')
traces = []
for file in files:
dat = pd.read_csv(file, sep=' ')
dat.columns = ['time','v(t)']
fig = go.Figure()
traces.append(go.Scatter(x = dat['time'], y = dat['v(t)']))
px.scatter(data_frame = traces)
Is it right to call px.scatter(...)? I was using fig.show() at the end but I don't know why it does not show anything in the graph.
have generated 100s of CSVs to demonstrate
pathlib is more pythonic / OO approach to interacting with file system and hence glob()
simplest approach with plotly is to use Plotly Express to generate all of the traces. Have taken approach of preparing all data into a single pandas data frame to make this super simple
per comments, a figure with so many traces and hence such a long legend may not be best visualisation for what you are trying to achieve. Consider what you need to visualise and tune solution to achieve a better visualisation
from pathlib import Path
import pandas as pd
import numpy as np
import plotly.express as px
# location where files exist
p = Path.cwd().joinpath("SO_csv")
if not p.is_dir():
p.mkdir()
# generate 100s of files
for i in range(400):
pd.DataFrame(
{
"time": pd.date_range("00:00", freq="30min", periods=47),
"v(t)": pd.Series(np.random.uniform(1, 5, 47)).sort_values(),
}
).to_csv(p.joinpath(f"GIRS12_L_8V_0.95bar.{i}"), index=False)
# read and concat all the CSVs into one dataframe, creating additional column that is the filename
# scatter this dataframe, a scatter / color per CSV
px.scatter(
pd.concat(
[pd.read_csv(f).assign(name=f.name) for f in p.glob("GIRS12_L_8V_0.95bar.*")]
),
x="time",
y="v(t)",
color="name",
)

Unable to generate plot using matplotlib

I am a beginner to Python and experimenting with a plot. the script runs fine but plot does not show up.
the matplotlib and numpy libraries are installed.
import numpy as np
f= h5py.File('3DIMG_05JUN2021_0000_L3B_HEM_DLY.h5','r')
#Studying the structure of the file by printing what HDF5 groups are present
for key in f.keys():
print(key) #Names of the groups in HDF5 file.
# will print the variables in the file
#Get the HDF5 group
ls=list(f.keys())
print("ls")
print(ls)
tsurf = f['HEM_DLY'][:]
print("tsurf")
print(tsurf)
tsurf1=np.squeeze(tsurf)
print(tsurf1.shape)
import matplotlib.pyplot as plt
im= plt.plot(tsurf1)
#plt.colorbar()
plt.imshow(im)```
Python version is 3 running on Ubuntu
Difficult to give you the exact answer without the dataset (please update the question with the dataset), but for sure, plt.plot does not return an object that can be plotted with plt.imshow
Try instead:
ax = plt.plot(tsurf1)
plt.show()
Probably the error was on the final plot.Try this:
import numpy as np
import matplotlib.pyplot as plt
f= h5py.File('/path','r')
ls=list(f.keys())
tsurf = f['your_key_str'][:]
tsurf1=np.squeeze(tsurf)
im= plt.plot(tsurf1)
plt.show(im) # <-- plt.show() NOT plt.imshow()

can't create a graph with matplotlib from a csv file / data type issue

I'm hoping to get some help here. I'm trying to create some simple bar/line graphs from a csv file, however, it gives me an empty graph until I open this csv file manually in excel and change the data type to numeric. I've tried changing the data type with pd.to_numeric but it still gives an empty graph.
The csv that I'm trying to visualise is web data that I scraped using Beautiful Soup, I used .text method do get rid of all of the HTML tags so maybe it's causing the issue?
Would really appreciate some help. thanks!
Data file: https://dropmefiles.com/AYTUT
import numpy
import matplotlib
from matplotlib import pyplot as plt
import pandas as pd
import csv
my_data = pd.read_csv('my_data.csv')
my_data_n = my_data.apply(pd.to_numeric)
plt.bar(x=my_data_n['Company'], height=my_data_n['Market_Cap'])
plt.show()
Your csv file is corrupt. There are commas at the end of each line. Remove them and your code should work. pd.to_numeric is not required for this sample dataset.
Test code:
from matplotlib import pyplot as plt
import pandas as pd
my_data = pd.read_csv('/mnt/ramdisk/my_data.csv')
fig = plt.bar(x=my_data['Company'], height=my_data['Market_Cap'])
plt.tick_params(axis='x', rotation=90)
plt.title("Title")
plt.tight_layout()
plt.show()

Zoom not working on Loading pickled matplotlib Figure

I want to store a matplotlib figure and load it later to use it interactively. To be more specific, I want to be able to use zoom in this figure.
I am using pickle to dump the figure handle into a file.
I then load the figure later using pickle, but the zoom does not work after loading the file even though I can use zoom in the figure before pickling it.
Here's a sample script that illustrates my problem.
import matplotlib.pyplot as plt
import pickle
import numpy as np
import os
import time
# Create Plot Data
x = np.arange(100)
# Create Figure, Axes and plot
fig1,axes1 = plt.subplots()
axes1.plot(x)
# Pickle plot
fileName = os.getcwd() + "/img"\
+ time.asctime(time.localtime()) + ".pickle"
with open(fileName,'wb') as pickle_file:
pickle.dump(fig1,pickle_file)
plt.show() # ZOOM WORKS HERE
plt.close()
# Load pickled plot
with open(fileName,'rb') as read_pickle:
fig_handle = pickle.load(read_pickle)
plt.show() # ZOOM DOES NOT WORK HERE
Zooming into the image before Pickle
Non-Zoomable image after Pickle
Version:
Python 3.7.0
Matplotlib 3.0.0
Pickle 4.0
Is this a limitation with Pickling matplotlib figure? Or is there something I can do to load/dump the figure in a zoomable way?
As #ImportanceOfBeingErnest pointed out I fixed the problem by changing my backends from MacOSX to TkAgg.
Here's what I did.
import matplotlib
matplotlib.get_backend() # Get the current backend
'MacOSX'
# Get Location where configuration file was loaded from.
matplotlib.matplotlib_fname()
'/usr/local/lib/python3.7/site-packages/matplotlib/mpl-data/matplotlibrc'
# Edit -> backend:TkAgg
matplotlib.get_backend()
'TkAgg'

How to plot remote image (from http url)

This must be easy, but I can't figure how right now without using urllib module and manually fetching remote file
I want to overlay plot with remote image (let's say "http://matplotlib.sourceforge.net/_static/logo2.png"), and neither imshow() nor imread() can load the image.
Any ideas which function will allow loading remote image?
It is easy indeed:
import urllib2
import matplotlib.pyplot as plt
# create a file-like object from the url
f = urllib2.urlopen("http://matplotlib.sourceforge.net/_static/logo2.png")
# read the image file in a numpy array
a = plt.imread(f)
plt.imshow(a)
plt.show()
This works for me in a notebook with python 3.5:
from skimage import io
import matplotlib.pyplot as plt
image = io.imread(url)
plt.imshow(image)
plt.show()
you can do it with this code;
from matplotlib import pyplot as plt
a = plt.imread("http://matplotlib.sourceforge.net/_static/logo2.png")
plt.imshow(a)
plt.show()
pyplot.imread for URLs is deprecated
Passing a URL is deprecated. Please open the URL for reading and pass
the result to Pillow, e.g. with
np.array(PIL.Image.open(urllib.request.urlopen(url))).
Matplotlib suggests using PIL instead. I prefer using imageio as sugested by SciPy:
imread is deprecated in SciPy 1.0.0, and will be removed in 1.2.0. Use
imageio.imread instead.
imageio.imread(uri, format=None, **kwargs)
Reads an image from the specified file. Returns a numpy array, which
comes with a dict of meta data at its ‘meta’ attribute.
Note that the image data is returned as-is, and may not always have a
dtype of uint8 (and thus may differ from what e.g. PIL returns).
Example:
import matplotlib.pyplot as plt
from imageio import imread
url = "http://matplotlib.sourceforge.net/_static/logo2.png"
img = imread(url)
plt.imshow(img)