I am doing a scatter plot using matplotlib and trying to use mpld3 to plot it in HTML. The figure when converted to dict, produces a property edgewidths with value type array with single element of value 1. which json.dumps cannot serialize to a JSON type. I am not sure which matplotlib axes property I need to change to fix this issue or do any of these libraries need some initialization properties set? Also is mpld3 the best backend for rendering as HTML?
Sample code
fig, ax = pyplot.subplots()
ax.scatter(a, b)
fig_json = json.dumps(mpld3.fig_to_dict(fig), skipkeys=True)
Problematic dict property when pretty printing dict
{'axes': [{'axes': [{'fontsize': 10.0,
'grid': {'alpha': 1.0,
'color': '#D3D3D3',
...
'axesbgalpha': None,
...
'collections': [{'alphas': [None],
...
'edgewidths': array([1.]), <--- This property
Exception
fig_json = json.dumps(mpld3.fig_to_dict(fig), skipkeys=True)
File "/usr/lib/python3.5/json/__init__.py", line 237, in dumps
**kw).encode(obj)
File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.5/json/encoder.py", line 179, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([1.]) is not JSON serializable
Dependencies used
matplotlib (2.1.2)
mpld3 (0.3)
Related
I am converting pandas dataframe to polars dataframe but pyarrow throws error.
My code:
import polars as pl
import pandas as pd
if __name__ == "__main__":
with open(r"test.xlsx", "rb") as f:
excelfile = f.read()
excelfile = pd.ExcelFile(excelfile)
sheetnames = excelfile.sheet_names
df = pd.concat(
[
pd.read_excel(
excelfile, sheet_name=x, header=0)
for x in sheetnames
], axis=0)
df_pl = pl.from_pandas(df)
Error:
File "pyarrow\array.pxi", line 312, in pyarrow.lib.array
File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow\error.pxi", line 122, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'int' object
I tried changing pandas dataframe dtype to str and problem is solved, but i don't want to change dtypes. Is it bug in pyarrow or am I missing something?
Edit: Polars 0.13.42 and later
Polars now has a read_excel function that will correctly handle this situation. read_excel is now the preferred way to read Excel files into Polars.
Note: to use read_excel, you will need to install xlsx2csv (which can be installed with pip).
Polars: prior to 0.13.42
I can replicate this result. It is due to a column in the original Excel file that contains both text and numbers.
For example, create a new Excel file with one column in which you type both numbers and text, save it, and run your code on that file. I get the following traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/xxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/convert.py", line 299, in from_pandas
return DataFrame._from_pandas(df, rechunk=rechunk, nan_to_none=nan_to_none)
File "/home/xxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/frame.py", line 454, in _from_pandas
pandas_to_pydf(
File "/home/xxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 485, in pandas_to_pydf
arrow_dict = {
File "/home/xxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 486, in <dictcomp>
str(col): _pandas_series_to_arrow(
File "/home/xxx/.virtualenvs/StackOverflow3.10/lib/python3.10/site-packages/polars/internals/construction.py", line 237, in _pandas_series_to_arrow
return pa.array(values, pa.large_utf8(), from_pandas=nan_to_none)
File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 122, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'int' object
There are several lengthy discussions on this issue, such as these:
to_parquet can't handle mixed type columns #21228
pyarrow.lib.ArrowTypeError: "Expected a string or bytes object, got a 'int' object" #349
This particular comment might be relevant, as you are concatenating the results of parsing multiple sheets in an Excel file. This may lead to conflicting dtypes for a column:
https://github.com/pandas-dev/pandas/issues/21228#issuecomment-419175116
How to approach this depends on your data and its use, so I can't recommend a blanket solution (i.e., fixing your source Excel file, or changing the dtype to str).
My problem is solved by saving pandas dataframe to 'csv' format and then importing 'csv' file in polars.
import os
import polars as pl
import pandas as pd
if __name__ == "__main__":
with open(r"test.xlsx", "rb") as f:
excelfile = f.read()
excelfile = pd.ExcelFile(excelfile)
sheetnames = excelfile.sheet_names
df = pd.concat([pd.read_excel(excelfile, sheet_name=x, header=0)
for x in sheetnames
], axis=0)
df.to_csv("temp.csv",index=False)
df_pl = pl.scan_csv("temp.csv")
os.remove("temp.csv")
How do you get the object from ax.hist() and then setp for the object.
Here is what I mean:
n,bins2,patches =
ax2.hist(arra,bins=18,weights=1./bias,normed=False,color='#d9d9db')
ax2.hist.setp(edgecolor='g')
Well, obviously this doesn't work! I am getting an error:
File "./bin_data.py", line 112, in <module>
ax2.hist.setp(edgecolor='g')
AttributeError: 'function' object has no attribute 'setp'
Your help will be greatly appreciated!
Of course to change the edgecolor you may directly supply it to the histogram function
n,bins2,patches = ax2.hist(..., facecolor='#d9d9db', edgecolor="g")
To answer the question: The object to set the color to is the third return of hist, which is a container of bars
n,bins2,patches = ax2.hist(..., color='#d9d9db')
plt.setp(patches, edgecolor="g")
This question is somewhat an extension of How can I use values read from TFRecords as arguments to tf.reshape?
I cast my images into a certain shape with the following code:
height = tf.cast(features['height'],tf.int32)
width = tf.cast(features['width'],tf.int32)
image = tf.reshape(image,tf.pack([height, width, 3]))
In cifar10_input code, the image is then distorted with the following, where IMAGE_SIZE = 32:
height = IMAGE_SIZE
width = IMAGE_SIZE
distorted_image = tf.random_crop(image, [height, width, 3])
However, for my purposes, I don't need to do a random crop now. As such, I replaced that line with:
distorted_image = image
When I do this, it throws the following error:
Traceback (most recent call last):
File "cnn_train.py", line 128, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "cnn_train.py", line 124, in main
train()
File "cnn_train.py", line 56, in train
images, labels = cnn.distorted_inputs()
File "/home/samuelchin/tensorflow/my_code/CNN/cnn.py", line 123, in distorted_inputs
batch_size=BATCH_SIZE)
File "/home/samuelchin/tensorflow/my_code/CNN/cnn_input.py", line 128, in distorted_inputs
min_queue_examples, batch_size)
File "/home/samuelchin/tensorflow/my_code/CNN/cnn_input.py", line 70, in _generate_image_and_label_batch
min_after_dequeue=min_queue_examples)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 494, in shuffle_batch
dtypes=types, shapes=shapes)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 404, in __init__
shapes = _as_shape_list(shapes, dtypes)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 70, in _as_shape_list
raise ValueError("All shapes must be fully defined: %s" % shapes)
ValueError: All shapes must be fully defined: [TensorShape([Dimension(None), Dimension(None), Dimension(None)]), TensorShape([])]
I have 2 questions:
Why does it throw an error when I don't do a tf.random_crop? It seems to me as if tf.random_crop returns something that is totally diffrent from image.
Is simply setting IMAGE_SIZE to the size that I want a good solution? For example, if the images are 32 x 32 and I want to crop it to 24 x 24, I will set IMAGE_SIZE = 24. Now, since I want it to be 32 x 32, should I simply set IMAGE_SIZE = 32?
Because you're generating your image dynamically, including pulling out the height & width dynamically from the tf record file, TensorFlow doesn't know the shape of the resulting image. Many of the later ops in the pipeline need to be able to determine the shape at the time Python executes.
The tf.random_crop has the incidental effect of setting the image size to a known, fixed size, and leaving its shape exposed for subsequent processing.
You can just slice the image to the size you want instead of doing a random_crop, but you need to perform some operation to turn the image into a fixed-size thing. If you want it to be 32x32 and you know that your input height and width are 32x32, then you can just do set_shape on it (but you'd better be right). Otherwise, you can crop and/or resize to the size you want.
Simple example of using Google prices in DataFrame format. Gadfly plot gives the following error: TypeError(u'There is no Line2D property "y"',). Also references matplotlib for some reason.
Here's code:
using Quandl
using DataFrames
google = quandl("GOOG/NASDAQ_QQQ", format = "DataFrame")
date = google[1]
dt_str = Array(Any,length(date))
for i=1:length(date)
dt_str[i] = string(date[i]);
end
price = google[5]
using Gadfly
set_default_plot_size(20cm, 10cm)
p1 = plot(x=dt_str, y=price,
Geom.point,
Geom.smooth(method=:lm),
Guide.xticks(ticks=[1:25]),
Guide.yticks(ticks=[1:25]),
Guide.xlabel("Date"),
Guide.ylabel("Price"),
Guide.title("Google: Close Price"))
LoadError: PyError (:PyObject_Call)
TypeError(u'There is no Line2D property "y"',)
File "C:\Anaconda2\lib\site-packages\matplotlib\pyplot.py", line 3154, in plot
ret = ax.plot(*args, **kwargs)
File "C:\Anaconda2\lib\site-packages\matplotlib\__init__.py", line 1811, in inner
return func(ax, *args, **kwargs)
File "C:\Anaconda2\lib\site-packages\matplotlib\axes\_axes.py", line 1424, in plot
for line in self._get_lines(*args, **kwargs):
File "C:\Anaconda2\lib\site-packages\matplotlib\axes\_base.py", line 395, in _grab_next_args
for seg in self._plot_args(remaining[:isplit], kwargs):
File "C:\Anaconda2\lib\site-packages\matplotlib\axes\_base.py", line 374, in _plot_args
seg = func(x[:, j % ncx], y[:, j % ncy], kw, kwargs)
File "C:\Anaconda2\lib\site-packages\matplotlib\axes\_base.py", line 281, in _makeline
self.set_lineprops(seg, **kwargs)
File "C:\Anaconda2\lib\site-packages\matplotlib\axes\_base.py", line 189, in set_lineprops
line.set(**kwargs)
File "C:\Anaconda2\lib\site-packages\matplotlib\artist.py", line 936, in set
(self.__class__.__name__, k))
while loading In[64], in expression starting on line 1
in getindex at C:\Users\yburkitbayev\.julia\v0.4\PyCall\src\PyCall.jl:239
The presence of the PyError would imply to me that the session in which this example was executed has loaded PyPlot prior to loading Gadfly. Both PyPlot and Gadfly export the plot function, so uses of plot in a session where both PyPlot and Gadfly have been loaded require the qualification of the function name with the package name (e.g. PyPlot.plot or Gadfly.plot).
Executing your example in a session where PyPlot has not been loaded, but Gadfly is loaded, produces a Gadfly plot without displaying the error message provided in your post.
I am new to Matplotlib and am struggling a bit to differentiate between the OO and pyplot interfaces. I’m actually working with the Kivy GUI framework and trying to plot 4 subplots on a single figure, to be displayed by Kivy. Here’s a snippet of my code:
def create_plot(self):
self.fig, ((self.ax0, self.ax1), (self.ax2, self.ax3)) = plt.subplots(nrows=2, ncols=2)
self.ax0.set_title("A")
self.ax0.grid(True, lw = 2, ls = '--', c = '.75')
self.ax1.set_title("B")
self.ax1.grid(True, lw = 2, ls = '--', c = '.75')
self.ax2.set_title("C")
self.ax2.grid(True, lw = 2, ls = '--', c = '.75')
self.ax3.set_title("D")
self.ax3.grid(True, lw = 2, ls = '--', c = '.75')
#plt.tight_layout()
plt.show()
canvas = self.fig.canvas
self.add_widget(canvas)
What worries me is that I am calling plt methods and assigning the results to my objects. Is plt the state machine interface and not the OO interface, or is this OK?
Secondly, I want to periodically update the plotted lines, so I have a plot method that does this:
def plot(self, xCoords, yCoords):
if len(self.ax0.lines) > 0:
self.ax0.lines.pop(0)
line = self.ax0.plot(xCoords, yCoords, color='blue')
canvas = self.fig.canvas
canvas.draw()
Does that look ok? Can I just pop the existing line, or should I reuse the existing line?
Lastly, and most difficult, if I enable:
plt.tight_layout()
I get an exception:
C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\matplotlib\tight_layout.py:225: UserWarning: tight_layout : falling back to Agg renderer
warnings.warn("tight_layout : falling back to Agg renderer")
Traceback (most recent call last):
File "main.py", line 1117, in <module>
GuiApp().run()
File "C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\kivy\app.py", line 801, in run
self.load_kv(filename=self.kv_file)
File "C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\kivy\app.py", line 598, in load_kv
root = Builder.load_file(rfilename)
File "C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\kivy\lang.py", line 1801, in load_file
return self.load_string(data, **kwargs)
File "C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\kivy\lang.py", line 1880, in load_string
self._apply_rule(widget, parser.root, parser.root)
File "C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\kivy\lang.py", line 2038, in _apply_rule
self._apply_rule(child, crule, rootrule)
File "C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\kivy\lang.py", line 2037, in _apply_rule
self.apply(child)
File "C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\kivy\lang.py", line 1924, in apply
self._apply_rule(widget, rule, rule)
File "C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\kivy\lang.py", line 2038, in _apply_rule
self._apply_rule(child, crule, rootrule)
File "C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\kivy\lang.py", line 2038, in _apply_rule
self._apply_rule(child, crule, rootrule)
File "C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\kivy\lang.py", line 2035, in _apply_rule
child = cls(__no_builder=True)
File "C:\SVNProj\Raggio\trunk\hostconsole\gui\mygraph.py", line 127, in __init__
self.create_plot()
File "C:\SVNProj\Raggio\trunk\hostconsole\gui\mygraph.py", line 224, in create_plot
self.add_widget(canvas)
File "C:\Kivy-1.9.0-py3.4-win32-x64\Python34\lib\site-packages\kivy\uix\boxlayout.py", line 211, in add_widget
widget.bind(
AttributeError: 'FigureCanvasAgg' object has no attribute 'bind'
Can anyone help with that please?
Best regards
David
Original post
regarding def create_plot
plt.subplots() is just a wrapper around creating a new figure and adding subplots (axes) to that figure, so that is safe to use in any context AFAIK
regarding updating lines vs creating new lines.
Popping off and creating a new line with a call to axes.plot() will work, but will be slower than just using the existing line artist and calling artist.set_data(x_data, y_data)
Regarding the tight_layout issue
Consider just calling self.fig.tight_layout() instead of plt.tight_layout(), as plt.tight_layout() might be grabbing the wrong figure
Update based on discussion in comments:
Since you are plotting one line per panel, I would suggest adding these four lines to your create_plot method to save the LineArtist's that are created by the axes.plot() method:
self.line0, = self.ax0.plot([], [], 'b')
self.line1, = self.ax1.plot([], [], 'b')
self.line2, = self.ax2.plot([], [], 'b')
self.line3, = self.ax3.plot([], [], 'b')
Then in the plot method you can just do this:
self.line0.set_data(xCoords, yCoords)
Though I seem to be struggling to remember how to make ax0 automatically update its limits based on the xCoords and yCoords. I thought that it was as simple as self.ax0.relim(), but that is not working in my jupyter notebook right now. Hmm.