Draw an ordinary plot with the same style as in plt.hist(histtype='step') - matplotlib

The method plt.hist() in pyplot has a way to create a 'step-like' plot style when calling
plt.hist(data, histtype='step')
but the 'ordinary' methods that plot raw data without processing (plt.plot(), plt.scatter(), etc.) apparently do not have style options to obtain the same result. My goal is to plot a given set of points using that style, without making histogram of these points.
Is that achievable with standard library methods for plotting a given 2-D set of points?
I also think that there is at least one hack (generating a fake distribution which would have histogram equal to our data) and a 'low-level' solution to draw each segment manually, but none of these ways seems favorable.

Maybe you are looking for drawstyle="steps".
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
data = np.cumsum(np.random.randn(10))
plt.plot(data, drawstyle="steps")
plt.show()
Note that this is slightly different from histograms, because the lines do not go to zero at the ends.

Related

Cannot plot a histogram from a Pandas dataframe

I've used pandas.read_csv to generate a 1000-row dataframe with 32 columns. I'm looking to plot a histogram or bar chart (depending on data type) of each column. For columns of type 'int64', I've tried doing matplotlib.pyplot.hist(df['column']) and df.hist(column='column'), as well as calling matplotlib.pyplot.hist on df['column'].values and df['column'].to_numpy(). Weirdly, nthey all take areally long time (>30s) and when I've allowed them to complet, I get unit-height bars in multiple colors, as if there's some sort of implicit grouping and they're all being separated into different groups. Any ideas about what I can do to get a normal histogram? Unfortunately I closed the charts so I can't show you an example right now.
Edit - this seems to be a much bigger problem with Int columns, and casting them to float fixes the problem.
Follow these two steps:
import the Histogram class from the Matplotlib library
use the "plot" method, which will accept a dataframe as argument
import matplotlib.pyplot as plt
plt.hist(df['column'], color='blue', edgecolor='black', bins=int(45/1))
Here's the source.

Matplotlib widget, secondary y axis, twinx

i use jupyterlab together with matplotlib widgets. I have ipywidgets installed.
My goal is to choose which y-axis data is displayed in the bottom of the figure.
When i use the interactive tool to see the coordinates i get only the data of the right y-axis displayed. Both would be really nice^^ My minimal code example:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib widgets
x=np.linspace(0,100)
y=x**2
y2=x**3
fig,ax=plt.subplots()
ax2=ax.twinx()
ax.plot(x,y)
ax2.plot(x,y2)
plt.show()
With this example you might ask why not to plot them to the same y-axis but thats why it is a minimal example. I would like to plot data of different units.
To choose which y-axis is used, you can set the zorder property of the axes containing this y-axis to a higher value than that of the other axes (0 is the default):
ax.zorder = 1
However, that will cause this Axes to obscure the other Axes. To counteract this, use
ax.set_facecolor((0, 0, 0, 0))
to make the background color of this Axes transparent.
Alternatively, use the grab_mouse function of the figure canvas:
fig.canvas.grab_mouse(ax)
See here for the (minimal) documentation for grab_mouse.
The reason this works is this:
The coordinate line shown below the figure is obtained by an event callback which ultimately calls matplotlib.Axes.format_coord() on the axes instance returned by the inaxes property of the matplotlib events that are being generated by your mouse movement. This Axes is the one returned by FigureCanvasBase.inaxes() which uses the Axes zorder, and in case of ties, chooses the last Axes created.
However, you can tell the figure canvas that one Axes should receive all mouse events, in which case this Axes is also set as the inaxes property of generated events (see the code).
I have not found a clean way to make the display show data from both Axes. The only solution I have found would be to monkey-patch NavigationToolbar2._mouse_event_to_message (also here) to do what you want.

Difference between matplotlib.countourf and matlab.contourf() - odd sharp edges in matplotlib

I am a recent migrant from Matlab to Python and have recently worked with Numpy and Matplotlib. I recoded one of my scripts from Matlab, which employs Matlab's contourf-function, into Python using matplotlib's corresponding contourf-function. I managed to replicate the output in Python, apart that the contourf-plots are not exacly the same, for a reason that is unknown to me. As I run the contourf-function in matplotlib, I get this otherwise nice figure but it has these sharp edges on the contour-levels on top and bottom, which should not be there (see Figure 1 below, matplotlib-output). Now, when I export the arrays I used in Python to Matlab (i.e. the exactly same data set that was used to generate the matplotlib-contourf-plot) and use Matlab's contourf-function, I get a slightly different output, without those sharp contour-level edges (see Figure 2 below, Matlab-output). I used the same number of levels in both figures. In figure 3 I have made a scatterplot of the same data, which shows that there are no such sharp edges in the data as shown in the contourf-plot (I added contour-lines just for reference). Example dataset can be downloaded through Dropbox-link given below. The data set contains three txt-files: X, Y, Z. Each of them are an 500x500 arrays, which can be directly used with contourf(), i.e. plt.contourf(X,Y,Z,...). The code that used was
plt.contourf(X,Y,Z,10, cmap=plt.cm.jet)
plt.contour(X,Y,Z,10,colors='black', linewidths=0.5)
plt.axis('equal')
plt.axis('off')
Does anyone have an idea why this happens? I would appreciate any insight on this!
Cheers,
Jussi
Below are the details of my setup:
Python 3.7.0
IPython 6.5.0
matplotlib 2.2.3
Matplotlib output
Matlab output
Matplotlib-scatter
Link to data set
The confusing thing about the matlab plot is that its colorbar shows much more levels than there are actually in the plot. Hence you don't see the actual intervals that are contoured.
You would achieve the same result in matplotlib by choosing 12 instead of 11 levels.
import numpy as np
import matplotlib.pyplot as plt
X, Y, Z = [np.loadtxt("data/roundcontourdata/{}.txt".format(i)) for i in list("XYZ")]
levels = np.linspace(Z.min(), Z.max(), 12)
cntr = plt.contourf(X,Y,Z,levels, cmap=plt.cm.jet)
plt.contour(X,Y,Z,levels,colors='black', linewidths=0.5)
plt.colorbar(cntr)
plt.axis('equal')
plt.axis('off')
plt.show()
So in conclusion, both plots are correct and show the same data. Just the levels being automatically chosen are different. This can be circumvented by choosing custom levels depending on the desired visual appearance.

Accessing backend specific functionality with Julia Plots

Plots is simple and powerful but sometimes I would like to have a little bit more control over individual elements of the plot to fine-tune its appearance.
Is it possible to update the plot object of the backend directly?
E.g., for the default pyplot backend, I tried
using Plots
p = plot(sin)
p.o[:axes][1][:xaxis][:set_ticks_position]("top")
but the plot does not change. Calling p.o[:show]() afterwards does not help, either.
In other words: Is there a way to use the PyPlot interface for a plot that was initially created with Plots?
Edit:
The changes to the PyPlot object become visible (also in the gui) when saving the figure:
using Plots
using PyPlot
p = Plots.plot(sin, top_margin=1cm)
gui() # not needed when using the REPL
gca()[:xaxis][:set_ticks_position]("top")
PyPlot.savefig("test.png")
Here, I used p.o[:axes][1] == gca(). One has to set top_margin=1cm because the plot area is not adjusted automatically (for my actual fine-tuning, this doesn't matter).
This also works for subsequent updates as long as only the PyPlot interface is used. E.g., after the following commands, the plot will have a red right border in addition to labels at the top:
gca()[:spines]["right"][:set_color]("red")
PyPlot.savefig("test.png")
However, when a Plots command like plot!(xlabel="foo") is used, all previous changes made with PyPlot are overwritten (which is not suprising).
The remaining question is how to update the gui interactively without having to call PyPlot.savefig explicitly.
No - the plot is a Plots object, not a PyPlot object. In your specific example you can do plot(sin, xmirror = true).
I'm trying to do the same but didn't find a solution to update an existing plot. But here is a partial answer: you can query information from the PyPlot axes object
julia> Plots.plot(sin, 1:4)
julia> Plots.PyPlot.plt[:xlim]()
(1.0,4.0)
julia> Plots.plot(sin, 20:24)
julia> ax = Plots.PyPlot.plt[:xlim]()
(20.0,24.0)
and it gets updated.

Figures with lots of data points in matplotlib

I generated the attached image using matplotlib (png format). I would like to use eps or pdf, but I find that with all the data points, the figure is really slow to render on the screen. Other than just plotting less of the data, is there anyway to optimize it so that it loads faster?
I think you have three options:
As you mentioned yourself, you can plot fewer points. For the plot you showed in your question I think it would be fine to only plot every other point.
As #tcaswell stated in his comment, you can use a line instead of points which will be rendered more efficiently.
You could rasterize the blue dots. Matplotlib allows you to selectively rasterize single artists, so if you pass rasterized=True to the plotting command you will get a bitmapped version of the points in the output file. This will be way faster to load at the price of limited zooming due to the resolution of the bitmap. (Note that the axes and all the other elements of the plot will remain as vector graphics and font elements).
First, if you want to show a "trend" in your plot , and considering the x,y arrays you are plotting are "huge" you could apply a random sub-sampling to your x,y arrays, as a fraction of your data:
import numpy as np
import matplotlib.pyplot as plt
fraction = 0.50
x_resampled = []
y_resampled = []
for k in range(0,len(x)):
if np.random.rand() < fraction:
x_resampled.append(x[k])
y_resampled.append(y[k])
plt.scatter(x_resampled,y_resampled , s=6)
plt.show()
Second, have you considered using log-scale in the x-axis to increase visibility?
In this example, only the plotting area is rasterized, the axis are still in vector format:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(size=400000)
y = np.random.uniform(size=400000)
plt.scatter(x, y, marker='x', rasterized=False)
plt.savefig("norm.pdf", format='pdf')