Does matplotlib autoscale have a default minimum tick size? Can this be changed? - matplotlib

I am using pyplot.scatter(x_coords, y_coords) to plot some points. When the points have very small granularity, the tick size is not scaled below 0.0002 like it should be.
I have tried using ax.autoscale(tight=True), but the result did not change. Is there a way to autoscale my axes when points have a small granularity without manually finding and setting the axis limits?
These graphs should explain what my problem is. Both graphs are generated using the same code, but given different data sets. The values along the y-axis of the lower graph are not all 0 - they are spread out on the 10^-9 order of magnitude.

Related

How Can I Find Peak Values of Defined Areas from Spectrogram Data using numpy?

I have spectrogram data from an audio analysis which looks like this:
On one axis I have frequencies in Hz and in the other times in seconds. I added the grid over the map to show the actual data points. Due to the nature of the used frequency analysis, the best results never give evenly spaced time and frequency values.
To allow comparison data from multiple sources, I would like to normalize this data. For this reason, I would like to calculate the peak values (maximum and minimum values) for specified areas in the map.
The second visualization shows the areas where I would like to calculate the peak values. I marked an area with a green rectangle to visualize this.
While for the time values, I would like to use equally spaced ranges (e.g 0.0-10.0, 10.0-20.0, 20.0-30.0), The frequency ranges are unevenly distributed. In higher frequencies, they will be like 450-550, 550-1500, 1500-2500, ...
You can download an example data-set here: data.zip. You can unpack the datasets like this:
with np.load(DATA_PATH) as data:
frequency_labels = data['frequency_labels']
time_labels = data['time_labels']
spectrogram_data = data['data']
DATA_PATH has to point to the path of the .npz data file.
As input, I would provide an array of frequency and time ranges. The result should be another 2d NumPy ndarray with either the maximum or the minimum values. As the amount of data is huge, I would like to rely on NumPy as much as possible to speed up the calculations.
How do I calculate the maximum/minimum values of defined areas from a 2d data map?

"Zoom in" on a violinplot whilst keeping accurate quartile lines (matplotlib/seaborn)

TL;DR: How can I get a subrange of a violinplot whilst keeping accurate quartile lines?
I am using seaborn violinplots to make static charts for a report, but as far as I can tell, there's no way to redraw a particular area between limits whilst retaining the 25/median/75 quartile lines of the original dataset.
Here's my example dataset as a violin. The 25/median/75 values are left side: 1.0/5.0/9.0; right side: 2.0/5.0/9.0
My data has such a long tail that all the useful info is scrunched up into a tiny area. I want to ignore (but not throw away) the tail and show a closer look at the interesting bit.
I tried to reset the ylim using ax.set(ylim=(0, upp)), but the resultant graph is not great: it's jaggy and the inner lines don't meet the violin edge.
Is there a way to reset the y-axis limits but get a better quality result?
Next I tried to cut off the tail by dropping values from the dataset. I dropped anything over the 97th centile. The violin looks way better, but the quartile lines have been recalculated for this new dataset. They're showing a median of about 4, not 5 as per the original dataset.
I'm using inner="quartile", so the code that gets called in Seaborn is _ViolinPlotter::draw_quartiles
def draw_quartiles(self, ax, data, support, density, center, split=False):
"""Draw the quartiles as lines at width of density."""
q25, q50, q75 = np.percentile(data, [25, 50, 75])
self.draw_to_density(ax, center, q25, support, density, split,
linewidth=self.linewidth,
dashes=[self.linewidth * 1.5] * 2)
As you can see, it assumes (understandably) that one wants to draw the quartile lines at percentiles 25, 50 and 75. It'd be amazeballs if there was a way I could call draw_to_density with my own values (is there?).
At the moment, I am attempting to manually adjust the position of the lines. It's trivial to figure out & set the y-values:
for l in ax.lines:
l.set_ydata(<get correct quartile value from original dataset>)
but I'm finding it hard to figure out the limits for x, i.e. the density of the distribution at the quartiles. It seems to involve gaussian kde, and tbh it's getting hacky and inelegant at this point. Is there an easy way to calculate how long each line should be?
What do you suggest?
Thanks for your help
Lnr
W/ Thanks to #JohanC.
added gridsize=1000 to the params of the violinplot and used ax.set(ylim=(0, upp)) to resize the y-axis to show the range from 0 to upp where upp is the upper limit. Much prettier lookin' graph:

Scale domain vs filter selection in vega-lite: automatic axis scaling

In Scale Domains docs of Vega-Lite it is noted:
An alternate way to construct this technique would be to filter out
the input data to the top (detail) view like so:
{
"vconcat": [{
"transform": [{"filter": {"selection": "brush"}}],
...
}]
}
Which is indeed almost the same (although filter method being much slower, as noted in the docs), except for one difference:
With filter-selection method (demo), the y-axis of the upper chart will be automatically zoomed in to the selected points. This is pretty neat, especially if you have large amount of points.
With scale-domain method (demo), the y-axis remains frozen as you move the selection around.
The question: is it possible to have the y-axis automatically zoom in to the selected points as you move the selection, with "scale domain" method (same as it does with filter-selection method)?
Why is the above difference important? Imagine a stock price that has been increasing on average by a total of $1 every day last year (but within a particular day it may have experinced any kind of volatile behaviour) and we're plotting it with line marks. If you plot the entire year, you see the whole picture. If you zoom in on a particular day without resetting your y-axis zoom, however, your intraday price plot will be just a flat line, or close to that.
// I've checked all scale-domain-related issues on vega-lite, on altair repo and SO and couldn't find anything related; I've also posted this question on vega-lite repo on GH, but was forwarded over to SO.
No. Unless otherwise specified, the y scale is determined from all of the data within the plot.
When you filter the data, the data in the plot changes, which causes the y axis to change. When you change the scale based on an x-selection without filtering the data, it does not change the data in the plot, and so the y scale remains constant.
If you want the y-scale to be determined automatically based on the data within the selection, the only option is to filter on that selection.

A plot describing the density of data points in 2D space in Julia

I am trying to use Julia to create a gif animation showing the change of density of data points with time (the data points are at the beginning concentrated at the center, and than spread to the sides, a little bit like 2D Gaussian of variance increasing with time). I have checked a catalogue of available kinds of plots in Julia:
http://docs.juliaplots.org/latest/examples/gr/
And I have tried contour plot, heatmap and 2D histogram. However, it seems that the grids of a heatmap or a contour plot have to be manually specified which is highly inconvenient. A 2D histogram serves the purpose better, but it's more related to the number of data points and when I want the plot to be more continuous by setting more bins, it cannot describe the density of data points well. Are there any good substitutes of the 2D density plot in matplotlib in Julia as the following?
https://python-graph-gallery.com/85-density-plot-with-matplotlib/
You use a package like KernelDensity to calculate the point density, then plot that. Here's an example
using StatsPlots, KernelDensity
a, b = randn(10000), randn(10000)
dens = kde((a,b))
plot(dens)
The philosophy, in the Plots package and other places in Julia, is that you generate the object you are interested in first, and then dispatch takes care of plotting it correctly.
Alternatively, you can always use PyPlot to plot anything using matplotlib directly.

How do I increase the the size of subplots in pair plot?

I've a dataset in which there are 15 different numeric columns and I would like to plot a pair plot using seaboard. However the image size of subplots is too small to make any inference from it.
I've tried using height and aspect with pair plot. However it doesn't seems to be working for me. The plot size keeps on reducing. The same goes for fig size.
plt.figure(figsize=(40,40))
sns.pairplot(df)
plt.show()
I'm expecting a a good enough size of all the pairs so that some inference can be made on the same. However I'm getting plots too small in size to even recognise the column name.
The command works for me.
I was not aware that in Jupyter notebook we can maximise the output to its actual size.
So essentially, below works just fine.
plt.figure(figsize=(100,100))
sns.pairplot(df)
plt.show()