Matplotlib multiple scatter subplots - reduce svg file size - matplotlib

I generated a plot in Matplotlib which consists of 50 subplots. In each of these subplots I have a scatterplot with about 3000 datapoints. I'm doing this, because I just want to have an overview of the different scatter plots in a document I'm working on.
This also works so far and looks nice, but the problem is obviously that the SVG file that I'm getting is really big (about 15 MB). And Word just can't handle such a big SVG file.
So my question: is there a way to optimize this SVG file? A lot of my datapoints in the scatter plots are overlapping each other, so I guess it should be possible remove many "invisible" ones of them without changing the visible output. (so something like this in illustrator seems to be what I want to do: Link) Is it also possible to do something like this in Inkscape? Or even directly in Matplotlib?
I know that I can just produce a PNG file, but I would prefer to have the plot as a vector graphic in my document.

If you want to keep all the data points as vector graphics, its unlikely you'll be able to reduce the file size.
While not ideal, one potential option is to rasterize only the data points created by ax.scatter, and leave the axes, labels, titles, etc. all as vector elements on your figure. This can dramatically reduce the file size, and if you set the dpi high enough, you probably won't lose any useful information from the plot.
You can do this by setting rasterized=True when calling ax.scatter.
You can then control the dpi of the rasterized elements using dpi=300 (or whatever dpi you want) when you fig.savefig.
Consider the following:
import matplotlib.pyplot as plt
figV, axesV = plt.subplots(nrows=10, ncols=5)
figR, axesR = plt.subplots(nrows=10, ncols=5)
for ax in figV.axes:
ax.scatter(range(3000), range(3000))
for ax in figR.axes:
ax.scatter(range(3000), range(3000), rasterized=True)
figV.savefig('bigscatterV.svg')
figR.savefig('bigscatterR.svg', dpi=300)
bigscatterV.svg has a file size of 16MB, while bigscatterR.svg has a file size of only 250KB.

Related

Dots repel in Pandas scatterplot

I am trying to draw a scatterplot where visualizing all the dots hue and size is important.
However, some dots are localized at the same location x,y therefore they overlap, and we cannot see them well.
I know there is the equivalent of the 'repel' function in Pandas for the dots labels with the following script.
https://github.com/Phlya/adjustText
Would anyone know if there is another software that allow to repel the dots themselves, and not just the text annotation?

A plot describing the density of data points in 2D space in Julia

I am trying to use Julia to create a gif animation showing the change of density of data points with time (the data points are at the beginning concentrated at the center, and than spread to the sides, a little bit like 2D Gaussian of variance increasing with time). I have checked a catalogue of available kinds of plots in Julia:
http://docs.juliaplots.org/latest/examples/gr/
And I have tried contour plot, heatmap and 2D histogram. However, it seems that the grids of a heatmap or a contour plot have to be manually specified which is highly inconvenient. A 2D histogram serves the purpose better, but it's more related to the number of data points and when I want the plot to be more continuous by setting more bins, it cannot describe the density of data points well. Are there any good substitutes of the 2D density plot in matplotlib in Julia as the following?
https://python-graph-gallery.com/85-density-plot-with-matplotlib/
You use a package like KernelDensity to calculate the point density, then plot that. Here's an example
using StatsPlots, KernelDensity
a, b = randn(10000), randn(10000)
dens = kde((a,b))
plot(dens)
The philosophy, in the Plots package and other places in Julia, is that you generate the object you are interested in first, and then dispatch takes care of plotting it correctly.
Alternatively, you can always use PyPlot to plot anything using matplotlib directly.

How do I increase the the size of subplots in pair plot?

I've a dataset in which there are 15 different numeric columns and I would like to plot a pair plot using seaboard. However the image size of subplots is too small to make any inference from it.
I've tried using height and aspect with pair plot. However it doesn't seems to be working for me. The plot size keeps on reducing. The same goes for fig size.
plt.figure(figsize=(40,40))
sns.pairplot(df)
plt.show()
I'm expecting a a good enough size of all the pairs so that some inference can be made on the same. However I'm getting plots too small in size to even recognise the column name.
The command works for me.
I was not aware that in Jupyter notebook we can maximise the output to its actual size.
So essentially, below works just fine.
plt.figure(figsize=(100,100))
sns.pairplot(df)
plt.show()

How do I save color mapped array of same dimensions of the original array?

I have data that I would like to save as png's. I need to keep the exact pixel dimensions - I don't want any inter-pixel interpolation, smoothing, or up/down sizing, etc. I do want to use a colormap, though (and mayber some other features of matplotlib's imshow). As I see it there are a couple ways I could do this:
1) Manually roll my own colormapping. (I'd rather not do this)
2) Figure out how to make sure the pixel dimenensions of the image in the figure produced by imshow are exactly correct, and then extract just the image portion of the figure for saving.
3) Use some other method which will directly give me a color mapped array (i.e. my NxN grayscale array -> NxNx3 array, using one of matplotlibs colormaps). Then save it using another png save method such as scipy.misc.imsave.
How can I do one of the above? (Or another alternate)
My problem arose when I was just saving the figure directly using savefig, and realized that I couldn't zoom into details. Upscaling wouldn't solve the problem, since the blurring between pixels is exactly one of the things I'm looking for - and the pixel size has a physical meaning.
EDIT:
Example:
import numpy as np
import matplotlib.pyplot as plt
X,Y = np.meshgrid(np.arange(-50.0,50,.1), np.arange(-50.0,50,.1))
Z = np.abs(np.sin(2*np.pi*(X**2+Y**2)**.5))/(1+(X/20)**2+(Y/20)**2)
plt.imshow(Z,cmap='inferno', interpolation='nearest')
plt.savefig('colormapeg.png')
plt.show()
Note zooming in on the interactive figure gives you a very different view then trying to zoom in on the saved figure. I could up the resolution of the saved figure - but that has it's own problems. I really just need the resolution fixed.
It seems you are looking for plt.imsave().
In this case,
plt.imsave("filename.png", Z, cmap='inferno')

How to control the specific size of plot in matplotlib?

Let us suppose that I am plotting a few plots with pyplot/matplotlib. Now, the first has to have tick marks and tick labels, and only the first. The last has to have a colorbar and some marks for scale. If I do a script specifying the figure size, the plot proper in the last and first plots is drawn with smaller sizes, as the figure has to make room for the extra markings. And I seem to be not able to control that, in an automatic way, like making the other plots at the same scale inside a larger figure or something like that.
Example code (it looks a little non-pythonic because I am using PyPlot inside Julia):
using PyPlot
SomeData=randn(64,64,3)
for t=1:3
figure(figsize=(3.0,3.0))
imagen=imshow(SomeData[:,:,t], origin="lower")
if t!=3
xticks([])
yticks([])
else
tick_params(labelsize=8, direction="out")
end
if t==1
cbx=colorbar(imagen, fraction=0.045, ticks=[])
cbx[:set_label]("Some proper English Label", fontsize=8)
end
savefig("CSD-$t.svg",dpi=92)
end
Thanks in advance-