Dots repel in Pandas scatterplot - pandas

I am trying to draw a scatterplot where visualizing all the dots hue and size is important.
However, some dots are localized at the same location x,y therefore they overlap, and we cannot see them well.
I know there is the equivalent of the 'repel' function in Pandas for the dots labels with the following script.
https://github.com/Phlya/adjustText
Would anyone know if there is another software that allow to repel the dots themselves, and not just the text annotation?

Related

Matplotlib multiple scatter subplots - reduce svg file size

I generated a plot in Matplotlib which consists of 50 subplots. In each of these subplots I have a scatterplot with about 3000 datapoints. I'm doing this, because I just want to have an overview of the different scatter plots in a document I'm working on.
This also works so far and looks nice, but the problem is obviously that the SVG file that I'm getting is really big (about 15 MB). And Word just can't handle such a big SVG file.
So my question: is there a way to optimize this SVG file? A lot of my datapoints in the scatter plots are overlapping each other, so I guess it should be possible remove many "invisible" ones of them without changing the visible output. (so something like this in illustrator seems to be what I want to do: Link) Is it also possible to do something like this in Inkscape? Or even directly in Matplotlib?
I know that I can just produce a PNG file, but I would prefer to have the plot as a vector graphic in my document.
If you want to keep all the data points as vector graphics, its unlikely you'll be able to reduce the file size.
While not ideal, one potential option is to rasterize only the data points created by ax.scatter, and leave the axes, labels, titles, etc. all as vector elements on your figure. This can dramatically reduce the file size, and if you set the dpi high enough, you probably won't lose any useful information from the plot.
You can do this by setting rasterized=True when calling ax.scatter.
You can then control the dpi of the rasterized elements using dpi=300 (or whatever dpi you want) when you fig.savefig.
Consider the following:
import matplotlib.pyplot as plt
figV, axesV = plt.subplots(nrows=10, ncols=5)
figR, axesR = plt.subplots(nrows=10, ncols=5)
for ax in figV.axes:
ax.scatter(range(3000), range(3000))
for ax in figR.axes:
ax.scatter(range(3000), range(3000), rasterized=True)
figV.savefig('bigscatterV.svg')
figR.savefig('bigscatterR.svg', dpi=300)
bigscatterV.svg has a file size of 16MB, while bigscatterR.svg has a file size of only 250KB.

How to control the specific size of plot in matplotlib?

Let us suppose that I am plotting a few plots with pyplot/matplotlib. Now, the first has to have tick marks and tick labels, and only the first. The last has to have a colorbar and some marks for scale. If I do a script specifying the figure size, the plot proper in the last and first plots is drawn with smaller sizes, as the figure has to make room for the extra markings. And I seem to be not able to control that, in an automatic way, like making the other plots at the same scale inside a larger figure or something like that.
Example code (it looks a little non-pythonic because I am using PyPlot inside Julia):
using PyPlot
SomeData=randn(64,64,3)
for t=1:3
figure(figsize=(3.0,3.0))
imagen=imshow(SomeData[:,:,t], origin="lower")
if t!=3
xticks([])
yticks([])
else
tick_params(labelsize=8, direction="out")
end
if t==1
cbx=colorbar(imagen, fraction=0.045, ticks=[])
cbx[:set_label]("Some proper English Label", fontsize=8)
end
savefig("CSD-$t.svg",dpi=92)
end
Thanks in advance-

Matplotlib's Figure and Axes explanation

I am really pretty new to matplotlib, though I know that it can be very powerful.
I've been reading number of tutorials and examples and it's a real hassle to understand how does matplotlib's Figure and Axes work. I am illustrating, what I am trying to understand, with the attached figure.
I know how to create a figure instance of certain size in inches. However, what bothers me is how can I create subplots and then axes, within each subplot, with relative coordinates (bottom=0,left=0,top=1,right=1) as illustrated.
So, for example I want to create a "parent" plot area (say (6in,10in)). Then, I want to create two subplot areas, each with size (3in,3in), with 1in space from the top, 2in space between the two vertical subplot areas and 1in from bottom. Then, 1in space on the left and 2in space on the write. In the same time, I would like to be able to get the coordinates of the subplot areas with respect to the main plot area.
Then, inside the first subplot area, I'd like to create 2 axis instances, with Axis 1, having coordinates with respect to Subplot Area1 (0.1,0.7,0.7,0.2) and Axes 2 (0.1,0.2,0.7,0.5). And then of course I'd like to be able to plot on these axes e.g., ax1.plot()....
If you could provide a sample code to achieve that, then I can study it.
Your help will be very much appreciated!
a subplot and an Axes object are really the same thing. There is not really a "subplot" as you describe it in matplotlib. You can just create your three Axes objects using gridspec without the need to put them in your "subplots".
There are a few different ways to create Axes instances within your figure.
fig.add_axes will create an Axes instance at the position given to it (you give it [left,bottom,width,height] in figure coordinates (i.e. 0,0 is bottom left, 1,1 is top right).
fig.add_subplot will also create an Axes instance. In this case, rather than giving it a rectangle to be created in, you give it the number of rows and columns of subplots you would like, and then the plot_number, where plot_number starts at 1, increments across rows first and has a maximum of nrows * ncols.
For example, to create the top-left Axes in a grid of 2 row and 2 columns, you could do the following:
fig.add_subplot(2,2,1)
or the shorthand
fig.add_subplot(221)
There are some more customisable ways to create Axes as well, for example gridspec and subplot2grid which allow for easy creation of many subplots of different shapes and sizes.

Seaborn Heatmap Colorbar Location

The cbar_kws argument of seaborn.heatmap accepts the parameters that fig.colobar accepts.
Is there a way to adjust the placement of the colorbar, simply to adjust the location to the left (especially when the correlation matrix is adjusted to have only a lower triangle).
I can adjust the labels by overriding the tick labels. As of now I still have to adjust the upper-right borders in post-processing, but it would make things much easier if I didn't have to edit the color bar as well.
heatmap accepts a cbar_ax argument; if you want to specify the position of the colorbar, the best thing to do is to set up the figure how you want it and then pass the specific axes.
You can also move axes around after plotting through normal matplotlib commands.

Selecting a single color from a matplotlib colormap in Juila

I'm constructing a graph plot in Julia and need to color each edge of the graph differently, based on some weighting factor. I can't find a way to get a specific RGB (or HSV, it doesn't matter) value from a colormap. Let's say I'd like to get the RGB value on 'jet' that would correspond to a data value of n on imshow plot.
In python, I would just use jet(n), where n is the value along the colormap in which I am interested. PyPlot in Julia doesn't seem to have wrapped this functionality. I've also already tried indexing into the cmap object returned from get_cmap(). Any advice?
I'm stumped, so even an approximate solution would help. Thanks!
Maybe you can look at the Colors.jl package (https://github.com/JuliaGraphics/Colors.jl):
using Colors
palette = colormap("Oranges", 100)
Then you can access each color with palette[n]. Or are you using PyCall? A code describing what you're trying to do would help.