Incorrect marker sizes with Seaborn relplot and scatterplot relative to legend - pandas

I'm trying to understand how to get the legend examples to align with the dots plotted using Seaborn's relplot in a Jupyter notebook. I have a size (float64) column in my pandas DataFrame df:
sns.relplot(x="A", y="B", size="size", data=df)
The values in the size column are [0.0, -7.0, -14.0, -7.0, 0.0, 1.0, 0.0, 0.0, 0.0, -1.0, 0.0, 8.0, 2.0, 0.0, -4.0, 7.0, -4.0, 0.0, 0.0, 4.0, 0.0, 0.0, -3.0, 0.0, 1.0, 7.0] and as you can see, the minimum value is -14 and the maximum value is 8. It looks like the legend is aligned well with that. However, look at the actual dots plotted, there's a dot considerably smaller than the one corresponding to -16 in the legend. There's also no dot plotted as large as the 8 in the legend.
What am I doing wrong -- or is this a bug?
I'm using pandas 0.24.2 and seaborn 0.9.0.
Edit:
Looking closer at the Seaborn relplot example:
the smallest weight is 1613 but there's an orange dot to the far left in the plot that's smaller than the dot for 1500 in the legend. I think this points to this being a bug.

Not sure what seaborn does here, but if you're willing to use matplotlib alone, it could look like
import numpy as np; np.random.rand
import matplotlib.pyplot as plt
import pandas as pd
s = [0.0, -7.0, -14.0, -7.0, 0.0, 1.0, 0.0, 0.0, 0.0, -1.0, 0.0, 8.0, 2.0,
0.0, -4.0, 7.0, -4.0, 0.0, 0.0, 4.0, 0.0, 0.0, -3.0, 0.0, 1.0, 7.0]
x = np.linspace(0, 2*np.pi, len(s))
y = np.sin(x)
df = pd.DataFrame({"A" : x, "B" : y, "size" : s})
# calculate some sizes in points^2 from the initial values
smin = df["size"].min()
df["scatter_sizes"] = 0.25 * (df["size"] - smin + 3)**2
# state the inverse of the above transformation
finv = lambda y: 2*np.sqrt(y)+smin-3
sc = plt.scatter(x="A", y="B", s="scatter_sizes", data=df)
plt.legend(*sc.legend_elements("sizes", func=finv), title="Size")
plt.show()
More details are in the Scatter plots with a legend example.

Related

Is there a way to have TWO axis with DIFFERENT labels in MATPLOTLIB?

I'm trying to create a plot that has one x-axis on the bottom with text-based labels and I want another one on the top, that has different text-based labels.
The closest thing I found so far are 'secondary_axis' (Link), but when I try to fill in text-based labels, a TypeError: unhashable type: numpy.ndarray is raised.
I produced a the following sample based on some code from this:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
# sphinx_gallery_thumbnail_number = 2
vegetables = ["cucumber", "tomato", "lettuce", "asparagus",
"potato", "wheat", "barley"]
farmers = ["Farmer Joe", "Upland Bros.", "Smith Gardening",
"Agrifun", "Organiculture", "BioGoods Ltd.", "Cornylee Corp."]
vegetables_to_farmers = dict(zip(vegetables, farmers))
farmers_to_vegetables = dict(zip(farmers, vegetables))
harvest = np.array([[0.8, 2.4, 2.5, 3.9, 0.0, 4.0, 0.0],
[2.4, 0.0, 4.0, 1.0, 2.7, 0.0, 0.0],
[1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 0.0],
[0.6, 0.0, 0.3, 0.0, 3.1, 0.0, 0.0],
[0.7, 1.7, 0.6, 2.6, 2.2, 6.2, 0.0],
[1.3, 1.2, 0.0, 0.0, 0.0, 3.2, 5.1],
[0.1, 2.0, 0.0, 1.4, 0.0, 1.9, 6.3]])
fig, ax = plt.subplots()
im = ax.imshow(harvest)
# We want to show all ticks...
ax.set_xticks(np.arange(len(farmers)))
ax.set_yticks(np.arange(len(vegetables)))
# ... and label them with the respective list entries
ax.set_xticklabels(farmers)
# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
rotation_mode="anchor")
def vegetables2farmers(x):
return vegetables_to_farmers[x]
def farmers2vegetables(x):
return farmers_to_vegetables[x]
secax = ax.secondary_xaxis('top', functions=(vegetables2farmers, farmers2vegetables))
fig.tight_layout()
plt.show()
It would be optimal to have the vegetables as labels in the top bar. Do you have any idea how to approach this?
You're mixing the concept of axes scales with axes labels. The function and its inverse for a secondary_axis is meant to transform the coordinates, not the labels.
In principle using a secondary axis would be nice, if it allowed to set a different formatter. This is unfortunately not the case as of now.
An alternative could be the use of a twin axes. But that won't work for axes with equal aspect.
So as a last resort, one could just create a new axes on top of the old one and format it consistently:
import numpy as np
import matplotlib.pyplot as plt
vegetables = ["cucumber", "tomato", "lettuce", "asparagus",
"potato", "wheat", "barley"]
farmers = ["Farmer Joe", "Upland Bros.", "Smith Gardening",
"Agrifun", "Organiculture", "BioGoods Ltd.", "Cornylee Corp."]
farmers_to_vegetables = dict(zip(farmers, vegetables))
harvest = np.array([[0.8, 2.4, 2.5, 3.9, 0.0, 4.0, 0.0],
[2.4, 0.0, 4.0, 1.0, 2.7, 0.0, 0.0],
[1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 0.0],
[0.6, 0.0, 0.3, 0.0, 3.1, 0.0, 0.0],
[0.7, 1.7, 0.6, 2.6, 2.2, 6.2, 0.0],
[1.3, 1.2, 0.0, 0.0, 0.0, 3.2, 5.1],
[0.1, 2.0, 0.0, 1.4, 0.0, 1.9, 6.3]])
fig, ax = plt.subplots()
im = ax.imshow(harvest)
# We want to show all xticks...
ax.set_xticks(np.arange(len(farmers)))
# ... and label them with the respective list entries
ax.set_xticklabels(farmers)
# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
rotation_mode="anchor")
ax2 = fig.add_subplot(111, label="secondary")
ax2.set_aspect("equal")
ax2.set_xlim(ax.get_xlim())
ax2.set_ylim(ax.get_ylim())
ax2.tick_params(left=False, labelleft=False, bottom=False, labelbottom=False,
top=True, labeltop=True)
ax2.set_facecolor("none")
for _, spine in ax2.spines.items():
spine.set_visible(False)
ax2.set_xticks(ax.get_xticks())
ax2.set_xticklabels([farmers_to_vegetables[x.get_text()] for x in ax.get_xticklabels()])
plt.setp(ax2.get_xticklabels(), rotation=45, ha="left",
rotation_mode="anchor")
fig.tight_layout()
plt.show()

numpy array changes to string when writing to file

I have a dataframe where one of the columns is a numpy array:
DF
Name Vec
0 Abenakiite-(Ce) [0.0, 0.0, 0.0, 0.0, 0.0, 0.043, 0.0, 0.478, 0...
1 Abernathyite [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
2 Abhurite [0.176, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.235, 0...
3 Abswurmbachite [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.0,...
When I check the data type of each element, the correct data type is returned.
type(DF['Vec'].iloc[1])
numpy.ndarray
I save this into a csv file:
DF.to_csv('.\\file.csv',sep='\t')
Now, when I read the file again,
new_DF=pd.read_csv('.\\file.csv',sep='\t')
and check the datatype of Vec at index 1:
type(new_DF['Vec'].iloc[1])
str
The size of the numpy array is 1x127.
The data type has changed from a numpy array to a string. I can also see some new line elements in the individual vectors. I think this might be due to some problem when the vector is written into a csv but I don't know how to fix it. Can someone please help?
Thanks!
In the comments I made a mistake and said dtype instead of converters. What you want is to convert them as you read them using a function. With some dummy variables:
df=pd.DataFrame({'name':['name1','name2'],'Vec':[np.array([1,2]),np.array([3,4])]})
df.to_csv('tmp.csv')
def converter(instr):
return np.fromstring(instr[1:-1],sep=' ')
df1=pd.read_csv('tmp.csv',converters={'Vec':converter})
df1.iloc[0,2]
array([1., 2.])

Converting gnuplot color map to matplotlib

I am trying to port some plotting code from gnuplot to matplotlib and am struggling with porting a discontinuous color map that is specified by color names. Any suggestions on how to do this in matplotlib?
# Establish a 3-section color palette with lower 1/4 in the blues,
# and middle 1/2 light green to yellow, and top 1/4 reds
set palette defined (0 'dark-blue', 0.5 'light-blue', \\
0.5 'light-green', 1 'green', 1.5 'yellow', \\
1.5 'red', 2 'dark-red')
# Establish that the palette range, such that the middle green range corresponds
# to 0.95 to 1.05
set cbrange [0.9:1.1]
I've used this script for years, can't really remember how or where I got it (edit: after some searching, this seems to be the source, but it requires some minor changes for Python3), but it has helped me a lot in quickly creating custom color maps. It allows you to simply specify a dictionary with locations (0..1) and colors, and creates a linear color map out of that; e.g. make_colormap({0:'w',1:'k'}) creates a linear color map going from white to black.
import numpy as np
import matplotlib.pylab as pl
def make_colormap(colors):
from matplotlib.colors import LinearSegmentedColormap, ColorConverter
from numpy import sort
z = np.array(sorted(colors.keys()))
n = len(z)
z1 = min(z)
zn = max(z)
x0 = (z - z1) / (zn - z1)
CC = ColorConverter()
R = []
G = []
B = []
for i in range(n):
Ci = colors[z[i]]
if type(Ci) == str:
RGB = CC.to_rgb(Ci)
else:
RGB = Ci
R.append(RGB[0])
G.append(RGB[1])
B.append(RGB[2])
cmap_dict = {}
cmap_dict['red'] = [(x0[i],R[i],R[i]) for i in range(len(R))]
cmap_dict['green'] = [(x0[i],G[i],G[i]) for i in range(len(G))]
cmap_dict['blue'] = [(x0[i],B[i],B[i]) for i in range(len(B))]
mymap = LinearSegmentedColormap('mymap',cmap_dict)
return mymap
test1 = make_colormap({0.:'#40004b',0.5:'#ffffff',1.:'#00441b'})
test2 = make_colormap({0.:'b',0.25:'w',0.251:'g',0.75:'y',0.751:'r',1:'k'})
data = np.random.random((10,10))
pl.figure()
pl.subplot(121)
pl.imshow(data, interpolation='nearest', cmap=test1)
pl.colorbar()
pl.subplot(122)
pl.imshow(data, interpolation='nearest', cmap=test2)
pl.colorbar()
Bart's function is very nice. However, if you want to make the colormap yourself, you can define a colormap like this using a dictionary in the way it is done in the custom_cmap example from the mpl website.
Here's an example that's pretty close to your colormap:
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import numpy as np
cdict = {'red': ((0.0, 0.0, 0.0), # From 0 to 0.25, we fade the red and green channels
(0.25, 0.5, 0.5), # up a little, to make the blue a bit more grey
(0.25, 0.0, 0.0), # From 0.25 to 0.75, we fade red from 0.5 to 1
(0.75, 1.0, 1.0), # to fade from green to yellow
(1.0, 0.5, 0.5)), # From 0.75 to 1.0, we bring the red down from 1
# to 0.5, to go from bright to dark red
'green': ((0.0, 0.0, 0.0), # From 0 to 0.25, we fade the red and green channels
(0.25, 0.6, 0.6), # up a little, to make the blue a bit more grey
(0.25, 1.0, 1.0), # Green is 1 from 0.25 to 0.75 (we add red
(0.75, 1.0, 1.0), # to turn it from green to yellow)
(0.75, 0.0, 0.0), # No green needed in the red upper quarter
(1.0, 0.0, 0.0)),
'blue': ((0.0, 0.9, 0.9), # Keep blue at 0.9 from 0 to 0.25, and adjust its
(0.25, 0.9, 0.9), # tone using the green and red channels
(0.25, 0.0, 0.0), # No blue needed above 0.25
(1.0, 0.0, 0.0))
}
cmap = colors.LinearSegmentedColormap('BuGnYlRd',cdict)
data = 0.9 + (np.random.rand(8,8) * 0.2) # Data in range 0.9 to 1.1
p=plt.imshow(data,interpolation='nearest',cmap=cmap,vmin=0.9,vmax=1.1)
plt.colorbar(p)
plt.show()

importing nested dictionaries to pandas

How do I convert a list of nested dictionaries to pandas dataframe?
In [123]:
metric_list
Out[123]:
[{'14-09-18': {u'AWSDataTransfer': 0.14,
u'AmazonDynamoDB': 0.0,
u'AmazonEC2': 15.0,
u'AmazonGlacier': 0.0,
u'AmazonRedshift': 275.4,
u'AmazonS3': 16.29,
u'AmazonSNS': 0.0}},
{'14-09-17': {u'AWSDataTransfer': 0.14,
u'AmazonDynamoDB': 0.0,
u'AmazonEC2': 13.75,
u'AmazonGlacier': 0.0,
u'AmazonRedshift': 249.05,
u'AmazonS3': 16.28,
u'AmazonSNS': 0.0}}]
I can not directly import the data to dataframe as shown below:
In [127]:
pd.DataFrame(metric_list)
Out[127]:
14-09-17 14-09-18
0 NaN {u'AmazonEC2': 15.0, u'AWSDataTransfer': 0.14,...
1 {u'AmazonEC2': 13.75, u'AWSDataTransfer': 0.14... NaN
try this:
import pandas as pd
data = [{'14-09-18': {u'AWSDataTransfer': 0.14,
u'AmazonDynamoDB': 0.0,
u'AmazonEC2': 15.0,
u'AmazonGlacier': 0.0,
u'AmazonRedshift': 275.4,
u'AmazonS3': 16.29,
u'AmazonSNS': 0.0}},
{'14-09-17': {u'AWSDataTransfer': 0.14,
u'AmazonDynamoDB': 0.0,
u'AmazonEC2': 13.75,
u'AmazonGlacier': 0.0,
u'AmazonRedshift': 249.05,
u'AmazonS3': 16.28,
u'AmazonSNS': 0.0}}]
pd.concat([pd.DataFrame.from_dict(item, orient="index") for item in data])
mylist=[]
for i in metric_list:
for k, v in i.items():
for a, b in v.items():
mytup = (k, a, b)
mylist.append(mytup)
df = pd.DataFrame(mylist)
df.columns =['date', 'type', 'amount']
df.set_index(['date', 'type'])

Values of image pixels according to the colorbar

Let a four-pixel image is as follows
image = array([[[0.0, 0.0, 1.0],
[0.0, 0.0, 1.0]],
[[0.0, 0.0, 1.0],
[0.0, 0.0, 1.0]]], dtype=float32)
that is, all four pixels are blue and in the colour scale bar their values are zero. I want to estimate the sum of all pixel values of an image according to the scale bar. For example, for the above case the sum of all pixel values is 0.0. I tried earlier with image.sum(), but this gives 4.0 and this is not the result I need. Any help please?