How to change legend scale - folium choropleth map - legend

I have some sort of error with the scale of the legend on my choropleth map in folium.
I have tried to use the bin size to change it but can't work out what I am doing wrong. no mater what numbers i input, the lgend changes in regards to colours on the map but they are still all grouped weirdly on the legend
this is what the map looks like
here is the code
bins = list(dfForces1['Total'].quantile([0, 0.1, 0.2, 0.3,0.5,0.6,0.7,0.8,0.9, 1]))
m1 = folium.Map(location=[50, 3], zoom_start=6.5, tiles='Mapbox Bright')
m1.choropleth(
geo_data=forces_json, #this is reading the geographical data from the json file
data=dfForces1, #this is reading my df
columns=['Police force sent NRM referral for Crime Recording', 'Total'], #which columns do I want it to read
key_on='feature.properties.objectid',
fill_color='YlGn',
fill_opacity=0.8,
line_opacity=0.3,
bins= bins,
nan_fill_color = 'white',
legend_name='Number of NRM Referrals Received',
highlight=True
)

Related

make specific data points in scatter plot seaborn more visible [duplicate]

I have a Seaborn scatterplot and am trying to control the plotting order with 'hue_order', but it is not working as I would have expected (I can't get the blue dot to show on top of the gray).
x = [1, 2, 3, 1, 2, 3]
cat = ['N','Y','N','N','N']
test = pd.DataFrame(list(zip(x,cat)),
columns =['x','cat']
)
display(test)
colors = {'N': 'gray', 'Y': 'blue'}
sns.scatterplot(data=test, x='x', y='x',
hue='cat', hue_order=['Y', 'N', ],
palette=colors,
)
Flipping the 'hue_order' to hue_order=['N', 'Y', ] doesn't change the plot. How can I get the 'Y' category to plot on top of the 'N' category? My actual data has duplicate x,y ordinates that are differentiated by the category column.
The reason this is happening is that, unlike most plotting functions, scatterplot doesn't (internally) iterate over the hue levels when it's constructing the plot. It draws a single scatterplot and then sets the color of the elements with a vector. It does this so that you don't end up with all of the points from the final hue level on top of all the points from the penultimate hue level on top of all the ... etc. But it means that the scatterplot z-ordering is insensitive to the hue ordering and reflects only the order in the input data.
So you could use your desired hue order to sort the input data:
hue_order = ["N", "Y"]
colors = {'N': 'gray', 'Y': 'blue'}
sns.scatterplot(
data=test.sort_values('cat', key=np.vectorize(hue_order.index)),
x='x', y='x',
hue='cat', hue_order=hue_order,
palette=colors, s=100, # Embiggen the points to see what's happening
)
There may be a more efficient way to do that "sort by list of unique values" built into pandas; I am not sure.
TLDR: Before plotting, sort the data so that the dominant color appears last in the data. Here, it could just be:
test = test.sort_values('cat') # ascending = True
Then you get:
It seems that hue_order doesn't affect the order (or z-order) in which things are plotted. Rather, it affects how colors are assigned. E.g., if you don't specify a specific mapping of categories to colors (i.e. you just use a list of colors or a color palette), this parameter can determine whether 'N' or 'Y' gets the first (and which gets the second) color of the palette. There's an example showing this behavior here in the hue_order section. When you have the dict already linking categories to colors (colors = {'N': 'gray', 'Y': 'blue'}), it seems to just affect the order of labels in the legend, as you probably have seen.
So the key is to make sure the color you want on top is plotted last (and thus "on top"). I would have also assumed the hue_order parameter would do as you expected, but apparently not!

Stratigraphic column in matplotlib

My goal is to create a stratigraphic column (colored stacked rectangles) using matplotlib like the example below.
Data is in this format:
depth = [1,2,3,4,5,6,7,8,9,10] #depth (feet) below ground surface
lithotype = [4,4,4,5,5,5,6,6,6,2] #lithology type. 4 = clay, 6 = sand, 2 = silt
I tried matplotlib.patches.Rectangle but it's cumbersome. Wondering if someone has another suggestion.
Imho using Rectangle is not so difficult nor cumbersome.
from numpy import ones
from matplotlib.pyplot import show, subplots
from matplotlib.cm import get_cmap
from matplotlib.patches import Rectangle as r
# a simplification is to use, for the lithology types, a qualitative colormap
# here I use Paired, but other qualitative colormaps are displayed in
# https://matplotlib.org/stable/tutorials/colors/colormaps.html#qualitative
qcm = get_cmap('Paired')
# the data, augmented with type descriptions
# note that depths start from zero
depth = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # depth (feet) below ground surface
lithotype = [4, 4, 4, 5, 5, 5, 6, 1, 6, 2] # lithology type.
types = {1:'swiss cheese', 2:'silt', 4:'clay', 5:'silty sand', 6:'sand'}
# prepare the figure
fig, ax = subplots(figsize = (4, 8))
w = 2 # a conventional width, used to size the x-axis and the rectangles
ax.set(xlim=(0,2), xticks=[]) # size the x-axis, no x ticks
ax.set_ylim(ymin=0, ymax=depth[-1])
ax.invert_yaxis()
fig.suptitle('Soil Behaviour Type')
fig.subplots_adjust(right=0.5)
# plot a series of dots, that eventually will be covered by the Rectangle\s
# so that we can draw a legend
for lt in set(lithotype):
ax.scatter(lt, depth[1], color=qcm(lt), label=types[lt], zorder=0)
fig.legend(loc='center right')
ax.plot((1,1), (0,depth[-1]), lw=0)
# do the rectangles
for d0, d1, lt in zip(depth, depth[1:], lithotype):
ax.add_patch(
r( (0, d0), # coordinates of upper left corner
2, d1-d0, # conventional width on x, thickness of the layer
facecolor=qcm(lt), edgecolor='k'))
# That's all, folks!
show()
As you can see, placing the rectangles is not complicated, what is indeed cumbersome is to properly prepare the Figure and the Axes.
I know that I omitted part of the qualifying details from my solution, but I hope these omissions won't stop you from profiting from my answer.
I made a package called striplog for handling this sort of data and making these kinds of plots.
The tool can read CSV, LAS, and other formats directly (if the format is rather particular), but we can also construct a Striplog object manually. First let's set up the basic data:
depth = [1,2,3,4,5,6,7,8,9,10]
lithotype = [4,4,4,5,5,5,6,6,6,2]
KEY = {2: 'silt', 4: 'clay', 5: 'mud', 6: 'sand'}
Now you need to know that a Striplog is composed of Interval objects, each of which can have one or more Component elements:
from striplog import Striplog, Component, Interval
intervals = []
for top, base, lith in zip(depth, depth[1:], lithotype):
comp = Component({'lithology': KEY[lith]})
iv = Interval(top, base, components=[comp])
intervals.append(iv)
s = Striplog(intervals).merge_neighbours() # Merge like with like.
This results in Striplog(3 Intervals, start=1.0, stop=10.0). Now we'd like to make a plot using an appropriate Legend object.
from striplog import Legend
legend_csv = u"""colour, width, component lithology
#F7E9A6, 3, Sand
#A68374, 2.5, Silt
#99994A, 2, Mud
#666666, 1, Clay"""
legend = Legend.from_csv(text=legend_csv)
s.plot(legend=legend, aspect=2, label='lithology')
Which gives:
Admittedly the plotting is a little limited, but it's just matplotlib so you can always add more code. To be honest, if I were to build this tool today, I think I'd probably leave the plotting out entirely; it's often easier for the user to do their own thing.
Why go to all this trouble? Fair question. striplog lets you merge zones, make thickness or lithology histograms, make queries ("show me sandstone beds thicker than 2 m"), make 'flags', export LAS or CSV, and even do Markov chain sequence analysis. But even if it's not what you're looking for, maybe you can recycle some of the plotting code! Good luck.

How to manually scale a continuous legend in a seaborn scatterplot?

I'm creating a scatterplot with seaborn like this:
plt.figure(figsize=(20,5))
ax = sns.scatterplot(x=x,
y=y,
hue=errors,
s=errors*20,
alpha=0.8,
edgecolors='w')
ax.set(xlabel='X', ylabel='Y')
ax.legend(title="Error (m)", loc='upper right')
My errors contain values between approximately 0.1 and 12.5. However, for my legend seaborn automatically generates labels 0, 5, 10, 15. This makes my algorithm look worse than it is. I would like to change the step size in the legend while maintaining a correct mapping between colors and error magnitudes. For example 0, 4, 8, 12.5. Is this possible?

Geopandas reduce legend size (and remove white space below map)

I would like to know how to change the legend automatically generated by Geopandas. Mostly I would like to reduce its size because it's quite big on the generated image. The legend seems to take all the available space.
Additional question, do you know how to remove the empty space below my map ? I've tried with
pad_inches = 0, bbox_inches='tight'
but I still have an empty space below the map.
Thanks for your help.
This works for me:
some_geodataframe.plot(..., legend=True, legend_kwds={'shrink': 0.3})
Other options here: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.colorbar.html
To show how to get proper size of a colorbar legend accompanying a map created by geopandas' plot() method I use the built-in 'naturalearth_lowres' dataset.
The working code is as follows.
import matplotlib.pyplot as plt
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world[(world.name != "Antarctica") & (world.name != "Fr. S. Antarctic Lands")] # exclude 2 no-man lands
plot as usual, grab the axes 'ax' returned by the plot
colormap = "copper_r" # add _r to reverse the colormap
ax = world.plot(column='pop_est', cmap=colormap, \
figsize=[12,9], \
vmin=min(world.pop_est), vmax=max(world.pop_est))
map marginal/face deco
ax.set_title('World Population')
ax.grid()
colorbar will be created by ...
fig = ax.get_figure()
# add colorbar axes to the figure
# here, need trial-and-error to get [l,b,w,h] right
# l:left, b:bottom, w:width, h:height; in normalized unit (0-1)
cbax = fig.add_axes([0.95, 0.3, 0.03, 0.39])
cbax.set_title('Population')
sm = plt.cm.ScalarMappable(cmap=colormap, \
norm=plt.Normalize(vmin=min(world.pop_est), vmax=max(world.pop_est)))
at this stage, 'cbax' is just a blank axes, with un needed labels on x and y axes blank-out the array of the scalar mappable 'sm'
sm._A = []
draw colorbar into 'cbax'
fig.colorbar(sm, cax=cbax, format="%d")
# dont use: plt.tight_layout()
plt.show()
Read the comments in the code for useful info.
The resulting plot:

How to pass different scatter kwargs to facets in lmplot in Seaborn

I'm trying to map a 3rd variable to the scatter point colour in the Seaborn lmplot. So total_bill on x, tip on y and point colour as function of size.
It works when no faceting is enabled but fails when col is used because the colour array size does not match the size of the data plotted in each facet.
This is my code
import matplotlib as mpl
import seaborn as sns
sns.set(color_codes=True)
# load data
data = sns.load_dataset("tips")
# size of data
print len(data.index)
### we want to plot scatter point colour as function of variable 'size'
# first, sort the data by 'size' so that high 'size' values are plotted
# over the smaller sizes (so they are more visible)
data = data.sort_values(by=['size'], ascending=True)
scatter_kws = dict()
cmap = mpl.cm.get_cmap(name='Blues')
# normalise 'size' variable as float range needs to be
# between 0 and 1 to map to a valid colour
scatter_kws['c'] = data['size'] / data['size'].max()
# map normalised values to colours
scatter_kws['c'] = cmap(scatter_kws['c'].values)
# colour array has same size as data
print len(scatter_kws['c'])
# this works as intended
g = sns.lmplot(data=data, x="total_bill", y="tip", scatter_kws=scatter_kws)
The above works well and produces the following (not allowed to include images yet, so here's the link):
lmplot with point colour as function of size
However, when I add col='sex' to lmplot (try code below), the issue is that the colour array has the size of the original dataset which is larger than the size of data plotted in each facet. So, for example col='male' has 157 data points so first 157 values from the colour array are mapped to the points (and these aren't even the correct ones). See below:
lmplot with point colour as function of size with col=sex
g = sns.lmplot(data=data, x="total_bill", y="tip", col="sex", scatter_kws=scatter_kws)
Ideally, I'd like to pass an array of scatter_kws to the lmplot so that each facet uses the correct colour array (which I'd calculate before passing to lmplot). But that doesn't seem to be an option.
Any other ideas or workarounds that still allow me to use the functionality of Seaborn's lmplot (meaning, without resorting to re-creating lmplot functionality from FacetGrid?
In principle the lmplot with different cols seems to be just a wrapper for several regplots. So instead of one lmplot we could use two regplots, one for each sex.
We therefore need to separate the original dataframe into male and female, the rest is rather straight forward.
import matplotlib.pyplot as plt
import seaborn as sns
data = sns.load_dataset("tips")
data = data.sort_values(by=['size'], ascending=True)
# make a new dataframe for males and females
male = data[data["sex"] == "Male"]
female = data[data["sex"] == "Female"]
# get normalized colors for all data
colors = data['size'].values / float(data['size'].max())
# get colors for males / females
colors_male = colors[data["sex"].values == "Male"]
colors_female = colors[data["sex"].values == "Female"]
# colors are values in [0,1] range
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(9,4))
#create regplot for males, put it to left axes
#use colors_male to color the points with Blues cmap
sns.regplot(data=male, x="total_bill", y="tip", ax=ax1,
scatter_kws= {"c" : colors_male, "cmap":"Blues"})
# same for females
sns.regplot(data=female, x="total_bill", y="tip", ax=ax2,
scatter_kws={"c" : colors_female, "cmap":"Greens"})
ax1.set_title("Males")
ax2.set_title("Females")
for ax in [ax1, ax2]:
ax.set_xlim([0,60])
ax.set_ylim([0,12])
plt.tight_layout()
plt.show()