How to overlay hatches on shapefile with condition? - matplotlib

I've been trying to plot hatches (like this pattern, "//") on polygons of a shapefile, based on a condition. The condition is that whichever polygon values ("Sig") are greater than equal to 0.05, there should be a hatch pattern for them. Unfortunately the resulting map doesn't meet my requirements.
So I first plot the "AMOTL" variable and then wanted to plot the hatches (variable Sig) on top of them (if the values are greater than equal to 0.05). I have used the following code:
import contextily as ctx
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as ticker
from matplotlib.patches import Ellipse, Polygon
data = gpd.read_file("mapsignif.shp")
Sig = data.loc[data["Sig"].ge(0.05)]
data.loc[data["AMOTL"].eq(0), "AMOTL"] = np.nan
ax = data.plot(
figsize=(12, 10),
column="AMOTL",
legend=True,
cmap="bwr",
vmin = -1,
vmax= 1,
missing_kwds={"color":"white"},
)
Sig.plot(
ax=ax,
hatch='//'
)
map = Basemap(
llcrnrlon=-50,
llcrnrlat=30,
urcrnrlon=50.0,
urcrnrlat=85.0,
resolution="i",
lat_0=39.5,
lon_0=1,
)
map.fillcontinents(color="lightgreen")
map.drawcoastlines()
map.drawparallels(np.arange(10,90,20),labels=[1,1,1,1])
map.drawmeridians(np.arange(-180,180,30),labels=[1,1,0,1])
Now the problem is that my original image (on which I want to plot the hatches) is different from the image resulting from the above code:
Original Image -
Resultant image from above code:
I basically want to plot hatches on that first image. This topic is similar to correlation plots where you have places with hatches (if the p-value is greater than 0.05). The first image plots the correlation variable and some of them are significant (defined by Sig). So I want to plot the Sig variable on top of the AMOTL. I've tried variations of the code, but still can't get through.
Would be grateful for some assistance... Here's my file - https://drive.google.com/file/d/10LPNjBtQMdQMw6XmXdJEg6Uq4icx_LD6/view?usp=sharing

I’d bet this is the culprit:
data.loc[data["Sig"].ge(0.05), "Sig"].plot(
column="Sig", hatch='//'
)
In this line, you’re selecting only the 'Sig' column, eliminating all spatial data in the 'geometry' column and returning a pandas.Series instead of a geopandas.GeoDataFrame. In order to plot a data column using the geometries column for your shapes you must maintain at least both of those columns in the object you call .plot on.
So instead, don’t select the column:
data.loc[data["Sig"].ge(0.05)].plot(
column="Sig", hatch='//'
)
You are already telling geopandas to plot the "Sig" column by using the column argument to .plot - no need to limit the actual data too.
Also, when overlaying a plot on an existing axis, be sure to pass in the axis object:
data.loc[data["Sig"].ge(0.05)].plot(
column="Sig", hatch='//', ax=ax
)

Related

Specify Matplotlib's kwargs to Seaborn's displot when hue is used

Suppose we have this:
import seaborn as sns
import pandas as pd
import numpy as np
samples = 2**13
data = pd.DataFrame({'Values': list(np.random.normal(size=samples)) + list(np.random.uniform(size=samples)),
'Kind': ['Normal'] * samples + ['Uniform'] * samples})
sns.displot(data, hue='Kind', x='Values', fill=True)
I want my Normal's histogram (or KDE) emphasized. I'd like it in red and non transparent in the background. Uniform should have alpha = .5.
How do I specify these style parameters in a "per hue" manner?
It's possible to do it with two separate histplots on the same Axes, as #Redox suggested. We can basically recreate the same plot, but with fine-grade control over colours and alpha. However I had to explicitly pass the number of bins in to get the same plot as yours. I also needed to define the colour for Uniform otherwise a ghost element would be added to the legend! I used C1, meaning the first default colour.
_, ax = plt.subplots()
sns.histplot(data=data[data.Kind=='Normal'], x="Values", ax=ax, label='Normal', color='tab:red',bins=130,alpha=1)
sns.histplot(data=data[data.Kind=='Uniform'], x="Values", ax=ax, label='Uniform', color='C1',bins=17, alpha=.5)
ax.set_xlabel('')
ax.legend()
Note that if you just want to set the colour without alpha you can already do this on a displot via the palette argument - pass in a dictionary of your unique hue values to colour names. However, the alpha that you pass in must be a scalar. I tried to use this clever answer to set colours as RGBA colours which include alpha, which seems to work with other figure level plots in Seaborn. However, displot overrides this and sets the alpha separately!

Get real range in colormap with LogLocator

The following code
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import ticker
n = 50
A = np.tile(np.linspace(-26,-2,n),(n,1))
plt.figure()
plt.contourf(A)
plt.colorbar()
B = np.tile(np.logspace(-26,-2,n),(n,1))
plt.figure()
plt.contourf(B,locator=ticker.LogLocator())
plt.colorbar()
plt.show()
produces these two plots:
For the linear case (first image), every color in the colorbar is present in the image, and the min and max values of A lie respectively in the first and last color bin (going bottom to top).
For the log case (second image), the colorbar's min and max values don't make sense to me anymore.
The minimum of B is 10^-26, so this value lies at the border between the first and second color bin of the colormap, but there are none of these two first colors in the image.
The maximum of B is 10^-2, and it lies at the border between the before-before last, and the before last color bins, so it could be considered in either.
But then, why is the last (yellow) color bin here, especially since there is no yellow in the image ?
So I find the default behavior of the colormap limits (for the LogLocator) weird because it is not representative of the real (or at least approximate) data range (like in the linear case), and it adds color bins (in this case 3 : 2 below the min, and 1 above the max) that are not present in the image.
Is this a bug or is there something I didn't understand ?
#ImportanceOfBeingErnest's answer below gives the output that I want, but it just feels like I shouldn't have to do this and that I can expect the same behavior from the colormap with linear values, and from the LogLocator color mapper.
If you want to have specific intervals in your contour plot you would need to decide for them and supply them to the contouring function via the levels argument.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import ticker
n = 50
A = np.tile(np.logspace(-26,-2,n),(n,1))
levels = 10.**np.arange(-26,-1,4)
plt.figure()
plt.contourf(A,levels=levels, locator=ticker.LogLocator())
plt.colorbar()
plt.show()

sns.clustermap ticks are missing

I'm trying to visualize what filters are learning in CNN text classification model. To do this, I extracted feature maps of text samples right after the convolutional layer, and for size 3 filter, I got an (filter_num)*(length_of_sentences) sized tensor.
df = pd.DataFrame(-np.random.randn(50,50), index = range(50), columns= range(50))
g= sns.clustermap(df,row_cluster=True,col_cluster=False)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()
This code results in :
Where I can't see all the ticks in the y-axis. This is necessary
because I need to see which filters learn which information. Is there
any way to properly exhibit all the ticks in the y-axis?
kwargs from sns.clustermap get passed on to sns.heatmap, which has an option yticklabels, whose documentation states (emphasis mine):
If True, plot the column names of the dataframe. If False, don’t plot the column names. If list-like, plot these alternate labels as the xticklabels. If an integer, use the column names but plot only every n label. If “auto”, try to densely plot non-overlapping labels.
Here, the easiest option is to set it to an integer, so it will plot every n labels. We want every label, so we want to set it to 1, i.e.:
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
In your complete example:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame(-np.random.randn(50,50), index=range(50), columns=range(50))
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()

Using pd.cut to create bins for a graph, but bin values are not coming out as expected

Here is the code I'm running:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
titanic = sns.load_dataset("titanic")
y =titanic.groupby([titanic.fare//1,'sex']).survived.mean().reset_index() #grouping by 'fare' rounded to an integer and 'sex' and then getting the survivability
x =pd.cut(y.fare, (0,17,35,70,300,515)) #I'm not sure if my format is correct but this is how I cut up the fare values
y['Fare_bins']= x # adding the newly created bins to a new column "Fare_bins' in original dataframe.
#graphing with seaborn
sns.set(style="whitegrid")
g = sns.factorplot(x='Fare_bins', y= 'survived', col = 'sex', kind ='bar' ,data= y,
size=4, aspect =2.5 , palette="muted")
g.despine(left=True)
g.set_ylabels("Survival Probability")
g.set_xlabels('Fare')
plt.show()
The problem I'm having is that Fare_values are showing up as (0,17].
The left side is a circle bracket and the right side is square bracket.
If possible I would like to have something like this:
(0-17) or [0-17]
Next, there seems to be a gap between each bar plot. I was expecting them to be adjoined. There are two graphs being represented, so I don't expect of the bars to be ajoined, but the first 5 bars(first graph)should be connected and the last 5 bars to eachother(second graph).
How can I go about fixing these two issues?
It seems I can add labels.
Just by adding labels to the "cut" method parameters, I can display the Fare_values as I want.
x =pd.cut(y.fare, (0,17,35,70,300,515), labels = ('(0-17)', '(17-35)', '(35-70)', '(70-300)','(300-515)') )
As for the brackets showing around the fare_value groups,
according to the documentation:
right : bool, optional
Indicates whether the bins include the rightmost edge or not. If right == True (the default), then the bins [1,2,3,4] indicate (1,2], (2,3], (3,4].
Still not sure if it's possible to join the bars though.

Colors for pandas timeline graphs with many series

I am using pandas for graphing data for a cluster of nodes. I find that pandas is repeating color values for the different series, which makes them indistinguishable.
I tried giving custom color values like this and passed the my_colors to the colors field in plot:
my_colors = []
for node in nodes_list:
my_colors.append(rand_color())
rand_color() is defined as follows:
def rand_color():
from random import randrange
return "#%s" % "".join([hex(randrange(16, 255))[2:] for i in range(3)])
But here also I need to avoid color values that are too close to distinguish. I sometimes have as many as 60 nodes (series). Most probably a hard-coded list of color values would be best option?
You can get a list of colors from any colormap defined in Matplotlib, and even custom colormaps, by:
>>> import matplotlib.pyplot as plt
>>> colors = plt.cm.Paired(np.linspace(0,1,60))
Plotting an example with these colors:
>>> plt.scatter( range(60), [0]*60, color=colors )
<matplotlib.collections.PathCollection object at 0x04ED2830>
>>> plt.axis("off")
(-10.0, 70.0, -0.0015, 0.0015)
>>> plt.show()
I found the "Paired" colormap to be especially useful for this kind of things, but you can use any other available or custom colormap.