Colors for pandas timeline graphs with many series - matplotlib

I am using pandas for graphing data for a cluster of nodes. I find that pandas is repeating color values for the different series, which makes them indistinguishable.
I tried giving custom color values like this and passed the my_colors to the colors field in plot:
my_colors = []
for node in nodes_list:
my_colors.append(rand_color())
rand_color() is defined as follows:
def rand_color():
from random import randrange
return "#%s" % "".join([hex(randrange(16, 255))[2:] for i in range(3)])
But here also I need to avoid color values that are too close to distinguish. I sometimes have as many as 60 nodes (series). Most probably a hard-coded list of color values would be best option?

You can get a list of colors from any colormap defined in Matplotlib, and even custom colormaps, by:
>>> import matplotlib.pyplot as plt
>>> colors = plt.cm.Paired(np.linspace(0,1,60))
Plotting an example with these colors:
>>> plt.scatter( range(60), [0]*60, color=colors )
<matplotlib.collections.PathCollection object at 0x04ED2830>
>>> plt.axis("off")
(-10.0, 70.0, -0.0015, 0.0015)
>>> plt.show()
I found the "Paired" colormap to be especially useful for this kind of things, but you can use any other available or custom colormap.

Related

How to overlay hatches on shapefile with condition?

I've been trying to plot hatches (like this pattern, "//") on polygons of a shapefile, based on a condition. The condition is that whichever polygon values ("Sig") are greater than equal to 0.05, there should be a hatch pattern for them. Unfortunately the resulting map doesn't meet my requirements.
So I first plot the "AMOTL" variable and then wanted to plot the hatches (variable Sig) on top of them (if the values are greater than equal to 0.05). I have used the following code:
import contextily as ctx
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as ticker
from matplotlib.patches import Ellipse, Polygon
data = gpd.read_file("mapsignif.shp")
Sig = data.loc[data["Sig"].ge(0.05)]
data.loc[data["AMOTL"].eq(0), "AMOTL"] = np.nan
ax = data.plot(
figsize=(12, 10),
column="AMOTL",
legend=True,
cmap="bwr",
vmin = -1,
vmax= 1,
missing_kwds={"color":"white"},
)
Sig.plot(
ax=ax,
hatch='//'
)
map = Basemap(
llcrnrlon=-50,
llcrnrlat=30,
urcrnrlon=50.0,
urcrnrlat=85.0,
resolution="i",
lat_0=39.5,
lon_0=1,
)
map.fillcontinents(color="lightgreen")
map.drawcoastlines()
map.drawparallels(np.arange(10,90,20),labels=[1,1,1,1])
map.drawmeridians(np.arange(-180,180,30),labels=[1,1,0,1])
Now the problem is that my original image (on which I want to plot the hatches) is different from the image resulting from the above code:
Original Image -
Resultant image from above code:
I basically want to plot hatches on that first image. This topic is similar to correlation plots where you have places with hatches (if the p-value is greater than 0.05). The first image plots the correlation variable and some of them are significant (defined by Sig). So I want to plot the Sig variable on top of the AMOTL. I've tried variations of the code, but still can't get through.
Would be grateful for some assistance... Here's my file - https://drive.google.com/file/d/10LPNjBtQMdQMw6XmXdJEg6Uq4icx_LD6/view?usp=sharing
I’d bet this is the culprit:
data.loc[data["Sig"].ge(0.05), "Sig"].plot(
column="Sig", hatch='//'
)
In this line, you’re selecting only the 'Sig' column, eliminating all spatial data in the 'geometry' column and returning a pandas.Series instead of a geopandas.GeoDataFrame. In order to plot a data column using the geometries column for your shapes you must maintain at least both of those columns in the object you call .plot on.
So instead, don’t select the column:
data.loc[data["Sig"].ge(0.05)].plot(
column="Sig", hatch='//'
)
You are already telling geopandas to plot the "Sig" column by using the column argument to .plot - no need to limit the actual data too.
Also, when overlaying a plot on an existing axis, be sure to pass in the axis object:
data.loc[data["Sig"].ge(0.05)].plot(
column="Sig", hatch='//', ax=ax
)

matplotlib - seaborn - the numbers on the correlation plots are not readable

The plot below shows the correlation for one column. The problem is that the numbers are not readable, because there are many columns in it.
How is it possible to show only 5 or 6 most important columns and not all of them with very low importance?
plt.figure(figsize=(20,3))
sns.heatmap(df.corr()[['price']].sort_values('price', ascending=False).iloc[1:].T, annot=True,
cmap='Spectral_r', vmax=0.9, vmin=-0.31)
You can limit the cells shown via .iloc[1:7]. If you also want to show the highest negative values, you could create a second plot with .iloc[-6:]. To have both together, you could use numpy's slicing function and write .iloc[np.r_[1:4, -3:0]].
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.random.rand(7, 27), columns=['price'] + [*'abcdefghijklmnopqrstuvwxyz'])
plt.figure(figsize=(20, 3))
sns.heatmap(df.corr()[['price']].sort_values('price', ascending=False).iloc[1:7].T,
annot=True, annot_kws={'rotation':90, 'size': 20},
cmap='Spectral_r', vmax=0.9, vmin=-0.31)
plt.show()
annot can also be a list of labels. Using this, you can define a string matrix that you use to display the desired numbers and set the others to an empty string.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
import seaborn as sns; sns.set_theme()
import pandas as pd
from string import ascii_letters
# generate random data
rs = np.random.RandomState(33)
df = pd.DataFrame(data=rs.normal(size=(100, 26)),
columns=list(ascii_letters[26:]))
importance_index = 5 # until which idx to hide values
data = df.corr()[['A']].sort_values('A', ascending=False).iloc[1:].T
labels = data.astype(str) # make a str-copy
labels.iloc[0,:importance_index] = ' ' # mask columns that you want to hide
sns.heatmap(data, annot=labels, cmap='Spectral_r', vmax=0.9, vmin=-0.31, fmt='', annot_kws={'rotation':90})
plt.show()
The output on some random data:
This works but it has its limits, particulary with setting fmt='' (can't use it to conveniently format decimals anymore, need to do it manually now). I would also question whether your approach is even the best one to take here. I think consistency in plots is quite important. I would rather evaluate if we can't rotate the heatmap labels (I've included it above) or leave them out completely since it is technically redundant due to the color-coding. Alternatively, you could only plot the cells with the "important" values.

sns.clustermap ticks are missing

I'm trying to visualize what filters are learning in CNN text classification model. To do this, I extracted feature maps of text samples right after the convolutional layer, and for size 3 filter, I got an (filter_num)*(length_of_sentences) sized tensor.
df = pd.DataFrame(-np.random.randn(50,50), index = range(50), columns= range(50))
g= sns.clustermap(df,row_cluster=True,col_cluster=False)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()
This code results in :
Where I can't see all the ticks in the y-axis. This is necessary
because I need to see which filters learn which information. Is there
any way to properly exhibit all the ticks in the y-axis?
kwargs from sns.clustermap get passed on to sns.heatmap, which has an option yticklabels, whose documentation states (emphasis mine):
If True, plot the column names of the dataframe. If False, don’t plot the column names. If list-like, plot these alternate labels as the xticklabels. If an integer, use the column names but plot only every n label. If “auto”, try to densely plot non-overlapping labels.
Here, the easiest option is to set it to an integer, so it will plot every n labels. We want every label, so we want to set it to 1, i.e.:
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
In your complete example:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame(-np.random.randn(50,50), index=range(50), columns=range(50))
g = sns.clustermap(df, row_cluster=True, col_cluster=False, yticklabels=1)
plt.setp(g.ax_heatmap.yaxis.get_majorticklabels(), rotation=0) # ytick rotate
g.cax.remove() # remove colorbar
plt.show()

Customize the axis label in seaborn jointplot

I seem to have got stuck at a relatively simple problem but couldn't fix it after searching for last hour and after lot of experimenting.
I have two numpy arrays x and y and I am using seaborn's jointplot to plot them:
sns.jointplot(x, y)
Now I want to label the xaxis and yaxis as "X-axis label" and "Y-axis label" respectively. If I use plt.xlabel, the labels goes to the marginal distribution. How can I make them appear on the joint axes?
sns.jointplot returns a JointGrid object, which gives you access to the matplotlib axes and you can then manipulate from there.
import seaborn as sns
import numpy as np
# example data
X = np.random.randn(1000,)
Y = 0.2 * np.random.randn(1000) + 0.5
h = sns.jointplot(X, Y)
# JointGrid has a convenience function
h.set_axis_labels('x', 'y', fontsize=16)
# or set labels via the axes objects
h.ax_joint.set_xlabel('new x label', fontweight='bold')
# also possible to manipulate the histogram plots this way, e.g.
h.ax_marg_y.grid('on') # with ugly consequences...
# labels appear outside of plot area, so auto-adjust
h.figure.tight_layout()
(The problem with your attempt is that functions such as plt.xlabel("text") operate on the current axis, which is not the central one in sns.jointplot; but the object-oriented interface is more specific as to what it will operate on).
Note that the last command uses the figure attribute of the JointGrid. The initial version of this answer used the simpler - but not object-oriented - approach via the matplotlib.pyplot interface.
To use the pyplot interface:
import matplotlib.pyplot as plt
plt.tight_layout()
Alternatively, you can specify the axes labels in a pandas DataFrame in the call to jointplot.
import pandas as pd
import seaborn as sns
x = ...
y = ...
data = pd.DataFrame({
'X-axis label': x,
'Y-axis label': y,
})
sns.jointplot(x='X-axis label', y='Y-axis label', data=data)

How to pick a new color for each plotted line within a figure in matplotlib?

I'd like to NOT specify a color for each plotted line, and have each line get a distinct color. But if I run:
from matplotlib import pyplot as plt
for i in range(20):
plt.plot([0, 1], [i, i])
plt.show()
then I get this output:
If you look at the image above, you can see that matplotlib attempts to pick colors for each line that are different, but eventually it re-uses colors - the top ten lines use the same colors as the bottom ten. I just want to stop it from repeating already used colors AND/OR feed it a list of colors to use.
I usually use the second one of these:
from matplotlib.pyplot import cm
import numpy as np
#variable n below should be number of curves to plot
#version 1:
color = cm.rainbow(np.linspace(0, 1, n))
for i, c in zip(range(n), color):
plt.plot(x, y, c=c)
#or version 2:
color = iter(cm.rainbow(np.linspace(0, 1, n)))
for i in range(n):
c = next(color)
plt.plot(x, y, c=c)
Example of 2:
matplotlib 1.5+
You can use axes.set_prop_cycle (example).
matplotlib 1.0-1.4
You can use axes.set_color_cycle (example).
matplotlib 0.x
You can use Axes.set_default_color_cycle.
You can use a predefined "qualitative colormap" like this:
from matplotlib.cm import get_cmap
name = "Accent"
cmap = get_cmap(name) # type: matplotlib.colors.ListedColormap
colors = cmap.colors # type: list
axes.set_prop_cycle(color=colors)
Tested on matplotlib 3.0.3. See https://github.com/matplotlib/matplotlib/issues/10840 for discussion on why you can't call axes.set_prop_cycle(color=cmap).
A list of predefined qualititative colormaps is available at https://matplotlib.org/gallery/color/colormap_reference.html :
prop_cycle
color_cycle was deprecated in 1.5 in favor of this generalization: http://matplotlib.org/users/whats_new.html#added-axes-prop-cycle-key-to-rcparams
# cycler is a separate package extracted from matplotlib.
from cycler import cycler
import matplotlib.pyplot as plt
plt.rc('axes', prop_cycle=(cycler('color', ['r', 'g', 'b'])))
plt.plot([1, 2])
plt.plot([2, 3])
plt.plot([3, 4])
plt.plot([4, 5])
plt.plot([5, 6])
plt.show()
Also shown in the (now badly named) example: http://matplotlib.org/1.5.1/examples/color/color_cycle_demo.html mentioned at: https://stackoverflow.com/a/4971431/895245
Tested in matplotlib 1.5.1.
I don't know if you can automatically change the color, but you could exploit your loop to generate different colors:
for i in range(20):
ax1.plot(x, y, color = (0, i / 20.0, 0, 1)
In this case, colors will vary from black to 100% green, but you can tune it if you want.
See the matplotlib plot() docs and look for the color keyword argument.
If you want to feed a list of colors, just make sure that you have a list big enough and then use the index of the loop to select the color
colors = ['r', 'b', ...., 'w']
for i in range(20):
ax1.plot(x, y, color = colors[i])
You can also change the default color cycle in your matplotlibrc file.
If you don't know where that file is, do the following in python:
import matplotlib
matplotlib.matplotlib_fname()
This will show you the path to your currently used matplotlibrc file.
In that file you will find amongst many other settings also the one for axes.color.cycle. Just put in your desired sequence of colors and you will find it in every plot you make.
Note that you can also use all valid html color names in matplotlib.
As Ciro's answer notes, you can use prop_cycle to set a list of colors for matplotlib to cycle through. But how many colors? What if you want to use the same color cycle for lots of plots, with different numbers of lines?
One tactic would be to use a formula like the one from https://gamedev.stackexchange.com/a/46469/22397, to generate an infinite sequence of colors where each color tries to be significantly different from all those that preceded it.
Unfortunately, prop_cycle won't accept infinite sequences - it will hang forever if you pass it one. But we can take, say, the first 1000 colors generated from such a sequence, and set it as the color cycle. That way, for plots with any sane number of lines, you should get distinguishable colors.
Example:
from matplotlib import pyplot as plt
from matplotlib.colors import hsv_to_rgb
from cycler import cycler
# 1000 distinct colors:
colors = [hsv_to_rgb([(i * 0.618033988749895) % 1.0, 1, 1])
for i in range(1000)]
plt.rc('axes', prop_cycle=(cycler('color', colors)))
for i in range(20):
plt.plot([1, 0], [i, i])
plt.show()
Output:
Now, all the colors are different - although I admit that I struggle to distinguish a few of them!