Setting colors individually in matplotlib - numpy

I want to create a custom plot. I want to precisely specify the color of each object. Specifically, I am creating a Gantt chart for system events. I am classifying those events into groups and color coding them to visualize.
Please consider the following code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame()
df['y'] = [0,4,5,6,10]
df['color'] = [(.5, .5, .5, .5),]*len(df)
print df['color']
#fig = plt.figure(figsize=(12, 6))
#vax = fig.add_subplot(1,1,1)
#vax.hlines(df['y'], 0, 10, colors=df['color'])
#fig.savefig('ok.png')
only_four = df['y']==4
df['color'][only_four] = [(0.7, 0.6, 0.5, 0.4),]*sum(only_four)
print df['color']
Note that I first am setting the color for all to be a semi-transparent gray. Later, for a particular set of values, I want to change the color. I end up with this color table.
0 (0.5, 0.5, 0.5, 0.5)
1 0.6
2 (0.5, 0.5, 0.5, 0.5)
3 (0.5, 0.5, 0.5, 0.5)
4 (0.5, 0.5, 0.5, 0.5)
I want to be able to specify any RGBA value (i.e. including transparency) for any subset of the hlines. Could someone share how to do this? I'm open to any other way to do this as long as I can precisely color each line including a transparency.
ADDITION TO QUESTION:
I am able to update multiple rows by iterating as in:
def set_color(df, row_bool, r, g, b, a=1.0):
idx = np.where(row_bool)[0]
for i in idx:
df['color'][i] = (r,g,b,a)
return
This is sufficient, but I really wanted a vector operation (ie no explicit loop by me).

I'm guessing the problem is that you cannot get your updated tuple to be input into the DataFrame and you only get that 0.6 value in the DataFrame. Have you tried using DataFrame.set_value?
In [1]: df
Out[1]:
y color
0 0 (0.5, 0.5, 0.5, 0.5)
1 4 0.6
2 5 (0.5, 0.5, 0.5, 0.5)
3 6 (0.5, 0.5, 0.5, 0.5)
4 10 (0.5, 0.5, 0.5, 0.5)
In [2]: df.set_value(1, 'color', (0.7, 0.6, 0.5, 0.4))
Out[2]:
y color
0 0 (0.5, 0.5, 0.5, 0.5)
1 4 (0.7, 0.6, 0.5, 0.4)
2 5 (0.5, 0.5, 0.5, 0.5)
3 6 (0.5, 0.5, 0.5, 0.5)
4 10 (0.5, 0.5, 0.5, 0.5)

Related

Numpy Linalg on transition matrix

I have the following states
states = [(0,2,3,0), (2,2,3,0), (2,2,2,0), (2,2,1,0)]
In addition, I have the following transition matrix
import pandas as pd
transition_matrix = pd.DataFrame([[1, 0, 0, 0],
[0.5, 0.3, 0.2, 0],
[0.5, 0.3, 0, 0.2],
[0.5, 0.5, 0, 0]], columns=states, index=states)
So, if you are in state (2,2,1,0) there is a 50% that you go to state (0,2,3,0) and a 50% probability that you go (2,2,3,0).
If you are in state (0,2,3,0), the absorbing state, you win.
We can write the following equations
p_win_(0,2,3,0) = 1
p_win_(2,2,3,0) = 0.50 * p_win_(0,2,3,0) + 0.3 * p_win(2,2,3,0) + 0.2 * p_win(2,2,2,0)
p_win_(2,2,2,0) = 0.50 * p_win_(0,2,3,0) + 0.3 * p_win(2,2,3,0) + 0.2 * p_win(2,2,1,0)
p_win_(2,2,1,0) = 0.50 * p_win_(0,2,3,0) + 0.5 * p_win(2,2,3,0)
I would like to solve the above formulas. I looked at the documentation of the np.linalg.solve-function. The example doesn't use defined variables and, in addition, I have terms on both side of the equal sign.
Please show me how I can solve the above.
First, your first equation is wrong (it should be p_win_(0,2,3,0) = 1* p_win_(0,2,3,0))
You are essentially trying to get the largest eigenvector (corresponding to eig=1) for the Transition matrix. p_win_ is determined by:
v = Pv (or P-I)v, sum(v) = 1, where I is identity matrix np.eye(4)
which we can write in extended form as:
I = np.eye(4)
P = np.array([[1, 0, 0, 0],
[0.5, 0.3, 0.2, 0],
[0.5, 0.3, 0, 0.2],
[0.5, 0.5, 0, 0]]) # if you already have it in DataFrame,
# you can alternatively do:
# P = transition_matrix.to_numpy()
extend_m = np.concatenate((P-I, np.ones((1,4), axis=0))
# Equation to solve is extend_m*v = np.array([0,1])
So solution is given by
v = np.linalg.lstsq(extend_m, np.array([0,1])
I use lstsq because we have an overdetermined system (5 equations, 4 unknowns). If you want to use np.linalg.solve you need to reduce it to 4 equations, which I leave up to you (In this particular case, there is one obviously redundant equation, which you can just remove).

Matplotlib: plt.text with user-defined circle radii

Dear stackoverflow users,
I want to plot some data labels with its coordinates in a x,y-plot. Around the labels I want to put a circle with a user-defined radius as I want to symbolize the magnitude of the data property by the radius of the circle.
An example dataset could look like the following:
point1 = ["label1", 0.5, 0.25, 1e0] # equals [label, x, y, radius]
point2 = ["label2", 0.5, 0.75, 1e1] # equals [label, x, y, radius]
I want to use a code silimar to the following one:
import matplotlib.pyplot as plt
plt.text(point1[1], point1[2], point1[0], bbox = dict(boxstyle="circle")) # here I want to alter the radius by passing point1[3]
plt.text(point2[1], point2[2], point2[0], bbox = dict(boxstyle="circle")) # here I want to alter the radius by passing point2[3]
plt.show()
Is this possible somehow or is the plt.add_patch variant the only possible way?
Regards
In principle, you can use the boxes' pad parameter to define the circle size. However this is then relative to the label. I.e. a small label would have a smaller circle around it for the same value of pad than a larger label. Also the units of pad are fontsize (i.e. if you have a fontsize of 10pt, a padding of 1 would correspond to 10pt).
import numpy as np
import matplotlib.pyplot as plt
points = [["A", 0.2, 0.25, 0], # zero radius
["long label", 0.4, 0.25, 0], # zero radius
["label1", 0.6, 0.25, 1]] # one radius
for point in points:
plt.text(point[1], point[2], point[0], ha="center", va="center",
bbox = dict(boxstyle=f"circle,pad={point[3]}", fc="lightgrey"))
plt.show()
I don't know in how far this is desired.
I guess usually you would rather create a scatterplot at the same positions as the text
import numpy as np
import matplotlib.pyplot as plt
points = [["A", 0.2, 0.25, 100], # 5 pt radius
["long label", 0.4, 0.25, 100], # 5 pt radius
["label1", 0.6, 0.25, 1600]] # 20 pt radius
data = np.array([l[1:] for l in points])
plt.scatter(data[:,0], data[:,1], s=data[:,2], facecolor="gold")
for point in points:
plt.text(point[1], point[2], point[0], ha="center", va="center")
plt.show()

Converting gnuplot color map to matplotlib

I am trying to port some plotting code from gnuplot to matplotlib and am struggling with porting a discontinuous color map that is specified by color names. Any suggestions on how to do this in matplotlib?
# Establish a 3-section color palette with lower 1/4 in the blues,
# and middle 1/2 light green to yellow, and top 1/4 reds
set palette defined (0 'dark-blue', 0.5 'light-blue', \\
0.5 'light-green', 1 'green', 1.5 'yellow', \\
1.5 'red', 2 'dark-red')
# Establish that the palette range, such that the middle green range corresponds
# to 0.95 to 1.05
set cbrange [0.9:1.1]
I've used this script for years, can't really remember how or where I got it (edit: after some searching, this seems to be the source, but it requires some minor changes for Python3), but it has helped me a lot in quickly creating custom color maps. It allows you to simply specify a dictionary with locations (0..1) and colors, and creates a linear color map out of that; e.g. make_colormap({0:'w',1:'k'}) creates a linear color map going from white to black.
import numpy as np
import matplotlib.pylab as pl
def make_colormap(colors):
from matplotlib.colors import LinearSegmentedColormap, ColorConverter
from numpy import sort
z = np.array(sorted(colors.keys()))
n = len(z)
z1 = min(z)
zn = max(z)
x0 = (z - z1) / (zn - z1)
CC = ColorConverter()
R = []
G = []
B = []
for i in range(n):
Ci = colors[z[i]]
if type(Ci) == str:
RGB = CC.to_rgb(Ci)
else:
RGB = Ci
R.append(RGB[0])
G.append(RGB[1])
B.append(RGB[2])
cmap_dict = {}
cmap_dict['red'] = [(x0[i],R[i],R[i]) for i in range(len(R))]
cmap_dict['green'] = [(x0[i],G[i],G[i]) for i in range(len(G))]
cmap_dict['blue'] = [(x0[i],B[i],B[i]) for i in range(len(B))]
mymap = LinearSegmentedColormap('mymap',cmap_dict)
return mymap
test1 = make_colormap({0.:'#40004b',0.5:'#ffffff',1.:'#00441b'})
test2 = make_colormap({0.:'b',0.25:'w',0.251:'g',0.75:'y',0.751:'r',1:'k'})
data = np.random.random((10,10))
pl.figure()
pl.subplot(121)
pl.imshow(data, interpolation='nearest', cmap=test1)
pl.colorbar()
pl.subplot(122)
pl.imshow(data, interpolation='nearest', cmap=test2)
pl.colorbar()
Bart's function is very nice. However, if you want to make the colormap yourself, you can define a colormap like this using a dictionary in the way it is done in the custom_cmap example from the mpl website.
Here's an example that's pretty close to your colormap:
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import numpy as np
cdict = {'red': ((0.0, 0.0, 0.0), # From 0 to 0.25, we fade the red and green channels
(0.25, 0.5, 0.5), # up a little, to make the blue a bit more grey
(0.25, 0.0, 0.0), # From 0.25 to 0.75, we fade red from 0.5 to 1
(0.75, 1.0, 1.0), # to fade from green to yellow
(1.0, 0.5, 0.5)), # From 0.75 to 1.0, we bring the red down from 1
# to 0.5, to go from bright to dark red
'green': ((0.0, 0.0, 0.0), # From 0 to 0.25, we fade the red and green channels
(0.25, 0.6, 0.6), # up a little, to make the blue a bit more grey
(0.25, 1.0, 1.0), # Green is 1 from 0.25 to 0.75 (we add red
(0.75, 1.0, 1.0), # to turn it from green to yellow)
(0.75, 0.0, 0.0), # No green needed in the red upper quarter
(1.0, 0.0, 0.0)),
'blue': ((0.0, 0.9, 0.9), # Keep blue at 0.9 from 0 to 0.25, and adjust its
(0.25, 0.9, 0.9), # tone using the green and red channels
(0.25, 0.0, 0.0), # No blue needed above 0.25
(1.0, 0.0, 0.0))
}
cmap = colors.LinearSegmentedColormap('BuGnYlRd',cdict)
data = 0.9 + (np.random.rand(8,8) * 0.2) # Data in range 0.9 to 1.1
p=plt.imshow(data,interpolation='nearest',cmap=cmap,vmin=0.9,vmax=1.1)
plt.colorbar(p)
plt.show()

How to perform subtraction on a single element of a tensor

I have a tensor that consists of 4 floats, called label.
How do I with a 50% chance execute x[0] = 1 - x[0]?
Right now I have:
label = tf.constant([0.35, 0.5, 0.17, 0.14]) # just an example
uniform_random = tf.random_uniform([], 0, 1.0)
# Create a tensor with [1.0, 0.0, 0.0, 0.0] if uniform_random > 50%
# else it's only zeroes
inv = tf.pack([tf.round(uniform_random), 0.0, 0.0, 0.0])
label = tf.sub(inv, label)
label = tf.abs(label) # need abs because it inverted the other elements
# output will be either [0.35, 0.5, 0.17, 0.14] or [0.65, 0.5, 0.17, 0.14]
which works, but looks extremely ugly. Isn't there a smarter/simpler way of doing this?
Related question: How do I apply a certain op (e.g. sqrt) just to two elements? I'm guessing I have to remove these two elements, perform the op and then concat them back to the original vector?
tf.select and tf.cond come in handy for situations where you have to perform computations conditionally on elements of a tensor. For your example, the following would work :
label = tf.constant([0.35, 0.5, 0.17, 0.14])
inv = tf.pack([1.0, 0.0, 0.0, 0.0])
mask = tf.pack([1.0, -1.0, -1.0, -1.0])
output = tf.cond(tf.random_uniform([], 0, 1.0) > 0.5,
lambda: label,
lambda: (inv - label) * mask)
with tf.Session(''):
print(output.eval())

Logarithmic multi-sequenz plot with equal bar widths

I have something like
import matplotlib.pyplot as plt
import numpy as np
a=[0.05, 0.1, 0.2, 1, 2, 3]
plt.hist((a*2, a*3), bins=[0, 0.1, 1, 10])
plt.gca().set_xscale("symlog", linthreshx=0.1)
plt.show()
which gives me the following plot:
As one can see, the bar width is not equal. In the linear part (from 0 to 0.1), everything is find, but after this, the bar width is still in linear scale, while the axis is in logarithmic scale, giving me uneven widths for bars and spaces in between (the tick is not in the middle of the bars).
Is there any way to correct this?
Inspired by https://stackoverflow.com/a/30555229/635387 I came up with the following solution:
import matplotlib.pyplot as plt
import numpy as np
d=[0.05, 0.1, 0.2, 1, 2, 3]
def LogHistPlot(data, bins):
totalWidth=0.8
colors=("b", "r", "g")
for i, d in enumerate(data):
heights = np.histogram(d, bins)[0]
width=1/len(data)*totalWidth
left=np.array(range(len(heights))) + i*width
plt.bar(left, heights, width, color=colors[i], label=i)
plt.xticks(range(len(bins)), bins)
plt.legend(loc='best')
LogHistPlot((d*2, d*3, d*4), [0, 0.1, 1, 10])
plt.show()
Which produces this plot:
The basic idea is to drop the plt.hist function, compute the histogram by numpy and plot it with plt.bar. Than, you can easily use a linear x-axis, which makes the bar width calculation trivial. Lastly, the ticks are replaced by the bin edges, resulting in the logarithmic scale. And you don't even have to deal with the symlog linear/logarithmic botchery anymore.
You could use histtype='stepfilled' if you are okay with a plot where the data sets are plotted one behind the other. Of course, you'll need to carefully choose colors with alpha values, so that all your data can still be seen...
a = [0.05, 0.1, 0.2, 1, 2, 3] * 2
b = [0.05, 0.05, 0.05, 0.15, 0.15, 2]
colors = [(0.2, 0.2, 0.9, 0.5), (0.9, 0.2, 0.2, 0.5)] # RGBA tuples
plt.hist((a, b), bins=[0, 0.1, 1, 10], histtype='stepfilled', color=colors)
plt.gca().set_xscale("symlog", linthreshx=0.1)
plt.show()
I've changed your data slightly for a better illustration. This gives me:
For some reason the overlap color seems to be going wrong (matplotlib 1.3.1 with Python 3.4.0; Is this a bug?), but it's one possible solution/alternative to your problem.
Okay, I found out the real problem: when you create the histogram with those bin-edge settings, the histogram creates bars which have equal size, and equal outside-spacing on the non-log scale.
To demonstrate, here's a zoomed-in version of the plot in the question, but in non-log scale:
Notice how the first two bars are centered around (0 + 0.1) / 2 = 0.05, with a gap of 0.1 / 10 = 0.01 at the edges, while the next two bars are centered around (0.1 + 1.0) / 2 = 0.55, with a gap of 1.1 / 10 = 0.11 at either edge.
When converting things to log scale, bar widths and edge widths all go for a huge toss. This is compounded further by the fact that you have a linear scale from 0 to 0.1, after which things become log-scale.
I know no way of fixing this, other than to do everything manually. I've used the geometric means of the bin-edges in order to compute what the bar edges and bar widths should be. Note that this piece of code will work only for two datasets. If you have more datasets, you'll need to have some function that fills in the bin-edges with a geometric series appropriately.
import numpy as np
import matplotlib.pyplot as plt
def geometric_means(a):
"""Return pairwise geometric means of adjacent elements."""
return np.sqrt(a[1:] * a[:-1])
a = [0.05, 0.1, 0.2, 1, 2, 3] * 2
b = [0.05, 0.1, 0.2, 1, 2, 3] * 3
# Find frequencies
bins = np.array([0, 0.1, 1, 10])
a_hist = np.histogram(a, bins=bins)[0]
b_hist = np.histogram(b, bins=bins)[0]
# Find log-scale mid-points for bar-edges
mid_vals = np.hstack((np.array([0.05,]), geometric_means(bins[1:])))
# Compute bar left-edges, and bar widths
a_x = np.empty(mid_vals.size * 2)
a_x = bins[:-1]
a_widths = mid_vals - bins[:-1]
b_x = np.empty(mid_vals.size * 2)
b_x = mid_vals
b_widths = bins[1:] - mid_vals
plt.bar(a_x, a_hist, width=a_widths, color='b')
plt.bar(b_x, b_hist, width=b_widths, color='g')
plt.gca().set_xscale("symlog", linthreshx=0.1)
plt.show()
And the final result:
Sorry, but the neat gaps between the bars get killed. Again, this can be fixed by doing the appropriate geometric interpolation, so that everything is linear on log-scale.
Just in case someone stumbles upon this problem:
This solution looks much more like the way it should be
plotting a histogram on a Log scale with Matplotlib