How do I create a bar chart that starts and ends in a certain range - matplotlib

I created a computer model (just for fun) to predict soccer match result. I ran a computer simulation to predict how many points that a team will gain. I get a list of simulation result for each team.
I want to plot something like confidence interval, but using bar chart.
I considered the following option:
I considered using matplotlib's candlestick, but this is not Forex price.
I also considered using matplotlib's errorbar, especially since it turns out I can mashes graphbar + errorbar, but it's not really what I am aiming for. I am actually aiming for something like Nate Silver's 538 election prediction result.
Nate Silver's is too complex, he colored the distribution and vary the size of the percentage. I just want a simple bar chart that plots on a certain range.
I don't want to resort to plot bar stacking like shown here

Matplotlib's barh (or bar) is probably suitable for this:
import numpy as np
import matplotlib.pylab as pl
x_mean = np.array([1, 3, 6 ])
x_std = np.array([0.3, 1, 0.7])
y = np.array([0, 1, 2 ])
pl.figure()
pl.barh(y, width=2*x_std, left=x_mean-x_std)
The bars have a horizontal width of 2*x_std and start at x_mean-x_std, so the center denotes the mean value.
It's not very pretty (yet), but highly customizable:

Related

Where is the list of available built-in colormap names?

Question
Where in the matplotlib documentations lists the name of available built-in colormap names to set as the name argument in matplotlib.cm.get_cmap(name)?
Choosing Colormaps in Matplotlib says:
Matplotlib has a number of built-in colormaps accessible via matplotlib.cm.get_cmap.
matplotlib.cm.get_cmap says:
matplotlib.cm.get_cmap(name=None, lut=None)
Get a colormap instance, defaulting to rc values if name is None.
name: matplotlib.colors.Colormap or str or None, default: None
https://www.kite.com/python/docs/matplotlib.pyplot.colormaps shows multiple names.
autumn sequential linearly-increasing shades of red-orange-yellow
bone sequential increasing black-white color map with a tinge of blue, to emulate X-ray film
cool linearly-decreasing shades of cyan-magenta
copper sequential increasing shades of black-copper
flag repetitive red-white-blue-black pattern (not cyclic at endpoints)
gray sequential linearly-increasing black-to-white grayscale
hot sequential black-red-yellow-white, to emulate blackbody radiation from an object at increasing temperatures
hsv cyclic red-yellow-green-cyan-blue-magenta-red, formed by changing the hue component in the HSV color space
inferno perceptually uniform shades of black-red-yellow
jet a spectral map with dark endpoints, blue-cyan-yellow-red; based on a fluid-jet simulation by NCSA [1]
magma perceptually uniform shades of black-red-white
pink sequential increasing pastel black-pink-white, meant for sepia tone colorization of photographs
plasma perceptually uniform shades of blue-red-yellow
prism repetitive red-yellow-green-blue-purple-...-green pattern (not cyclic at endpoints)
spring linearly-increasing shades of magenta-yellow
summer sequential linearly-increasing shades of green-yellow
viridis perceptually uniform shades of blue-green-yellow
winter linearly-increasing shades of blue-green
However, simply google 'matplotlib colormap names' seems not hitting the right documentation. I suppose there is a page listing the names as a enumeration or constant strings. Please help find it out.
There is some example code in the documentation (thanks to #Patrick Fitzgerald for posting the link in the comments, because it's not half as easy to find as it should be) which demonstrates how to generate a plot with an overview of the installed colormaps.
However, this uses an explicit list of maps, so it's limited to the specific version of matplotlib for which the documentation was written, as maps are added and removed between versions. To see what exactly your environment has, you can use this (somewhat crudely) adapted version of the code:
import numpy as np
import matplotlib.pyplot as plt
gradient = np.linspace(0, 1, 256)
gradient = np.vstack((gradient, gradient))
def plot_color_gradients(cmap_category, cmap_list):
# Create figure and adjust figure height to number of colormaps
nrows = len(cmap_list)
figh = 0.35 + 0.15 + (nrows + (nrows - 1) * 0.1) * 0.22
fig, axs = plt.subplots(nrows=nrows + 1, figsize=(6.4, figh))
fig.subplots_adjust(top=1 - 0.35 / figh, bottom=0.15 / figh,
left=0.2, right=0.99)
axs[0].set_title(cmap_category + ' colormaps', fontsize=14)
for ax, name in zip(axs, cmap_list):
ax.imshow(gradient, aspect='auto', cmap=plt.get_cmap(name))
ax.text(-0.01, 0.5, name, va='center', ha='right', fontsize=10,
transform=ax.transAxes)
# Turn off *all* ticks & spines, not just the ones with colormaps.
for ax in axs:
ax.set_axis_off()
cmaps = [name for name in plt.colormaps() if not name.endswith('_r')]
plot_color_gradients('all', cmaps)
plt.show()
This plots just all of them, without regarding the categories.
Since plt.colormaps() produces a list of all the map names, this version only removes all the names ending in '_r', (because those are the inverted versions of the other ones), and plots them all.
That's still a fairly long list, but you can have a look and then manually update/remove items from cmaps narrow it down to the ones you would consider for a given task.
You can also automatically reduce the list to monochrome/non-monochrome maps, because they provide that properties as an attribute:
cmaps_mono = [name for name in cmaps if plt.get_cmap(name).is_gray()]
cmaps_color = [name for name in cmaps if not plt.get_cmap(name).is_gray()]
That should at least give you a decent starting point.
It'd be nice if there was some way within matplotlib to select just certain types of maps (categorical, perceptually uniform, suitable for colourblind viewers ...), but I haven't found a way to do that automatically.
You can use my CMasher to make simple colormap overviews of a list of colormaps.
In your case, if you want to see what every colormap in MPL looks like, you can use the following:
import cmasher as cmr
import matplotlib.pyplot as plt
cmr.create_cmap_overview(plt.colormaps(), savefig='MPL_cmaps.png')
This will give you an overview with all colormaps that are registered in MPL, which will be all built-in colormaps and all colormaps my CMasher package adds, like shown below:

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

Visualizing Data, Tracking Specific SD Values

BLUF: I want to track a specific Std Dev, e.g. 1.0 to 1.25, by color coding it and making a separate KDF or other probability density graph.
What I want to do with this is be able to pick out other Std Dev ranges and get back new graphs that I can turn around and use to predict outcomes in that specific Std Dev.
Data: https://www.dropbox.com/s/y78pynq9onyw9iu/Data.csv?dl=0
What I have so far is normalized data that looks like a shotgun blast:
Code used to produce it:
data = pd.read_csv("Data.csv")
sns.jointplot(data.x,data.y, space=0.2, size=10, ratio=2, kind="reg");
What I want to achieve here looks like what I have marked up below:
I kind of know how to do this in RStudio using RidgePlot-type functions, but I'm at a loss here in Python, even while using Seaborn. Any/All help appreciated!
The following code might point you in the right directly, you can tweak the appearance of the plot as you please from there.
tips = sns.load_dataset("tips")
g = sns.jointplot(x="total_bill", y="tip", data=tips)
top_lim = 4
bottom_lim = 2
temp = tips.loc[(tips.tip>=bottom_lim)&(tips.tip<top_lim)]
g.ax_joint.axhline(top_lim, c='k', lw=2)
g.ax_joint.axhline(bottom_lim, c='k', lw=2)
# we have to create a secondary y-axis to the joint-plot, otherwise the
# kde might be very small compared to the scale of the original y-axis
ax_joint_2 = g.ax_joint.twinx()
sns.kdeplot(temp.total_bill, shade=True, color='red', ax=ax_joint_2, legend=False)
ax_joint_2.spines['right'].set_visible(False)
ax_joint_2.spines['top'].set_visible(False)
ax_joint_2.yaxis.set_visible(False)

how to shift x axis labesl on line plot?

I'm using pandas to work with a data set and am tring to use a simple line plot with error bars to show the end results. It's all working great except that the plot looks funny.
By default, it will put my 2 data groups at the far left and right of the plot, which obscures the error bar to the point that it's not useful (the error bars in this case are key to intpretation so I want them plainly visible).
Now, I fix that problem by setting xlim to open up some space on either end of the x axis so that the error bars are plainly visible, but then I have an offset from where the x labels are to where the actual x data is.
Here is a simplified example that shows the problem:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df6 = pd.DataFrame( [-0.07,0.08] , index = ['A','B'])
df6.plot(kind='line', linewidth=2, yerr = [ [0.1,0.1],[0.1,0.1 ] ], elinewidth=2,ecolor='green')
plt.xlim(-0.2,1.2) # Make some room at ends to see error bars
plt.show()
I tried to include a plot (image) showing the problem but I cannot post images yet, having just joined up and do not have anough points yet to post images.
What I want to know is: How do I shift these labels over one tick to the right?
Thanks in advance.
Well, it turns out I found a solution, which I will jsut post here in case anyone else has this same issue in the future.
Basically, it all seems to work better in the case of a line plot if you just specify both the labels and the ticks in the same place at the same time. At least that was helpful for me. It sort of forces you to keep the length of those two lists the same, which seems to make the assignment between ticks and labels more well behaved (simple 1:1 in this case).
So I coudl fix my problem by including something like this:
plt.xticks([0, 1], ['A','B'] )
right after the xlim statement in code from original question. Now the A and B align perfectly with the place where the data is plotted, not offset from it.
Using above solution it works, but is less good-looking since now the x grid is very coarse (this is purely and aesthetic consideration). I could fix that by using a different xtick statement like:
plt.xticks([-0.2, 0, 0.2, 0.4, 0.6, 0.8, 1.0], ['','A','','','','','B',''])
This gives me nice looking grid and the data where I need it, but of course is very contrived-looking here. In the actual program I'd find a way to make that less clunky.
Hope that is of some help to fellow seekers....

Dotted line style from non-evenly distributed data

I'm new to Python and MatPlotlib.
This is my first posting to Stackoverflow - I've been unable to find the answer elsewhere and would be grateful for your help.
I'm using Windows XP, with Enthought Canopy v1.1.1 (32 bit).
I want to plot a dotted-style linear regression line through a scatter plot of data, where both x and y arrays contain random floating point data.
The dots in the resulting dotted line are not distributed evenly along the regression line, and are "smeared together" in the middle of the red line, making it look messy (see upper plot resulting from attached minimal example code).
This does not seem to occur if the items in the array of x values are evenly distributed (lower plot).
I'm therefore guessing that this is an issue with how MatplotLib renders dotted lines, or with how Canopy interfaces Python with Matplotlib.
Please could you tell me a workaround which will make the dots on the dotted line type appear evenly distributed; even if both x and y data are non-evenly distributed; whilst still using Canopy and Matplotlib?
(As a general point, I'm always keen to improve my coding skills - if any code in my example can be written more neatly or concisely, I'd be grateful for your expertise).
Many thanks in anticipation
Dave
(UK)
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
#generate data
x1=10 * np.random.random_sample((40))
x2=np.linspace(0,10,40)
y=5 * np.random.random_sample((40))
slope, intercept, r_value, p_value, std_err = stats.linregress(x1,y)
line = (slope*x1)+intercept
plt.figure(1)
plt.subplot(211)
plt.scatter(x1,y,color='blue', marker='o')
plt.plot(x1,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
slope, intercept, r_value, p_value, std_err = stats.linregress(x2,y)
line = (slope*x2)+intercept
plt.subplot(212)
plt.scatter(x2,y,color='blue', marker='o')
plt.plot(x2,line,'r:',label="Regression Line")
plt.legend(loc='upper right')
plt.show()
Welcome to SO.
You have already identified the problem yourself, but seem a bit surprised that a random x-array results in the line be 'cluttered'. But you draw a dotted line repeatedly over the same location, so it seems like the normal behavior to me that it gets smeared at places where there are multiple dotted lines on top of each other.
If you don't want that, you can sort your array and use that to calculate the regression line and plot it. Since its a linear regression, just using the min and max values would also work.
x1_sorted = np.sort(x1)
line = (slope * x1_sorted) + intercept
or
x1_extremes = np.array([x1.min(),x1.max()])
line = (slope * x1_extremes) + intercept
The last should be faster if x1 becomes very large.
With regard to your last comment. In your example you use whats called the 'state-machine' environment for plotting. It means that specified commands are applied to the active figure and the active axes (subplots).
You can also consider the OO approach where you get figure and axes objects. This means you can access any figure or axes at any time, not just the active one. Its useful when passing an axes to a function for example.
In your example both would work equally well and it would be more a matter of taste.
A small example:
# create a figure with 2 subplots (2 rows, 1 column)
fig, axs = plt.subplots(2,1)
# plot in the first subplots
axs[0].scatter(x1,y,color='blue', marker='o')
axs[0].plot(x1,line,'r:',label="Regression Line")
# plot in the second
axs[1].plot()
etc...