plotting 2 dictionaries in matplotlib - matplotlib

I have 2 dictionaries: dict1 = {'Beef':10, 'Poultry': 13, 'Pork': 14, 'Lamb': 11} and dict2 = {'Beef':3, 'Poultry': 1, 'Pork': 17, 'Lamb': 16}
I want to plot a double bar chart using the dictionary keys as the x-axis values, and the associated values on the y-axis. I am using matplotlib for this. does anyone have any information?

This part of the matplotlib documentation may what you are looking for. To plot your data, the x and y values need to be extracted from the dicts, for example via dict.keys() and dict.values().
import matplotlib.pyplot as plt
import numpy as np
dict1 = {'Beef':10, 'Poultry': 13, 'Pork': 14, 'Lamb': 11}
dict2 = {'Beef':3, 'Poultry': 1, 'Pork': 17, 'Lamb': 16}
x = dict1.keys()
y1 = dict1.values()
y2 = dict2.values()
N = len(x)
fig, ax = plt.subplots()
ind = np.arange(N) # the x locations for the groups
width = 0.35 # the width of the bars
p1 = ax.bar(ind, y1, width)
p2 = ax.bar(ind + width, y2, width)
ax.set_xticks(ind + width / 2)
ax.set_xticklabels(x)
ax.legend((p1[0], p2[0]), ('dict1', 'dict2'))
plt.show()
Result:

I'd like to propose a more general approach: instead of just two dicts, what happens if we have a list of dictionaries?
In [89]: from random import randint, seed, shuffle
...: seed(20201213)
...: cats = 'a b c d e f g h i'.split() # categories
...: # List Of Dictionaries
...: lod = [{k:randint(5, 15) for k in shuffle(cats) or cats[:-2]} for _ in range(5)]
...: lod
Out[89]:
[{'d': 14, 'h': 10, 'i': 13, 'f': 13, 'c': 5, 'b': 5, 'a': 14},
{'h': 12, 'd': 5, 'c': 5, 'i': 11, 'b': 14, 'g': 8, 'e': 13},
{'d': 8, 'a': 12, 'f': 7, 'h': 10, 'g': 10, 'c': 11, 'i': 12},
{'g': 11, 'f': 8, 'i': 14, 'h': 11, 'a': 5, 'c': 7, 'b': 8},
{'e': 11, 'h': 13, 'c': 5, 'i': 8, 'd': 12, 'a': 11, 'g': 11}]
As you can see, the keys are not ordered in the same way and the dictionaries do not contain all the possible keys...
Our first step is to find a list of keys (lok), using a set comprehension, followed by sorting the keys (yes, we already know the keys, but here we are looking for a general solution…)
In [90]: lok = sorted(set(k for d in lod for k in d))
The number of elements in the two lists are
In [91]: nk, nd = len(lok), len(lod)
At this point we can compute the width of a single bar, saying that the bar groups are 1 unit apart (hence x = range(nk)) and that we leave 1/3 unit between the groups, we have
In [92]: x, w = range(nk), 0.67/nd
We are ready to go with the plot
In [93]: import matplotlib.pyplot as plt
...: for n, d in enumerate(lod):
...: plt.bar([ξ+n*w for ξ in x], [d.get(k, 0) for k in lok], w,
...: label='dict %d'%(n+1))
...: plt.xticks([ξ+w*nd/2 for ξ in x], lok)
...: plt.legend();
Let's write a small function
def plot_lod(lod, ws=0.33, ax=None, legend=True):
"""bar plot from the values in a list of dictionaries.
lod: list of dictionaries,
ws: optional, white space between groups of bars as a fraction of unity,
ax: optional, the Axes object to draw into,
legend: are we going to draw a legend?
Return: the Axes used to plot and a list of BarContainer objects."""
from matplotlib.pyplot import subplot
from numpy import arange, nan
if ax is None : ax = subplot()
lok = sorted({k for d in lod for k in d})
nk, nd = len(lok), len(lod)
x, w = arange(nk), (1.0-ws)/nd
lobars = [
ax.bar(x+n*w, [d.get(k, nan) for k in lok], w, label='%02d'%(n+1))
for n, d in enumerate(lod)
]
ax.set_xticks(x+w*nd/2-w/2)
ax.set_xticklabels(lok)
if legend : ax.legend()
return ax, lobars
Using the data of the previous example, we get a slightly different graph…

Related

How to plot the relation between an array's columns and rows mean value

I'm a newcomer to Pandas and Matplotlib, trying to plot a relation between the mean value of my array's rows and columns. The result I'm looking for is something like this:
"linhas" refers to the rows and "colunas" refers to the columns. The Y label refers to the means and the X label refers to the number of columns in my array
I came up with some solutions, as shown below:
print(arr)
df = pd.DataFrame(arr)
display(df)
num_cols = [df.shape[1]]
print(type(num_cols))
print(num_cols)
cols = df.count(axis=1)
lcols = cols.tolist()
print(type(lcols))
col_mean = df[:].mean(axis=0)
print(type(col_mean))
col_mean.tolist()
row_mean = df[:].mean(axis=1)
print(type(row_mean))
row_mean.tolist()
print(type(row_mean))
print(row_mean)
dados = pd.DataFrame({
'Colunas': col_mean,
'Linhas': row_mean
}, index=lcols)
lines = dados.plot.line()
What I was looking after is something like this:
"linhas" refers to the rows and "colunas" refers to the columns. The Y label refers to the means and the X label refers to the number of columns in my array
Unfortunately, my output is totally wrong, as follows:
My output
Any help would be deeply appreciated, as I'm a bit lost right now.
Thanks in advance!
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# just a dummy array
arr = np.array([[37, 68, 1, 19, 6],
[ 0, 14, 32, 73, 53],
[37, 85, 67, 30, 91],
[42, 52, 6, 42, 85],
[82, 26, 44, 38, 48],
[54, 55, 23, 46, 78]])
n_rows, n_cols = arr.shape
df = pd.DataFrame(arr)
col_mean = df.mean(axis=0)
row_mean = df.mean(axis=1)
plt.plot(range(1, n_rows+1), row_mean, marker='^', c='orange', label='rows')
plt.plot(range(1, n_cols+1), col_mean, marker='o', c='blue', label='cols')
plt.xlabel('Label x axis')
plt.ylabel('Label y axis')
plt.title('Title plotting')
plt.legend()

Cublic spline interpolation produces straight lines

I would like to obtain a smooth curve going through specific points with integer coordinates. Instead of that I get straight line segments between the points. I tried interp1d(x,y,kind='cubic') and also CubicSpline, nothing works. Here is my code:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d,CubicSpline
x = np.arange(34)
y = [8,3,0,1,6,2,1,7,6,2,0,2,6,0,1,6,2,2,0,2,7,0,2,8,6,3,6,2,0,1,6,2,7,2]
f = CubicSpline(x, y)
plt.figure(figsize=(10,3))
plt.plot(x, y, 'o', x, f(x))
plt.show()
and here is the result:
Can you tell me how to get smooth curves instead?
Now you are using the original x-values to draw the curve. You need a new array with much more intermediate x-values. Numpy's np.linspace() creates such an array between a given minimum and maximum.
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d, CubicSpline
y = [8, 3, 0, 1, 6, 2, 1, 7, 6, 2, 0, 2, 6, 0, 1, 6, 2, 2, 0, 2, 7, 0, 2, 8, 6, 3, 6, 2, 0, 1, 6, 2, 7, 2]
x = np.arange(len(y))
f = CubicSpline(x, y)
plt.figure(figsize=(10, 3))
xs = np.linspace(x.min(), x.max(), 500)
plt.plot(x, y, 'o', xs, f(xs))
plt.tight_layout()
plt.show()

Coloring minimum bars in seaborn FacetGrid barplot

Any easy way to automatically color (or mark in any way) the minimum/maximum bars for each plot of a FacetGrid?
For example, how to mark the minimal Z value on each one of the following 16 plots?
df = pd.DataFrame({'A':[10, 20, 30, 40]*4, 'Y':[1,2,3,4]*4, 'W':range(16), 'Z':range(16)})
g = sns.FacetGrid(df, row="A", col="Y", sharey=False)
g.map(sns.barplot, "W", "Z")
plt.show()
The following approach loops through the diagonal axes, for each ax searches the minimum height of the bars and then colors those:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.DataFrame({'A': [10, 20, 30, 40] * 4, 'Y': [1, 2, 3, 4] * 4, 'W': range(16), 'Z': range(16)})
g = sns.FacetGrid(df, row="A", col="Y", sharey=False)
g.map(sns.barplot, "W", "Z")
for i in range(len(g.axes)):
ax = g.axes[i, i]
min_height = min([p.get_height() for p in ax.patches])
for p in ax.patches:
if p.get_height() == min_height:
p.set_color('red')
plt.tight_layout()
plt.show()

Managing high dimensions in Numpy

I want to write a function of 4 variables : f(x1,x2,x3,x4), each in a different dimension.
This can be achieved by f(x1,x2[newaxis],x3[newaxis,newaxis],x4[newaxis,newaxis,newaxis]).
Do you know a smarter way ?
You're looking for np.ix_1:
f(*np.ix_(x1, x2, x3, x4))
For example:
>>> np.ix_([1, 2, 3], [4, 5])
(array([[1],
[2],
[3]]), array([[4, 5]]))
1Or equivalently, np.meshgrid(..., sparse=True, indexing='ij')
One way would be to reshape each array giving appropriate number of singleton dimensions along the leading axes. To do this across all arrays, we could use a list comprehension.
Thus, one way to handle generic number of input arrays would be -
L = [x1,x2,x3,x4]
out = [l.reshape([1]*i + [len(l)]) for i,l in enumerate(L)]
Sample run -
In [186]: # Initialize input arrays
...: x1 = np.random.randint(0,9,(4))
...: x2 = np.random.randint(0,9,(2))
...: x3 = np.random.randint(0,9,(5))
...: x4 = np.random.randint(0,9,(3))
...:
In [187]: A = x1,x2[None],x3[None,None],x4[None,None,None]
In [188]: L = [x1,x2,x3,x4]
...: out = [l.reshape([1]*i + [len(l)]) for i,l in enumerate(L)]
...:
In [189]: A
Out[189]:
(array([2, 1, 1, 1]),
array([[8, 2]]),
array([[[0, 3, 5, 8, 7]]]),
array([[[[6, 7, 0]]]]))
In [190]: out
Out[190]:
[array([2, 1, 1, 1]),
array([[8, 2]]),
array([[[0, 3, 5, 8, 7]]]),
array([[[[6, 7, 0]]]])]

connecting all numpy array plot points to each other using plt.plot() from matplotlib

I have a numpy array with xy co-ordinates for points. I have plotted each of these points and want a line connecting each point to every other point (a complete graph). The array is a 2x50 structure so I have transposed it and used a view to let me iterate through the rows. However, I am getting an 'index out of bounds' error with the following:
plt.plot(*zip(*v.T)) #to plot all the points
viewVX = (v[0]).T
viewVY = (v[1]).T
for i in range(0, 49):
xPoints = viewVX[i], viewVX[i+1]
print("xPoints is", xPoints)
yPoints = viewVY[i+2], viewVY[i+3]
print("yPoints is", yPoints)
xy = xPoints, yPoints
plt.plot(*zip(*xy), ls ='-')
I was hoping that the indexing would 'wrap-around' so that for the ypoints, it'd start with y0, y1 etc. Is there an easier way to accomplish what I'm trying to achieve?
import matplotlib.pyplot as plt
import numpy as np
import itertools
v=np.random.random((2,50))
plt.plot(
*zip(*itertools.chain.from_iterable(itertools.combinations(v.T,2))),
marker='o', markerfacecolor='red')
plt.show()
The advantage of doing it this way is that there are fewer calls to plt.plot. This should be significantly faster than methods that make O(N**2) calls to plt.plot.
Note also that you do not need to plot the points separately. Instead, you can use the marker='o' parameter.
Explanation: I think the easiest way to understand this code is to see how it operates on a simple v:
In [4]: import numpy as np
In [5]: import itertools
In [7]: v=np.arange(8).reshape(2,4)
In [8]: v
Out[8]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
itertools.combinations(...,2) generates all possible pairs of points:
In [10]: list(itertools.combinations(v.T,2))
Out[10]:
[(array([0, 4]), array([1, 5])),
(array([0, 4]), array([2, 6])),
(array([0, 4]), array([3, 7])),
(array([1, 5]), array([2, 6])),
(array([1, 5]), array([3, 7])),
(array([2, 6]), array([3, 7]))]
Now we use itertools.chain.from_iterable to convert this list of pairs of points into a (flattened) list of points:
In [11]: list(itertools.chain.from_iterable(itertools.combinations(v.T,2)))
Out[11]:
[array([0, 4]),
array([1, 5]),
array([0, 4]),
array([2, 6]),
array([0, 4]),
array([3, 7]),
array([1, 5]),
array([2, 6]),
array([1, 5]),
array([3, 7]),
array([2, 6]),
array([3, 7])]
If we plot these points one after another, connected by lines, we get our complete graph. The only problem is that plt.plot(x,y) expects x to be a sequence of x-values, and y to be a sequence of y-values.
We can use zip to convert the list of points into a list of x-values and y-values:
In [12]: zip(*itertools.chain.from_iterable(itertools.combinations(v.T,2)))
Out[12]: [(0, 1, 0, 2, 0, 3, 1, 2, 1, 3, 2, 3), (4, 5, 4, 6, 4, 7, 5, 6, 5, 7, 6, 7)]
The use of the splat operator (*) in zip and plt.plot is explained here.
Thus we've managed to massage the data into the right form to be fed to plt.plot.
With a 2 by 50 array,
for i in range(0, 49):
xPoints = viewVX[i], viewVX[i+1]
print("xPoints is", xPoints)
yPoints = viewVY[i+2], viewVY[i+3]
would get out of bounds for i = 47 and i = 48 since you use i+2 and i+3 as indices into viewVY.
This is what I came up with, but I hope someone comes up with something better.
def plot_complete(v):
for x1, y1 in v.T:
for x2, y2, in v.T:
plt.plot([x1, x2], [y1, y2], 'b')
plt.plot(v[0], v[1], 'sr')
The 'b' makes the lines blue, and 'sr' marks the points with red squares.
Have figured it out. Basically used simplified syntax provided by #Bago for plotting and considered #Daniel's indexing tip. Just have to iterate through each xy set of points and construct a new set of xx' yy' set of points to use to send to plt.plot():
viewVX = (v[0]).T #this is if your matrix is 2x100 ie row [0] is x and row[1] is y
viewVY = (v[1]).T
for i in range(0, v.shape[1]): #v.shape[1] gives the number of columns
for j in range(0, v.shape[1]):
xPoints = viewVX[j], viewVX[i]
yPoints = viewVY[j], viewVY[i]
xy = [xPoints, yPoints] #tuple/array of xx, yy point
#print("xy points are", xy)
plt.plot(xy[0],xy[1], ls ='-')