plotting stacked bar graph - pandas

i want to plot stacked bar graph using matplotlib and pandas.The below code plot the bargraph very nicely.However when i change a,b,c,d,e,f,g,h ..etc to January, February... it doesnot plot the same graph.It only plot alphabetical order.Are there anyway to overcome this problem.
import pandas as pd
import matplotlib.pyplot as plt
years=["2016","2017","2018","2019","2020", "2021"]
dataavail={
"a":[20,0,0,0,10,21],
"b":[20,13,10,18,15,45],
"c":[20,20,10,15,18,78],
"d":[20,20,10,15,18,75],
"e":[20,20,10,15,18,78],
"f":[20,20,10,15,18,78],
"g":[20,20,10,15,18,78],
"h":[20,20,10,15,18,78],
"i":[20,20,10,15,18,78],
"j":[20,20,10,15,18,78],
"k":[20,20,10,15,18,78],
"l":[20,20,0,0,0,20],
}
df=pd.DataFrame(dataavail,index=years)
df.plot(kind="bar",stacked=True,figsize=(10,8))
plt.legend(loc="centre",bbox_to_anchor=(0.8,1.0))
plt.show()
But when i change the code in the portion a,b,c...to January,February... it doesnot plot the same.
import pandas as pd
import matplotlib.pyplot as plt
years=["2016","2017","2018","2019","2020", "2021"]
dataavail={
"january":[20,0,0,0,10,21],
"February":[20,13,10,18,15,45],
"March":[20,20,10,15,18,78],
"April":[20,20,10,15,18,75],
"may":[20,20,10,15,18,78],
"June":[20,20,10,15,18,78],
"July":[20,20,10,15,18,78],
"August":[20,20,10,15,18,78],
"September":[20,20,10,15,18,78],
"October":[20,20,10,15,18,78],
"November":[20,20,10,15,18,78],
"December":[20,20,0,0,0,20],
}
df=pd.DataFrame(dataavail,index=years)
df.plot(kind="bar",stacked=True,figsize=(10,8))
plt.legend(loc="centre",bbox_to_anchor=(0.8,1.0))
plt.show()

With Python 3.9.7, your graphs look like the same:
>>> df_alpha
a b c d e f g h i j k l
2016 20 20 20 20 20 20 20 20 20 20 20 20
2017 0 13 20 20 20 20 20 20 20 20 20 20
2018 0 10 10 10 10 10 10 10 10 10 10 0
2019 0 18 15 15 15 15 15 15 15 15 15 0
2020 10 15 18 18 18 18 18 18 18 18 18 0
2021 21 45 78 75 78 78 78 78 78 78 78 20
>>> df_month
January February March April may June July August September October November December
2016 20 20 20 20 20 20 20 20 20 20 20 20
2017 0 13 20 20 20 20 20 20 20 20 20 20
2018 0 10 10 10 10 10 10 10 10 10 10 0
2019 0 18 15 15 15 15 15 15 15 15 15 0
2020 10 15 18 18 18 18 18 18 18 18 18 0
2021 21 45 78 75 78 78 78 78 78 78 78 20
Full-code:
import pandas as pd
import matplotlib.pyplot as plt
years = ['2016', '2017', '2018', '2019', '2020', '2021']
dataavail1 = {'a': [20, 0, 0, 0, 10, 21], 'b': [20, 13, 10, 18, 15, 45], 'c': [20, 20, 10, 15, 18, 78], 'd': [20, 20, 10, 15, 18, 75], 'e': [20, 20, 10, 15, 18, 78], 'f': [20, 20, 10, 15, 18, 78], 'g': [20, 20, 10, 15, 18, 78], 'h': [20, 20, 10, 15, 18, 78], 'i': [20, 20, 10, 15, 18, 78], 'j': [20, 20, 10, 15, 18, 78], 'k': [20, 20, 10, 15, 18, 78], 'l': [20, 20, 0, 0, 0, 20]}
dataavail2 = {'January': [20, 0, 0, 0, 10, 21], 'February': [20, 13, 10, 18, 15, 45], 'March': [20, 20, 10, 15, 18, 78], 'April': [20, 20, 10, 15, 18, 75], 'may': [20, 20, 10, 15, 18, 78], 'June': [20, 20, 10, 15, 18, 78], 'July': [20, 20, 10, 15, 18, 78], 'August': [20, 20, 10, 15, 18, 78], 'September': [20, 20, 10, 15, 18, 78], 'October': [20, 20, 10, 15, 18, 78], 'November': [20, 20, 10, 15, 18, 78], 'December': [20, 20, 0, 0, 0, 20]}
df_alpha = pd.DataFrame(dataavail1, index=years)
df_month = pd.DataFrame(dataavail2, index=years)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 8))
df_alpha.plot(kind='bar', stacked=True, colormap=plt.cm.tab20, ax=ax1, rot=0)
df_month.plot(kind='bar', stacked=True, colormap=plt.cm.tab20, ax=ax2, rot=0)
plt.show()
Update: the code also works with Python 3.7.12

Related

Create nested array for all unique indices in a pandas MultiIndex DataFrame

generate dummy data
np.random.seed(42)
df = pd.DataFrame({'subject': ['A'] * 10 + ['B'] * 10,
'trial': list(range(5)) * 4,
'value1': np.random.randint(0, 100, 20),
'value2': np.random.randint(0, 100, 20)
})
df = df.set_index(['subject', 'trial']).sort_index()
print(df)
value1 value2
subject trial
A 0 51 1
0 20 75
1 92 63
1 82 57
2 14 59
2 86 21
3 71 20
3 74 88
4 60 32
4 74 48
B 0 87 90
0 52 79
1 99 58
1 1 14
2 23 41
2 87 61
3 2 91
3 29 61
4 21 59
4 37 46
Notice: Each subject / trial combination has multiple rows.
I want to create a array with the rows as nested dimensions.
My (as I find ugly) data transformation via list
tmp=list()
for idx in df.index.unique():
tmp.append(df.loc[idx].to_numpy())
goal = np.array(tmp)
print(goal)
[[[51 1]
[20 75]]
...
[[21 59]
[37 46]]]
Can you show me a native pandas / numpy way to do it (without the list crutch)?
To be able to generate a non-ragged numpy array, the number of duplicates must be equal for all values. Thus you don't have to loop over them. Just find out the number and reshape
n = len(df)/(~df.index.duplicated()).sum()
assert n.is_integer()
out = df.to_numpy().reshape(-1, df.shape[1], int(n))
Output:
array([[[51, 1],
[20, 75]],
[[92, 63],
[82, 57]],
[[14, 59],
[86, 21]],
[[71, 20],
[74, 88]],
[[60, 32],
[74, 48]],
[[87, 90],
[52, 79]],
[[99, 58],
[ 1, 14]],
[[23, 41],
[87, 61]],
[[ 2, 91],
[29, 61]],
[[21, 59],
[37, 46]]])
You can use stack:
<code>df.stack().values
</code>
Output:
<code>array([[ 0, 25],
[16, 11],
[49, 87],
[38, 77],
[67, 6],
[27, 27],
[40, 0],
[22, 81],
[83, 89],
[36, 55],
[41, 1],
[13, 74],
[88, 61],
[85, 73],
[55, 66],
[44, 82],
[20, 30],
[82, 69],
[37, 71],
[30, 16],
[81, 96],
[ 0, 56],
[ 5, 99],
[73, 86]], dtype=int64)
</code>

How to plot my data using MatPloitLib with step size

Consider the following code and the graph obtained from it
import matplotlib.pyplot as plt
import numpy as np
fig,axs = plt.subplots(figsize=(10,10))
data1 = [5, 6, 18, 7, 19]
x_ax = [10, 20, 30, 40, 50]
y_ax = [0, 5, 10, 15, 20]
axs.plot(data1,marker="o")
axs.set_xticks(x_ax)
axs.set_xticklabels(labels=x_ax,rotation=45)
axs.set_yticks(y_ax)
axs.set_yticklabels(labels=y_ax,rotation=45)
axs.set_xlabel("X")
axs.set_ylabel("Y")
axs.set_title("Name")
I need to plot my data1 = [5, 6, 18, 7, 19] with a step size of 10. 5 for 10, 6 for 20, 18 for 30, 7 for 40 and 19 for 50. But the plot is taking a step size of one.
How can I modify my code to do the required?
If you don't provide x values to plot, it'll automatically use 0, 1, 2 ....
So in your case you need:
x = range(10, len(data1)*10+1, 10)
axs.plot(x, data1, marker="o")

Getting positions from two given numpy arrays

I have two set of given numbers (100,110), and (20, 30).
I wanted get numbers between them.
X = np.arange(100, 110)
Y = np.arange(20, 30)
print (X)
print (Y)
[100 101 102 103 104 105 106 107 108 109]
[20 21 22 23 24 25 26 27 28 29]
I wanted to get their positions as follows.
xy = np.array( [(x,y) for x in X for y in Y])
print (xy)
X_result = xy[:,0]
Y_result = xy[:,1]
The results are correct.
However, wondering if it could be obtained more directly and more faster.
Expected results are same as shown by the prints of (X_result and Y_result).
print (X_result)
print (Y_result)
[100 100 100 100 100 100 100 100 100 100 101 101 101 101 101 101 101 101
101 101 102 102 102 102 102 102 102 102 102 102 103 103 103 103 103 103
103 103 103 103 104 104 104 104 104 104 104 104 104 104 105 105 105 105
105 105 105 105 105 105 106 106 106 106 106 106 106 106 106 106 107 107
107 107 107 107 107 107 107 107 108 108 108 108 108 108 108 108 108 108
109 109 109 109 109 109 109 109 109 109]
[20 21 22 23 24 25 26 27 28 29 20 21 22 23 24 25 26 27 28 29 20 21 22 23
24 25 26 27 28 29 20 21 22 23 24 25 26 27 28 29 20 21 22 23 24 25 26 27
28 29 20 21 22 23 24 25 26 27 28 29 20 21 22 23 24 25 26 27 28 29 20 21
22 23 24 25 26 27 28 29 20 21 22 23 24 25 26 27 28 29 20 21 22 23 24 25
26 27 28 29]
Edit.
I noticed that what I wanted is:
X_result, Y_result = np.meshgrid(X, Y)
print (X_result.flatten())
print (Y_result.flatten())
Please let me know if there is other better ways of doing it.
You can use numpy.meshgrid:
np.meshgrid(X, Y, indexing='ij')
[array([[100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
[101, 101, 101, 101, 101, 101, 101, 101, 101, 101],
[102, 102, 102, 102, 102, 102, 102, 102, 102, 102],
[103, 103, 103, 103, 103, 103, 103, 103, 103, 103],
[104, 104, 104, 104, 104, 104, 104, 104, 104, 104],
[105, 105, 105, 105, 105, 105, 105, 105, 105, 105],
[106, 106, 106, 106, 106, 106, 106, 106, 106, 106],
[107, 107, 107, 107, 107, 107, 107, 107, 107, 107],
[108, 108, 108, 108, 108, 108, 108, 108, 108, 108],
[109, 109, 109, 109, 109, 109, 109, 109, 109, 109]]), array([[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])]

Appending numpy arrays using numpy.insert

I have a numpy array (inputs) of shape (30,1). I want to insert 31st value (eg. x = 2). Trying to use the np.insert function but it is giving me out of bounds error.
np.insert(inputs,b+1,x)
IndexError: index 31 is out of bounds for axis 0 with size 30
Short answer: you need to insert it at index b, not b+1.
The index you pass to np.insert(..) [numpy-doc], is the one where the element should be added. If you insert it at index 30, then it will be positioned last. Note that indexes are zero-based. So if you have an array with 30 elements, then the last index is 29. If you thus insert this at index 30, we get:
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
>>> np.insert(a,30,42)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 42])

MultiPoint crossover using Numpy

I am trying to do crossover on a Genetic Algorithm population using numpy.
I have sliced the population using parent 1 and parent 2.
population = np.random.randint(2, size=(4,8))
p1 = population[::2]
p2 = population[1::2]
But I am not able to figure out any lambda or numpy command to do a multi-point crossover over parents.
The concept is to take ith row of p1 and randomly swap some bits with ith row of p2.
I think you want to select from p1 and p2 at random, cell by cell.
To make it easier to understand i've changed p1 to be 10 to 15 and p2 to be 20 to 25. p1 and p2 were generated at random in these ranges.
p1
Out[66]:
array([[15, 15, 13, 14, 12, 13, 12, 12],
[14, 11, 11, 10, 12, 12, 10, 12],
[12, 11, 14, 15, 14, 10, 13, 10],
[11, 12, 10, 13, 14, 13, 12, 13]])
In [67]: p2
Out[67]:
array([[23, 25, 24, 21, 24, 20, 24, 25],
[21, 21, 20, 20, 25, 22, 24, 22],
[24, 22, 25, 20, 21, 22, 21, 22],
[22, 20, 21, 22, 25, 23, 22, 21]])
In [68]: sieve=np.random.randint(2, size=(4,8))
In [69]: sieve
Out[69]:
array([[0, 1, 0, 1, 1, 0, 1, 0],
[1, 1, 1, 0, 0, 1, 1, 1],
[0, 1, 1, 0, 0, 1, 1, 0],
[0, 0, 0, 1, 1, 1, 1, 1]])
In [70]: not_sieve=sieve^1 # Complement of sieve
In [71]: pn = p1*sieve + p2*not_sieve
In [72]: pn
Out[72]:
array([[23, 15, 24, 14, 12, 20, 12, 25],
[14, 11, 11, 20, 25, 12, 10, 12],
[24, 11, 14, 20, 21, 10, 13, 22],
[22, 20, 21, 13, 14, 13, 12, 13]])
The numbers in the teens come from p1 when sieve is 1
The numbers in the twenties come from p2 when sieve is 0
This may be able to be made more efficient but is this what you expect as output?