imshow: labels as any arbitrary function of the image indices - matplotlib

imshow plots a matrix against its column indices (x axis) and row indices (y axis). I would like the axes labels to not be indices, but an arbitrary function of the indices.
e.g. pitch detection
imshow(A, aspect='auto') where A.shape == (88200,8)
in the x-axis, shows several ticks at about [11000, 22000, ..., 88000]
in the y-axis, shows the frequency bin [0,1,2,3,4,5,6,7]
What I want is:
x-axis labeling are normalized from samples to seconds. For a 2 second audio at 44.1kHz sample rate, I want two ticks at [1,2].
y-axis labeling is the pitch as a note. i want the labels in the note of the pitch ['c', 'd', 'e', 'f', 'g', 'a', 'b'].
ideally:
imshow(A, ylabel=lambda i: freqs[i], xlabel=lambda j: j/44100)

You can do this with a combination of Locators and Formatters (doc).
ax = gca()
ax.imshow(rand(500,500))
ax.get_xaxis().set_major_formatter(FuncFormatter(lambda x,p :"%.2f"%(x/44100)))
ax.get_yaxis().set_major_locator(LinearLocator(7))
ax.get_yaxis().set_major_formatter(FixedFormatter(['c', 'd', 'e', 'f', 'g', 'a', 'b']))
draw()

Related

Matplotlib plot with x-axis as binned data and y-axis as the mean value of various variables in the bin?

My apologies if this is rather basic; I can't seem to find a good answer yet because everything refers only to histograms. I have circular data, with a degrees value as the index. I am using pd.cut() to create bins of a few degrees in order to summarize the dataset. Then, I use df.groupby() and .mean() to calculate mean values of all columns for the respective bins.
Now - I would like to plot this, with the bins on the x-axis, and lines for the columns.
I tried to iterate over the columns, adding them as:
for i in df.columns:
ax.plot(df.index,df[i])
However, this gives me the error: "float() argument must be a string or number, not 'pandas._libs.interval.Interval'
Therefore, I assume it wants the x-axis values to be numbers or strings and not intervals. Is there a way I can make this work?
To get the dataframe containing the mean values of each variable with respect to bins, I used:
bins = np.arange(0,360,5)
df = df.groupby(pd.cut(df[Dir]),bins)).mean()
Here is what df looks like at the point of plotting - each column includes mean values for each variable 0,1,2 etc. for each bin, which I would like plotted on y-axis, and "Dir" is the index with bins.
0 1 2 3 4 5
Dir
(0, 5] 37.444135 2922.848675 3244.325904 4203.001446 36.262371 37.493497
(5, 10] 42.599494 3248.194328 3582.355759 4061.098517 36.351476 37.148341
(10, 15] 47.277694 2374.379517 2709.435714 2932.064076 36.537377 36.878293
(15, 20] 52.345712 2626.774240 2659.391040 3087.324800 36.114965 36.603918
(20, 25] 57.318976 2207.845000 2228.002353 2811.066176 36.279392 37.165979
(25, 30] 62.454386 2436.117405 2839.255696 3329.441772 36.762896 37.861577
(30, 35] 67.705955 3138.968411 3462.831977 4007.180620 36.462313 37.560977
(35, 40] 72.554786 2554.552620 2548.955581 3079.570159 36.256386 36.819579
(40, 45] 77.501479 2862.703066 2965.408491 2857.901887 36.170788 36.140976
(45, 50] 82.386679 2973.858188 2539.348967 2000.606359 36.067776 37.210645
We have multiple options, we can obtain the middle of the bin using as shown below. You can also access the left and right side of the bins, as described here. Let me know if you need any further help.
df = pd.DataFrame(data={'x': np.random.uniform(low=0, high=10, size=10), 'y': np.random.exponential(size=10)})
bins = range(0,360,5)
df['bin'] = pd.cut(df['x'], bins)
agg_df = df.groupby(by='bin').mean()
# this is the important step. We can obtain the interval index from the categorical input using this line.
mids = pd.IntervalIndex(agg_df.index.get_level_values('bin')).mid
# to apply for plots:
for col in df.columns:
plt.plot(mids, df[col])

Changing the value of a Numpy Array based on a probability and the value itself

I have a 2d Numpy Array:
a = np.reshape(np.random.choice(['b','c'],4), (2,2))
I want to go through each element and, with probability p=0.2 change the element. If the element is 'b' I want to change it to 'c' and vice versa.
I've tried all sorts (looping through with enumerate, where statements) but I can't seen to figure it out.
Any help would be appreciated.
You could generate a random mask with the wanted probability and use it to swap the values on a subset of the array:
# select 20% of cells
mask = np.random.choice([True, False], a.shape, p=[0.2, 0.8])
# swap the values for those
a[mask] = np.where(a=='b', 'c', 'b')[mask]
example output:
array([['b', 'c'],
['c', 'c']], dtype='<U1')

How can I print it out in this order: table, bar chart, table ...?

How can I print it out in this order: table, bar chart, table, bar chart, ...?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(100, 10),
columns=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
for column in df:
print(df[column].value_counts(normalize=True, bins=10))
print(df[column].hist(bins=10))
It prints all tables first. Then prints one joint bar chart. But I want to mix tables and bar charts.
What do you mean by tables? Are you doing plt.show() to get your plots?
for column in df:
print(df[column].value_counts(normalize=True, bins=10))
print(df[column].hist(bins=10))
plt.show()
Shows me the value value_counts with each individual plot. If you do it outside of the loop, the plots would just accumulate it unless you clear them.

Pandas Dataframe Create Seaborn Horizontal Barplot with categorical data

I'm currently working with a data frame like this:
What I want is to show the total numer of the Victory column where the value is S grouped by AGE_GROUP and differenced by GENDER, something like in the following horizontal barplot:
Until now I could obtain the following chart:
Following this steps:
victory_df = main_df[main_df["VICTORY"] == "S"]
victory_count = victory_df["AGE_GROUP"].value_counts()
sns.set(style="darkgrid")
sns.barplot(victory_count.index, victory_count.values, alpha=0.9)
Which strategy I should use to difference in the value_count by gender and include it in the chart?
It would obviously help giving raw data and not an image. Came up with own data.Not sure understood your question but my attempt below.
Data
df=pd.DataFrame.from_dict({'VICTORY':['S', 'S', 'N', 'N', 'N', 'S', 'N', 'S', 'N', 'S', 'N', 'S', 'S'],'AGE':[5., 88., 12., 19., 30., 43., 77., 50., 78., 34., 45., 9., 67.],'AGE_GROUP':['0-13', '65+', '0-13', '18-35', '18-35', '36-64', '65+', '36-64','65+', '18-35', '36-64', '0-13', '65+'],'GENDER':['M', 'M', 'F', 'M', 'F', 'F', 'M', 'F', 'F', 'F', 'M', 'M', 'F']})
Plotting. I groupby AGE_GROUP, value count GENDER, unstack and plot a stacked horizontal bar plot. Seaborn is build on matplotlib and when plotting is not straightforward in seaborn like the stacked horizontal bar, I fall back to matplotlib. Hope you dont take offence.
df[df['VICTORY']=='S'].groupby('AGE_GROUP')['GENDER'].apply(lambda x: x.value_counts()).unstack().plot(kind='barh', stacked=True)
plt.xlabel('Count')
plt.title('xxxx')
Output

How can I change the filled color of stacked area plot in DataFrame?

I want to change the filled color in the stacked area plots drawn with Pandas.Dataframe.
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
ax = df.plot.area(linewidth=0);
The area plot example
Now I guess that the instance return by the plot function offers the access to modifying the attributes like colors.
But the axes classes are too complicated to learn fast. And I failed to find similar questions in the Stack Overflow.
So can any master do me a favor?
Use 'colormap' (See the document for more details):
ax = df.plot.area(linewidth=0, colormap="Pastel1")
The trick is using the 'color' parameter:
Soln 1: dict
Simply pass a dict of {column name: color}
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'], )
ax = df.plot.area(color={'b':'0', 'c':'#17A589', 'a':'#9C640C', 'd':'#ECF0F1'})
Soln 2: sequence
Simply pass a sequence of color codes (it will match the order of your columns).
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'], )
ax = df.plot.area(color=('0', '#17A589', '#9C640C', '#ECF0F1'))
No need to set linewidth (it will automatically adjust colors). Also, this wouldn't mess with the legend.
The API of matplotlib is really complex, but here artist Module gives a very plain illustration. For the bar/barh plots, the attributes can be visited and modified by .patches, but for the area plot they need to be with .collections.
To achieve the specific modification, use codes like this.
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
ax = df.plot.area(linewidth=0);
for collection in ax.collections:
collection.set_facecolor('#888888')
highlight = 0
ax.collections[highlight].set_facecolor('#aa3333')
Other methods of the collections can be found by run
dir(ax.collections[highlight])