Draw a point at the mean peak of a distplot or kdeplot in Seaborn - pandas

I'm interested in automatically plotting a point just above the mean peak of a distribution, represented by a kdeplot or distplot with kde. Plotting points and lines manually is simple, but I'm having difficulty deriving this maximal coordinate point.
For example, the kdeplot generated below should have a point drawn at about (3.5, 1.0):
iris = sns.load_dataset("iris")
setosa = iris.loc[iris.species == "setosa"]
sns.kdeplot(setosa.sepal_width)
This question is serving the ultimate goal to draw a line across to the next peak (two distributions in one graph) with a t-statistic printed above it.

Here is one way to do it. The idea here is to first extract the x and y-data of the line object in the plot. Then, get the id of the peak and finally plot the single (x,y) point corresponding to the peak of the distribution.
import numpy as np
import seaborn as sns
iris = sns.load_dataset("iris")
setosa = iris.loc[iris.species == "setosa"]
ax = sns.kdeplot(setosa.sepal_width)
x = ax.lines[0].get_xdata() # Get the x data of the distribution
y = ax.lines[0].get_ydata() # Get the y data of the distribution
maxid = np.argmax(y) # The id of the peak (maximum of y data)
plt.plot(x[maxid],y[maxid], 'bo', ms=10)

Related

Plotting fuzzy data with matplotlib

I don't know where to start, as I think it is a new approach for me. Using matplotlib with python, I would like to plot a set of fuzzy numbers (for instance a set of triangular or bell curve fuzzy numbers) as in the picture below:
You can plot the curves recurrently. My try at reproducing your example (including the superposition of labels 1 and 6):
import matplotlib.pyplot as plt
import numpy as np
# creating the figure and axis
fig, ax = plt.subplots(1,1,constrained_layout=True)
# generic gaussian
y = np.linspace(-1,1,100)
x = np.exp(-5*y**2)
center_x = (0,2,4,1,3,0,5)
center_y = (6,2,3,4,5,6,7)
# loop for all the values
for i in range(len(center_x)):
x_c, y_c = center_x[i], center_y[i]
# plotting the several bells, relocated to (x_c, y_c)
ax.plot(x + x_c,y + y_c,
color='red',linewidth=2.0)
ax.plot(x_c,y_c,
'o',color='blue',markersize=3)
# adding label
ax.annotate(
str(i+1),
(x_c - 0.1,y_c), # slight shift in x
horizontalalignment='right',
verticalalignment='center',
color='blue',
)
ax.grid()
Every call to ax.plot() is adding points or curves (to be more precise, Artists) to the same axis. The same for ax.annotate() to create the labels.

Dynamically scaling axes during a matplotlib ArtistAnimation

It appears to be impossible to change the y and x axis view limits during an ArtistAnimation, and have the frames replayed with different axis limits.
The limits seem to fixed to those set last before the animation function is called.
In the code below, I have two plotting stages. The input data in the second plot is a much smaller subset of the data in the 1st frame. The data in the 1st stage has a much wider range.
So, I need to "zoom in" when displaying the second plot (otherwise the plot would be very tiny if the axis limits remain the same).
The two plots are overlaid on two different images (that are of the same size, but different content).
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import matplotlib.image as mpimg
import random
# sample 640x480 image. Actual frame loops through
# many different images, but of same size
image = mpimg.imread('image_demo.png')
fig = plt.figure()
plt.axis('off')
ax = fig.gca()
artists = []
def plot_stage_1():
# both x, y axis limits automatically set to 0 - 100
# when we call ax.imshow with this extent
im_extent = (0, 100, 0, 100) # (xmin, xmax, ymin, ymax)
im = ax.imshow(image, extent=im_extent, animated=True)
# y axis is a list of 100 random numbers between 0 and 100
p, = ax.plot(range(100), random.choices(range(100), k=100))
# Text label at 90, 90
t = ax.text(im_extent[1]*0.9, im_extent[3]*0.9, "Frame 1")
artists.append([im, t, p])
def plot_stage_2():
# axes remain at the the 0 - 100 limit from the previous
# imshow extent so both the background image and plot are tiny
im_extent = (0, 10, 0, 10)
# so let's update the x, y axis limits
ax.set_xlim(im_extent[0], im_extent[1])
ax.set_ylim(im_extent[0], im_extent[3])
im = ax.imshow(image, extent=im_extent, animated=True)
p, = ax.plot(range(10), random.choices(range(10), k=10))
# Text label at 9, 9
t = ax.text(im_extent[1]*0.9, im_extent[3]*0.9, "Frame 2")
artists.append([im, t, p])
plot_stage_1()
plot_stage_2()
# clear white space around plot
fig.subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=None, hspace=None)
# set figure size
fig.set_size_inches(6.67, 5.0, True)
anim = animation.ArtistAnimation(fig, artists, interval=2000, repeat=False, blit=False)
plt.show()
If I call just one of the two functions above, the plot is fine. However, if I call both, the axis limits in both frames will be 0 - 10, 0 - 10. So frame 1 will be super zoomed in.
Also calling ax.set_xlim(0, 100), ax.set_ylim(0, 100) in plot_stage_1() doesn't help. The last set_xlim(), set_ylim() calls fix the axis limits throughout all frames in the animation.
I could keep the axis bounds fixed and apply a scaling function to the input data.
However, I'm curious to know whether I can simply change the axis limits -- my code will be better this way, because the actual code is complicated with multiple stages, zooming plots across many different ranges.
Or perhaps I have to rejig my code to use FuncAnimation, instead of ArtistAnimation?
FuncAnimation appears to result in the expected behavior. So I'm changing my code to use that instead of ArtistAnimation.
Still curious to know though, whether this can at all be done using ArtistAnimation.

plotting graph of 3 parameters (PosX ,PosY) vs Time .It is a timeseries data

I am new to this module. I have time series data for movement of particle against time. The movement has its X and Y component against the the time T. I want to plot these 3 parameters in the graph. The sample data looks like this. The first coloumn represent time, 2nd- Xcordinate , 3rd Y-cordinate.
1.5193 618.3349 487.5595
1.5193 619.3349 487.5595
2.5193 619.8688 489.5869
2.5193 620.8688 489.5869
3.5193 622.9027 493.3156
3.5193 623.9027 493.3156
If you want to add a 3rd info to a 2D curve, one possibility is to use a color mapping instituting a relationship between the value of the 3rd coordinate and a set of colors.
In Matplotlib we have not a direct way of plotting a curve with changing color, but we can fake one using matplotlib.collections.LineCollection.
In the following I've used some arbitrary curve but I have no doubt that you could adjust my code to your particular use case if my code suits your needs.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
# e.g., a Lissajous curve
t = np.linspace(0, 2*np.pi, 6280)
x, y = np.sin(4*t), np.sin(5*t)
# to use LineCollection we need an array of segments
# the canonical answer (to upvote...) is https://stackoverflow.com/a/58880037/2749397
points = np.array([x, y]).T.reshape(-1,1,2)
segments = np.concatenate([points[:-1],points[1:]], axis=1)
# instantiate the line collection with appropriate parameters,
# the associated array controls the color mapping, we set it to time
lc = LineCollection(segments, cmap='nipy_spectral', linewidth=6, alpha=0.85)
lc.set_array(t)
# usual stuff, just note ax.autoscale, not needed here because we
# replot the same data but tipically needed with ax.add_collection
fig, ax = plt.subplots()
plt.xlabel('x/mm') ; plt.ylabel('y/mm')
ax.add_collection(lc)
ax.autoscale()
cb = plt.colorbar(lc)
cb.set_label('t/s')
# we plot a thin line over the colormapped line collection, especially
# useful when our colormap contains white...
plt.plot(x, y, color='black', linewidth=0.5, zorder=3)
plt.show()

How to change pyplot.specgram x and y axis scaling?

I have never worked with audio signals before and little do I know about signal processing. Nevertheless, I need to represent and audio signal using pyplot.specgram function from matplotlib library. Here is how I do it.
import matplotlib.pyplot as plt
import scipy.io.wavfile as wavfile
rate, frames = wavfile.read("song.wav")
plt.specgram(frames)
The result I am getting is this nice spectrogram below:
When I look at x-axis and y-axis which I suppose are frequency and time domains I can't get my head around the fact that frequency is scaled from 0 to 1.0 and time from 0 to 80k.
What is the intuition behind it and, what's more important, how to represent it in a human friendly format such that frequency is 0 to 100k and time is in sec?
As others have pointed out, you need to specify the sample rate, else you get a normalised frequency (between 0 and 1) and sample index (0 to 80k). Fortunately this is as simple as:
plt.specgram(frames, Fs=rate)
To expand on Nukolas answer and combining my Changing plot scale by a factor in matplotlib
and
matplotlib intelligent axis labels for timedelta
we can not only get kHz on the frequency axis, but also minutes and seconds on the time axis.
import matplotlib.pyplot as plt
import scipy.io.wavfile as wavfile
cmap = plt.get_cmap('viridis') # this may fail on older versions of matplotlib
vmin = -40 # hide anything below -40 dB
cmap.set_under(color='k', alpha=None)
rate, frames = wavfile.read("song.wav")
fig, ax = plt.subplots()
pxx, freq, t, cax = ax.specgram(frames[:, 0], # first channel
Fs=rate, # to get frequency axis in Hz
cmap=cmap, vmin=vmin)
cbar = fig.colorbar(cax)
cbar.set_label('Intensity dB')
ax.axis("tight")
# Prettify
import matplotlib
import datetime
ax.set_xlabel('time h:mm:ss')
ax.set_ylabel('frequency kHz')
scale = 1e3 # KHz
ticks = matplotlib.ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x/scale))
ax.yaxis.set_major_formatter(ticks)
def timeTicks(x, pos):
d = datetime.timedelta(seconds=x)
return str(d)
formatter = matplotlib.ticker.FuncFormatter(timeTicks)
ax.xaxis.set_major_formatter(formatter)
plt.show()
Result:
Firstly, a spectrogram is a representation of the spectral content of a signal as a function of time - this is a frequency-domain representation of the time-domain waveform (e.g. a sine wave, your file "song.wav" or some other arbitrary wave - that is, amplitude as a function of time).
The frequency values (y-axis, Hertz) are wholly dependant on the sampling frequency of your waveform ("song.wav") and will range from "0" to "sampling frequency / 2", with the upper limit being the "nyquist frequency" or "folding frequency" (https://en.wikipedia.org/wiki/Aliasing#Folding). The matplotlib specgram function will automatically determine the sampling frequency of the input waveform if it is not otherwise specified, which is defined as 1 / dt, with dt being the time interval between discrete samples of the waveform. You can can pass the option Fs='sampling rate' to the specgram function to manually define what it is. It will be easier for you to get your head around what is going on if you figure out and pass these variables to the specgram function yourself
The time values (x-axis, seconds) are purely dependent on the length of your "song.wav". You may notice some whitespace or padding if you use a large window length to calculate each spectra slice (think- the individual spectra which are arranged vertically and tiled horizontally to create the spectrogram image)
To make the axes more intuitive in the plot, use x- and y-axes labels and you can also scale the axes values (i.e. change the units) using a method similar to this
Take home message - try to be a bit more verbose with your code: see below for my example.
import matplotlib.pyplot as plt
import numpy as np
# generate a 5Hz sine wave
fs = 50
t = np.arange(0, 5, 1.0/fs)
f0 = 5
phi = np.pi/2
A = 1
x = A * np.sin(2 * np.pi * f0 * t +phi)
nfft = 25
# plot x-t, time-domain, i.e. source waveform
plt.subplot(211)
plt.plot(t, x)
plt.xlabel('time')
plt.ylabel('amplitude')
# plot power(f)-t, frequency-domain, i.e. spectrogram
plt.subplot(212)
# call specgram function, setting Fs (sampling frequency)
# and nfft (number of waveform samples, defining a time window,
# for which to compute the spectra)
plt.specgram(x, Fs=fs, NFFT=nfft, noverlap=5, detrend='mean', mode='psd')
plt.xlabel('time')
plt.ylabel('frequency')
plt.show()
5Hz_spectrogram:

Matplotlib plotting a single line that continuously changes color

I would like to plot a curve in the (x,y) plane, where the color of the curve depends on a value of another variable T. x is a 1D numpy array, y is a 1D numpy array.
T=np.linspace(0,1,np.size(x))**2
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x,y)
I want the line to change from blue to red (using RdBu colormap) depending on the value of T (one value of T exists for every (x,y) pair).
I found this, but I don't know how to warp it to my simple example. How would I use the linecollection for my example? http://matplotlib.org/examples/pylab_examples/multicolored_line.html
Thanks.
One idea could be to set the color using color=(R,G,B) then split your plot into n segments and continuously vary either one of the R, G or B (or a combinations)
import pylab as plt
import numpy as np
# Make some data
n=1000
x=np.linspace(0,100,n)
y=np.sin(x)
# Your coloring array
T=np.linspace(0,1,np.size(x))**2
fig = plt.figure()
ax = fig.add_subplot(111)
# Segment plot and color depending on T
s = 10 # Segment length
for i in range(0,n-s,s):
ax.plot(x[i:i+s+1],y[i:i+s+1],color=(0.0,0.5,T[i]))
Hope this is helpful