I have a simple dataset of X and Y values I am plotting in matplotlib. The independent variable in my data is a duration/timedelta (e.g. 60 seconds, 2 hours, 24 hours, 10 days), which in my input data is always represented as an integer number of seconds. My question is, does matplotlib have any way of setting the duration axis labels intelligently, in a human readable form?
For example, at the small end of the scale, it would be desirable to show say 30 seconds simply as "30 seconds". At the large end of the scale, it would be nicer to show "10 days" rather than 864000 seconds. Somewhere in between, it would be better to have the labels read in "minutes" and "hours". Does matplotlib have any automated way of inferring something approximately human readable for durations spanning several orders of magnitude?
Ideally whatever approach I use should generalize to datasets that span different duration timescales, rather than a plot that is individually tailored to one input dataset.
Could you provide an example? Is this what you want:
import datetime
import numpy as np
import pylab as plt
import matplotlib
fig = plt.figure()
ax = fig.add_subplot(111)
x = np.linspace(0, 300) # 5 minutes
y = np.random.random(len(x))
ax.plot(x, y)
def timeTicks(x, pos):
d = datetime.timedelta(seconds=x)
return str(d)
formatter = matplotlib.ticker.FuncFormatter(timeTicks)
ax.xaxis.set_major_formatter(formatter)
plt.show()
It uses pythons timedelta. With 864000 seconds the above will result in "10 days, 10:00:00". You can of course stuff more advanced formatting into the timeTicks() function above.
Related
Both x and y scale values in this plot should be multiplied by 100.So that x-axis and y-axis values should be 0 2500 5000 7500 10000 12500 15000 17500 20000
My plot remains unchanged even after multiplying the data by that factor.
Input data https://www.file.io/GAYM/download/dDB51UAJdAG5
My code
import matplotlib.pyplot as plt
import numpy as np
data=np.loadtxt("input.txt")
plt.imshow(100*data,cmap='jet', interpolation='none')
plt.show()
Multiplying the data doesn't help because your data only includes the pixel values but not the coordinates.
A quick and dirty hack is to modify the tick formatter:
def formatter(x, pos):
del pos
return str(x*100)
ax.xaxis.set_major_formatter(formatter)
ax.yaxis.set_major_formatter(formatter)
A more proper way is to use the extent keyword argument as explained in this tutorial but I don't have time right now to tailor that to turn that into an answer to your question. Maybe someone else (or you) does.
I have a chart created from a pandas DataFrame that looks like this:
I've formatted the ticks with:
ax = df.plot(kind='bar')
ax.set_xticklabels(df.index.strftime('%I %p'))
However, I'd like to add a second set of larger ticks, to achieve this kind of effect:
I've tried many variations of use set_major_locator and set_major_formatter (as well as combining major and minor formatter), but it seems I'm not approaching it correctly and I wasn't able to find useful examples of similar combined ticks online either.
Does someone have a suggestion on how to achieve something similar to the bottom image?
The dataframe has a datetime index and is binned data, from something like df.resample(bin_size, label='right', closed='right').sum())
One idea is to set major ticks to display the date (%-d-%b) at noon each day with some padding (e.g., pad=40). This will leave a minor tick gap at noon, so for consistency you could set minor ticks only on the odd hours and give them rotation=90.
Note that this uses matplotlib's bar() since pandas' plot.bar() doesn't play well with the date formatting.
import matplotlib.dates as mdates
# toy data
dates = pd.date_range('2021-08-07', '2021-08-10', freq='1H')
df = pd.DataFrame({'date': dates, 'value': np.random.randint(10, size=len(dates))}).set_index('date')
# pyplot bar instead of pandas bar
fig, ax = plt.subplots(figsize=(14, 4))
ax.bar(df.index, df.value, width=0.02)
# put day labels at noon
ax.xaxis.set_major_locator(mdates.HourLocator(byhour=[12]))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%-d-%b'))
ax.xaxis.set_tick_params(which='major', pad=40)
# put hour labels on odd hours
ax.xaxis.set_minor_locator(mdates.HourLocator(byhour=range(1, 25, 2)))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%-I %p'))
ax.xaxis.set_tick_params(which='minor', pad=0, rotation=90)
# add day separators at every midnight tick
ticks = df[df.index.strftime('%H:%M:%S') == '00:00:00'].index
arrowprops = dict(width=2, headwidth=1, headlength=1, shrink=0.02)
for tick in ticks:
xy = (mdates.date2num(tick), 0) # convert date index to float coordinate
xytext = (0, -65) # draw downward 65 points
ax.annotate('', xy=xy, xytext=xytext, textcoords='offset points',
annotation_clip=False, arrowprops=arrowprops)
I generate plots like below:
from pylab import *
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
import matplotlib.ticker as ticker
rcParams['axes.linewidth'] = 2 # set the value globally
rcParams['font.size'] = 16# set the value globally
rcParams['font.family'] = ['DejaVu Sans']
rcParams['mathtext.fontset'] = 'stix'
rcParams['legend.fontsize'] = 24
rcParams['axes.prop_cycle'] = cycler(color=['grey','b','g','r','orange'])
rc('lines', linewidth=2, linestyle='-',marker='o')
rcParams['axes.xmargin'] = 0
rcParams['axes.ymargin'] = 0
t = arange(0,21,1)
v = 2.0
s = v*t
plt.figure(figsize=(12, 4))
plt.plot(t,s,label='$s=%1.1f\cdot t$'%v)
plt.title('Wykres drogi w czasie $s=v\cdot t$')
plt.xlabel('Czas $t$, s')
plt.ylabel('Droga $s$, m')
plt.autoscale(enable=True, axis='both', tight=None)
legend(loc='best')
plt.xlim(min(t),max(t))
plt.ylim(min(s),max(s))
plt.grid()
plt.show()
When I am changing the value t = arange(0,21,1) for example to t = arange(0,20,1) which gives me for example on the x axis max value= 19.0 my max value dispirs from the x axis. The same situation is of course with y axis.
My question is how to force matplotlib to produce always plots where on the axes are max values just at the end of the axes like should be always for my purposes or should be possible to chose like an option?
Imiage from my program in Fortan I did some years ago
Matplotlib is more efficiens that I use it but there should be an opition like that (the picture above).
In this way I can always observe max min in text windows or do take addiional steps to make sure about max min values. I would like to read them from axes and the question is ...Are there such possibilites in mathplotlib ??? If not I will close the post.
Axes I am thinking about more or less
I see two ways to solve the problem.
Set the axes automatic limit mode to round numbers
In the rcParams you can do this with
rcParams['axes.autolimit_mode'] = 'round_numbers'
And turn off the manual axes limits with min and max
plt.xlim(min(t),max(t))
plt.ylim(min(s),max(s))
This will produce the image below. Still, the extreme values of the axes are shown at the nearest "round numbers", but the user can approximately catch the data range limits. If you need the exact value to be displayed, you can see the second solution which cannot be directly used from the rcParams.
or – Manually generate axes ticks
This solution implies explicitly asking for a given number of ticks. I guess there is a way to automatize it depending on the axes size etc. But if you are dealing with more or less every time the same graph size, you can decide a fixed number of ticks manually. This can be done with
plt.xlim(min(t),max(t))
plt.ylim(min(s),max(s))
plt.xticks(np.linspace(t.min(), t.max(), 7)) # arbitrary chosen
plt.yticks(np.linspace(s.min(), s.max(), 5)) # arbitrary chosen
generated the image below, quite similar to your image example.
I am trying to plot multiple time series in one plot. The scales are different, so they need separate y-axis, and I want a specific time series to have its y-axis on the right. I also want that time series to be behind the others. But I find that when I use secondary_y=True, this time series is always brought to the front, even if the code to plot it comes before the others. How can I control the order of the plots when using secondary_y=True (or is there an alternative)?
Furthermore, when I use secondary_y=True the y-axis on the left no longer adapts to appropriate values. Is there a fixed for this?
# imports
import numpy as np
import matplotlib.pyplot as plt
# dummy data
lenx = 1000
x = range(lenx)
np.random.seed(4)
y1 = np.random.randn(lenx)
y1 = pd.Series(y1, index=x)
y2 = 50.0 + y1.cumsum()
# plot time series.
# use ax to make Pandas plot them in the same plot.
ax = y2.plot.area(secondary_y=True)
y1.plot(ax=ax)
So what I would like is to have the blue area plot behind the green time series, and to have the left y-axis take appropriate values for the green time series:
https://i.stack.imgur.com/6QzPV.png
Perhaps something like the following using matplotlib.axes.Axes.twinx instead of using secondary_y, and then following the approach in this answer to move the twinned axis to the background:
# plot time series.
fig, ax = plt.subplots()
y1.plot(ax=ax, color='green')
ax.set_zorder(10)
ax.patch.set_visible(False)
ax1 = ax.twinx()
y2.plot.area(ax=ax1, color='blue')
I have never worked with audio signals before and little do I know about signal processing. Nevertheless, I need to represent and audio signal using pyplot.specgram function from matplotlib library. Here is how I do it.
import matplotlib.pyplot as plt
import scipy.io.wavfile as wavfile
rate, frames = wavfile.read("song.wav")
plt.specgram(frames)
The result I am getting is this nice spectrogram below:
When I look at x-axis and y-axis which I suppose are frequency and time domains I can't get my head around the fact that frequency is scaled from 0 to 1.0 and time from 0 to 80k.
What is the intuition behind it and, what's more important, how to represent it in a human friendly format such that frequency is 0 to 100k and time is in sec?
As others have pointed out, you need to specify the sample rate, else you get a normalised frequency (between 0 and 1) and sample index (0 to 80k). Fortunately this is as simple as:
plt.specgram(frames, Fs=rate)
To expand on Nukolas answer and combining my Changing plot scale by a factor in matplotlib
and
matplotlib intelligent axis labels for timedelta
we can not only get kHz on the frequency axis, but also minutes and seconds on the time axis.
import matplotlib.pyplot as plt
import scipy.io.wavfile as wavfile
cmap = plt.get_cmap('viridis') # this may fail on older versions of matplotlib
vmin = -40 # hide anything below -40 dB
cmap.set_under(color='k', alpha=None)
rate, frames = wavfile.read("song.wav")
fig, ax = plt.subplots()
pxx, freq, t, cax = ax.specgram(frames[:, 0], # first channel
Fs=rate, # to get frequency axis in Hz
cmap=cmap, vmin=vmin)
cbar = fig.colorbar(cax)
cbar.set_label('Intensity dB')
ax.axis("tight")
# Prettify
import matplotlib
import datetime
ax.set_xlabel('time h:mm:ss')
ax.set_ylabel('frequency kHz')
scale = 1e3 # KHz
ticks = matplotlib.ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x/scale))
ax.yaxis.set_major_formatter(ticks)
def timeTicks(x, pos):
d = datetime.timedelta(seconds=x)
return str(d)
formatter = matplotlib.ticker.FuncFormatter(timeTicks)
ax.xaxis.set_major_formatter(formatter)
plt.show()
Result:
Firstly, a spectrogram is a representation of the spectral content of a signal as a function of time - this is a frequency-domain representation of the time-domain waveform (e.g. a sine wave, your file "song.wav" or some other arbitrary wave - that is, amplitude as a function of time).
The frequency values (y-axis, Hertz) are wholly dependant on the sampling frequency of your waveform ("song.wav") and will range from "0" to "sampling frequency / 2", with the upper limit being the "nyquist frequency" or "folding frequency" (https://en.wikipedia.org/wiki/Aliasing#Folding). The matplotlib specgram function will automatically determine the sampling frequency of the input waveform if it is not otherwise specified, which is defined as 1 / dt, with dt being the time interval between discrete samples of the waveform. You can can pass the option Fs='sampling rate' to the specgram function to manually define what it is. It will be easier for you to get your head around what is going on if you figure out and pass these variables to the specgram function yourself
The time values (x-axis, seconds) are purely dependent on the length of your "song.wav". You may notice some whitespace or padding if you use a large window length to calculate each spectra slice (think- the individual spectra which are arranged vertically and tiled horizontally to create the spectrogram image)
To make the axes more intuitive in the plot, use x- and y-axes labels and you can also scale the axes values (i.e. change the units) using a method similar to this
Take home message - try to be a bit more verbose with your code: see below for my example.
import matplotlib.pyplot as plt
import numpy as np
# generate a 5Hz sine wave
fs = 50
t = np.arange(0, 5, 1.0/fs)
f0 = 5
phi = np.pi/2
A = 1
x = A * np.sin(2 * np.pi * f0 * t +phi)
nfft = 25
# plot x-t, time-domain, i.e. source waveform
plt.subplot(211)
plt.plot(t, x)
plt.xlabel('time')
plt.ylabel('amplitude')
# plot power(f)-t, frequency-domain, i.e. spectrogram
plt.subplot(212)
# call specgram function, setting Fs (sampling frequency)
# and nfft (number of waveform samples, defining a time window,
# for which to compute the spectra)
plt.specgram(x, Fs=fs, NFFT=nfft, noverlap=5, detrend='mean', mode='psd')
plt.xlabel('time')
plt.ylabel('frequency')
plt.show()
5Hz_spectrogram: