mplfinance moving average gaps and exponential moving averages - pandas

I am printing moving averages on a mplfinance plot, as expected there are gaps.
On most charting software, i.e. TradingView etc, they do not have gaps on the moving averages - and presume they are pulling the data from previous -n elements (even with a discontinuous jump they accept this).
I have two questions please:
How can I run a moving average without a gap (understanding it would
be skewed within n elements of the discontinuity)... i.e. pull in the day prior and use this for moving average calculation but do not display that day (so that the moving average will already be running on the left hand side of the plot - for below that would be startng at Dec 21st)?
If i wanted to calculate this moving average outside of mplfinance internal function (or change to exponential moving average etc) how would I go about adding this as a separate plot on top of the candlesticks?
And my code is below:
import mplfinance as mpf
import pandas as pd
from polygon import RESTClient
import yfinance as yf
import datetime
start = datetime.date(2021,12,21)
end = datetime.date(2021,12,23)
yfResults = yf.download("AAPL", start=start, end=end, period='1d', interval='5m')
mpf.plot(yfResults, type='candlestick', xrotation=0, style='yahoo', tight_layout=True, volume=True, mav=(9, 20), figratio=(48,24))

As you have implied, those systems that show no gap at the beginning of the moving average do so by using data prior to the data displayed as part of the moving average calculation. You can accomplish the same thing by setting kwarg xlim=(min,max) in your call to mpf.plot() by setting min equal to one less than your largest moving average, and max=len(data) ... so for example given your code above, do:
mpf.plot( yfResults, type='candlestick', xrotation=0, style='yahoo',
tight_layout=True, volume=True, mav=(9, 20), figratio=(48,24),
xlim=(19,len(yfResults)) )
You can calculate and plot any additional data using the mpf.make_addplot() api and the addplot kwarg. For further details, see https://github.com/matplotlib/mplfinance/blob/master/examples/addplot.ipynb

Related

what kind of moving average will be drawn when we use (mpl.finance mav) function

My question is when we use mplfinance mpf.plot function to draw candlestick and use the mav function to draw the moving average. What kind of moving average it is? Is it Exponential moving average or simple moving average?
fig, axlist = mpf.plot(daily,type='candle',mav=(20),volume=True, style='blueskies',returnfig=True)
mplfinance mpf.plot(data,...,mav=...) does a simple moving average (although it also allows for a shift in the moving average).
The code for the moving average is here.
Specifically, the actual calulation is on this line of code.
It should not be very difficult to modify the code to allow for other types of moving averages, particually if you want to contribute.
That said, in the menatime, alternative moving averages can be calulated externally and plotted with mpf.make_addplot().

How to plot timeseries with many NaNs?

Originally I had a dataframe containing power consumption of some devices like this:
and I wanted to plot power consumption vs time for different devices, one plot per one of 6 possible dates. After grouping by date I got plots like this one (for each group = date):
Then I tried to create similar plot, but switch date and device roles so that it is grouped by device and colored by date. In order to do it I prepared this dataframe:
It is similar to the previous one, but has many NaN values due to differing measurement times. I thought it won't be a problem, but then after grouping by device, subplots look like this one (ex is just a name of sub-dataframe extracted from loop going through groups = devices):
This is the ex dataframe (mean lag between observations is around 20 seconds)
Question: What should I do to make plot grouped by device look like ones grouped by date? (I'd like to use ex dataframe but handle NaNs somehow.)
I found solution in answer to similar question: ex.interpolate(method='linear').plot(). This line will fill gaps between data points via interpolation between plotting. This is the result:
Another thing that can help is adding .plot(marker='o', ms = 3) which won't fill gaps between points, but at least will make points visible (previously some points, mainly the peaks in energy consumption were too small in scale of whole plot). This is the result:

"Zoom in" on a violinplot whilst keeping accurate quartile lines (matplotlib/seaborn)

TL;DR: How can I get a subrange of a violinplot whilst keeping accurate quartile lines?
I am using seaborn violinplots to make static charts for a report, but as far as I can tell, there's no way to redraw a particular area between limits whilst retaining the 25/median/75 quartile lines of the original dataset.
Here's my example dataset as a violin. The 25/median/75 values are left side: 1.0/5.0/9.0; right side: 2.0/5.0/9.0
My data has such a long tail that all the useful info is scrunched up into a tiny area. I want to ignore (but not throw away) the tail and show a closer look at the interesting bit.
I tried to reset the ylim using ax.set(ylim=(0, upp)), but the resultant graph is not great: it's jaggy and the inner lines don't meet the violin edge.
Is there a way to reset the y-axis limits but get a better quality result?
Next I tried to cut off the tail by dropping values from the dataset. I dropped anything over the 97th centile. The violin looks way better, but the quartile lines have been recalculated for this new dataset. They're showing a median of about 4, not 5 as per the original dataset.
I'm using inner="quartile", so the code that gets called in Seaborn is _ViolinPlotter::draw_quartiles
def draw_quartiles(self, ax, data, support, density, center, split=False):
"""Draw the quartiles as lines at width of density."""
q25, q50, q75 = np.percentile(data, [25, 50, 75])
self.draw_to_density(ax, center, q25, support, density, split,
linewidth=self.linewidth,
dashes=[self.linewidth * 1.5] * 2)
As you can see, it assumes (understandably) that one wants to draw the quartile lines at percentiles 25, 50 and 75. It'd be amazeballs if there was a way I could call draw_to_density with my own values (is there?).
At the moment, I am attempting to manually adjust the position of the lines. It's trivial to figure out & set the y-values:
for l in ax.lines:
l.set_ydata(<get correct quartile value from original dataset>)
but I'm finding it hard to figure out the limits for x, i.e. the density of the distribution at the quartiles. It seems to involve gaussian kde, and tbh it's getting hacky and inelegant at this point. Is there an easy way to calculate how long each line should be?
What do you suggest?
Thanks for your help
Lnr
W/ Thanks to #JohanC.
added gridsize=1000 to the params of the violinplot and used ax.set(ylim=(0, upp)) to resize the y-axis to show the range from 0 to upp where upp is the upper limit. Much prettier lookin' graph:

Is there a way of using Pandas or Matplotlib to plot Pandas Time Series density?

I am having a hard time of plotting the density of Pandas time series.
I have a data frame with perfectly organised timestamps, like below:
It's a web log, and I want to show the density of the timestamp, which indicates how many visitors in certain period of time.
My solution atm is extracting the year, month, week and day of each timestamp, and group them. Like below:
But I don't think it would be a efficient way of dealing with time. And I couldn't find any good info on this, more of them are about plot the calculated values on a date or something.
So, anybody have any suggestions on how to plot Pandas time series?
Much appreciated!
The best way to compute the values you want to plot is to use Series.resample; for example, to aggregate the count of dates daily, use this:
ser = pd.Series(1, index=dates)
ser.resample('D').sum()
The documentation there has more details depending on exactly how you want to resample & aggregate the data.
If you want to plot the result, you can use Pandas built-in plotting capabilities; for example:
ser.resample('D').sum().plot()
More info on plotting is here.

Setting the axis custom limits matplotlib dataframe

Across a list of dataframes (dflist), each showing some sensor readings in a 24 hour window, I am setting the y axis limits for these readings in matplotlib.
axes[3].set_ylim(dflist[day]['AS_%s_WE_%d(mv)' %(gas,sensor)].min(),dflist[day]['AS_%s_WE_%d(mv)' %(gas,sensor)].max())
So for each df in my list, a graph is produced. Unfortunately the first 10 minutes of readings throws of the scale dramatically, and I can't interpret the readings.
Now, for each df, instead of setting the minimum sensor reading as the ymin, could I tell the df to ignore the first 10 minutes (which is the first 10 readings, as I have 1 minute a reading) and take the min in the rest of the data?
You can use a boolean mask in pandas that filters out undesired values.
You didn't provide the structure of your dataframe, so I'm just writing something that gives you the right idea:
dflist[day[day['minute'] > 10]]['AS_%s_WE_%d(mv)' %(gas,sensor)].min()
Essentially you are indexing each row of day with a boolean value that is mapped to the dataframe using a conditional expression.