Scale domain vs filter selection in vega-lite: automatic axis scaling - vega

In Scale Domains docs of Vega-Lite it is noted:
An alternate way to construct this technique would be to filter out
the input data to the top (detail) view like so:
{
"vconcat": [{
"transform": [{"filter": {"selection": "brush"}}],
...
}]
}
Which is indeed almost the same (although filter method being much slower, as noted in the docs), except for one difference:
With filter-selection method (demo), the y-axis of the upper chart will be automatically zoomed in to the selected points. This is pretty neat, especially if you have large amount of points.
With scale-domain method (demo), the y-axis remains frozen as you move the selection around.
The question: is it possible to have the y-axis automatically zoom in to the selected points as you move the selection, with "scale domain" method (same as it does with filter-selection method)?
Why is the above difference important? Imagine a stock price that has been increasing on average by a total of $1 every day last year (but within a particular day it may have experinced any kind of volatile behaviour) and we're plotting it with line marks. If you plot the entire year, you see the whole picture. If you zoom in on a particular day without resetting your y-axis zoom, however, your intraday price plot will be just a flat line, or close to that.
// I've checked all scale-domain-related issues on vega-lite, on altair repo and SO and couldn't find anything related; I've also posted this question on vega-lite repo on GH, but was forwarded over to SO.

No. Unless otherwise specified, the y scale is determined from all of the data within the plot.
When you filter the data, the data in the plot changes, which causes the y axis to change. When you change the scale based on an x-selection without filtering the data, it does not change the data in the plot, and so the y scale remains constant.
If you want the y-scale to be determined automatically based on the data within the selection, the only option is to filter on that selection.

Related

"Zoom in" on a violinplot whilst keeping accurate quartile lines (matplotlib/seaborn)

TL;DR: How can I get a subrange of a violinplot whilst keeping accurate quartile lines?
I am using seaborn violinplots to make static charts for a report, but as far as I can tell, there's no way to redraw a particular area between limits whilst retaining the 25/median/75 quartile lines of the original dataset.
Here's my example dataset as a violin. The 25/median/75 values are left side: 1.0/5.0/9.0; right side: 2.0/5.0/9.0
My data has such a long tail that all the useful info is scrunched up into a tiny area. I want to ignore (but not throw away) the tail and show a closer look at the interesting bit.
I tried to reset the ylim using ax.set(ylim=(0, upp)), but the resultant graph is not great: it's jaggy and the inner lines don't meet the violin edge.
Is there a way to reset the y-axis limits but get a better quality result?
Next I tried to cut off the tail by dropping values from the dataset. I dropped anything over the 97th centile. The violin looks way better, but the quartile lines have been recalculated for this new dataset. They're showing a median of about 4, not 5 as per the original dataset.
I'm using inner="quartile", so the code that gets called in Seaborn is _ViolinPlotter::draw_quartiles
def draw_quartiles(self, ax, data, support, density, center, split=False):
"""Draw the quartiles as lines at width of density."""
q25, q50, q75 = np.percentile(data, [25, 50, 75])
self.draw_to_density(ax, center, q25, support, density, split,
linewidth=self.linewidth,
dashes=[self.linewidth * 1.5] * 2)
As you can see, it assumes (understandably) that one wants to draw the quartile lines at percentiles 25, 50 and 75. It'd be amazeballs if there was a way I could call draw_to_density with my own values (is there?).
At the moment, I am attempting to manually adjust the position of the lines. It's trivial to figure out & set the y-values:
for l in ax.lines:
l.set_ydata(<get correct quartile value from original dataset>)
but I'm finding it hard to figure out the limits for x, i.e. the density of the distribution at the quartiles. It seems to involve gaussian kde, and tbh it's getting hacky and inelegant at this point. Is there an easy way to calculate how long each line should be?
What do you suggest?
Thanks for your help
Lnr
W/ Thanks to #JohanC.
added gridsize=1000 to the params of the violinplot and used ax.set(ylim=(0, upp)) to resize the y-axis to show the range from 0 to upp where upp is the upper limit. Much prettier lookin' graph:

Optimal display for overlapping series in a line chart

In a context of a line chart displaying time data in regular intervals where multiple series might overlap what would be the optimal way to:
A) hint the user that the chart has overlapping series?
B) give the user the capability to visualize all those series? Like spanning the series somehow?
For overlapping series in a line chart, I would keep the traditional line chart but put a label at the end of the graph with a color legend. The legend and label will help the user get information quickly.
Another version of a line chart for overlapping series can be a line area chat.
If you are not stuck on only line charts, I would suggest a bar chart. Below are some examples that you can use.
Example 1:
Example 2:
Example 3:
There are couple ways to indicate that there are overlapping series on a chart. You can increase the marker radius of one of them. The number of legend elements tells you how many series there is, too. Finally, you can distribute series on a different yAxis, with different top and height properties. Also, in styled mode, when you hover on legend item, other series opacity changes.
API Reference:
http://api.highcharts.com/highcharts/plotOptions.line.marker.radius
Examples:
http://jsfiddle.net/whsgpdyw/ - changing marker radius
http://jsfiddle.net/fuq6j4sg/ - each series on a different yAxis

How to make gnuplot generate figures with smaller/fized size (Bytes)?

I would like to avoid using every command since it simply discards data that, however, might be very important (like a spike for instance). I would like also to avoid posterior downsizing since this might lead to the deterioration of the text on the figure...
Is there a manner/option to force gnuplot generating files (eps) with maximum size?
You'd need some adaptive compression on your data. Without actually knowing it, that's rather tough.
The stats command can tell you how many datapoints you actually have, and you can then adjust the every statement to a sensible value. Otherwise, you can use smooth to achieve a predefined (set sample) number of datapoints, or (if you have a sensible model for you data) you can do a fit and simply plot the fitted model function instead of you dataset.
If you specifically want outliers to show in the plot, this might be helpful:
fit f(x) data via *parameter*
plot f(x), data using ((abs($2-f($1)) > threshold) ? $2 : NaN)
It plots a fit to your dataset, and all actual datapoints that deviate from the fit by more than threshold.

Programmatically set drawCircle on a per value basis on LineChart

Is there a way to toggle drawCircle on individual data points? From the documentation it appears you can only enable this at the dataset level. I'm trying to draw circles for the min/max of a dataset.
If there isn't a way to draw circles on individual data points another workaround I've tried implementing has been to create a second dataset and only plot the corresponding X index and min/max Y value; every other value is set small enough that it is not visible in the viewport. The only problem here is I can't seem to plot both datasets without overwriting the first dataset's background fill.

Layered, not stacked column graph in Excel

I want to layer (superimpose) one column graph on another in Excel. So it would be like a stacked column graph, except that each column for a given category on the x-axis would have its origin at 0 on the y-axis. My data are before and after scores. By layering the columns instead of putting them side-by-side, it would be easier to visualize the magnitude and direction of the difference between the two scores. I've seen this done with R, but can't find examples in Excel. Anyone ever attempted this?
I tried the 3D suggestion and it worked. But the other answer I discovered was to choose a Clustered Column graph and click 'Format Data Series' and change the 'overlap' percentage to 100%. I'm using a Mac so it's slightly different, but the person who helped me with this was on a PC and I've used PC's mainly. What I ended up discovering is that using 90% looked quite nice, but 100% will achieve what you're looking for.
I did the same thing for my thesis presentation. It's a little tricky and I made it by myself. To do it, you have to create a 3D bar graph (not a stacked one), in which your columns are put in front of each other. You have to make sure that all the taller columns in each X cell are behind the shorter columns in that cell on the X axis.
Once you created that graph, you can rotate the 3D graph in a way that it looks like a 2D graph (by resetting the axes values to zero). Now you have a bar graph, in which every bar has different columns and all of the columns start at zero. ;)
Short answer: Change the post score to (post - pre), then you can proceed with making the stacked bar chart.
Long and correct answer: DO NOT DO THIS. Clustered bar chart is much better because:
The visual line for comparison is the same line anyway, you're not facilitating the understanding in any means.
Any kind of overlapping of the bars conceals the area of the post-score, which induces visual distortion. A pre-score of 10 and a post score of 20 should have a column area ratio of 1:2. But if you completely overlap them, it'd be reduced to 1:1. Partial overlapping is equally problematic.