Layered, not stacked column graph in Excel - data-visualization

I want to layer (superimpose) one column graph on another in Excel. So it would be like a stacked column graph, except that each column for a given category on the x-axis would have its origin at 0 on the y-axis. My data are before and after scores. By layering the columns instead of putting them side-by-side, it would be easier to visualize the magnitude and direction of the difference between the two scores. I've seen this done with R, but can't find examples in Excel. Anyone ever attempted this?

I tried the 3D suggestion and it worked. But the other answer I discovered was to choose a Clustered Column graph and click 'Format Data Series' and change the 'overlap' percentage to 100%. I'm using a Mac so it's slightly different, but the person who helped me with this was on a PC and I've used PC's mainly. What I ended up discovering is that using 90% looked quite nice, but 100% will achieve what you're looking for.

I did the same thing for my thesis presentation. It's a little tricky and I made it by myself. To do it, you have to create a 3D bar graph (not a stacked one), in which your columns are put in front of each other. You have to make sure that all the taller columns in each X cell are behind the shorter columns in that cell on the X axis.
Once you created that graph, you can rotate the 3D graph in a way that it looks like a 2D graph (by resetting the axes values to zero). Now you have a bar graph, in which every bar has different columns and all of the columns start at zero. ;)

Short answer: Change the post score to (post - pre), then you can proceed with making the stacked bar chart.
Long and correct answer: DO NOT DO THIS. Clustered bar chart is much better because:
The visual line for comparison is the same line anyway, you're not facilitating the understanding in any means.
Any kind of overlapping of the bars conceals the area of the post-score, which induces visual distortion. A pre-score of 10 and a post score of 20 should have a column area ratio of 1:2. But if you completely overlap them, it'd be reduced to 1:1. Partial overlapping is equally problematic.

Related

How to zoom in data points without changing x values?

I have four sets of data which have been created in dictionaries. After plotting them in one graph it is hard to clearly differentiate the point.
I put all of the data frames in one graph. My intention is to zoom in data points without changing x values that one can see the difference.
Please, help me in this regard. Thank you!
Graph Describes 4 sets of data

"Zoom in" on a violinplot whilst keeping accurate quartile lines (matplotlib/seaborn)

TL;DR: How can I get a subrange of a violinplot whilst keeping accurate quartile lines?
I am using seaborn violinplots to make static charts for a report, but as far as I can tell, there's no way to redraw a particular area between limits whilst retaining the 25/median/75 quartile lines of the original dataset.
Here's my example dataset as a violin. The 25/median/75 values are left side: 1.0/5.0/9.0; right side: 2.0/5.0/9.0
My data has such a long tail that all the useful info is scrunched up into a tiny area. I want to ignore (but not throw away) the tail and show a closer look at the interesting bit.
I tried to reset the ylim using ax.set(ylim=(0, upp)), but the resultant graph is not great: it's jaggy and the inner lines don't meet the violin edge.
Is there a way to reset the y-axis limits but get a better quality result?
Next I tried to cut off the tail by dropping values from the dataset. I dropped anything over the 97th centile. The violin looks way better, but the quartile lines have been recalculated for this new dataset. They're showing a median of about 4, not 5 as per the original dataset.
I'm using inner="quartile", so the code that gets called in Seaborn is _ViolinPlotter::draw_quartiles
def draw_quartiles(self, ax, data, support, density, center, split=False):
"""Draw the quartiles as lines at width of density."""
q25, q50, q75 = np.percentile(data, [25, 50, 75])
self.draw_to_density(ax, center, q25, support, density, split,
linewidth=self.linewidth,
dashes=[self.linewidth * 1.5] * 2)
As you can see, it assumes (understandably) that one wants to draw the quartile lines at percentiles 25, 50 and 75. It'd be amazeballs if there was a way I could call draw_to_density with my own values (is there?).
At the moment, I am attempting to manually adjust the position of the lines. It's trivial to figure out & set the y-values:
for l in ax.lines:
l.set_ydata(<get correct quartile value from original dataset>)
but I'm finding it hard to figure out the limits for x, i.e. the density of the distribution at the quartiles. It seems to involve gaussian kde, and tbh it's getting hacky and inelegant at this point. Is there an easy way to calculate how long each line should be?
What do you suggest?
Thanks for your help
Lnr
W/ Thanks to #JohanC.
added gridsize=1000 to the params of the violinplot and used ax.set(ylim=(0, upp)) to resize the y-axis to show the range from 0 to upp where upp is the upper limit. Much prettier lookin' graph:

Scale domain vs filter selection in vega-lite: automatic axis scaling

In Scale Domains docs of Vega-Lite it is noted:
An alternate way to construct this technique would be to filter out
the input data to the top (detail) view like so:
{
"vconcat": [{
"transform": [{"filter": {"selection": "brush"}}],
...
}]
}
Which is indeed almost the same (although filter method being much slower, as noted in the docs), except for one difference:
With filter-selection method (demo), the y-axis of the upper chart will be automatically zoomed in to the selected points. This is pretty neat, especially if you have large amount of points.
With scale-domain method (demo), the y-axis remains frozen as you move the selection around.
The question: is it possible to have the y-axis automatically zoom in to the selected points as you move the selection, with "scale domain" method (same as it does with filter-selection method)?
Why is the above difference important? Imagine a stock price that has been increasing on average by a total of $1 every day last year (but within a particular day it may have experinced any kind of volatile behaviour) and we're plotting it with line marks. If you plot the entire year, you see the whole picture. If you zoom in on a particular day without resetting your y-axis zoom, however, your intraday price plot will be just a flat line, or close to that.
// I've checked all scale-domain-related issues on vega-lite, on altair repo and SO and couldn't find anything related; I've also posted this question on vega-lite repo on GH, but was forwarded over to SO.
No. Unless otherwise specified, the y scale is determined from all of the data within the plot.
When you filter the data, the data in the plot changes, which causes the y axis to change. When you change the scale based on an x-selection without filtering the data, it does not change the data in the plot, and so the y scale remains constant.
If you want the y-scale to be determined automatically based on the data within the selection, the only option is to filter on that selection.

How to draw an outline of a group of multiple rectangles?

I need to draw an enclosing polygon of a group of rectangles that are placed next to each other.
Let's think of text fields that share at least one edge (or part of it) with at least one of the other rectangles.
I can get the rectangles points coordinates, and so I basically have any data I need about them.
Can you think of a simple algorithm / procedure to draw a polygon (connected straight paths) around these objects.
Here's a demonstration of different potential cases (A, B, C, etc...). In example A I also drew a blue polygon which is the path that I need to draw, outlining the group of rectangles.
I've read here about convex hull and stuff like that but really, this looks like a far simpler problem.
One (beginning of) solution I thought of was that the points I actually need to draw through are only ones that are not shared by any pair of rectangles, meaning points that are vertices of more than one rectangle are redundant. What I couldn't find out was the order by which I need to draw lines from one to the next.
I currently work on objective c, but any other language or algo would be appreciated, including pseudo.
Thanks!
IMHO it should be like this. Make a list of edged and see if some are overlaying: This should be simple if the rectangles are aligned with the x,y axis. You just find the edges that have the vertexes on the same x or y and the other coordinates need to be in between. After this the remaining edges should form the outline.
Another method to find common edges is to break all rectangles along each x and y axis where you have vertices. This should look as if you are growing all lines to infinity. After this all common edges will have common vertices and can be eliminated.
You have two rows, and three different y-values. Let's say y0 is the top of the thing, y2 is the bottom end, and y1 marks the middle between both rows.
Each row has a maximum and a minimum x-value, let's say the top-row goes from x0_min to x0_max, and the bottom row from x2_min to x2_max. Given those values you just draw around the thing:
(x0_min,y0)->
(x0_max,y0)->
(x0_max,y1)->
(x2_max,y1)->
(x2_max,y2)->
(x2_min,y2)->
(x2_min,y1)->
(x0_min,y1)->
(x0_min,y0)

Plot variable size/color-heatmap for mulitple occurences of points in scatter plot

I'm stuck with the following problem and I hope I can explain it coherent.
So, I have a number (about 10) of descrete positions on a coordinate system.
Now, I want to analyse data from a program where user could label each point as somethingA and somethingB.
I extracted the data points for each class. So I have about 60 points for the somethingA class and a little bit less for the other class. One class stands for good points and one for bad points. I want to find the positions which have the most good/bad labels. I do that with machine learning algorithms, I just want to visualize this with plots.
I now want to plot those points. So I make one plot per class. But since in every class every point occurs at least once, the two plots would look exactly the same.
But, the amount of occurences has a different distribution thoughout the positions.
Maybe point A has 20 occurences in class A and 1 in class B, both plots would look the same.
So, my question is: How can I take the number of occurences for points into account when plotting scatters in Matplotlib?
Either with different colors (like a heatmap?) maybe with a cool legend.
Or with different sizes (e.g. higher amount = bigger cirlce).
Any help would be appreciated!
I don't know if this helps you but I have had a problem where I wanted a scatterplot to reflect both positions as well as two variables that were attributed to the data points.
Since size and color in the scatter function do not allow variables themselves, meaning one has to specify color code and size in the usual way, meaning sth like
ax.scatter(..., c=whatEverFunction, s=numberOfOccurences, ...)
did not work for me.
what I did was to bin the values of the two variables I wanted to visualize. In my case the variable nodeMass and another variable.
for i in range(Number):
mask[i] = False
if(lowerBound1<variableOne[i]<upperBound1):
mask[i] = True & pmask[i]
if len(positionX[mask])>0:
ax.scatter(positionX[mask], positionY[mask], positionZ[mask],C='#424242',s=10, edgecolors='none')
for i in range(Number):
mask[i] = False
if(lowerBound2<variableOne[i]<upperBound2):
mask[i] = True & pmask[i]
if len(positionX[mask])>0:
ax.scatter(positionX[mask], positionY[mask], positionZ[mask],c='#9E0050',s=25,edgecolors='none')
I know it is not very elegant but it worked for me. I had to make as many for loops as I had bins in my variables. With if-querys and the masks I could at least avoid redundant or 'unreadable' plots.