Vega-Lite how to get nested bar chart? - data-visualization

I'm fairly new to vega-lite. I'd really like to get the following nested bar chart working.
This nested bar chart depicts aggregated values across multiple categories. The input data is subdivided according to two fields (with uneven category membership). Each sub-group is then aggregated to show the average value of a third, quantitative field.
Example on vega:Nested Bar Chart Example
How not to use the row function?
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"values": [
{"a0": 0,"a": 0, "b": "a", "c": 6.3},
{"a0": 0,"a": 0, "b": "a", "c": 4.2},
{"a0": 0,"a": 0, "b": "b", "c": 6.8},
{"a0": 0,"a": 0, "b": "c", "c": 5.1},
{"a0": 0,"a": 1, "b": "b", "c": 4.4},
{"a0": 0,"a": 2, "b": "b", "c": 3.5},
{"a0": 0,"a": 2, "b": "c", "c": 6.2}
]
},
"transform": [
{"window": [{"op": "count", "as": "room2"}]}
],
"vconcat": [
{
"facet": {"column": {"field": "a0"},"row": {"field": "a"}},
"spec": {
"width": 100,
"encoding": {
"y": {"field": "room2", "type": "nominal","axis": null},
"x": {"value": 100, "type": "quantitative"}
},
"layer": [
{
"mark": {"type": "bar", "cornerRadius": 10},
"encoding": {
"color": {
"field": "room2"
}
}
}
]
}
}
]
}

I have converted the Example on vega: Nested Bar Chart Example to vega-lite. So I hope it helps you understand or resolve your case. If you need help with your chart then I would need some proper inputs from your end. Refer the code below or in Editor
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": [
{"a": 0, "b": "a", "c": 6.3},
{"a": 0, "b": "a", "c": 4.2},
{"a": 0, "b": "b", "c": 6.8},
{"a": 0, "b": "c", "c": 5.1},
{"a": 1, "b": "b", "c": 4.4},
{"a": 2, "b": "b", "c": 3.5},
{"a": 2, "b": "c", "c": 6.2}
]
},
"transform": [
{
"aggregate": [{"field": "c", "as": "avg_c", "op": "average"}],
"groupby": ["a", "b"]
}
],
"vconcat": [
{
"facet": {"row": {"field": "a", "title": null, "header": null}},
"spec": {
"width": 300,
"encoding": {
"y": {"field": "b", "type": "nominal", "axis": null},
"x": {"field": "avg_c", "type": "quantitative"}
},
"layer": [
{
"mark": {"type": "bar", "cornerRadius": 10},
"encoding": {"color": {"field": "a"}}
},
{
"mark": {"type": "text", "align": "right", "dx": -5},
"encoding": {
"text": {"field": "b"},
"x": {"datum": "0", "type": "quantitative"}
}
}
]
}
}
]
}

Related

How to use Data Studio bindings in a Vega chart code table?

I would like to implement an example of RoseWid chart in Data Studio using the Vega community visualization, using this great code from Stan Nowak. Once adapted for the Vega plugin, it looks like this:
{"$schema": "https://vega.github.io/schema/vega/v5.json",
"width": 400,
"height": 400,
"data": [
{
"name": "table",
"values": [
{"g": "1", "c": 0, "r": 11},
{"g": 1, "c": 1, "r": 22},
{"g": 1, "c": 2, "r": 13},
{"g": 2, "c": 0, "r": 24},
{"g": 2, "c": 1, "r": 35},
{"g": 2, "c": 2, "r": 36},
{"g": 3, "c": 0, "r": 42},
{"g": 3, "c": 1, "r": 32},
{"g": 3, "c": 2, "r": "32"},
{"g": 4, "c": 0, "r": 6},
{"g": 4, "c": 1, "r": 27},
{"g": 4, "c": 2, "r": 16},
{"g": 5, "c": 0, "r": 52},
{"g": 5, "c": 1, "r": 79},
{"g": 5, "c": 2, "r": 38},
{"g": 6, "c": 0, "r": 19},
{"g": 6, "c": 1, "r": 83},
{"g": 6, "c": 2, "r": 3}
]
},
{
"name": "angles",
"source": "table",
"transform": [
{"type": "aggregate", "groupby": ["g"]},
{"type": "pie"}
]
},{
"name": "stack",
"source": "table",
"transform": [
{"type": "stack", "groupby": ["g"], "sortby": ["c"], "field": "r"},
{"type": "lookup", "from": "angles", "key": "g", "fields": ["g"], "as": ["obj"]}
]
}
],
"scales": [
{
"name": "color",
"type": "linear",
"domain": {"data": "stack", "field": "c"},
"range": {"scheme": "redyellowgreen"}
},
{
"name": "r",
"type": "sqrt",
"domain": {"data": "table", "field": "y"},
"range": [20, 200]
}
],
"marks": [
{
"type": "arc",
"from": {"data": "stack"},
"encode": {
"enter": {
"x": {"field": {"group": "width"}, "mult": 0.5},
"y": {"field": {"group": "height"}, "mult": 0.5},
"startAngle": {"data": "table", "field": "obj.startAngle"},
"endAngle": {"data": "table", "field": "obj.endAngle"},
"innerRadius": {"field": "y0"},
"outerRadius": {"field": "y1"},
"stroke": {"value": "black"}
},
"update": {"fill": {"scale": "color", "field": "c"}},
"hover": {"fill": {"value": "red"}}
}
}
],
"config": {}
}
The dataset has beed arranged in datastudio to have the same structure as the provided sample table ( with 3 columns with the same structure as the inline table). However I´m stuck on how to replace the table with the Data Studio bindings (namely, $dimension0, $dimension1 and $metric0)¨.
So far I tried:
"data": [
{
"name": "table",
"values": ["$dimension0","$dimension1","$metric0"],
"as": ["g","c","r"]
},
....
and some variations on this, all to no result. The visualization keeps blank and there´s little information on what is failing.
Any help would be greatly appreciated.
EDIT : Here's a google sheet that is reproducing the same structure as the table in the inline code, which would be used as data for the Data Studio, and you can find an example of the working visualization & attempts to solve here
First of all, there is an error in your example Vega source: stack transform does not have "sortby". Instead, specify "sort":
"sort": { "field": ["c"],
"order": ["ascending"]
},
View example in Vega online editor
See Vega documentation for stack and compare:
For Google Data Studio, I have no experience using it
but at the "Visualizations" web page, there are examples of using Vega in Data Studio:
"Data Studio Vega Viz" by Jerry Chen
Based on his examples, "$dimension0","$dimension1" and "$metric0" can be used directly as Vega data fields.
Your question is how to use the Vega source that already has field names "g", "c" and "r". One way is to globally replace all "g", "c" and "r" with "$dimension0","$dimension1" and "$metric0" as shown in Jerry Chen's examples.
An alternative approach is to create formula fields "g", "c" and "r" as substitutes for "$dimension0","$dimension1" and "$metric0" as shown below. If this works then it is a general solution for converting any existing Vega spec to Google Data Studio and also for retaining meaningful field names in the Vega spec.
"data": [
{ "name": "table",
"transform": [
{
"type": "formula",
"as": "g",
"expr": "datum['$dimension0']"
},
{
"type": "formula",
"as": "c",
"expr": "datum['$dimension1']"
},
{
"type": "formula",
"as": "r",
"expr": "datum['$metric0']"
}
]
},
{
"name": "angles",
"source": "table",
"transform": [
{"type": "aggregate", "groupby": ["g"]},
{"type": "pie"}
]
}
... etc
I have come across the same issue and managed to solve this after so many trial-and-errors.
To bind with data studio, you have to name your data as
"default" and you can simply rename the fields by transforming with "type" : "project" in Vega.
"data": [
{
"name": "default",
"transform" : [{ "type" : "project",
"fields" : ["$dimension0","$dimension1","$metric0"],
"as": ["g","c","r"]}
]
},
....

Use data from datasets

I am trying to use data from datasets. Here the vega config spec :
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"datasets": {
"data_2": [
{"category": "A", "sex": 1, "people": 1483789},
{"category": "B", "sex": 2, "people": 1450376},
{"category": "C", "sex": 1, "people": 2411067}
],
"data_1": [
{"a": "A", "b": 28},
{"a": "B", "b": 55},
{"a": "C", "b": 43}
]
},
"concat": [
{
"data": {"name": "data_1"},
"encoding": {
"x": {"type": "nominal", "field": "a"},
"y": {"type": "quantitative", "field": "b"}
},
"mark": "bar"
},
{
"data": {"name": "data_2"},
"encoding": {
"color": {"field": "sex"},
"x": {"title": "population", "field": "category", "sort": "color"},
"y": {"type": "quantitative", "field": "people"}
},
"mark": "bar"
}
]
}
It seems that VEGA doesn't find the reference name to the data. data_1 or data_2
Tested in VEGA editor it returns the following message : Duplicate data set name: "data_1"
All advice is welcome
Erwan
I am definitely not a vega-lite superhero. But vega-lite compiles to vega.
You can see what it makes of it in the online editor. At the bottom you can see: compile to Vega. And then you can open that Vega in the online editor. And then you can see it does something with datasets, where it uses the name data_1 for something else.
So I think there is nothing wrong with your vega-lite code, just rename datasets data_1 and data_2 to something else, and it will be fine.

Ordering of a faceted Bar-chart in Vega

I have a question about facet visualizations in vega.
I have a pretty similar problem as the nested bar char example.
Here I changed the order of the tuple values. as intended, the order in the visualization is changing to.
But I want the 3 facets ordered ascending for value "a".
I don't really understand, if there is a "scale" for the facets, or where to order them.
Second part would be to have an axes for the value "a" shown below/above the bar chart.
Hope someone can help me.
Greetings
Christian
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"description": "A nested bar chart example, with bars grouped by category.",
"width": 300,
"padding": 5,
"autosize": "pad",
"signals": [
{
"name": "rangeStep", "value": 20,
"bind": {"input": "range", "min": 5, "max": 50, "step": 1}
},
{
"name": "innerPadding", "value": 0.1,
"bind": {"input": "range", "min": 0, "max": 0.7, "step": 0.01}
},
{
"name": "outerPadding", "value": 0.2,
"bind": {"input": "range", "min": 0, "max": 0.4, "step": 0.01}
},
{
"name": "height",
"update": "trellisExtent[1]"
}
],
"data": [
{
"name": "tuples",
"values": [
{"a": 1, "b": "b", "c": 4.4},
{"a": 0, "b": "c", "c": 5.1},
{"a": 0, "b": "a", "c": 6.3},
{"a": 0, "b": "a", "c": 4.2},
{"a": 0, "b": "b", "c": 6.8},
{"a": 2, "b": "b", "c": 3.5},
{"a": 2, "b": "c", "c": 6.2}
],
"transform": [
{
"type": "aggregate",
"groupby": ["a", "b"],
"fields": ["c"],
"ops": ["average"],
"as": ["c"]
}
]
},
{
"name": "trellis",
"source": "tuples",
"transform": [
{
"type": "aggregate",
"groupby": ["a"]
},
{
"type": "formula", "as": "span",
"expr": "rangeStep * bandspace(datum.count, innerPadding, outerPadding)"
},
{
"type": "stack",
"field": "span"
},
{
"type": "extent",
"field": "y1",
"signal": "trellisExtent"
}
]
}
],
"scales": [
{
"name": "xscale",
"domain": {"data": "tuples", "field": "c"},
"nice": true,
"zero": true,
"round": true,
"range": "width"
},
{
"name": "color",
"type": "ordinal",
"range": "category",
"domain": {"data": "trellis", "field": "a"}
}
],
"axes": [
{ "orient": "bottom", "scale": "xscale", "domain": true }
],
"marks": [
{
"type": "group",
"from": {
"data": "trellis",
"facet": {
"name": "faceted_tuples",
"data": "tuples",
"groupby": "a"
}
},
"encode": {
"enter": {
"x": {"value": 0},
"width": {"signal": "width"}
},
"update": {
"y": {"field": "y0"},
"y2": {"field": "y1"}
}
},
"scales": [
{
"name": "yscale",
"type": "band",
"paddingInner": {"signal": "innerPadding"},
"paddingOuter": {"signal": "outerPadding"},
"round": true,
"domain": {"data": "faceted_tuples", "field": "b"},
"range": {"step": {"signal": "rangeStep"}}
}
],
"axes": [
{ "orient": "left", "scale": "yscale",
"ticks": false, "domain": false, "labelPadding": 4 }
],
"marks": [
{
"type": "rect",
"from": {"data": "faceted_tuples"},
"encode": {
"enter": {
"x": {"value": 0},
"x2": {"scale": "xscale", "field": "c"},
"fill": {"scale": "color", "field": "a"},
"strokeWidth": {"value": 2}
},
"update": {
"y": {"scale": "yscale", "field": "b"},
"height": {"scale": "yscale", "band": 1},
"stroke": {"value": null},
"zindex": {"value": 0}
},
"hover": {
"stroke": {"value": "firebrick"},
"zindex": {"value": 1}
}
}
}
]
}
]
}

multi-histogram plot with Vegalite

I would like to create single visual showing multiple histograms on it. I have simple arrays of values, like so:
"data": {"values": {"foo": [0,0,0,1,1,1,2,2,2], "baz": [2,2,2,3,3,3,4,4,4]}}
I want to use different color bars to show the spread of values for "foo" and "baz". I am able to make a single histogram for "foo" like so:
{
"data": {"values": {"foo": [0,0,0,1,1,1,2,2,2]}},
"mark": "bar",
"transform": [{"flatten": ["foo"]}],
"encoding": {
"x": {"field": "foo", "type": "quantitative"},
"y": {"field": "foo", "type": "quantitative", "aggregate": "count"}
}
}
However, I cannot find the correct way to flatten out the arrays. This doesn't work:
{
"data": {"values": {"foo": [0,0,0,1,1,1,2,2,2], "bar": [0,0,0,1,1,1,2,2,2]}},
"mark": "bar",
"transform": [{"flatten": ["foo", "baz"]}],
"encoding": {
"x": {"field": "foo", "type": "quantitative"},
"y": {"field": "foo", "type": "quantitative", "aggregate": "count"}
},
"layer": [{
"mark": "bar",
"encoding": {
"y": {"field": "baz", "type": "quantitative", "aggregate": "count"}
}
}]
}
https://vega.github.io/editor/#/url/vega-lite/N4IgJghgLhIFygG4QDYFcCmBneoBmA9gfANoAMANJZQIwV10BMFzjAuhSAEYQBep1KvWFMWLNgF8JnALYQATgGt43BSE5R5EAHZZC8maXwpoUDNtIhCxTj36SOIcwGMCYAJbaA5rhAAPXzx3DBQwFWt1ECgATwAHDBUARzQdKHcYNMQE6RBowODQ8KJImPiklO00jPcsyIgvL3kML2gEuBBXNEqQKU4TaIx5IxA5JRUeIc4XN08fBFz8kLD2uxK4tpBk1PToGoTOesbm1pVO7qkJSSA
Inspecting data_0, there is are columns for foo and its counts, but nothing for baz.
This doesn't work, either:
{
"data": {
"values": {
"foo": [0, 0, 0, 1, 1, 1, 2, 2, 2],
"baz": [0, 0, 0, 1, 1, 1, 2, 2, 2]
}
},
"mark": "bar",
"transform": [{"flatten": ["foo"]},{"flatten": ["baz"]}],
"encoding": {
"x": {"field": "foo", "type": "quantitative"},
"y": {"field": "foo", "type": "quantitative", "aggregate": "count"}
},
"layer": [
{
"mark": "bar",
"encoding": {
"y": {"field": "baz", "type": "quantitative", "aggregate": "count"}
}
}
]
}
https://vega.github.io/editor/#/url/vega-lite/N4IgJghgLhIFygG4QDYFcCmBneoBmA9gfANoAMANJZQIwV10BMFzjAuhSAEYQBep1KvWFMWLNgF8JnALYQATgGt43BSE5R5EAHZZC8maXwpoUDNtIhCxSRWOnzlnv0kcQ5gMYEwAS20BzXBAADyC8HwwUMBVrdRAoAE8ABwwVAEc0HSgfGGzEVOkQBLCIqJiiOMSU9MztbNyffLiIf395DH9oVLgQLzQ6kClOEwSMeSMQOSUVHnHOT28-QIQiksjonudK5O6QDKyc6EbUzha2jq6VPoGpCUkgA
That still only gives columns for foo and its count, but now the count is 27 for each bucket!
How can I accomplish a multi-histogram graphic starting with array data?
You can do this using a flatten transform followed by a fold transform, and then use a color encoding to separate the two datasets. For example (open in editor):
{
"data": {
"values": {
"foo": [0, 0, 1, 1, 1, 1, 2, 2, 2],
"baz": [4, 4, 5, 5, 6, 6, 6, 6, 7]
}
},
"transform": [{"flatten": ["foo", "baz"]}, {"fold": ["foo", "baz"]}],
"mark": "bar",
"encoding": {
"x": {"field": "value", "type": "quantitative"},
"y": {
"field": "value",
"type": "quantitative",
"aggregate": "count",
"stack": null
},
"color": {"field": "key", "type": "nominal"}
}
}
As an aside, your layer approach also works if you put the encodings in separate layers, so that the outer foo aggregate doesn't clobber the baz data, but it's a bit more verbose than the approach based on fold:
{
"data": {
"values": {
"foo": [0, 0, 1, 1, 1, 1, 2, 2, 2],
"baz": [4, 4, 5, 5, 6, 6, 6, 6, 7]
}
},
"transform": [{"flatten": ["foo", "baz"]}],
"layer": [
{
"mark": {"type": "bar", "color": "orange"},
"encoding": {
"x": {"field": "foo", "type": "quantitative"},
"y": {"field": "foo", "type": "quantitative", "aggregate": "count"}
}
},
{
"mark": "bar",
"encoding": {
"x": {"field": "baz", "type": "quantitative"},
"y": {"field": "baz", "type": "quantitative", "aggregate": "count"}
}
}
]
}

Calculations in vega-lite

is there a way to dynamically calculate growth rates in Vega-Lite.
For example:
[
{"date": "1/1/2020", "b": 27},
{"date": "1/2/2020", "b": 30},
{"date": "1/3/2020", "b": 33}
]
How could I create data (and a chart) that shows the daily +3 (or the ~+10%)?
Edit: Thanks for the answer, #jakevdp.
Should have outlined the added complexity earlier; apologies: I need to aggregate prior to calculating changes. See below for the data and my attempt (dates seem offset and last date's drop doesn't make sense.
[Vega Editor][1]
{
"data": {
"values": [
{"date": "2020-01-01", "country": "CHN", "count": 0},
{"date": "2020-01-02", "country": "CHN", "count": 2},
{"date": "2020-01-03", "country": "CHN", "count": 4},
{"date": "2020-01-01", "country": "GER", "count": 0},
{"date": "2020-01-02", "country": "GER", "count": 2},
{"date": "2020-01-03", "country": "GER", "count": 4},
{"date": "2020-01-04", "country": "GER", "count": 6}
]
},
"transform": [
{
"aggregate": [{"op": "sum", "field":"count", "as":"daily_count"}],
"groupby": ["date"]
},
{
"window": [
{"op": "lead", "field": "daily_count", "as": "daily_count_tomorrow"}
]
},
{"filter": "isValid(datum.daily_count_tomorrow)"},
{"calculate": "datum.daily_count_tomorrow - datum.daily_count", "as": "change"}
],
"mark": "bar",
"encoding": {
"x": {"type": "ordinal", "field": "date", "timeUnit": "yearmonthdate"},
"y": {"type": "quantitative", "field": "change"}
}
}
[1]: https://vega.github.io/editor/#/url/vega-lite/N4KABGBEAmCGAutIC4yghSA3WAbArgKYDOKYA2uBhMDAoWZAEwAMrAtCwIydeQA0UAMYB7fADt4AJwCejAMIAJAHIDhYyWRYBfflWq048BqmZsWvTkzWRRE6XNNLVg2xvhkmu-RkP1GrBzcnADMNnaSsgoq4e5kACzePjR0xgHmltyx9lGmAOIAogBK2ZqoOnrUKUYmUIEWwWylDoyFJa4RHqhelVV+aab1mWEd7rlQbc0J3lVoqbVmQTws8c3jkJOj9mQAbNo+ALpU3pDSsOLEAGYiUgC2ZJQGyVCwAOavUoSv-qjktCIAB0YxHw91clwAloRcNAUG5tq5YKRkHQIbgZAB9TqQbQHXrUSAfMQAgBGjgo80gR2oM18z0gAHcIeJoCIGQ9nilAYxcIRYLDwVCYYw4GjMdjEcioKL0Vj3Bj4CJbjcpGycc9qRhaSlIbhjFJGBDiAA1PAQ6AACiMoIAdDLxfLFcqpKqGQBKHH4uZCPBCfC4H7ShC2+1y+wKpUqtlgdhga23O2wMVhzSSxhCAAW51eDH2EDxICokFusCkAGtGCTSwIi4RxKJoMzXmR0BhIAAPFunGQAhY3RviPA2SHQ2GmGo2eAQ26EACq4ghXSgMj5dxEkgzE+1y678B7CwAjvhzlPEFOsAxBaP01nxDn1RB9togA
Yes, you can do this using the window transform with the lead or lag operation. For example (vega editor):
{
"data": {
"values": [
{"date": "2020-01-01", "b": 29},
{"date": "2020-01-02", "b": 30},
{"date": "2020-01-03", "b": 32},
{"date": "2020-01-04", "b": 31},
{"date": "2020-01-05", "b": 34}
]
},
"transform": [
{"window": [{"op": "lead", "field": "b", "as": "b1"}]},
{"filter": "isValid(datum.b1)"},
{"calculate": "datum.b1 - datum.b", "as": "change"}
],
"mark": "bar",
"encoding": {
"x": {"type": "ordinal", "field": "date", "timeUnit": "yearmonthdate"},
"y": {"type": "quantitative", "field": "change"},
"color": {
"condition": {"test": "datum.change > 0", "value": "green"},
"value": "red"
}
}
}