Clamping y-axis when layering aggregated charts in vega-lite - data-visualization

This is a follow up from a previous question for which I built a test case in a (hopefully now public) notebook and noticed the following behavior:
At the end of the notebook, in the section bugs you will notice that y-axis of the max_precipitation of the layered chart using is clamped to 10.
I tried changing the domain but the bars do not go above 10.
Here the code example in vega-lite's editor reproduced below:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": "Top Months by Mean Precipitation",
"data": {"url": "data/seattle-weather.csv"},
"transform": [
{"timeUnit": "month", "field": "date", "as": "month_date"},
{
"aggregate": [
{"op": "mean", "field": "precipitation", "as": "mean_precipitation"},
{"op": "max", "field": "precipitation", "as": "max_precipitation"}
],
"groupby": ["month_date"]
},
{
"window": [{"op": "row_number", "as": "rank"}],
"sort": [{"field": "mean_precipitation", "order": "descending"}]
}
],
"encoding": {
"x": {
"field": "month_date",
"type": "ordinal",
"timeUnit": "month",
"title": "month (descending by max precip)",
"sort": {
"field": "max_precipitation",
"op": "average",
"order": "descending"
}
}
},
"layer": [
{
"mark": {"type": "bar"},
"encoding": {
"y": {
"field": "max_precipitation",
"type": "quantitative",
"title": "precipitation (mean & max)"
}
}
},
{
"mark": "tick",
"encoding": {
"y": {"field": "mean_precipitation", "type": "quantitative"},
"color": {"value": "red"},
"size": {"value": 15}
}
}
]
}
Please help me understand what I am doing wrong?

It appears that the precipitation column is being parsed as strings rather than as numbers. You can specify the parsing format for the column using :
"data": {
"url": "data/seattle-weather.csv",
"format": {"parse": {"precipitation": "number"}}
},
The result is here:

Related

Why column facet in Vega Lite not working properly with layer?

I'm trying to create 3 column plot and it works when there's no layer.
But when I add the layer - 3 columns get merged into one plot (open in editor).
How to make it to be separated into 3 columns by the duration field?
CODE
For the plot with the full data please use editor link above.
{
"encoding": {
"column": { "field": "duration", "type": "nominal" },
"x": { "field": "bin_i", "type": "ordinal" }
},
"layer": [
{
"mark": { "type": "bar", "size": 2 },
"encoding": {
"y": { "field": "min", "type": "quantitative" },
"y2": { "field": "max", "type": "quantitative" }
}
},
{
"mark": { "type": "tick" },
"encoding": {
"y": { "field": "mean", "type": "quantitative" }
}
}
],
"data": {
"values": [
{
"bin_i": 1,
"duration": 1,
"max": 1.9642835793718165,
"mean": 1.0781367168962268,
"min": 0.3111818864927448
},
...
]
}
}
A layered chart does not accept a faceted encoding. If you want to facet a layered chart, you should use the facet operator rather than a facet encoding.
For your example, it would look like this (Vega Editor):
{
"facet": {"column": {"field": "duration", "type": "nominal"}},
"spec": {
"encoding": {
"x": {"field": "bin_i", "type": "ordinal"}
},
"layer": [
{
"mark": {"type": "bar", "size": 2},
"encoding": {
"y": {"field": "min", "type": "quantitative"},
"y2": {"field": "max"}
}
},
{
"mark": {"type": "tick"},
"encoding": {
"y": {"field": "mean", "type": "quantitative"}
}
}
]
},
"data": {
"values": [
{
"bin_i": 1,
"duration": 1,
"max": 1.9642835793718165,
"mean": 1.0781367168962268,
"min": 0.3111818864927448
},
...
]
}
}

Vega-Lite single line or trail mark with multiple colours

I'm trying to plot something like a trail mark but where I can map the line colour instead of the line size. Is that possible? So far I haven't been able to achieve it.
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"description": "Google's stock price over time.",
"data": {"url": "data/stocks.csv"},
"transform": [
{"filter": "datum.symbol==='GOOG'"},
{"calculate": "datum.price>400", "as": "good"}
],
"mark": "trail",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "price", "type": "quantitative"},
"size": {"field": "good", "type": "nominal"}
}
}
This is when using size with a trail mark.
This if I map to color.
Lines cannot be multiple colors in Vega-Lite, but you can use a color encoding along with an impute transform to change the color of different sections of the line (vega editor):
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"description": "Google's stock price over time.",
"data": {"url": "data/stocks.csv"},
"transform": [
{"filter": "datum.symbol==='GOOG'"},
{"calculate": "datum.price>400", "as": "good"}
],
"mark": "line",
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "price", "type": "quantitative", "impute": {"value": null}},
"color": {"field": "good", "type": "nominal"}
}
}
Unfortunately this leaves breaks in the line; you can get around this by creating a background layer like this (vega editor):
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"description": "Google's stock price over time.",
"data": {"url": "data/stocks.csv"},
"transform": [
{"filter": "datum.symbol==='GOOG'"},
{"calculate": "datum.price>400", "as": "good"}
],
"encoding": {
"x": {"field": "date", "type": "temporal"},
"y": {"field": "price", "type": "quantitative", "impute": {"value": null}}
},
"layer": [
{"mark": "line"},
{
"mark": "line",
"encoding": {"color": {"field": "good", "type": "nominal"}}
}
]
}
Edit: if you're using Altair, the equivalent would be something like this:
import altair as alt
from vega_datasets import data
alt.layer(
alt.Chart().mark_line(),
alt.Chart().mark_line().encode(color='good:N'),
data=data.stocks.url
).transform_filter(
'datum.symbol==="GOOG"',
).transform_calculate(
good="datum.price>400"
).encode(
x='date:T',
y=alt.Y('price:Q', impute={'value': None})
)

Best way to split data into few lines?

I have data as shown:
2021 43466.822 medium variant
2021 43510.982 high variant
2021 43416.407 low variant
2021 43468.429 constant fertility
2021 43580.45 instant replacement
And need to get chart:
https://image.prntscr.com/image/eBKqmOUsSa_6PBlomh5Erg.png
I have tried the transform fold option, but it does not help me. And making a lot of layers for that - will be a lot of code. Is there any smart way? Also I will need an legend like the one shown.
vegalite({
height:300,
autosize: "fit",
width:width,
title: {text:"Ukraine Population Prospects",
subtitle:"Total population, million"
},
data: {
url:"https://gist.githubusercontent.com/turiy/005f2ce11637fefcde8e9d6efdb0c2e6/raw/19e67bb3a6d63e7fd9f49a596e5d24404469bd63/population_prospects.csv"},
transform: [{"calculate": "datum.population/1000", "as": "population"},{fold:["medium variant","high variant", "low variant", "constant fertility","instant replacement", "momentum", "zero migration", "constant mortality", "no change"]}],
layer: [
{ mark: "line",
encoding:{
"x": {
"timeUnit": "utcyear",
"field": "year",
"type": "temporal",
"axis": {
"values":[1950,1991,2020,2100],
"domain": false,
"gridDash": {"value": [1,1]}
}
},
"y": {
"field": "population",
"type": "quantitative",
"scale": {"domain": [15,55]},
"axis": {
"domain": false ,
"gridDash": {"value": [1,1]}
}
}
},
color: {"value":"#0000ff"},
transform:[{filter:{"timeUnit": "utcyear", "field": "year", "range": [1950, 2020]}}]
},
{
mark: "line",legend:{title:"low variant"},
encoding:{
x: {
"timeUnit": "utcyear",
"field": "year",
"type": "temporal",
"axis": {
"values":[1950,1991,2020,2100],
"domain": false,
"gridDash": {"value": [1,1]}}
},
y: {
"field": "population",
"type": "quantitative",
"scale": {"domain": [15,55]},
"axis": {
"domain": false ,
"gridDash": {"value": [1,1]}
}
},
legends:{
"orient": "top-right",
"stroke": "color",
"title": "Origin",
"encode": {
"symbols": {
"update": {
"fill": {"value": ""},
"strokeWidth": {"value": 2},
"size": {"value": 64}
}
}
}
},
color: {"field": "key", "type":"nominal"}
},
transform:[{filter:{"timeUnit": "year", "field": "year", "range": [2020, 2100]}},
{filter:{field:"type", "equal":"low variant"}}]
}
]})
And I am getting like this https://image.prntscr.com/image/3Y9WNk4SQzGYWDr2JKWV9A.png
If your variants are listed by name in a column as in your example dataset, you can use a detail encoding to split them into different lines (vega editor link):
{
"data": {
"url": "https://gist.githubusercontent.com/turiy/005f2ce11637fefcde8e9d6efdb0c2e6/raw/19e67bb3a6d63e7fd9f49a596e5d24404469bd63/population_prospects.csv"
},
"mark": "line",
"encoding": {
"detail": {"type": "nominal", "field": "type"},
"x": {"type": "quantitative", "field": "year"},
"y": {"type": "quantitative", "field": "population"}
}
}
If you use color rather than detail, each line will be a different color and a legend will be included.
To add labels at the right side of the chart, you can use a text mark with an aggregate transform; something like this:
{
"data": {
"url": "https://gist.githubusercontent.com/turiy/005f2ce11637fefcde8e9d6efdb0c2e6/raw/19e67bb3a6d63e7fd9f49a596e5d24404469bd63/population_prospects.csv"
},
"layer": [
{
"mark": "line",
"encoding": {
"detail": {"type": "nominal", "field": "type"},
"x": {"type": "quantitative", "field": "year"},
"y": {"type": "quantitative", "field": "population"}
}
},
{
"transform": [
{"filter": "datum.type != 'estimate'"},
{
"aggregate": [{"op": "argmax", "field": "year", "as": "rightmost"}],
"groupby": ["type"]
}
],
"mark": {"type": "text", "align": "left"},
"encoding": {
"text": {"type": "nominal", "field": "rightmost.type"},
"x": {"type": "quantitative", "field": "rightmost.year"},
"y": {"type": "quantitative", "field": "rightmost.population"}
}
}
],
"width": 400
}

vega-lite line plot - color not getting applied in transform filter

Vega Editor link here
I've an overlay color change based on filter condition in a multi line chart. Got it working with single line here but 'red' overlay line(along with red dot) doesn't come up with this above multi-line example. Could anyone help me out?
Short answer: your chart is working, except the filtered values are not colored red.
The core issue is that encodings always supersede mark properties, as you can see in this simpler example: editor link
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"description": "A scatterplot showing horsepower and miles per gallons.",
"data": {"url": "data/cars.json"},
"mark": {"type": "point", "color": "red"},
"encoding": {
"x": {"field": "Horsepower", "type": "quantitative"},
"y": {"field": "Miles_per_Gallon", "type": "quantitative"},
"color": {"field": "Origin", "type": "nominal"},
"shape": {"field": "Origin", "type": "nominal"}
}
}
Notice that although we specify that the mark should have color red, this is overridden by the color encoding. This is by design within Vega-Lite, because encodings are more specific than properties.
Back to your chart: because you specify the color encoding in the parent chart, each individual layer inherits that color encoding, and those colors override the "color": "red" that you specify in the individual layers.
To make it do what you want, you can move the color encoding into the individual layers (and use a detail encoding to ensure the data are still grouped by that field). For example (editor link):
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"data": {...},
"width": 1000,
"height": 200,
"autosize": {"type": "pad", "resize": true},
"transform": [
{
"window": [{"op": "rank", "as": "rank"}],
"sort": [{"field": "dateTime", "order": "descending"}]
},
{"filter": "datum.rank <= 100"}
],
"layer": [
{
"mark": {"type": "line"},
"encoding": {
"color": {
"field": "name",
"type": "nominal",
"legend": {"title": "Type"}
}
}
},
{
"mark": {"type": "line", "color": "red"},
"transform": [
{
"as": "count",
"calculate": "if(datum.anomaly == true, datum.count, null)"
},
{"calculate": "true", "as": "baseline"}
]
},
{
"mark": {"type": "circle", "color": "red"},
"transform": [
{"filter": "datum.anomaly == true"},
{"calculate": "true", "as": "baseline"}
]
}
],
"encoding": {
"x": {
"field": "dateTime",
"type": "temporal",
"timeUnit": "hoursminutesseconds",
"sort": {"field": "dateTime", "op": "count", "order": "descending"},
"axis": {"title": "Time", "grid": false}
},
"y": {
"field": "count",
"type": "quantitative",
"axis": {"title": "Count", "grid": false}
},
"detail": {"field": "name", "type": "nominal"}
}
}

Vega Lite Independent Scale with Multiple Layers and Facet

Is it possible to have an independent scale for each facet and each layer? The resolve works great when you have either a facet or an extra layer, but I cannot get it to do both, wondering if it is even possible.
What I want is:
The two scales on each side
mixed with
the faceting here
The way this would be expressed in Vega-Lite is using a layer, with resolve set, within a facet. Something like this:
{
"data": {
"url": "https://vega.github.io/vega-datasets/data/seattle-weather.csv"
},
"facet": {
"column": {
"field": "weather",
"type": "nominal"
}
},
"spec": {
"layer": [
{
"encoding": {
"x": {
"field": "date",
"timeUnit": "month",
"type": "temporal"
},
"y": {
"aggregate": "mean",
"field": "temp_max",
"type": "quantitative"
}
},
"mark": {
"color": "salmon",
"type": "line"
}
},
{
"encoding": {
"x": {
"field": "date",
"timeUnit": "month",
"type": "temporal"
},
"y": {
"aggregate": "mean",
"field": "precipitation",
"type": "quantitative"
}
},
"mark": {
"color": "steelblue",
"type": "line"
}
}
],
"resolve": {
"scale": {
"y": "independent"
}
}
}
}
While this spec is valid according to the Vega-Lite schema, there is unfortunately a bug in the vega-lite renderer that makes it unable to render this spec.
As a workaround, you can manually concatenate two layered charts with a filter transform that selects the desired subset of data for each. For example:
{
"data": {
"url": "https://vega.github.io/vega-datasets/data/seattle-weather.csv"
},
"hconcat": [
{
"layer": [
{
"mark": {"type": "line", "color": "salmon"},
"encoding": {
"x": {"type": "temporal", "field": "date", "timeUnit": "month"},
"y": {
"type": "quantitative",
"aggregate": "mean",
"field": "temp_max"
}
}
},
{
"mark": {"type": "line", "color": "steelblue"},
"encoding": {
"x": {"type": "temporal", "field": "date", "timeUnit": "month"},
"y": {
"type": "quantitative",
"aggregate": "mean",
"field": "precipitation"
}
}
}
],
"resolve": {"scale": {"y": "independent", "x": "shared"}},
"transform": [{"filter": "(datum.weather === 'sun')"}]
},
{
"layer": [
{
"mark": {"type": "line", "color": "salmon"},
"encoding": {
"x": {"type": "temporal", "field": "date", "timeUnit": "month"},
"y": {
"type": "quantitative",
"aggregate": "mean",
"field": "temp_max"
}
}
},
{
"mark": {"type": "line", "color": "steelblue"},
"encoding": {
"x": {"type": "temporal", "field": "date", "timeUnit": "month"},
"y": {
"type": "quantitative",
"aggregate": "mean",
"field": "precipitation"
}
}
}
],
"resolve": {"scale": {"y": "independent", "x": "shared"}},
"transform": [{"filter": "(datum.weather === 'fog')"}]
}
],
"$schema": "https://vega.github.io/schema/vega-lite/v2.6.0.json"
}