Vega-Lite: week starting from Monday and wrong week numbers in general - data-visualization

I am new to Vega-Lite and trying to aggregate my data by week. The existing option to display the data by week is not suitable for me because I'd like the week to start on Monday (rather than Sunday as it is right now) + in fact the week numbers are wrong.
Below is my basic code.
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"values": [
{"date": "2020-09-29", "count": "13", "outcome": "invalid"},
{"date": "2020-09-29", "count": "14", "outcome": "fail"},
{"date": "2020-09-29", "count": "20", "outcome": "pass"},
{"date": "2020-09-27", "count": "70", "outcome": "invalid"},
{"date": "2020-09-27", "count": "30", "outcome": "fail"},
{"date": "2020-09-27", "count": "20", "outcome": "pass"},
{"date": "2020-09-26", "count": "5", "outcome": "invalid"},
{"date": "2020-09-26", "count": "15", "outcome": "fail"},
{"date": "2020-09-26", "count": "13", "outcome": "pass"}
]
},
"width": 280,
"height": 200,
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {
"title": "Week",
"field": "date",
"type": "ordinal",
"timeUnit": "week",
"axis": {"format": "%W"}
},
"y": {
"title": "Number of tests",
"field": "count",
"aggregate": "sum",
"type": "quantitative",
"axis": {"orient": "right"}
},
"color": {
"field": "outcome",
"type": "nominal",
"scale": {
"domain": ["invalid", "fail", "pass"],
"range": ["#c7c7c7", "#8fd7f9", "#ef9292"]
},
"legend": {"title": "Test results"}
}
}
}
I could in principle calculate the counts per week using something like the window function in the snippet below, but I have multiple instances of each date and I do not want to collapse across the "outcome" variable. Moreover, my data can start at any arbitrary date, so calculating the week number starting from 0 is not an option either.
{"calculate": "day(datum.date) == 0", "as": "sundays"},
{
"window": [{"op": "sum", "field": "sundays", "as": "week"}],
"sort": "date"
}
I also thought of a less elegant solution - taking steps of 7 days in the x axis and aggregating in the y axis (while making sure that the data starts on a Monday). This gives me a correct total count per week, but then I am struggling with labeling the X axis correctly with week numbers.
Finally, even if I was OK with starting the weeks on Sundays (so using the basic code that I give above), I am seeing unexpected week numbers. For some reason (and perhaps that's because I don't know how to count week numbers correctly), the week numbers displayed are 37 and 38 (as can be seen in the attached image) when in fact they should be 39 and 40. How do I solve this?
I'd be grateful for any tips.

Vega's week timeUnit has well-defined behavior, spelled-out in the timeUnit documentation:
"week": Sunday-based weeks. Days before the first Sunday of the year are considered to be in week 0, the first Sunday of the year is the start of week 1, the second Sunday week 2, etc..
There is currently no alternative week definition built-in to the package, but you can make use of vega expressions within transforms to compute arbitrary quantities from your data.
If I've done the calculations correctly, I think this will give you the ISO weeks that you are after:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"values": [
{"date": "2020-09-29", "count": "14", "outcome": "fail"},
{"date": "2020-09-29", "count": "20", "outcome": "pass"},
{"date": "2020-09-27", "count": "70", "outcome": "invalid"},
{"date": "2020-09-27", "count": "30", "outcome": "fail"},
{"date": "2020-09-27", "count": "20", "outcome": "pass"},
{"date": "2020-09-26", "count": "5", "outcome": "invalid"},
{"date": "2020-09-26", "count": "15", "outcome": "fail"},
{"date": "2020-09-26", "count": "13", "outcome": "pass"}
]
},
"transform": [
{"calculate": "day(datetime(utcyear(datum.date), 0, 1))", "as": "startingDay"},
{"calculate": "(4 - datum.startingDay) % 7 - 2", "as": "mondayOfFirstWeek"},
{"calculate": "1 + floor((utcdayofyear(datum.date) - datum.mondayOfFirstWeek) / 7)", "as": "ISOweek"}
],
"width": 280,
"height": 200,
"mark": {"type": "bar", "tooltip": true},
"encoding": {
"x": {
"title": "Week",
"field": "ISOweek",
"type": "ordinal"
},
"y": {
"title": "Number of tests",
"field": "count",
"aggregate": "sum",
"type": "quantitative",
"axis": {"orient": "right"}
},
"color": {
"field": "outcome",
"type": "nominal",
"scale": {
"domain": ["invalid", "fail", "pass"],
"range": ["#c7c7c7", "#8fd7f9", "#ef9292"]
},
"legend": {"title": "Test results"}
}
}
}
Brief explanation of the transforms:
{"calculate": "day(datetime(utcyear(datum.date), 0, 1))", "as": "startingDay"},
This computes the day of the week that falls on January 1st for the given year (Sunday=0, Monday=1... Saturday=6).
{"calculate": "(4 - datum.startingDay) % 7 - 2", "as": "mondayOfFirstWeek"},
This computes the day of the year on which the first week starts. So, for example, if startingDay = 5, then January 1st is a Friday, so day 4 of the year is the Monday of the first week containing a Thursday. If startingDay = 4, then January 1st is a Thursday, so day -2 is the Monday of the first week containing a Thursday.
{"calculate": "1 + floor((utcdayofyear(datum.date) - datum.mondayOfFirstWeek) / 7)", "as": "ISOweek"}
This counts the rounded number of 7-day weeks from the first monday identified above.
Note that we use utc versions of timeUnits when parsing datum.date in order to correctly handle incomplete timestamps like 2020-09-29. If we had not, the ISOweek would be incorrect for the 1st of January.

Related

Calculations in vega-lite

is there a way to dynamically calculate growth rates in Vega-Lite.
For example:
[
{"date": "1/1/2020", "b": 27},
{"date": "1/2/2020", "b": 30},
{"date": "1/3/2020", "b": 33}
]
How could I create data (and a chart) that shows the daily +3 (or the ~+10%)?
Edit: Thanks for the answer, #jakevdp.
Should have outlined the added complexity earlier; apologies: I need to aggregate prior to calculating changes. See below for the data and my attempt (dates seem offset and last date's drop doesn't make sense.
[Vega Editor][1]
{
"data": {
"values": [
{"date": "2020-01-01", "country": "CHN", "count": 0},
{"date": "2020-01-02", "country": "CHN", "count": 2},
{"date": "2020-01-03", "country": "CHN", "count": 4},
{"date": "2020-01-01", "country": "GER", "count": 0},
{"date": "2020-01-02", "country": "GER", "count": 2},
{"date": "2020-01-03", "country": "GER", "count": 4},
{"date": "2020-01-04", "country": "GER", "count": 6}
]
},
"transform": [
{
"aggregate": [{"op": "sum", "field":"count", "as":"daily_count"}],
"groupby": ["date"]
},
{
"window": [
{"op": "lead", "field": "daily_count", "as": "daily_count_tomorrow"}
]
},
{"filter": "isValid(datum.daily_count_tomorrow)"},
{"calculate": "datum.daily_count_tomorrow - datum.daily_count", "as": "change"}
],
"mark": "bar",
"encoding": {
"x": {"type": "ordinal", "field": "date", "timeUnit": "yearmonthdate"},
"y": {"type": "quantitative", "field": "change"}
}
}
[1]: https://vega.github.io/editor/#/url/vega-lite/N4KABGBEAmCGAutIC4yghSA3WAbArgKYDOKYA2uBhMDAoWZAEwAMrAtCwIydeQA0UAMYB7fADt4AJwCejAMIAJAHIDhYyWRYBfflWq048BqmZsWvTkzWRRE6XNNLVg2xvhkmu-RkP1GrBzcnADMNnaSsgoq4e5kACzePjR0xgHmltyx9lGmAOIAogBK2ZqoOnrUKUYmUIEWwWylDoyFJa4RHqhelVV+aab1mWEd7rlQbc0J3lVoqbVmQTws8c3jkJOj9mQAbNo+ALpU3pDSsOLEAGYiUgC2ZJQGyVCwAOavUoSv-qjktCIAB0YxHw91clwAloRcNAUG5tq5YKRkHQIbgZAB9TqQbQHXrUSAfMQAgBGjgo80gR2oM18z0gAHcIeJoCIGQ9nilAYxcIRYLDwVCYYw4GjMdjEcioKL0Vj3Bj4CJbjcpGycc9qRhaSlIbhjFJGBDiAA1PAQ6AACiMoIAdDLxfLFcqpKqGQBKHH4uZCPBCfC4H7ShC2+1y+wKpUqtlgdhga23O2wMVhzSSxhCAAW51eDH2EDxICokFusCkAGtGCTSwIi4RxKJoMzXmR0BhIAAPFunGQAhY3RviPA2SHQ2GmGo2eAQ26EACq4ghXSgMj5dxEkgzE+1y678B7CwAjvhzlPEFOsAxBaP01nxDn1RB9togA
Yes, you can do this using the window transform with the lead or lag operation. For example (vega editor):
{
"data": {
"values": [
{"date": "2020-01-01", "b": 29},
{"date": "2020-01-02", "b": 30},
{"date": "2020-01-03", "b": 32},
{"date": "2020-01-04", "b": 31},
{"date": "2020-01-05", "b": 34}
]
},
"transform": [
{"window": [{"op": "lead", "field": "b", "as": "b1"}]},
{"filter": "isValid(datum.b1)"},
{"calculate": "datum.b1 - datum.b", "as": "change"}
],
"mark": "bar",
"encoding": {
"x": {"type": "ordinal", "field": "date", "timeUnit": "yearmonthdate"},
"y": {"type": "quantitative", "field": "change"},
"color": {
"condition": {"test": "datum.change > 0", "value": "green"},
"value": "red"
}
}
}

Clamping y-axis when layering aggregated charts in vega-lite

This is a follow up from a previous question for which I built a test case in a (hopefully now public) notebook and noticed the following behavior:
At the end of the notebook, in the section bugs you will notice that y-axis of the max_precipitation of the layered chart using is clamped to 10.
I tried changing the domain but the bars do not go above 10.
Here the code example in vega-lite's editor reproduced below:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": "Top Months by Mean Precipitation",
"data": {"url": "data/seattle-weather.csv"},
"transform": [
{"timeUnit": "month", "field": "date", "as": "month_date"},
{
"aggregate": [
{"op": "mean", "field": "precipitation", "as": "mean_precipitation"},
{"op": "max", "field": "precipitation", "as": "max_precipitation"}
],
"groupby": ["month_date"]
},
{
"window": [{"op": "row_number", "as": "rank"}],
"sort": [{"field": "mean_precipitation", "order": "descending"}]
}
],
"encoding": {
"x": {
"field": "month_date",
"type": "ordinal",
"timeUnit": "month",
"title": "month (descending by max precip)",
"sort": {
"field": "max_precipitation",
"op": "average",
"order": "descending"
}
}
},
"layer": [
{
"mark": {"type": "bar"},
"encoding": {
"y": {
"field": "max_precipitation",
"type": "quantitative",
"title": "precipitation (mean & max)"
}
}
},
{
"mark": "tick",
"encoding": {
"y": {"field": "mean_precipitation", "type": "quantitative"},
"color": {"value": "red"},
"size": {"value": 15}
}
}
]
}
Please help me understand what I am doing wrong?
It appears that the precipitation column is being parsed as strings rather than as numbers. You can specify the parsing format for the column using :
"data": {
"url": "data/seattle-weather.csv",
"format": {"parse": {"precipitation": "number"}}
},
The result is here:

Vega-lite difference between Firefox and Chrome

I have a vega-lite chart that shows up as expected in Chrome (72.0.3626.96), but not in Firefox (70.0.1). I have checked the spec in the Vega Editor. Does anyone know why this might be?
Here are the rendered charts:
Firefox:
Chrome:
Here is the spec:
Any help you might be able to give would be much appreciated.
Apologies, but I do not know how to collapse this code.
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.2.1.json",
"background": "white",
"config": {"mark": {"tooltip": null}, "view": {"height": 300, "width": 400}},
"datasets": {
"data-511198e25d4dbee99248144390684caa": [
{
"counts": 338,
"filter_method": "greater than",
"grade": "9",
"index": 3,
"perc": 0.2669826224328594,
"school_code": "Board",
"threshold": "8",
"year": 20172018,
"year_lab": "2017/18",
"year_lab_q": "2017"
},
{
"counts": 414,
"filter_method": "greater than",
"grade": "9",
"index": 4,
"perc": 0.30689399555226093,
"school_code": "Board",
"threshold": "8",
"year": 20182019,
"year_lab": "2018/19",
"year_lab_q": "2018"
}
],
"data-72a083843a98847e44077116c495e448": [
{
"counts": 49,
"filter_method": "greater than",
"grade": "9",
"index": 0,
"perc": 0.3356164383561644,
"school_code": "KING",
"threshold": "8",
"year": 20142015,
"year_lab": "2014/15",
"year_lab_q": "2014"
},
{
"counts": 62,
"filter_method": "greater than",
"grade": "9",
"index": 5,
"perc": 0.3668639053254438,
"school_code": "MLTS",
"threshold": "8",
"year": 20162017,
"year_lab": "2016/17",
"year_lab_q": "2016"
},
{
"counts": 53,
"filter_method": "greater than",
"grade": "9",
"index": 6,
"perc": 0.29608938547486036,
"school_code": "KING",
"threshold": "8",
"year": 20172018,
"year_lab": "2017/18",
"year_lab_q": "2017"
},
{
"counts": 44,
"filter_method": "greater than",
"grade": "9",
"index": 7,
"perc": 0.25882352941176473,
"school_code": "MLTS",
"threshold": "8",
"year": 20172018,
"year_lab": "2017/18",
"year_lab_q": "2017"
},
{
"counts": 53,
"filter_method": "greater than",
"grade": "9",
"index": 8,
"perc": 0.3212121212121212,
"school_code": "KING",
"threshold": "8",
"year": 20182019,
"year_lab": "2018/19",
"year_lab_q": "2018"
},
{
"counts": 61,
"filter_method": "greater than",
"grade": "9",
"index": 9,
"perc": 0.25206611570247933,
"school_code": "MLTS",
"threshold": "8",
"year": 20182019,
"year_lab": "2018/19",
"year_lab_q": "2018"
}
]
},
"height": 400,
"layer": [
{
"data": {"name": "data-72a083843a98847e44077116c495e448"},
"encoding": {
"color": {
"field": "school_code",
"legend": {"labelFontSize": 15, "titleFontSize": 20},
"title": null,
"type": "nominal"
},
"tooltip": [
{
"field": "perc",
"format": ".2%",
"title": "percentage",
"type": "quantitative"
},
{
"field": "counts",
"title": "number",
"type": "quantitative"
},
{"field": "year_lab", "title": "school year", "type": "nominal"},
{"field": "school_code", "title": "level", "type": "nominal"},
{"field": "grade", "type": "nominal"},
{"field": "filter_method", "type": "nominal"},
{"field": "threshold", "type": "nominal"}
],
"x": {
"axis": {"format": "%Y", "tickCount": 5},
"field": "year_lab_q",
"scale": {"domain": ["2013.9", "2018.5"]},
"title": "School Year (beginning)",
"type": "temporal"
},
"y": {
"axis": {"format": ".0%"},
"field": "perc",
"title": "Percentage",
"type": "quantitative"
}
},
"mark": {"point": true, "type": "line"},
"selection": {
"selector001": {
"bind": "scales",
"encodings": ["x", "y"],
"type": "interval"
}
}
},
{
"data": {"name": "data-511198e25d4dbee99248144390684caa"},
"encoding": {
"color": {
"field": "school_code",
"legend": {"labelFontSize": 15, "titleFontSize": 20},
"scale": {"domain": ["Board"], "range": ["black"]},
"title": null,
"type": "nominal"
},
"tooltip": [
{
"field": "perc",
"format": ".2%",
"title": "percentage",
"type": "quantitative"
},
{
"field": "counts",
"title": "number",
"type": "quantitative"
},
{"field": "year_lab", "title": "school year", "type": "nominal"},
{"field": "school_code", "title": "level", "type": "nominal"},
{"field": "grade", "type": "nominal"},
{"field": "filter_method", "type": "nominal"},
{"field": "threshold", "type": "nominal"}
],
"x": {"field": "year_lab_q", "type": "temporal"},
"y": {"field": "perc", "type": "quantitative"}
},
"mark": {"point": true, "type": "line"}
}
],
"resolve": {"scale": {"color": "independent"}},
"title": "A title!",
"width": 700
}
It appears that your temporal values are not being parsed correctly in firefox (details of javascript date parsing behavior is often browser-dependent). You could try forcing the correct parsing by changing your data specification (in both places) to:
"data": {
"name": "data-72a083843a98847e44077116c495e448",
"format": {"parse": {"year_lab_q": "date:%Y"}}
}
This should ensure that the year string is parsed as a year, rather than e.g. a unix timestamp.
The other place date parsing is happening is in your domain specification. You might try changing those to a more standard time format, e.g.
"domain": ["2013-11-01", "2018-06-01"]

Render different fill/stroke colours depending on positive/negative x-axis value

I'm trying to code a little chart for use in risk/opportunity analysis. You feed in 4 values:
PreConsequence & PostConsequence (int's ranging -4 to +4)
PreLikelihood & PostLikelihood (int's ranging 0 to 4)
And it visualises those with charts as follows:
The shaded fill renders the Pre values, whilst the stroke renders the Post values.
Geometry in the left quadrant represents risk and should be red, whilst geometry in the right quadrant represents opportunity and should be green.
I'm struggling to find out how to check for negative consequence values and assign colour accordingly. I think it'll be done in the last two lines of my code below:
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"width": 500,
"height": 250,
"data": [
{
"name": "table",
"values": [
{"C": -4, "L": 0, "f":"Pre"}, {"C": 0, "L": 4, "f":"Pre"},
{"C": 0, "L": 2, "f":"Post"}, {"C": -1, "L": 0, "f":"Post"}
]
}
],
"scales": [
{
"name": "xscale",
"type": "linear",
"range": "width",
"nice": true,
"zero": true,
"domain": {"data": "table", "field": "C"},
"domainMax": 4,
"domainMin": -4
},
{
"name": "yscale",
"type": "linear",
"range": "height",
"domain": {"data": "table", "field": "L"},
"domainMax": 4,
"domainMin": 0
},
{
"name": "color",
"type": "ordinal",
"range":"ordinal",
"domain": {"data": "table", "field": "f"}
}
],
"axes": [
{"orient": "bottom", "scale": "xscale", "tickCount": 10 },
{"orient": "left", "scale": "yscale", "tickCount": 5, "offset": -250 }
],
"marks": [
{
"type": "group",
"from": {
"facet": {
"name": "series",
"data": "table",
"groupby": "f"
}
},
"marks": [
{
"type": "area",
"from": {"data": "series"},
"encode": {
"enter": {
"x": {"scale": "xscale", "field": "C" },
"y": {"scale": "yscale", "field": "L"},
"y2": {"scale": "yscale", "value": 0 },
"fillOpacity": [{ "test": "indata('series', 'f', 'Pre')", "value": 0.3 }, {"value": 0}],
"strokeWidth": [{ "test": "indata('series', 'f', 'Pre')", "value": 0 }, {"value": 2}],
"fill": [{ "test": "indata('series', 'f', 'Pre')", "value": "red" }, {"value": "red"}],
"stroke": [{ "test": "indata('series', 'f', 'Pre')", "value": "red" }, {"value": "red"}]
}
}
}
]
}
]
}
Can anyone give me some pointers as to how to test the data values & set fill & stroke colours accordingly?
Thanks
Vega signals can be used to conditionally paint colors based on data values.
In the given vega spec, changing the fill property like this should do what you are expecting if data has Positive quadrant C values also.
"fill": { "signal": "datum.C > 0 ? 'green': 'red'"},

Vega legend and color per mark

As you can see in the example, I want to create a chart with multiple marks. Each of these marks needs a specific color, a corresponding label and legend. I get that I define the color by "domain": {"data": "table", "field": "District"}. But I need a specific color for my 'subtotal' and 'totals' line. And the legend should read Neustadt - Totals and Neustadt - Subtotals.
I have played around with what I can find in the scales documentation. But I just cant seem to make color and legend items by referenced marks field.
"data": [
{
"name": "table",
"values": [
{"District": "Neustadt", "total": "86", "id": 12, "subtotal": "600", "Year": "2017"},
{"District": "Neustadt", "total": "398", "id": 13, "subtotal": "100", "Year": "2018"},
{"District": "Neustadt", "total": "155", "id": 14, "subtotal": "10", "Year": "2019"}
],
"transform": [
{
"type": "collect",
"sort": {
"field": "Year"
}
}
]
}
],
"scales": [
{
"name": "Year",
"type": "point",
"range": "width",
"domain": {"data": "table", "field": "Year", "sort": true}
},
{
"name": "subtotal",
"type": "linear",
"range": "height",
"nice": true,
"zero": true,
"domain": {"data": "table", "field": "subtotal"}
},
{
"name": "color",
"type": "ordinal",
"range": "category",
"domain": {"data": "table", "field": "District"}
}
]
I can only get one legend item or of course multiple items if I would have another district in the data. Is grouping marks an option?