Acessing array in vega lite - vega

I need to perform an operation in vega-lite/Kibana 6.5 similar to the next one. I need to divide y axis by "data.values[0].b". How can I perform this operation?
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"description": "A simple bar chart with embedded data.",
"data": {
"values": [
{"a": "A", "b": 28}, {"a": "B", "b": 55}, {"a": "C", "b": 43},
{"a": "D", "b": 91}, {"a": "E", "b": 81}, {"a": "F", "b": 53},
{"a": "G", "b": 19}, {"a": "H", "b": 87}, {"a": "I", "b": 52}
]
},
"mark": "bar",
"encoding": {
"x": {"field": "a", "type": "ordinal"},
"y": {"field": "b", "type": "quantitative"}
}

Please take a look at Transform Calculate topic in Vega Lite docs.
You can do :
"transform": [
{"calculate": "datum.a / datum.b", "as": "y"}
]
Notice that 'datum' is the keyword used to access data set.

Related

Spark SQL running sum with over() function - summing an indicator

I have the following data:
df = [{"Category": 'A', "date": '01/01/2022', "Indictor": 1},
{"Category": 'A', "date": '02/01/2022', "Indictor": 0},
{"Category": 'A', "date": '03/01/2022', "Indictor": 1},
{"Category": 'A', "date": '04/01/2022', "Indictor": 1},
{"Category": 'A', "date": '05/01/2022', "Indictor": 1},
{"Category": 'B', "date": '01/01/2022', "Indictor": 0},
{"Category": 'B', "date": '02/01/2022', "Indictor": 1},
{"Category": 'B', "date": '03/01/2022', "Indictor": 1},
{"Category": 'B', "date": '04/01/2022', "Indictor": 0},
{"Category": 'B', "date": '05/01/2022', "Indictor": 0},
{"Category": 'B', "date": '06/01/2022', "Indictor": 1}]
df = spark.createDataFrame(df)
I want to use a LEAD() function to group by 'Category' and then order by 'date' ascending. Then create a new field called 'consec_ind' which is a counter of the number of consecutive days that the indicator has been 1.
This is the code I have tried but it doesn't quite work.
df.createOrReplaceTempView('df')
%sql
select date, Indictor,
case when Indictor > 0 THEN
(sum(count(Indictor)) over (order by date)) else 0 end as running_total
from df
WHERE Category = 'A'
group by date, Indictor
order by date, Indictor;
This is what I would like the data to look like:
{"Category": 'A', "date": '02/01/2022', "Indictor": 0,"consec_ind":0},
{"Category": 'A', "date": '03/01/2022', "Indictor": 1,"consec_ind":1},
{"Category": 'A', "date": '04/01/2022', "Indictor": 1,"consec_ind":2},
{"Category": 'A', "date": '05/01/2022', "Indictor": 1,"consec_ind":3},
{"Category": 'B', "date": '01/01/2022', "Indictor": 0,"consec_ind":0},
{"Category": 'B', "date": '02/01/2022', "Indictor": 1,"consec_ind":1},
{"Category": 'B', "date": '03/01/2022', "Indictor": 1,"consec_ind":2},
{"Category": 'B', "date": '04/01/2022', "Indictor": 0,"consec_ind":0},
{"Category": 'B', "date": '05/01/2022', "Indictor": 0,"consec_ind":0},
{"Category": 'B', "date": '06/01/2022', "Indictor": 1,"consec_ind":1}]
Here is my solution
First step: Partition by Category
Second step: Try to partition by your custom condition within first partitioning - i am doing it by incrementing counter every time i meet 0 while iterating over first window
Third step: Calcuate sum within final partitions
Here is version in PySpark:
import pyspark.sql.functions as F
from pyspark.sql import Window
df = [
{"Category": "A", "date": "01/01/2022", "Indictor": 1},
{"Category": "A", "date": "02/01/2022", "Indictor": 0},
{"Category": "A", "date": "03/01/2022", "Indictor": 1},
{"Category": "A", "date": "04/01/2022", "Indictor": 1},
{"Category": "A", "date": "05/01/2022", "Indictor": 1},
{"Category": "B", "date": "01/01/2022", "Indictor": 0},
{"Category": "B", "date": "02/01/2022", "Indictor": 1},
{"Category": "B", "date": "03/01/2022", "Indictor": 1},
{"Category": "B", "date": "04/01/2022", "Indictor": 0},
{"Category": "B", "date": "05/01/2022", "Indictor": 0},
{"Category": "B", "date": "06/01/2022", "Indictor": 1},
]
df = spark.createDataFrame(df)
windowSpec = Window.partitionBy("Category").orderBy("date")
df.withColumn(
"partition_number", F.sum((F.col("Indictor") == 0).cast("int")).over(windowSpec)
).withColumn(
"part_sum",
F.sum(F.col("Indictor")).over(
Window.partitionBy("Category", "partition_number").orderBy("date")
),
).drop(
"partition_number"
).show()
and here is sql:
select
Category,
Indictor,
date,
sum(Indictor) over (
PARTITION BY Category,
partition_number
ORDER BY
date
) as part_sum
from
(
select
*,
sum(
case when Indictor == 0 then 1 else 0 end
) over (
PARTITION BY Category
ORDER BY
date
) as partition_number
from
df
)
Output is:
+--------+--------+----------+--------+
|Category|Indictor| date|part_sum|
+--------+--------+----------+--------+
| A| 1|01/01/2022| 1|
| A| 0|02/01/2022| 0|
| A| 1|03/01/2022| 1|
| A| 1|04/01/2022| 2|
| A| 1|05/01/2022| 3|
| B| 0|01/01/2022| 0|
| B| 1|02/01/2022| 1|
| B| 1|03/01/2022| 2|
| B| 0|04/01/2022| 0|
| B| 0|05/01/2022| 0|
| B| 1|06/01/2022| 1|
+--------+--------+----------+--------+

Make a property bag from a list of keys and values

I have a list containing the keys and another list containing values (obtained from splitting a log line). How can I combine the two to make a proeprty-bag in Kusto?
let headers = pack_array("A", "B", "C");
datatable(RawData:string)
[
"1,2,3",
"4,5,6",
]
| expand fields = split(RawData, ",")
| expand dict = ???
Expected:
dict
-----
{"A": 1, "B": 2, "C": 3}
{"A": 4, "B": 5, "C": 6}
Here's one option, that uses the combination of:
mv-apply: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/mv-applyoperator
pack(): https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/packfunction
make_bag(): https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/make-bag-aggfunction
let keys = pack_array("A", "B", "C");
datatable(RawData:string)
[
"1,2,3",
"4,5,6",
]
| project values = split(RawData, ",")
| mv-apply with_itemindex = i key = keys to typeof(string) on (
summarize dict = make_bag(pack(key, values[i]))
)
values
dict
[ "1", "2", "3"]
{ "A": "1", "B": "2", "C": "3"}
[ "4", "5", "6"]
{ "A": "4", "B": "5", "C": "6"}

Combining separate temporal measurement series

I have a data set that combines two temporal measurement series with one row per measurement
time: 1, measurement: a, value: 5
time: 2, measurement: b, value: false
time: 10, measurement: a, value: 2
time: 13, measurement: b, value: true
time: 20, measurement: a, value: 4
time: 24, measurement: b, value: true
time: 30, measurement: a, value: 6
time: 32, measurement: b, value: false
in a visualization using Vega lite, I'd like to combine the measurement series and encode measurement a and b in a single visualization without simply layering their representation on a temporal axis but representing their value in a single encoding spec.
either measurement a values need to be interpolated and added as a new value to rows of measurement b
eg:
time: 2, measurement: b, value: false, interpolatedMeasurementA: 4.6667
or the other way around, which leaves the question of how to interpolate a boolean. maybe closest value by time, or simpler: last value
eg:
time: 30, measurement: a, value: 6, lastValueMeasurementB: true
I suppose this could be done either query side in which case this question would be regarding indexDB Flux query language
or this could be done on the visualization side in which case this would be regarding vega-lite
There's not any true linear interpolation schemes built-in to Vega-Lite (though the loess transform comes close), but you can achieve roughly what you wish with a window transform.
Here is an example (view in editor):
{
"data": {
"values": [
{"time": 1, "measurement": "a", "value": 5},
{"time": 2, "measurement": "b", "value": false},
{"time": 10, "measurement": "a", "value": 2},
{"time": 13, "measurement": "b", "value": true},
{"time": 20, "measurement": "a", "value": 4},
{"time": 24, "measurement": "b", "value": true},
{"time": 30, "measurement": "a", "value": 6},
{"time": 32, "measurement": "b", "value": false}
]
},
"transform": [
{
"calculate": "datum.measurement == 'a' ? datum.value : null",
"as": "measurement_a"
},
{
"window": [
{"op": "mean", "field": "measurement_a", "as": "interpolated"}
],
"sort": [{"field": "time"}],
"frame": [1, 1]
},
{"filter": "datum.measurement == 'b'"}
],
"mark": "line",
"encoding": {
"x": {"field": "time"},
"y": {"field": "interpolated"},
"color": {"field": "value"}
}
}
This first uses a calculate transform to isolate the values to be interpolated, then a window transform that computes the mean over adjacent values (frame: [1, 1]), then a filter transform to isolate interpolated rows.
If you wanted to go the other route, you could do a similar sequence of transforms targeting the boolean value instead.

Filtering down a Karate test response object to get a sub-list?

Given this feature file:
Feature: test
Scenario: filter response
* def response =
"""
[
{
"a": "a",
"b": "a",
"c": "a",
},
{
"d": "ab",
"e": "ab",
"f": "ab",
},
{
"g": "ac",
"h": "ac",
"i": "ac",
}
]
"""
* match response[1] contains { e: 'ab' }
How can I filter the response down so that it is equal to:
{
"d": "ab",
"e": "ab",
"f": "ab",
}
Is there a built-in way to do this? In the same way as you can filter a List using a Java stream?
Sample code:
Feature: test
Scenario: filter response
* def response =
"""
[
{
"a": "a",
"b": "a",
"c": "a",
},
{
"d": "ab",
"e": "ab",
"f": "ab",
},
{
"g": "ac",
"h": "ac",
"i": "ac",
}
]
"""
* def filt = function(x){ return x.e == 'ab' }
* def items = get response[*]
* def res = karate.filter(items, filt)
* print res

C3JS Acces value shown on X axis

I have simple bar chart like this:
Here is my C3JS
var chart = c3.generate({
data: {
json:[{"A": 67, "B": 10, "site": "Google", "C": 12}, {"A": 10, "B": 20, "site": "Amazon", "C": 12}, {"A": 25, "B": 10, "site": "Stackoverflow", "C": 8}, {"A": 20, "B": 22, "site": "Yahoo", "C": 12}, {"A": 76, "B": 30, "site": "eBay", "C": 9}],
mimeType: 'json',
keys: {
x: 'site',
value: ['A','B','C']
},
type: 'bar',
selection: {
enabled: true
},
onselected: function(d,element)
{
alert('selected x: '+chart.selected()[0].x+' value: '+chart.selected()[0].value+' name: '+chart.selected()[0].name);
},
groups: [
['A','B','C']
]
},
axis: {
x: {
type: 'category'
}
}
});
After some chart elemnt is selected (clicked), alert shows X and Value and Name attributes of first selected element. For example "selected x: 0 value: 67 name: A" after I click on left-top chart element. How can I get value shown on X axis? In this case it is "Google".
Property categories is populated when the x-axis is declared to be of type category as it is in this case. So to get the data from the x-axis you needs to call the .categories() function.
onselected: function(d,element){alert(chart.categories()[d.index]);}
https://jsfiddle.net/4bos2qzx/1/