KQL: How to put unique values from a column as names of columns - kql

I want to gather unique entries from one column and pass them as column names to display their count values.
let sales = datatable (store: string, category: string, product: string)
[
"StoreA", "Food", "Steak",
"StoreB", "Drink", "Cola",
"StoreB", "Food", "Fries",
"StoreA", "Sweets", "Cake",
"StoreB", "Food", "Hotdog",
"StoreB", "Food", "Salad",
"StoreA", "Sweets", "Chocolate",
"StoreC", "Food", "Steak"
];
sales
| summarize Food=count(category=="Food"),Drink=count(category=="Drink"),Sweets=count(category=="Sweets") by store
It can be done manually, but I want to make this query universal, so it doesn't need to be changed when new categories are added.

Converting rows to columns is called "pivoting".
See KQL pivot plugin
let sales = datatable (store: string, category: string, product: string)
[
"StoreA", "Food", "Steak",
"StoreB", "Drink", "Cola",
"StoreB", "Food", "Fries",
"StoreA", "Sweets", "Cake",
"StoreB", "Food", "Hotdog",
"StoreB", "Food", "Salad",
"StoreA", "Sweets", "Chocolate",
"StoreC", "Food", "Steak"
];
sales
| evaluate pivot(category, count(), store)
store
Drink
Food
Sweets
StoreA
0
1
2
StoreB
1
3
0
StoreC
0
1
0
Fiddle

Related

BigQuery string-formatting to json

Is the following a full list of all value types as they're passed to json in BigQuery? I've gotten this by trial and error but haven't been able to find this in the documentation:
select
NULL as NullValue,
FALSE as BoolValue,
DATE '2014-01-01' as DateValue,
INTERVAL 1 year as IntervalValue,
DATETIME '2014-01-01 01:02:03' as DatetimeValue,
TIMESTAMP '2014-01-01 01:02:03' as TimestampValue,
"Hello" as StringValue,
B"abc" as BytesValue,
123 as IntegerValue,
NUMERIC '3.14' as NumericValue,
3.14 as FloatValue,
TIME '12:30:00.45' as TimeValue,
[1,2,3] as ArrayValue,
STRUCT('Mark' as first, 'Thomas' as last) as StructValue,
[STRUCT(1 as x, 2 as y), STRUCT(5 as x, 6 as y)] as ArrayStructValue,
STRUCT(1 as x, [1,2,3] as y, ('a','b','c') as z) as StructNestedValue
{
"NullValue": null,
"BoolValue": "false", // why not just false without quotes?
"DateValue": "2014-01-01",
"IntervalValue": "1-0 0 0:0:0",
"DatetimeValue": "2014-01-01T01:02:03",
"TimestampValue": "2014-01-01T01:02:03Z",
"StringValue": "Hello",
"BytesValue": "YWJj",
"IntegerValue": "123",
"NumericValue": "3.14",
"FloatValue": "3.14",
"TimeValue": "12:30:00.450000",
"ArrayValue": ["1", "2", "3"],
"StructValue": {
"first": "Mark",
"last": "Thomas"
},
"ArrayStructValue": [
{"x": "1", "y": "2"},
{"x": "5", "y": "6"}
],
"StructNestedValue": {
"x": "1",
"y": ["1", 2", "3"],
"z": {"a": "a", b": "b", "c": "c"}
}
}
Honestly, it seems to me that other than the null value and the array [] or struct {} container, everything is string-enclosed, which seems a bit odd.
According to this document, json is built on two structures:
A collection of name/value pairs. In various languages, this is
realized as an object, record, struct, dictionary, hash table, keyed
list, or associative array.
An ordered list of values. In most
languages, this is realized as an array, vector, list, or sequence.
The result of the SELECT query is in json format, wherein [] depicts an array datatype, {} depicts an object datatype and double quotes(" ") depicts a string value as in the query itself.

Create tabular View by Spreading Data from JSON in Snowflake

I'm very new to Snowflake and I am working on creating a view from the table that holds JSON data as follows :
"data": {
"baseData": {
"dom_url": "https://www.soccertables.com/european_tables",
"event_id": "01b2722a-d8e6-4f67-95d0-8dd7ba088a4a",
"event_utc_time": "2020-05-11 09:01:14.821",
"ip_address": "125.238.134.96",
"table_1": [
{
"position": "1",
"team_name": "Liverpool",
"games_played": "29",
"games_won": "26",
"games_drawn": "2",
"games_lost": "1",
"goals_for": "75",
"goals_against": "35"
"points": "80"
},
{
"position": "2",
"team_name": "Man. City",
"games_played": "29",
"games_won": "20",
"games_drawn": "5",
"games_lost": "4",
"goals_for": "60",
"goals_against": "45"
"points": "65"
},
{
"position": "...",
"team_name": "...",
"games_played": "...",
"games_won": "...",
"games_drawn": "...",
"games_lost": "...",
"goals_for": "...",
"goals_against": "..."
"points": "..."
}
],
"unitID": "CN 8000",
"ver": "1.0.0"
},
"baseType": "MatchData"
},
"dataName": "CN8000.Prod.MatchData",
"id": "18a89f9e-9620-4453-a546-23412025e7c0",
"tags": {
"itrain.access.level1": "Private",
"itrain.access.level2": "Kumar",
"itrain.internal.deviceID": "",
"itrain.internal.deviceName": "",
"itrain.internal.encodeTime": "2022-03-23T07:41:19.000Z",
"itrain.internal.sender": "Harish",
"itrain.software.name": "",
"itrain.software.partNumber": 0,
"itrain.software.version": ""
},
"timestamp": "2021-02-25T07:32:31.000Z"
}
I want to extract the common values like dom_url, event_id, event_utc_time, ip_address along with each team_name in a separate column and the associated team details like position, games_played etc possibly in rows for each team name
E.g :
I've been trying Lateral flatten function but couldn't succeed so far
create or replace view AWSS3_PM.PUBLIC.PM_POWER_CN8000_V1(
DOM_URL,
EVENT_ID,
EVENT_UTC_TIME,
IP_ADDRESS,
TIMESTAMP,
POSITION,
GAMES_PLAYED,
GAMES_WON,
GAMES_LOST,
GAMES_DRAWN
) as
select c1:data:baseData:dom_url dom_url,
c1:data:baseData:event_id event_id,
c1:data:baseData:event_utc_time event_utc_time,
c1:data:baseData:ip_address ip_address,
c1:timestamp timestamp,
value:position TeamPosition,
value:games_played gamesPlayed,
value:games_won wins ,
value:games_lost defeats,
value:games_drawn draws
from pm_power, lateral flatten(input => c1:data:baseData:table_1);
Any help would be really grateful
Thanks,
Harish
#For the table Portion in JSON it would need flattening and transpose, example below -
Sample table -
select * from test_json;
+--------------------------------+
| TAB_VAL |
|--------------------------------|
| { |
| "table_1": [ |
| { |
| "games_drawn": "2", |
| "games_lost": "1", |
| "games_played": "29", |
| "games_won": "26", |
| "goals_against": "35", |
| "goals_for": "75", |
| "points": "80", |
| "position": "1", |
| "team_name": "Liverpool" |
| }, |
| { |
| "games_drawn": "5", |
| "games_lost": "4", |
| "games_played": "29", |
| "games_won": "20", |
| "goals_against": "45", |
| "goals_for": "60", |
| "points": "65", |
| "position": "2", |
| "team_name": "Man. City" |
| } |
| ] |
| } |
+--------------------------------+
1 Row(s) produced. Time Elapsed: 0.285s
Perform transpose after flattening JSON
select * from (
select figures,stats,team_name
from (
select
f.value:"games_drawn"::number as games_drawn,
f.value:"games_lost"::number as games_lost,
f.value:"games_played"::number as games_played,
f.value:"games_won"::number as games_won,
f.value:"goals_against"::number as goals_against,
f.value:"goals_for"::number as goals_for,
f.value:"points"::number as points,
f.value:"position"::number as position,
f.value:"team_name"::String as team_name
from
TEST_JSON, table(flatten(input=>tab_val:table_1, mode=>'ARRAY')) as f
) flt
unpivot (figures for stats in(games_drawn, games_lost, games_played, games_won, goals_against, goals_for, points,position))
) up
pivot (min(up.figures) for up.team_name in ('Liverpool','Man. City'));
+---------------+-------------+-------------+
| STATS | 'Liverpool' | 'Man. City' |
|---------------+-------------+-------------|
| GAMES_DRAWN | 2 | 5 |
| GAMES_LOST | 1 | 4 |
| GAMES_PLAYED | 29 | 29 |
| GAMES_WON | 26 | 20 |
| GOALS_AGAINST | 35 | 45 |
| GOALS_FOR | 75 | 60 |
| POINTS | 80 | 65 |
| POSITION | 1 | 2 |
+---------------+-------------+-------------+
8 Row(s) produced. Time Elapsed: 0.293s

Add Map within a Map in a column

In the Metadata column i have a Map type value:
+-----------+--------+-----------+--------------------------------+
| Noun| Pronoun| Adjective|Metadata |
+-----------+--------+-----------+--------------------------------+
| Homer| Simpson|Engineer |["Age": "50", "Country": "USA"] |
| Elon | Musk |King |["Age": "45", "Country": "RSA"] |
| Bart | Lee |Cricketer |["Age": "35", "Country": "AUS"] |
| Lisa | Jobs |Daughter |["Age": "35", "Country": "IND"] |
| Joe | Root |Player |["Age": "31", "Country": "ENG"] |
+-----------+--------+-----------+--------------------------------+
I want to append another Map type value in the Metadata against a key called tags.
+-----------+--------+-----------+--------------------------------------------------------------------+
| Noun| Pronoun| Adjective|Metadata |
+-----------+--------+-----------+--------------------------------------------------------------------+
| Homer| Simpson|Engineer |["Age": "50", "Country": "USA", "tags": ["Gen": "M", "Fit": "Yes"]] |
| Elon | Musk |King |["Age": "45", "Country": "RSA", "tags": ["Gen": "M", "Fit": "Yes"]] |
| Bart | Lee |Cricketer |["Age": "35", "Country": "AUS", "tags": ["Gen": "M", "Fit": "No"]] |
| Lisa | Jobs |Daughter |["Age": "35", "Country": "IND", "tags": ["Gen": "F", "Fit": "Yes"]] |
| Joe | Root |Player |["Age": "31", "Country": "ENG", "tags": ["Gen": "M", "Fit": "Yes"]] |
+-----------+--------+-----------+--------------------------------------------------------------------+
In the Metadata column, the outer Map is already a typedLit, adding another Map within it is not being allowed.
I implemented it using a struct. This is how it looks:
df.withColumn("Metadata", struct(lit("Age").alias("Age"), lit("Country").alias("Country"), typedLit(tags).alias("tags")))
It won't be exactly key val pair but still be queryable with alias.

_lodash group objects by itemColor, itemSize, shopId

I'm using _lodash library,
this is my data with objects:
[ {
"itemColor": "red",
"itemSize": "L",
"itemCount": 1,
"shopId": "shop 1",
"itemName": "product name 1",
},
{
"itemColor": "red",
"itemSize": "L",
"itemCount": 3,
"shopId": "shop 2",
"itemName": "product name 1",
},
{
"itemColor": "red",
"itemSize": "L",
"itemCount": 5,
"shopId": "shop 3",
"itemName": "product name 1",
},
{
"itemColor": "green",
"itemSize": "S",
"itemCount": 1,
"shopId": "shop 3",
"itemName": "product name 2",
}]
I need to group items by itemSize, itemColor and as result I need to have this table:
+----------------+-------+------+--------+--------+--------+
| itemName | color | size | shop 1 | shop 2 | shop 3 |
+================+=======+======+========+========+========+
| product name 1 | red | L | 1 | 3 | 5 |
+----------------+-------+------+--------+--------+--------+
| product name 2 | green | S | 0 | 0 | 1 |
+----------------+-------+------+--------+--------+--------+
If shop are no matches then I need to set 0 value.

Combining separate temporal measurement series

I have a data set that combines two temporal measurement series with one row per measurement
time: 1, measurement: a, value: 5
time: 2, measurement: b, value: false
time: 10, measurement: a, value: 2
time: 13, measurement: b, value: true
time: 20, measurement: a, value: 4
time: 24, measurement: b, value: true
time: 30, measurement: a, value: 6
time: 32, measurement: b, value: false
in a visualization using Vega lite, I'd like to combine the measurement series and encode measurement a and b in a single visualization without simply layering their representation on a temporal axis but representing their value in a single encoding spec.
either measurement a values need to be interpolated and added as a new value to rows of measurement b
eg:
time: 2, measurement: b, value: false, interpolatedMeasurementA: 4.6667
or the other way around, which leaves the question of how to interpolate a boolean. maybe closest value by time, or simpler: last value
eg:
time: 30, measurement: a, value: 6, lastValueMeasurementB: true
I suppose this could be done either query side in which case this question would be regarding indexDB Flux query language
or this could be done on the visualization side in which case this would be regarding vega-lite
There's not any true linear interpolation schemes built-in to Vega-Lite (though the loess transform comes close), but you can achieve roughly what you wish with a window transform.
Here is an example (view in editor):
{
"data": {
"values": [
{"time": 1, "measurement": "a", "value": 5},
{"time": 2, "measurement": "b", "value": false},
{"time": 10, "measurement": "a", "value": 2},
{"time": 13, "measurement": "b", "value": true},
{"time": 20, "measurement": "a", "value": 4},
{"time": 24, "measurement": "b", "value": true},
{"time": 30, "measurement": "a", "value": 6},
{"time": 32, "measurement": "b", "value": false}
]
},
"transform": [
{
"calculate": "datum.measurement == 'a' ? datum.value : null",
"as": "measurement_a"
},
{
"window": [
{"op": "mean", "field": "measurement_a", "as": "interpolated"}
],
"sort": [{"field": "time"}],
"frame": [1, 1]
},
{"filter": "datum.measurement == 'b'"}
],
"mark": "line",
"encoding": {
"x": {"field": "time"},
"y": {"field": "interpolated"},
"color": {"field": "value"}
}
}
This first uses a calculate transform to isolate the values to be interpolated, then a window transform that computes the mean over adjacent values (frame: [1, 1]), then a filter transform to isolate interpolated rows.
If you wanted to go the other route, you could do a similar sequence of transforms targeting the boolean value instead.