BigQuery string-formatting to json - google-bigquery

Is the following a full list of all value types as they're passed to json in BigQuery? I've gotten this by trial and error but haven't been able to find this in the documentation:
select
NULL as NullValue,
FALSE as BoolValue,
DATE '2014-01-01' as DateValue,
INTERVAL 1 year as IntervalValue,
DATETIME '2014-01-01 01:02:03' as DatetimeValue,
TIMESTAMP '2014-01-01 01:02:03' as TimestampValue,
"Hello" as StringValue,
B"abc" as BytesValue,
123 as IntegerValue,
NUMERIC '3.14' as NumericValue,
3.14 as FloatValue,
TIME '12:30:00.45' as TimeValue,
[1,2,3] as ArrayValue,
STRUCT('Mark' as first, 'Thomas' as last) as StructValue,
[STRUCT(1 as x, 2 as y), STRUCT(5 as x, 6 as y)] as ArrayStructValue,
STRUCT(1 as x, [1,2,3] as y, ('a','b','c') as z) as StructNestedValue
{
"NullValue": null,
"BoolValue": "false", // why not just false without quotes?
"DateValue": "2014-01-01",
"IntervalValue": "1-0 0 0:0:0",
"DatetimeValue": "2014-01-01T01:02:03",
"TimestampValue": "2014-01-01T01:02:03Z",
"StringValue": "Hello",
"BytesValue": "YWJj",
"IntegerValue": "123",
"NumericValue": "3.14",
"FloatValue": "3.14",
"TimeValue": "12:30:00.450000",
"ArrayValue": ["1", "2", "3"],
"StructValue": {
"first": "Mark",
"last": "Thomas"
},
"ArrayStructValue": [
{"x": "1", "y": "2"},
{"x": "5", "y": "6"}
],
"StructNestedValue": {
"x": "1",
"y": ["1", 2", "3"],
"z": {"a": "a", b": "b", "c": "c"}
}
}
Honestly, it seems to me that other than the null value and the array [] or struct {} container, everything is string-enclosed, which seems a bit odd.

According to this document, json is built on two structures:
A collection of name/value pairs. In various languages, this is
realized as an object, record, struct, dictionary, hash table, keyed
list, or associative array.
An ordered list of values. In most
languages, this is realized as an array, vector, list, or sequence.
The result of the SELECT query is in json format, wherein [] depicts an array datatype, {} depicts an object datatype and double quotes(" ") depicts a string value as in the query itself.

Related

Laravel query sum group by week and get 0 for weeks not existent in the dataset

Hi I am trying to get sum of quantity group by week of current year.
Here is my query which is working
Sale::selectRaw('sum(qty) as y')
->selectRaw(DB::raw('WEEK(order_date,1) as x'))
->whereYear('order_date',Carbon::now()->format('Y'))
->groupBy('x')
->orderBy('x', 'ASC')
->get();
The response I get is look like this. where x is the week number and y is the sum value.
[
{
"y": "50",
"x": 2
},
{
"y": "4",
"x": 14
}
]
I want to get 0 values for the week that doesn't have any value for y
My desired result should be like this
[
{
"y": "0",
"x": 1
},
{
"y": "50",
"x": 2
},
...
...
...
{
"y": "4",
"x": 14
}
]

Make a property bag from a list of keys and values

I have a list containing the keys and another list containing values (obtained from splitting a log line). How can I combine the two to make a proeprty-bag in Kusto?
let headers = pack_array("A", "B", "C");
datatable(RawData:string)
[
"1,2,3",
"4,5,6",
]
| expand fields = split(RawData, ",")
| expand dict = ???
Expected:
dict
-----
{"A": 1, "B": 2, "C": 3}
{"A": 4, "B": 5, "C": 6}
Here's one option, that uses the combination of:
mv-apply: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/mv-applyoperator
pack(): https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/packfunction
make_bag(): https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/make-bag-aggfunction
let keys = pack_array("A", "B", "C");
datatable(RawData:string)
[
"1,2,3",
"4,5,6",
]
| project values = split(RawData, ",")
| mv-apply with_itemindex = i key = keys to typeof(string) on (
summarize dict = make_bag(pack(key, values[i]))
)
values
dict
[ "1", "2", "3"]
{ "A": "1", "B": "2", "C": "3"}
[ "4", "5", "6"]
{ "A": "4", "B": "5", "C": "6"}

Combining separate temporal measurement series

I have a data set that combines two temporal measurement series with one row per measurement
time: 1, measurement: a, value: 5
time: 2, measurement: b, value: false
time: 10, measurement: a, value: 2
time: 13, measurement: b, value: true
time: 20, measurement: a, value: 4
time: 24, measurement: b, value: true
time: 30, measurement: a, value: 6
time: 32, measurement: b, value: false
in a visualization using Vega lite, I'd like to combine the measurement series and encode measurement a and b in a single visualization without simply layering their representation on a temporal axis but representing their value in a single encoding spec.
either measurement a values need to be interpolated and added as a new value to rows of measurement b
eg:
time: 2, measurement: b, value: false, interpolatedMeasurementA: 4.6667
or the other way around, which leaves the question of how to interpolate a boolean. maybe closest value by time, or simpler: last value
eg:
time: 30, measurement: a, value: 6, lastValueMeasurementB: true
I suppose this could be done either query side in which case this question would be regarding indexDB Flux query language
or this could be done on the visualization side in which case this would be regarding vega-lite
There's not any true linear interpolation schemes built-in to Vega-Lite (though the loess transform comes close), but you can achieve roughly what you wish with a window transform.
Here is an example (view in editor):
{
"data": {
"values": [
{"time": 1, "measurement": "a", "value": 5},
{"time": 2, "measurement": "b", "value": false},
{"time": 10, "measurement": "a", "value": 2},
{"time": 13, "measurement": "b", "value": true},
{"time": 20, "measurement": "a", "value": 4},
{"time": 24, "measurement": "b", "value": true},
{"time": 30, "measurement": "a", "value": 6},
{"time": 32, "measurement": "b", "value": false}
]
},
"transform": [
{
"calculate": "datum.measurement == 'a' ? datum.value : null",
"as": "measurement_a"
},
{
"window": [
{"op": "mean", "field": "measurement_a", "as": "interpolated"}
],
"sort": [{"field": "time"}],
"frame": [1, 1]
},
{"filter": "datum.measurement == 'b'"}
],
"mark": "line",
"encoding": {
"x": {"field": "time"},
"y": {"field": "interpolated"},
"color": {"field": "value"}
}
}
This first uses a calculate transform to isolate the values to be interpolated, then a window transform that computes the mean over adjacent values (frame: [1, 1]), then a filter transform to isolate interpolated rows.
If you wanted to go the other route, you could do a similar sequence of transforms targeting the boolean value instead.

How to extract separate values from GeoJSON in BigQuery

I have a GeoJSON string for a multipoint geometry. I want to extract each of those points to a table of individual point geometries in BigQuery
I have been able to achieve point geometry for one of the points. I want to do it for all the others as well in a automated fashion. I've already tried converting the string to an array but it remains an array of size 1 with the entire content as a single string.
This is what worked for me that I was able to extract one point and convert it to a geometry
WITH temp_table as (select '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' as string)
select ST_GEOGPOINT(CAST(JSON_EXTRACT(string, '$.coordinates[0][0]') as FLOAT64), CAST(JSON_EXTRACT(string, '$.coordinates[0][1]') as FLOAT64)) from temp_table
This results in POINT(20 10)
I can write manual queries for each of these points and do a UNION ALL but that won't scale or work every time. I want to achieve this such that it is able to do it in a automated fashion. For architectural purposes, we can't do string manipulation in languages like Python.
Below is for BigQuery Standard SQL
#standardSQL
SELECT
ARRAY(
SELECT ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
) points
FROM `project.dataset.temp_table`
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.temp_table` AS (
SELECT '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' AS STRING
)
SELECT
ARRAY(
SELECT ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
) points
FROM `project.dataset.temp_table`
with result
Row points
1 POINT(20 10)
POINT(30 5)
POINT(90 50)
POINT(40 80)
Note: in above version - array of points is produced for each respective original row. Obviously you can adjust it to flatten as in below example
#standardSQL
WITH `project.dataset.temp_table` AS (
SELECT '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' AS STRING
)
SELECT
ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64)
) points
FROM `project.dataset.temp_table`, UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
with result
Row points
1 POINT(20 10)
2 POINT(30 5)
3 POINT(90 50)
4 POINT(40 80)

In PostgreSQL, what's the best way to select an object from a JSONB array?

Right now, I have an an array that I'm able to select off a table.
[{"_id": 1, "count: 3},{"_id": 2, "count: 14},{"_id": 3, "count: 5}]
From this, I only need the count for a particular _id. For example, I need the count for
_id: 3
I've read the documentation but I haven't been able to figure out the correct way to get the object.
WITH test_array(data) AS ( VALUES
('[
{"_id": 1, "count": 3},
{"_id": 2, "count": 14},
{"_id": 3, "count": 5}
]'::JSONB)
)
SELECT val->>'count' AS result
FROM
test_array ta,
jsonb_array_elements(ta.data) val
WHERE val #> '{"_id":3}'::JSONB;
Result:
result
--------
5
(1 row)