Is it possible to combine data from different tables in one column - sql

I'm trying to achieve the following JSON structure with a SQL-Schema:
"data_set": [
{
"id": 0,
"annotation": "foo",
"value":
{
"type": "number",
"value": 10.0,
"unit": "m"
}
},
{
"id": 1,
"annotation": "bar",
"value":
{
"type": "text",
"value": "Hello World"
}
}
]
The tricky part is that I not only want to includes values of one type, but different types. My thought was to have different tables for each value e.g.:
numeric_value: id {PK} | type | value | unit
text_values: id {PK} | type | value
and include them in the data_set table via foreign key:
data_set: id {PK} | annotation | value {FK}
My problem is that I'm not sure how to reference ids from different tables in one column using keys, of if I'm taking a totally wrong approach in tackling this problem.

You can do this in different ways. One of them is to use only one table, and have it defined in such a way that, when your value_type is number you are forced to fill in the values for numeric_value and unit (and nothing else), and when it is text, you must then fill in the text_value column (and not fill in the rest, but leave NULL).
This definition could do it:
CREATE TABLE data_set
(
data_set_id integer PRIMARY KEY,
annotation text,
value_type text NOT NULL,
numeric_value numeric,
unit text,
text_value text,
/* Constraint to make sure values are consistent with definition */
CHECK (CASE WHEN value_type = 'text' THEN
numeric_value IS NULL AND unit IS NULL AND text_value IS NOT NULL
WHEN value_type = 'number' THEN
numeric_value IS NOT NULL AND unit IS NOT NULL AND text_value IS NULL
ELSE
false
END)
) ;
The following inserts would be valid:
INSERT INTO
data_set
(data_set_id, annotation, value_type, numeric_value, unit)
VALUES
(1, 'height', 'number', 10.0, 'm') ;
INSERT INTO
data_set
(data_set_id, annotation, value_type, text_value)
VALUES
(2, 'brand', 'text', 'BigBrand') ;
These ones would not:
INSERT INTO
data_set
(data_set_id, annotation, value_type, text_value)
VALUES
(3, 'brand', 'strange_type', 'misstake') ;
INSERT INTO
data_set
(data_set_id, annotation, value_type, text_value)
VALUES
(4, 'brand', 'number', 'misstake') ;

Related

In Snowflake, how do I parse an unnamed JSON array and access each key with key value rather than array slicing methods?

I have a JSON array in Snowflake, where I need to parse into a table, i.e., convert all information into a row. Assume my table includes two columns: id and json_tag column.
An example row is as below:
{ [
json_tag
[
{
"key": "app.name",
"value": "myapp1"
},
{
"key": "device.name",
"value": "myiPhone11"
},
{
"key": "iOS.dist",
"value": "latestDist5"
}
]
This is not a standard example, another example can have 5 or even 6 with new key names. An example is below:
{ [
json_tag
[
{
"key": "app.name",
"value": "myapp2"
},
{
"key": "app.cost",
"value": "$2.99"
},
{
"key": "device.name",
"value": "myiPhoneX"
},
{
"key": "device.color",
"value": "gold"
},
{
"key": "iOS.dist",
"value": "latestDist4.9"
}
]
What I want is working on a table (without creating a new one) where the json row is split into columns as below:
id app.name app.cost device.name device.color iOS.dist
1 myapp1 null myiPhone11 null latestDist5
2 myapp2 $2.99 myiPhoneX gold latestDist4.9
I tried the following snippet:
with parsed_tb as (
select id,
to_variant(parse_json(json_tag)) as parsed_json_tag
from mytable
)
select parsed_json_tag[0]:value::varchar as app_name,
parsed_json_tag[1]:value::varchar as app_cost,
parsed_json_tag[2]:value::varchar as device_name
from tb;
As you can imagine, the snippet above does not work when there is no app.cost in key values or every row differs in number of keys and values.
I tried lateral flatten command in Snowflake, but out creates many rows and I cannot figure out how to put them in columns in the same row. I also tried using recursive command, and could not achieve it.
So my question is:
How can I access a key by its name rather than slicing an array as I do above? - this would solve my problem I guess.
If #1 solution I imagine does not fix, how can I attain the table above?
I have found a solution to this problem with the help of following thread:
Mysql, reshape data from long / tall to wide
Solution is first parsing json column with unique identifier, and then lateral flattening and then grouping the results by unique identifier [formatting table from long to wide as given in the link above].
So for the examples I provided above, the solution would be as follows:
with tb as (
select id,
to_variant(parse_json(json_tags)) as parsed_json_tags
from mytable
),
flattened as (
select id,
VALUE:value::string as value,
VALUE:key::string as key
from mytable, lateral flatten(input => (tb.parsed_json_tags))
)
select id,
MAX( IFF( key='app.name', value, NULL ) ) AS app_name,
MAX( IFF( key='app.cost', value, NULL ) ) AS app_cost,
MAX( IFF( key='device.name', value, NULL ) ) AS device_name,
MAX( IFF( key='device.color', value, NULL ) ) AS device_color,
MAX( IFF( key='iOS.dist', value, NULL ) ) AS ios_dist
from flattened
group by 1;

How to extract a value as a column from JSON with multiple key-value lists using (a materialized view compatible) SQL?

There is a table with measurements. There is a column in this table with the name measurement (JSON type). It contains lists of a named parameter values.
A sample table with one key-value list called parameters can be defined as follow:
select 1 id, parse_json('{"parameters":[{"name":"aaa","value":10},{"name":"bbb","value":20},{"name":"ccc","value":30}]}') measurement union all
select 2 id, parse_json('{"parameters":[{"name":"aaa","value":11},{"name":"bbb","value":22},{"name":"ccc","value":33}]}') measurement union all
select 3 id, parse_json('{"parameters":[{"name":"aaa","value":111},{"name":"bbb","value":222},{"name":"ccc","value":333}]}') measurement
Same in the table form:
id
measurement
1
{"parameters":[{"name":"aaa","value":10},{"name":"bbb","value":20},{"name":"ccc","value":30}]}
2
{"parameters":[{"name":"aaa","value":11},{"name":"bbb","value":22},{"name":"ccc","value":33}]}
3
{"parameters":[{"name":"aaa","value":111},{"name":"bbb","value":222},{"name":"ccc","value":333}]}
Now, I want to extract some values from the list to columns. For example, if I want parameters aaa and bbb, I would expect the output like:
id
aaa
bbb
1
10
20
2
11
22
3
111
222
CASE-WHEN-GROUP-BY method
I can achieve this using 4 sub-queries. It starts already getting complex, but still bearable:
with measurements AS (
select 1 id, parse_json('{"parameters":[{"name":"aaa","value":10},{"name":"bbb","value":20},{"name":"ccc","value":30}]}') measurement union all
select 2 id, parse_json('{"parameters":[{"name":"aaa","value":11},{"name":"bbb","value":22},{"name":"ccc","value":33}]}') measurement union all
select 3 id, parse_json('{"parameters":[{"name":"aaa","value":111},{"name":"bbb","value":222},{"name":"ccc","value":333}]}') measurement
),
parameters AS (select id, JSON_QUERY_ARRAY(measurements.measurement.parameters) measurements_list from measurements),
param_values as (select id, JSON_VALUE(ml.name) name, JSON_VALUE(ml.value) value from parameters, parameters.measurements_list ml),
trimmed_values as (select id, case when name="aaa" then value else null end as aaa, case when name="bbb" then value else null end as bbb
from param_values where name in ("aaa", "bbb"))
select id, max(aaa) aaa, max(bbb) bbb from trimmed_values group by id
JSONPath method
I can also use full-featured JSONPath function as suggested by Mikhail. Then things start looking more manageable:
select id,
bq_data_loader_json.CUSTOM_JSON_VALUE(TO_JSON_STRING(measurement.parameters), '$.[?(#.name=="aaa")].value') aaa,
bq_data_loader_json.CUSTOM_JSON_VALUE(TO_JSON_STRING(measurement.parameters), '$.[?(#.name=="bbb")].value') bbb
from `sap-clm-analytics-dev.ag_experiment.measurements`
(It may be less efficient then the CASE-WHEN-GROUP-BY method because of the external UDF call, but let's focus on maintainability for now).
Adding another list of values
Now I add another list of key-value pairs named colors:
select 1 id, parse_json('{"parameters":[{"name":"aaa","value":10},{"name":"bbb","value":20},{"name":"ccc","value":30}], "colors": [{"name": "green", "value": "A"}, {"name": "yellow", "value": "B"}]}') measurement union all
select 2 id, parse_json('{"parameters":[{"name":"aaa","value":10},{"name":"bbb","value":20},{"name":"ccc","value":30}], "colors": [{"name": "green", "value": "AA"}, {"name": "yellow", "value": "BB"}]}') measurement union all
select 3 id, parse_json('{"parameters":[{"name":"aaa","value":10},{"name":"bbb","value":20},{"name":"ccc","value":30}], "colors": [{"name": "green", "value": "AAA"}, {"name": "yellow", "value": "BBB"}]}') measurement
Let's pick values for green from the list of colors. Then the output will be:
id
aaa
bbb
green
1
10
20
A
2
11
22
AA
3
111
222
AAA
The JSONPath solution from above can be trivially extended to cover this case:
select id,
bq_data_loader_json.CUSTOM_JSON_VALUE(TO_JSON_STRING(measurement.parameters), '$.[?(#.name=="aaa")].value') aaa,
bq_data_loader_json.CUSTOM_JSON_VALUE(TO_JSON_STRING(measurement.parameters), '$.[?(#.name=="bbb")].value') bbb,
bq_data_loader_json.CUSTOM_JSON_VALUE(TO_JSON_STRING(measurement.colors), '$.[?(#.name=="green")].value') bbb
from measurements
With the CASE-WHEN approach things starts getting tricky. The below query already gets complex and is simply wrong:
with measurements AS (
select 1 id, parse_json('{"parameters":[{"name":"aaa","value":10},{"name":"bbb","value":20},{"name":"ccc","value":30}], "colors": [{"name": "green", "value": "A"}, {"name": "yellow", "value": "B"}]}') measurement union all
select 2 id, parse_json('{"parameters":[{"name":"aaa","value":10},{"name":"bbb","value":20},{"name":"ccc","value":30}], "colors": [{"name": "green", "value": "AA"}, {"name": "yellow", "value": "BB"}]}') measurement union all
select 3 id, parse_json('{"parameters":[{"name":"aaa","value":10},{"name":"bbb","value":20},{"name":"ccc","value":30}], "colors": [{"name": "green", "value": "AAA"}, {"name": "yellow", "value": "BBB"}]}') measurement),
parameters_colors AS (
select id, JSON_QUERY_ARRAY(measurements.measurement.parameters) parameters_list, JSON_QUERY_ARRAY(measurements.measurement.colors) colors_list from measurements),
param_color_values AS (select id, JSON_VALUE(parameters_list.name) param_name, JSON_VALUE(parameters_list.value) param_value, JSON_VALUE(colors_list.name) color_name, JSON_VALUE(colors_list.value) color_value from parameters_colors, parameters_colors.parameters_list, parameters_colors.colors_list),
trimmed_values AS (select id,
case when param_name="aaa" then param_value else null end as aaa,
case when param_name="bbb" then param_value else null end as bbb,
case when color_name="green" then color_value else null end as green,
from param_color_values where param_name in ("aaa", "bbb") and color_name = "green")
select id, max(aaa) aaaa, max(bbb) bbb, max(green) green from trimmed_values group by 1
Wrong result:
id
aaa
bbb
green
1
10
20
A
2
10
20
AA
3
10
20
AAA
The cartesian product in param_color_values is fine, but trimmed_values incorrectly fill permutations with nulls. Apparently the level of dependency is needed for "green" values.
It would be apparently possible to fix my example, but it probably won't be maintainable after another list of parameters. So, I want to phrase my question differently.
Question
What would be a maintainable way to extract multiple values from such data structures in SQL?
Materialized view
Ideally, I'd like to persist such query as a BigQuery materialized view. The original data object is huge, so I want to create a stage in the data pipeline, which persists a curated subset of it, differently clustered. I want that the BigQuery manages the refreshes of this object. Materialized view has a limited set of functions. For examples, UDFs (like CUSTOM_JSON_PATH) are not supported.
My current state
I tend to drop the idea of using the materialized view in favor of the maintainability of the UDF/JSONPath method and organize the refresh of the extracted dataset myself using scheduled queries.
Do I oversee any trivial pure SQL solution, which is optionally materialized-view compatible and easy to extend to more complex cases?
I tend to drop the idea of using the materialized view in favor of the maintainability of the UDF/JSONPath method and organize the refresh of the extracted dataset myself using scheduled queries.
Consider below approach (not compatible with materialized view)
create temp function get_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function get_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
create temp function get_leaves(input string) returns string language js as '''
function flattenObj(obj, parent = '', res = {}){
for(let key in obj){
let propName = parent ? parent + '.' + key : key;
if(typeof obj[key] == 'object'){
flattenObj(obj[key], propName, res);
} else {
res[propName] = obj[key];
}
}
return JSON.stringify(res);
}
return flattenObj(JSON.parse(input));
''';
with temp as (
select id, val, --key, val, --leaves
if(ends_with(key, '.name'), 'name', 'value') type,
regexp_replace(key, r'.name$|.value$', '') key
from your_table, unnest([struct(get_leaves(json_extract(to_json_string(measurement), '$')) as leaves)]),
unnest(get_keys(leaves)) key with offset
join unnest(get_values(leaves)) val with offset using(offset)
)
select * from (
select * except(key)
from temp
pivot (any_value(val) for type in ('name', 'value'))
)
pivot (any_value(value) for name in ('aaa', 'bbb', 'ccc', 'green', 'yellow') )
if applied to sample data in your question - output is
In case if keys are not known in advance or too many to manually manage - you can use below dynamic version
create temp function get_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function get_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
create temp function get_leaves(input string) returns string language js as '''
function flattenObj(obj, parent = '', res = {}){
for(let key in obj){
let propName = parent ? parent + '.' + key : key;
if(typeof obj[key] == 'object'){
flattenObj(obj[key], propName, res);
} else {
res[propName] = obj[key];
}
}
return JSON.stringify(res);
}
return flattenObj(JSON.parse(input));
''';
create temp table temp as (
select * except(key) from (
select id, val,
if(ends_with(key, '.name'), 'name', 'value') type,
regexp_replace(key, r'.name$|.value$', '') key
from your_table, unnest([struct(get_leaves(json_extract(to_json_string(measurement), '$')) as leaves)]),
unnest(get_keys(leaves)) key with offset
join unnest(get_values(leaves)) val with offset using(offset)
)
pivot (any_value(val) for type in ('name', 'value'))
);
execute immediate (select '''
select * from temp
pivot (any_value(value) for name in ("''' || string_agg(distinct name, '","') || '"))'
from temp
);

Json Arrays of objects PostgreSQL Table format

I have a JSON file (array of objects) which I have to convert into a table format using a PostgreSQL query.
Follow Sample Data.
"b", "c", "d", "e" are to be extracted as separate tables as they are arrays and in these arrays, there are objects
I have tried using json_populate_recordset() but it only works if I have a single array.
[{a:"1",b:"2"},{a:"10",b:"20"}]
I have referred to some links and codes.
jsonb_array_element example
postgreSQL functions
Expected Output
Sample Data:
{
"b":[
{columnB1:value, columnB2:value},
{columnB1:value, columnB2:value},
],
"c":[
{columnC1:value, columnC2:value, columnC3:value},
{columnC1:value, columnC2:value, columnC3:value},
{columnC1:value, columnC2:value, columnC3:value}
],
"d":[
{columnD1:value, columnD2:value},
{columnD1:value, columnD2:value},
],
"e":[
{columnE1:value, columnE2:value},
]
}
expected output
b should be one table in which columnA1 and columnA2 are displayed with their values.
Similarly table c, d, e with their respective columns and values.
Expected Output
You can use jsonb_to_recordset() but you need to unnest your JSON. You need to do this inline as this is a JSON Processing Function which cannot used derived values.
I am using validated JSON as simplified and formatted at end of this answer
To unnest your JSON use below notation which extracts JSON object field with the given key.
--one level
select '{"a":1}'::json->'a'
result : 1
--two levels
select '{"a":{"b":[2]}}'::json->'a'->'b'
result : [2]
We now expand this to include json_to_recordset()
select * from
json_to_recordset(
'{"a":{"b":[{"f1":2,"f2":4},{"f1":3,"f2":6}]}}'::json->'a'->'b' --inner table b
)
as x("f1" int, "f2" int); --fields from table b
or using json_array_elements. Either way we need to list our fields. With second solution type will be json not int so you cant sum etc
with b as (select json_array_elements('{"a":{"b":[{"f1":2,"f2":4},{"f1":3,"f2":6}]}}'::json->'a'->'b') as jx)
select jx->'f1' as f1, jx->'f2' as f2 from b;
Output
f1 f2
2 4
3 6
We now use your data structure in jsonb_to_recordset()
select * from jsonb_to_recordset( '{"a":{"b":[{"columnname1b":"value1b","columnname2b":"value2b"},{"columnname1b":"value","columnname2b":"value"}],"c":[{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"}]}}'::jsonb->'a'->'b') as x(columnname1b text, columnname2b text);
Output:
columnname1b columnname2b
value1b value2b
value value
For table c
select * from jsonb_to_recordset( '{"a":{"b":[{"columnname1b":"value1b","columnname2b":"value2b"},{"columnname1b":"value","columnname2b":"value"}],"c":[{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"}]}}'::jsonb->'a'->'c') as x(columnname1 text, columnname2 text);
Output
columnname1 columnname2
value value
value value
value value
Sample JSON
{
"a": {
"b": [
{
"columnname1b": "value1b",
"columnname2b": "value2b"
},
{
"columnname1b": "value",
"columnname2b": "value"
}
],
"c": [
{
"columnname1": "value",
"columnname2": "value"
},
{
"columnname1": "value",
"columnname2": "value"
},
{
"columnname1": "value",
"columnname2": "value"
}
]
}
}
Well, I came up with some ideas, here is one that worked. I was able to get one table at a time.
https://www.postgresql.org/docs/9.5/functions-json.html
I am using json_populate_recordset.
The column used in the first select statement comes from a table whose column is a JSON type which we are trying to extract into a table.
The 'tablename from column' in the json_populate_recordset function, is the table we are trying to extract followed with b its columns and datatypes.
WITH input AS(
SELECT cast(column as json) as a
FROM tablename
)
SELECT b.*
FROM input c,
json_populate_recordset(NULL::record,c.a->'tablename from column') as b(columnname1 datatype, columnname2 datatype)

How to parse this array of dicts and extract key columns from a big query external table

I have this Gaint Array of (dicts) loaded from a Json in a date partitioned big query external table with table structure as below as
Field name
Type.
Mode
meta
Record
Nullable
Messages
String
Repeated
date
Integer
Nullable
Every "Messages" Field is in its own row/record in my Bigquery table (New_line_delimited_Json)
I am trying to parse the "messages" field/column to extract some fields Key1 and Key2 which happens to be inside an Array (of dicts). For sake of simplicity ,below is the snippet of json of which "messages" is a field that I am trying to unnest/explode.
Ignore this schema;updated schema below***
[
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-10",
"Key2":"H21257061"
}"
],
"date": "20220110"
},
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-11",
"Key2":"H21257062"
}"
],
"date": "20220111"
}
]
updated schema on 01/17
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-10",
"Key2":"H21257061"
}",
"{
"Key1":"2022-01-10",
"Key2":"H21257062"
}"
],
"date": "20220110"
},
updated schema representation on 01/17:
so far I have tried this but I am getting sql output of key1 and Key2 as Nulls
WITH table AS (SELECT Messages as array_column FROM `project.dataset.table` )
SELECT
json_extract_scalar(flattened_array, '$.Messages.key1') as key1,
json_extract_scalar(flattened_array, '$.Messages.key2') as key2
FROM table t
CROSS JOIN UNNEST(t.array_column) AS flattened_array
Still a little ambiguous so I assume below correctly represents your table (at least it matches the structure/schema in your question)
If my assumption correct - consider below approach
select * except(id) from (
select to_json_string(t) id, kv[offset(0)] as key, kv[safe_offset(1)] as value
from your_table t,
t.messages as message,
unnest([struct( split(translate(message, '"', ''), ':') as kv)])
)
pivot (min(value) for key in ('Key1', 'Key2'))
If / when applied to above sample data - output is
Edit: Trying to help you further - Ok so, looks like your table looks like below
In this case - try below (quite light modification of previous version)
select * except(id) from (
select to_json_string(t) id, kv[offset(0)] as key, kv[safe_offset(1)] as value
from your_table t,
unnest(regexp_extract_all(messages, r'"[^"]+":"[^"]+"')) as message,
unnest([struct( split(translate(message, '"', ''), ':') as kv)])
)
pivot (min(value) for key in ('Key1', 'Key2'))
with output
But obviously, I would use below simplest approach
select
json_extract_scalar(messages, '$.Key1') as Key1,
json_extract_scalar(messages, '$.Key2') as Key2
from your_table
SELECT
JSON_QUERY(message,"$.Key1") as Key1,
JSON_QUERY(message,"$.Key2") as Key2
FROM
`project.dataset.table` as table
CROSS JOIN UNNEST(table.Messages) as message
CROSS JOIN for flattening the array,which will return a row for each message.
After that “JSON_QUERY” to extract the needed values from the JSON string.

Get data from jsonb field array

I have the following schema in PostgreSQL database:
CREATE TABLE survey_results (
id integer NOT NULL,
raw jsonb DEFAULT '{}'::jsonb,
created_at timestamp without time zone,
updated_at timestamp without time zone
);
INSERT INTO survey_results (id, raw, created_at, updated_at)
VALUES (1, '{ "slides": [{"id": "1", "name": "Test", "finished_at": 1517421628092}, {"id": "2", "name": "Test", "finished_at": 1517421894736}]}', now(), now());
I want to get data from raw['slides']. I want to have query which return every raw['slides'] id and raw['slides'] finished_at. So result of a query should look like this:
id finished_at
1 1517421628092
2 1517421894736
Here is sqlfiddle to experiment with:
http://sqlfiddle.com/#!17/ae504
How can I do this in PostgreSQL?
You need to unnest the array, then you can access each element:
select s.slide ->> 'id' as id,
s.slide ->> 'finished_at' as finished_at
from survey_results, jsonb_array_elements(raw -> 'slides') as s (slide)
http://sqlfiddle.com/#!17/ae504/80
For more details see JSON Functions and Operators in the manual.