JSON_EXTRACT How to extract value of a dynamic key - google-bigquery

The below returns value of key 'a', i.e. "x". This is good if one already knows the key name.
SELECT JSON_EXTRACT('{"a":"x", "b":"y"}', "$['a']") as val
In my use case, the key name is dynamic. Hence, above wouldn't help. Is there anyway to extract first child element only without mentioning key-name 'a' in standard SQL?

#standardSQL
SELECT REGEXP_EXTRACT('{"a":"x", "b":"y"}', r'^{"\w":"(\w)",') AS val

Mikhail proposes a good compromise to solve this within SQL, but sometimes a regular expression can't parse complex JSON objects.
You could do anything operation within a JSON object by leveraging Javascript with a BigQuery SQL query.
For example:
#standardSQL
CREATE TEMPORARY FUNCTION anyJsonOp(json STRING, langs STRING)
RETURNS STRING
LANGUAGE js AS """
lang = JSON.parse(json).pull_request.base.repo.language;
if (langs.split(",").indexOf(lang)>-1) {
return lang
}
""";
SELECT anyJsonOp(payload, langs), COUNT(*)
FROM `githubarchive.day.20171010` a
CROSS JOIN (SELECT 'JavaScript,Java,Python,Ruby' langs)
WHERE type='PullRequestEvent'
GROUP BY 1
ORDER BY 2 DESC

Related

Hive sql extract one to multiple values from key value pairs

I have a column that looks like:
[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false},...]
There can be 1 to many items described by parameters in {} in the column.
I would like to extract values only of parameters described by key_1. Is there a function for that? I tried so far json related functions (json_tuple, get_json_object) but each time I received null.
Consider below json path.
WITH sample_data AS (
SELECT '[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false}]' json
)
SELECT get_json_object(json, '$[*].key_1') AS key1_values FROM sample_data;
Query results

How to cast postgres JSON column to int without key being present in JSON (simple JSON values)?

I am working on data in postgresql as in the following mytable with the fields id (type int) and val (type json):
id
val
1
"null"
2
"0"
3
"2"
The values in the json column val are simple JSON values, i.e. just strings with surrounding quotes and have no key.
I have looked at the SO post How to convert postgres json to integer and attempted something like the solution presented there
SELECT (mytable.val->>'key')::int FROM mytable;
but in my case, I do not have a key to address the field and leaving it empty does not work:
SELECT (mytable.val->>'')::int as val_int FROM mytable;
This returns NULL for all rows.
The best I have come up with is the following (casting to varchar first, trimming the quotes, filtering out the string "null" and then casting to int):
SELECT id, nullif(trim('"' from mytable.val::varchar), 'null')::int as val_int FROM mytable;
which works, but surely cannot be the best way to do it, right?
Here is a db<>fiddle with the example table and the statements above.
Found the way to do it:
You can access the content via the keypath (see e.g. this PostgreSQL JSON cheatsheet):
Using the # operator, you can access the json fields through the keypath. Specifying an empty keypath like this {} allows you to get your content without a key.
Using double angle brackets >> in the accessor will return the content without the quotes, so there is no need for the trim() function.
Overall, the statement
select id
, nullif(val#>>'{}', 'null')::int as val_int
from mytable
;
will return the contents of the former json column as int, respectvely NULL (in postgresql >= 9.4):
id
val_int
1
NULL
2
0
3
2
See updated db<>fiddle here.
--
Note: As pointed out by #Mike in his comment above, if the column format is jsonb, you can also use val->>0 to dereference scalars. However, if the format is json, the ->> operator will yield null as result. See this db<>fiddle.

Big query unnest array with json values

Lets consider the following table on Google BigQuery:
WITH example AS (
SELECT 1 AS id, ["{\"id\":1, \"name\":\"AAA\"}", "{\"id\":2, \"name\":\"BBB\"}","{\"id\":3, \"name\":\"CCC\"}"]
UNION ALL
SELECT 2 AS id, ["{\"id\":5, \"name\":\"XXX\"}", "{\"id\":6, \"name\":\"ZZZ\"}"]
)
SELECT *
FROM example;
I would like to compose a query that will return names with their parent row's id.
like:
I tried using unnest with json functions and I just cant make this right.
Can anyone help me?
Thanks
Ido
According to your query, you already have json elements in your array. So with the use of unnest, you can use a json function like json_value to extract the name attribute of your elements.
select
id,
json_value(elt, '$.name')
from example, unnest(r) as elt;

Convert strings into table columns in biq query

I would like to convert this table
to something like this
the long string can be dynamic so it's important to me that it's not a fixed solution for these values specifically
Please help, i'm using big query
You could start by using SPLIT SPLIT(value[, delimiter]) to convert your long string into separate key-value pairs in an array.
This will be sensitive to you having commas as part of your values.
SPLIT(session_experiments, ',')
Then you could either FLATTEN that array or access each element, and then use some REGEXs to separate the key and the value.
If you share more context on your restrictions and intended result I could try and put together a query for you that does exactly what you want.
It's not possible what you want, however, there is a better practice for BigQuery.
You can use arrays of structs to store that information in a table.
Let's say you have a table like that
You can use that sample query to understand how to use it.
with rawdata AS
(
SELECT 1 as id, 'test1-val1,test2-val2,test3-val3' as experiments union all
SELECT 1 as id, 'test1-val1,test3-val3,test5-val5' as experiments
)
select
id,
(select array_agg(struct(split(param, '-')[offset(0)] as experiment, split(param, '-')[offset(1)] as value)) from unnest(split(experiments)) as param ) as experiments
from rawdata
The output will look like that:
After having that output, it's more convenient to manipulate the data

Get an average value for element in column of arrays of json data in postgres

I have some data in a postgres table that is a string representation of an array of json data, like this:
[
{"UsageInfo"=>"P-1008366", "Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0},
{"Role"=>"Text", "ProjectCode"=>"", "PublicationCode"=>"", "RetailPrice"=>2},
{"Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0, "ParentItemId"=>"396487"}
]
This is is data in one cell from a single column of similar data in my database.
The datatype of this stored in the db is varchar(max).
My goal is to find the average RetailPrice of EVERY json item with "Role"=>"Abstract", including all of the json elements in the array, and all of the rows in the database.
Something like:
SELECT avg(json_extract_path_text(json_item, 'RetailPrice'))
FROM (
SELECT cast(json_items to varchar[]) as json_item
FROM my_table
WHERE json_extract_path_text(json_item, 'Role') like 'Abstract'
)
Now, obviously this particular query wouldn't work for a few reasons. Postgres doesn't let you directly convert a varchar to a varchar[]. Even after I had an array, this query would do nothing to iterate through the array. There are probably other issues with it too, but I hope it helps to clarify what it is I want to get.
Any advice on how to get the average retail price from all of these arrays of json data in the database?
It does not seem like Redshift would support the json data type per se. At least, I found nothing in the online manual.
But I found a few JSON function in the manual, which should be instrumental:
JSON_ARRAY_LENGTH
JSON_EXTRACT_ARRAY_ELEMENT_TEXT
JSON_EXTRACT_PATH_TEXT
Since generate_series() is not supported, we have to substitute for that ...
SELECT tbl_id
, round(avg((json_extract_path_text(elem, 'RetailPrice'))::numeric), 2) AS avg_retail_price
FROM (
SELECT *, json_extract_array_element_text(json_items, pos) AS elem
FROM (VALUES (0),(1),(2),(3),(4),(5)) a(pos)
CROSS JOIN tbl
) sub
WHERE json_extract_path_text(elem, 'Role') = 'Abstract'
GROUP BY 1;
I substituted with a poor man's solution: A dummy table counting from 0 to n (the VALUES expression). Make sure you count up to the maximum number of possible elements in your array. If you need this on a regular basis create an actual numbers table.
Modern Postgres has much better options, like json_array_elements() to unnest a json array. Compare to your sibling question for Postgres:
Can get an average of values in a json array using postgres?
I tested in Postgres with the related operator ->>, where it works:
SQL Fiddle.