Length of json array field in Big Query - google-bigquery

Given: json string that represents array field selected as a result of query.
How to find out length of JSON array field in Big Query?

In case it may be useful for someone like me looking for the answer today:
I replaced the JSON_LENGTH(json_string, json_path) method from MySQL by ARRAY_LENGTH(JSON_EXTRACT_ARRAY(json_string, json_path) in BigQuery.

See if this does what you want. It uses a JavaScript UDF to parse the JSON and then returns the array length.
#standardSQL
CREATE TEMP FUNCTION JsonArrayLength(json_array STRING)
RETURNS INT64 LANGUAGE js AS """
var arr = JSON.parse(json_array);
return arr.length;
""";
SELECT JsonArrayLength('[1,3,5,7]');

Below example is for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_LENGTH(SPLIT('[1,3,5,7]'))

Related

How to search a string in an array of strings using Apache Spark SQL query?

I have an array of strings like this:
SELECT ARRAY('item_1', 'item_2', 'item_3') AS items
Result:
items
Type : ARRAY<STRING>
["item_1","item_2","item_3"]
I would like to search for an item inside of it, but if I try the regular way:
SELECT * FROM items WHERE items = 'item_1'
I'll get this error:
Cannot resolve '(items.items = 'item_1')' due to data type mismatch
differing types in '(items.items = 'item_1')' (array and
string). line 1 pos 26
So, what can I do to search a string value inside of an array of strings using a Spark SQL query?
Thanks in advance =)
Use array_contains function:
SELECT * FROM items WHERE array_contains(items, 'item_1')

How to query only a particular part of the string on BigQuery

I have a BigQuery table that has column called topics under which I have result like this
/finance/investing/funds/mutual funds. How do I write a query on BigQuery to yield me only the word between the first two slashes i.e. in this example I would like it to return only finance.
Just developing Gordons answer further to work with your ARRAY<STRING>. All you need to do, is just UNNEST the array before passing it to the SPLIT function mentioned before.
Simple sample:
SELECT SPLIT(string, '/')[safe_ordinal(2)]
FROM UNNEST([ '/finance/investing/funds/mutual funds', '/random/investing/funds/mutual
funds' ]) AS string
You can use split():
select split('/finance/investing/funds/mutual funds', '/')[safe_ordinal(2)]

Is there a postgresql SQL equivalent of Python's map function

How to apply a function with arguments on every element in a text array in Postgresql SQL queries
Lets say My text array is
["abc-123-x", "def-123-y", "hij-234-k", "klm-232-p", "nop-3434-9", "qrs-23-p9"]
the result should be
[x,y,k,p,9,p9]
you need to unnest the array, extract the characters, then aggregate back:
select array_agg(right(t.w, 1))
from unnest(array['abc','def','hij','klm','nop','qrs']) as t(w);

JSON_EXTRACT How to extract value of a dynamic key

The below returns value of key 'a', i.e. "x". This is good if one already knows the key name.
SELECT JSON_EXTRACT('{"a":"x", "b":"y"}', "$['a']") as val
In my use case, the key name is dynamic. Hence, above wouldn't help. Is there anyway to extract first child element only without mentioning key-name 'a' in standard SQL?
#standardSQL
SELECT REGEXP_EXTRACT('{"a":"x", "b":"y"}', r'^{"\w":"(\w)",') AS val
Mikhail proposes a good compromise to solve this within SQL, but sometimes a regular expression can't parse complex JSON objects.
You could do anything operation within a JSON object by leveraging Javascript with a BigQuery SQL query.
For example:
#standardSQL
CREATE TEMPORARY FUNCTION anyJsonOp(json STRING, langs STRING)
RETURNS STRING
LANGUAGE js AS """
lang = JSON.parse(json).pull_request.base.repo.language;
if (langs.split(",").indexOf(lang)>-1) {
return lang
}
""";
SELECT anyJsonOp(payload, langs), COUNT(*)
FROM `githubarchive.day.20171010` a
CROSS JOIN (SELECT 'JavaScript,Java,Python,Ruby' langs)
WHERE type='PullRequestEvent'
GROUP BY 1
ORDER BY 2 DESC

Get an average value for element in column of arrays of json data in postgres

I have some data in a postgres table that is a string representation of an array of json data, like this:
[
{"UsageInfo"=>"P-1008366", "Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0},
{"Role"=>"Text", "ProjectCode"=>"", "PublicationCode"=>"", "RetailPrice"=>2},
{"Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0, "ParentItemId"=>"396487"}
]
This is is data in one cell from a single column of similar data in my database.
The datatype of this stored in the db is varchar(max).
My goal is to find the average RetailPrice of EVERY json item with "Role"=>"Abstract", including all of the json elements in the array, and all of the rows in the database.
Something like:
SELECT avg(json_extract_path_text(json_item, 'RetailPrice'))
FROM (
SELECT cast(json_items to varchar[]) as json_item
FROM my_table
WHERE json_extract_path_text(json_item, 'Role') like 'Abstract'
)
Now, obviously this particular query wouldn't work for a few reasons. Postgres doesn't let you directly convert a varchar to a varchar[]. Even after I had an array, this query would do nothing to iterate through the array. There are probably other issues with it too, but I hope it helps to clarify what it is I want to get.
Any advice on how to get the average retail price from all of these arrays of json data in the database?
It does not seem like Redshift would support the json data type per se. At least, I found nothing in the online manual.
But I found a few JSON function in the manual, which should be instrumental:
JSON_ARRAY_LENGTH
JSON_EXTRACT_ARRAY_ELEMENT_TEXT
JSON_EXTRACT_PATH_TEXT
Since generate_series() is not supported, we have to substitute for that ...
SELECT tbl_id
, round(avg((json_extract_path_text(elem, 'RetailPrice'))::numeric), 2) AS avg_retail_price
FROM (
SELECT *, json_extract_array_element_text(json_items, pos) AS elem
FROM (VALUES (0),(1),(2),(3),(4),(5)) a(pos)
CROSS JOIN tbl
) sub
WHERE json_extract_path_text(elem, 'Role') = 'Abstract'
GROUP BY 1;
I substituted with a poor man's solution: A dummy table counting from 0 to n (the VALUES expression). Make sure you count up to the maximum number of possible elements in your array. If you need this on a regular basis create an actual numbers table.
Modern Postgres has much better options, like json_array_elements() to unnest a json array. Compare to your sibling question for Postgres:
Can get an average of values in a json array using postgres?
I tested in Postgres with the related operator ->>, where it works:
SQL Fiddle.