BigQuery - JSON_EXTRACT only extracts first entry - sql

I have a column containing a json-string as follows:
[{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"},{"answer":"europe-italy","text":"Italien"},{"answer":"europe-france","text":"Frankreich"}]
I want to extract ALL answers given in ONE column and row, seperated by a comma:
europe-austria-swiss, europe-italy, europe-france
I think I tried all possibilites offered by JSON_EXTRACT and JSON_EXTRACT_ARRAY or replacing parentheses and other signs, but I either only get the first entry extracted (in this case
europe-austria-swiss
) or it splits up in rows as array from which I can no longer extract the strings of "answer".
Has anyone any idea on how to solve that problem? It's very much appreciated!
This column is of course part of a much larger table (if that is relevant anyhow).

I think I know what's going on (please, correct me if I'm wrong).
My best guess is that you are trying something like:
SELECT JSON_EXTRACT(json_text, "$.answer") AS answers
FROM UNNEST([
'{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"},{"answer":"europe-italy","text":"Italien"},{"answer":"europe-france","text":"Frankreich"}'
]) as json_text
This returns:
"europe-austria-swiss"
However, if you change the underlying data for something like this (each line as a json string object), it should resolve the issue:
SELECT JSON_EXTRACT(json_text, "$.answer") AS answers
FROM UNNEST([
'{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"}',
'{"answer":"europe-italy","text":"Italien"}',
'{"answer":"europe-france","text":"Frankreich"}'
]) as json_text
Result:
"europe-austria-swiss"
"europe-italy"
"europe-france"
Hope this helps!

Below is for BigQuery Standard SQL
#standardSQL
SELECT (
SELECT STRING_AGG(JSON_EXTRACT_SCALAR(answer, '$.answer'), ' ,')
FROM UNNEST(JSON_EXTRACT_ARRAY(json_string)) answer
) AS answers
FROM `project.dataset.table`
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT '[{"answer":"europe-austria-swiss","text":"Österreich, Schweiz"},{"answer":"europe-italy","text":"Italien"},{"answer":"europe-france","text":"Frankreich"}]' json_string
)
SELECT (
SELECT STRING_AGG(JSON_EXTRACT_SCALAR(answer, '$.answer'), ' ,')
FROM UNNEST(JSON_EXTRACT_ARRAY(json_string)) answer
) AS answers
FROM `project.dataset.table`
with result
Row answers
1 europe-austria-swiss ,europe-italy ,europe-france

Related

Hive sql extract one to multiple values from key value pairs

I have a column that looks like:
[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false},...]
There can be 1 to many items described by parameters in {} in the column.
I would like to extract values only of parameters described by key_1. Is there a function for that? I tried so far json related functions (json_tuple, get_json_object) but each time I received null.
Consider below json path.
WITH sample_data AS (
SELECT '[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false}]' json
)
SELECT get_json_object(json, '$[*].key_1') AS key1_values FROM sample_data;
Query results

Splitting value in database in WHERE clause

I am querying a BigQuery table to extract building data. We are storing building data in a cell, together with location and sensor name data. The building row has the following values e.g.
GB-FRE-BB2003_MSU-01
GB-FRE-BB2001_MSU-12
GB-FRE-BB2003_MSU-12
GB-FRE-BB2012_MSU-12
GB-FRE-BB2003_MSU-10
etc
I would like to query the data, using a substring, so I can find all the data from the BD2003 building, regardless of location and sensor.
SELECT `presentvalue`
FROM `database`
WHERE ???
Is someone able to help with this? I have looked at SPLIT and SUBSTRING but can't seem to get the query right.
Few more options
Using LIKE
select *
from your_table
where presentvalue like '%-BB2003_%'
Using REGEXP_CONTAINS
select *
from your_table
where regexp_contains(presentvalue, '-BB2003_')
if applied to sample data in your question - both have below output
use where substring(split(val , '-')[offset(2)], 1, 6)='BB2003'
tested it on the below code and it works:
create temp table sample(
val string
);
insert into sample
(val)
values
('GB-FRE-BB2003_MSU-01'),
('GB-FRE-BB2001_MSU-12'),
('GB-FRE-BB2003_MSU-12'),
('GB-FRE-BB2012_MSU-12'),
('GB-FRE-BB2003_MSU-10');
select *, from sample where substring(split(val , '-')[offset(2)], 1, 6)='BB2003''

Splitting a string and converting to integer in BigQuery

I have a simple problem but I started to use google bq and their help menu was so complex for me.
I have a column like that for some rows:
ANSWER(title of column)
9
10 - Certainly Satisfied.
7 -
My aim is to split the previous part of that column from "-" sign and convert it to integer. I found some formulas like split(), regexp_extract() but I couldn't be sure how can I imply them for my data.
Thanks for your help in advance :)
If the number is always first, you can use:
select sum(safe.cast((split(answer, '-'))[ordinal(1)] as int64)
from t;
Note: It looks like you have spaces, so you might really want to split on the space:
select sum(safe.cast((split(answer, ' '))[ordinal(1)] as int64)
from t;
Consider below option
select answer,
safe_cast(regexp_extract(trim(answer), r'^\d+') as int64) as score
from `project.dataset.table`
if to apply to sample data in your question - output is

How to query only a particular part of the string on BigQuery

I have a BigQuery table that has column called topics under which I have result like this
/finance/investing/funds/mutual funds. How do I write a query on BigQuery to yield me only the word between the first two slashes i.e. in this example I would like it to return only finance.
Just developing Gordons answer further to work with your ARRAY<STRING>. All you need to do, is just UNNEST the array before passing it to the SPLIT function mentioned before.
Simple sample:
SELECT SPLIT(string, '/')[safe_ordinal(2)]
FROM UNNEST([ '/finance/investing/funds/mutual funds', '/random/investing/funds/mutual
funds' ]) AS string
You can use split():
select split('/finance/investing/funds/mutual funds', '/')[safe_ordinal(2)]

function to convert unicode in bigquery

I trying out the NORMALIZE function with NFKC in bigquery from the documentation, I see that I can convert a string to a readable format. For example
WITH EquivalentNames AS (
SELECT name
FROM UNNEST([
'Jane\u2004Doe',
'\u0026 Hello'
]) AS name
)
SELECT
NORMALIZE(name, NFKC) AS normalized_str
FROM EquivalentNames
GROUP BY 1;
The ampersand character shows up correctly, but I have a table, with a column of STRING with unicode character in its values, but I'm not able to use NORMALIZE to convert it to a readable format.
I've also tried some of the other solutions presented
Decode Unicode's to Local language in Bigquery but nothing is working yet.
Attached is an example of the data:
You posted a question about NORMALIZE, but didn't make your goals clear.
Here I'll answer the question about NORMALIZE - to point out that it probably doesn't do what you are expecting it to do. But at least it's acting as expected.
There are many ways to encode the same string with Unicode. Normalize chooses one, while preserving the string.
See this query:
SELECT *, a=b ab, a=c ac, a=d ad, b=c bc, b=d bd, c=d cd
FROM (
SELECT NORMALIZE('hello ñá 😞', NFC) a
, NORMALIZE('hello ñá 😞', NFKC) b
, NORMALIZE('hello ñá 😞', NFD) c
, NORMALIZE('hello ñá 😞', NFKD) d
)
As you see - every time you get the same string, they just have different non-visible representations.
The \u2004 is so called thick space so that is why you thnk it is not showing correctly because you just see space -
But if you will try some other codes - like for example \2020 - you will see it is actually showing even without extra processing with NORMALIZE function
As in below
#standardSQL
WITH EquivalentNames AS (
SELECT name
FROM UNNEST([
'Jane\u2020Doe',
'\u0026 Hello'
]) AS name
)
SELECT
name, NORMALIZE(name, NFKC) AS normalized_str
FROM EquivalentNames
GROUP BY 1
with result
Row name normalized_str
1 Jane†Doe Jane†Doe
2 & Hello & Hello