How to query only a particular part of the string on BigQuery - sql

I have a BigQuery table that has column called topics under which I have result like this
/finance/investing/funds/mutual funds. How do I write a query on BigQuery to yield me only the word between the first two slashes i.e. in this example I would like it to return only finance.

Just developing Gordons answer further to work with your ARRAY<STRING>. All you need to do, is just UNNEST the array before passing it to the SPLIT function mentioned before.
Simple sample:
SELECT SPLIT(string, '/')[safe_ordinal(2)]
FROM UNNEST([ '/finance/investing/funds/mutual funds', '/random/investing/funds/mutual
funds' ]) AS string

You can use split():
select split('/finance/investing/funds/mutual funds', '/')[safe_ordinal(2)]

Related

Hive sql extract one to multiple values from key value pairs

I have a column that looks like:
[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false},...]
There can be 1 to many items described by parameters in {} in the column.
I would like to extract values only of parameters described by key_1. Is there a function for that? I tried so far json related functions (json_tuple, get_json_object) but each time I received null.
Consider below json path.
WITH sample_data AS (
SELECT '[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false}]' json
)
SELECT get_json_object(json, '$[*].key_1') AS key1_values FROM sample_data;
Query results

Why json_extract works but json_extract_scalar does not?

I have a dataset containing a column in json with an attribute giving me a list, and I would like to unnest it to join some different data.
I thought about json_extract_scalar the json_data, then I could split it and finally unnest with other operations, however I got a problem.
In my case, when i run the json_extract it works fine but i cannot convert to a varchar. In the other hand, if i use json_extract_scalar it returns a null value.
I think the problem should be the quotation marks, but I am not sure how to deal with it - and even if this is the correct problem.
Let me give you a sample of the data:
{"my_test_list":["756596263-0","743349523-371296","756112380-0","755061590-0"]}
Can you guys give me some advice?
I'm querying SQL in Presto.
What you are storing under key my_test_list is a JSON array, not a scalar value - which is why json_extract_scalar() returns null.
It is rather unclear how you want to use that data. A typical solution is to cast it to an array, that you can then use as needed, for example by unnesting it. The base syntax would be:
cast(json_extract(mycol, '$.my_test_list') as array(varchar))
You would then use that in a lateral join, like:
select t.mycol, x.myval
from mytable t
cross join unnest(
cast(json_extract(mycol, '$.my_test_list') as array(varchar))
) as x(myval)

Convert strings into table columns in biq query

I would like to convert this table
to something like this
the long string can be dynamic so it's important to me that it's not a fixed solution for these values specifically
Please help, i'm using big query
You could start by using SPLIT SPLIT(value[, delimiter]) to convert your long string into separate key-value pairs in an array.
This will be sensitive to you having commas as part of your values.
SPLIT(session_experiments, ',')
Then you could either FLATTEN that array or access each element, and then use some REGEXs to separate the key and the value.
If you share more context on your restrictions and intended result I could try and put together a query for you that does exactly what you want.
It's not possible what you want, however, there is a better practice for BigQuery.
You can use arrays of structs to store that information in a table.
Let's say you have a table like that
You can use that sample query to understand how to use it.
with rawdata AS
(
SELECT 1 as id, 'test1-val1,test2-val2,test3-val3' as experiments union all
SELECT 1 as id, 'test1-val1,test3-val3,test5-val5' as experiments
)
select
id,
(select array_agg(struct(split(param, '-')[offset(0)] as experiment, split(param, '-')[offset(1)] as value)) from unnest(split(experiments)) as param ) as experiments
from rawdata
The output will look like that:
After having that output, it's more convenient to manipulate the data

BigQuery Standard SQL: how to return the first value of array?

Small working example
SELECT SPLIT("hello::hej::hallo::hoi", "::")
returns an array [hello, hej, hallo, hoi] where I want to select the first element i.e. hello. BG Standard provides no FIRST, instead FIRST_VALUE(..) OVER() which I cannot get working for this example above, so
How can I select the first value of array with BigQuery Standard SQL?
I think the documentation in BigQuery is pretty good. You can read about arrays here.
You can use either OFFSET() or ORDINAL(). The method would be:
select array[offset(0)]
or
select array[ordinal(1)]

Length of json array field in Big Query

Given: json string that represents array field selected as a result of query.
How to find out length of JSON array field in Big Query?
In case it may be useful for someone like me looking for the answer today:
I replaced the JSON_LENGTH(json_string, json_path) method from MySQL by ARRAY_LENGTH(JSON_EXTRACT_ARRAY(json_string, json_path) in BigQuery.
See if this does what you want. It uses a JavaScript UDF to parse the JSON and then returns the array length.
#standardSQL
CREATE TEMP FUNCTION JsonArrayLength(json_array STRING)
RETURNS INT64 LANGUAGE js AS """
var arr = JSON.parse(json_array);
return arr.length;
""";
SELECT JsonArrayLength('[1,3,5,7]');
Below example is for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_LENGTH(SPLIT('[1,3,5,7]'))