Convert Array Athena into String - sql

I currently have a JSON output as an array in Athena:
This is the query Im running
WITH dataset AS (SELECT
Items
FROM
(SELECT * FROM (
SELECT
JSON_EXTRACT(message, '$.items') AS Items
FROM kafka.database
)))
select * from dataset
LIMIT 10
And this is the current Ouput
["item0","item1","item2","item3"]
But would like to generate the Ouput from AWS Athena in this way:
"item0,item1,item2,item3"
I have tried to follow this steps from Athena documentation but it's not working:
WITH dataset AS (SELECT
Items
FROM
(SELECT * FROM (
SELECT
array_join(JSON_EXTRACT(message, '$.items'),' ') AS Items
FROM kafka.database
)))
select * from dataset
LIMIT 10
But for example, in this way, I am able to select the first Item in the JSON output.
WITH dataset AS (SELECT
Items
FROM
(SELECT * FROM (
SELECT
json_array_get(JSON_EXTRACT(message, '$.items'),0) AS Items
FROM kafka.database
)))
select * from dataset
LIMIT 10

JSON_EXTRACT does not return an array, it returns value of json type, so directly manipulating it as an array is not supported. One way to handle it is to cast it to array(varchar) and use array_join on it:
-- sample data
WITH dataset (json_arr) AS (
VALUES (json '["item0","item1","item2","item3"]')
)
-- query
select array_join(cast(json_arr as array(varchar)), ', ')
from dataset;
Output:
_col0
item0, item1, item2, item3

Related

How to convert fields to JSON in Postgresql

I have a table with the following schema (postgresql 14):
message sentiment classification
any text positive mobile, communication
message are only string, phrases.
sentiment is a string, only one word
classification are string but can have 1 to many word comma separated
I would like to create a json field with these columns, like this:
{"msg":"any text", "sentiment":"positive","classification":["mobile,"communication"]}
Also, if possible, is there a way to consider the classification this way:
{"msg":"any text", "sentiment":"positive","classification 1":"mobile","classification 2" communication"}
The first part of question is easy - Postgres provides functions for splitting string and converting to json:
with t(message, sentiment, classification) as (values
('any text','positive','mobile, communication')
)
select row_to_json(x.*)
from (
select t.message
, t.sentiment
, array_to_json(string_to_array(t.classification, ', ')) as classification
from t
) x
The second part is harder - your want json to have variable number of attributes, mixed of grouped and nongrouped data. I suggest to unwind all attributes and then assemble them back (note the numbered CTE is actually not needed if your real table has id - I just needed some column to group by):
with t(message, sentiment, classification) as (values
('any text','positive','mobile, communication')
)
, numbered (id, message, sentiment, classification) as (
select row_number() over (order by null)
, t.*
from t
)
, extracted (id,message,sentiment,classification,index) as (
select n.id
, n.message
, n.sentiment
, l.c
, l.i
from numbered n
join lateral unnest(string_to_array(n.classification, ', ')) with ordinality l(c,i) on true
), unioned (id, attribute, value) as (
select id, concat('classification ', index::text), classification
from extracted
union all
select id, 'message', message
from numbered
union all
select id, 'sentiment', sentiment
from numbered
)
select json_object_agg(attribute, value)
from unioned
group by id;
DB fiddle
Use jsonb_build_object and concatenate the columns you want
SELECT
jsonb_build_object(
'msg',message,
'sentiment',sentiment,
'classification',
string_to_array(classification,','))
FROM mytable;
Demo: db<>fiddle
The second output is definitely not trivial. The SQL code would be much larger and harder to maintain - not to mention that parsing such file also requires a little more effort.
You can use a cte to handle the flattening of the classification attributes and then perform the necessary grouping in the main queries for each problem component:
with cte(r, m, s, k) as (
select row_number() over (order by t.message), t.message, t.sentiment, v.* from tbl t
cross join json_array_elements(array_to_json(string_to_array(t.classification, ', '))) v
)
-- first part --
select json_build_object('msg', t1.message, 'sentiment', t1.sentiment, 'classification', string_to_array(t1.classification, ', ')) from tbl t1
-- second part --
select jsonb_build_object('msg', t1.m, 'sentiment', t1.s)||('{'||t1.g||'}')::jsonb
from (select c.m, c.s, array_to_string(array_agg('"classification '||c.r||'":'||c.k), ', ') g
from cte c group by c.m, c.s) t1

BigQuery - Count how many words in array are equal

I want to count how many similar words I have in a path (which will be split at delimiter /) and return a matching array of integers.
Input data will be something like:
I want to add another column, match_count, with an array of integers. For example:
To replicate this case, this is the query I'm working with:
CREATE TEMP FUNCTION HOW_MANY_MATCHES_IN_PATH(src_path ARRAY<STRING>, test_path ARRAY<STRING>) RETURNS ARRAY<INTEGER> AS (
-- WHAT DO I PUT HERE?
);
SELECT
*,
HOW_MANY_MATCHES_IN_PATH(src_path, test_path) as dir_path_match_count
FROM (
SELECT
ARRAY_AGG(x) AS src_path,
ARRAY_AGG(y) as test_path
FROM
UNNEST([
'lib/client/core.js',
'lib/server/core.js'
]) AS x, UNNEST([
'test/server/core.js'
]) as y
)
I've tried working with ARRAY and UNNEST in the HOW_MANY_MATCHES_IN_PATH function, but I either end up with an error or an array of 4 items (in this example)
Consider below approach
create temp function how_many_matches_in_path(src_path string, test_path string) returns integer as (
(select count(distinct src)
from unnest(split(src_path, '/')) src,
unnest(split(test_path, '/')) test
where src = test)
);
select *,
array( select how_many_matches_in_path(src, test)
from t.src_path src with offset
join t.test_path test with offset
using(offset)
) dir_path_match_count
from your_table t
if to apply to sample of Input data in your question
with your_table as (
select
['lib/client/core.js', 'lib/server/core.js'] src_path,
['test/server/core.js', 'test/server/core.js'] test_path
)
output is

BigQuery Standard SQL, get max value from json array

I have a BigQuery column which contains STRING values like
col1
[{"a":1,"b":2},{"a":2,"b":3}]
[{"a":3,"b":4},{"a":5,"b":6}]
Now when doing a SELECT for each I want to get just the max. value of "a" in each json array for example here I would want the output of the SELECT on the table to be
2
5
Any ideas please? Thanks!
Use JSON_EXTRACT_ARRAY() to retrieve each array element. Then JSON_EXTRACT_VALUE():
with t as (
select '[{"a":1,"b":2},{"a":2,"b":3}]' as col union all
select '[{"a":3,"b":4},{"a":5,"b":6}]'
)
select t.*,
(select max(json_value(el, '$.a'))
from unnest(JSON_QUERY_ARRAY(col, '$')) el
)
from t;

Bigquery array of STRINGs to array of INTs

I'm trying to pull an array of INT64 s in BigQuery standard SQL from a column which is a long string of numbers separated by commas (for example, 2013,1625,1297,7634). I can pull an array of strings easily with:
SELECT
SPLIT(string_col,",")
FROM
table
However, I want to return an array of INT64 s, not an array of strings. How can I do that? I've tried
CAST(SPLIT(string_col,",") AS ARRAY<INT64>)
but that doesn't work.
Below is for BigQuery Standard SQL
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, '2013,1625,1297,7634' AS string_col UNION ALL
SELECT 2, '1,2,3,4,5'
)
SELECT id,
(SELECT ARRAY_AGG(CAST(num AS INT64))
FROM UNNEST(SPLIT(string_col)) AS num
) AS num,
ARRAY(SELECT CAST(num AS INT64)
FROM UNNEST(SPLIT(string_col)) AS num
) AS num_2
FROM yourTable
Mikhail beat me to it and his answer is more extensive but adding this as a more minimal repro:
SELECT CAST(num as INT64) from unnest(SPLIT("2013,1625,1297,7634",",")) as num;

Convert array of strings into array of integers

I have the following SQL:
SELECT * FROM (SELECT t.id, t.summary, null as worker, tt.worked from ticket t
INNER JOIN (SELECT ticket, sum(seconds_worked)/3600.0 as worked FROM ticket_time GROUP BY ticket) tt ON tt.ticket=t.id
UNION ALL
SELECT ticket,null, tt.worker, sum(tt.seconds_worked)/3600.0 from ticket_time tt GROUP BY ticket,worker) as foo
WHERE id in ('9755, 9759') ORDER BY id
The ids string '9755, 9759' in the last line can and will change whenever the sql executed.
I can convert the sting to an array like this:
string_to_array('9755, 9759', ',')
But is there a way to convert this array of strings into array of integers?
Just cast the resulting array to an int[]
where id = ANY ( string_to_array('9755, 9759', ',')::int[] )