BigQuery - concatenate ignoring NULL - sql

I'm very new to SQL. I understand in MySQL there's the CONCAT_WS function, but BigQuery doesn't recognise this.
I have a bunch of twenty fields I need to CONCAT into one comma-separated string, but some are NULL, and if one is NULL then the whole result will be NULL. Here's what I have so far:
CONCAT(m.track1, ", ", m.track2))) As Tracks,
I tried this but it returns NULL too:
CONCAT(m.track1, IFNULL(m.track2,CONCAT(", ", m.track2))) As Tracks,
Super grateful for any advice, thank you in advance.

Unfortunately, BigQuery doesn't support concat_ws(). So, one method is string_agg():
select t.*,
(select string_agg(track, ',')
from (select t.track1 as track union all select t.track2) x
) x
from t;
Actually a simpler method uses arrays:
select t.*,
array_to_string([track1, track2], ',')
Arrays with NULL values are not supported in result sets, but they can be used for intermediate results.

I have a bunch of twenty fields I need to CONCAT into one comma-separated string
Assuming that these are the only fields in the table - you can use below approach - generic enough to handle any number of columns and their names w/o explicit enumeration
select
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.*)), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as Tracks
from `project.dataset.table` t
Below is oversimplified dummy example to try, test the approach
#standardSQL
with `project.dataset.table` as (
select 1 track1, 2 track2, 3 track3, 4 track4 union all
select 5, null, 7, 8
)
select
(select string_agg(col, ', ' order by offset)
from unnest(split(trim(format('%t', (select as struct t.*)), '()'), ', ')) col with offset
where not upper(col) = 'NULL'
) as Tracks
from `project.dataset.table` t
with output

Related

How do I select columns based on a string pattern in BigQuery

I have a table in BigQuery with hundreds of columns, and it just happens that I want to select all of them except for those that begin with an underscore. I know how to do a query to select the columns beginning with an underscore using the INFORAMTION_SCHEMA.COLUMNS table, but I can't figure out how I would use this query to select the columns I want. I know BigQuery has EXCEPT but I want to avoid writing out each column that begins with an underscore, and I can't seem to pass to it a subquery or even something like a._*.
Consider below approach
execute immediate (select '''
select * except(''' || string_agg(col) || ''') from your_table
'''
from (
select col
from (select * from your_table limit 1) t,
unnest([struct(translate(to_json_string(t), '{}"', '') as kvs)]),
unnest(split(kvs)) kv,
unnest([struct(split(kv, ':')[offset(0)] as col)])
where starts_with(col, '_')
));
if apply to table like below
it generates below statement
select * except(_c,_e) from your_table
and produces below output

How to convert fields to JSON in Postgresql

I have a table with the following schema (postgresql 14):
message sentiment classification
any text positive mobile, communication
message are only string, phrases.
sentiment is a string, only one word
classification are string but can have 1 to many word comma separated
I would like to create a json field with these columns, like this:
{"msg":"any text", "sentiment":"positive","classification":["mobile,"communication"]}
Also, if possible, is there a way to consider the classification this way:
{"msg":"any text", "sentiment":"positive","classification 1":"mobile","classification 2" communication"}
The first part of question is easy - Postgres provides functions for splitting string and converting to json:
with t(message, sentiment, classification) as (values
('any text','positive','mobile, communication')
)
select row_to_json(x.*)
from (
select t.message
, t.sentiment
, array_to_json(string_to_array(t.classification, ', ')) as classification
from t
) x
The second part is harder - your want json to have variable number of attributes, mixed of grouped and nongrouped data. I suggest to unwind all attributes and then assemble them back (note the numbered CTE is actually not needed if your real table has id - I just needed some column to group by):
with t(message, sentiment, classification) as (values
('any text','positive','mobile, communication')
)
, numbered (id, message, sentiment, classification) as (
select row_number() over (order by null)
, t.*
from t
)
, extracted (id,message,sentiment,classification,index) as (
select n.id
, n.message
, n.sentiment
, l.c
, l.i
from numbered n
join lateral unnest(string_to_array(n.classification, ', ')) with ordinality l(c,i) on true
), unioned (id, attribute, value) as (
select id, concat('classification ', index::text), classification
from extracted
union all
select id, 'message', message
from numbered
union all
select id, 'sentiment', sentiment
from numbered
)
select json_object_agg(attribute, value)
from unioned
group by id;
DB fiddle
Use jsonb_build_object and concatenate the columns you want
SELECT
jsonb_build_object(
'msg',message,
'sentiment',sentiment,
'classification',
string_to_array(classification,','))
FROM mytable;
Demo: db<>fiddle
The second output is definitely not trivial. The SQL code would be much larger and harder to maintain - not to mention that parsing such file also requires a little more effort.
You can use a cte to handle the flattening of the classification attributes and then perform the necessary grouping in the main queries for each problem component:
with cte(r, m, s, k) as (
select row_number() over (order by t.message), t.message, t.sentiment, v.* from tbl t
cross join json_array_elements(array_to_json(string_to_array(t.classification, ', '))) v
)
-- first part --
select json_build_object('msg', t1.message, 'sentiment', t1.sentiment, 'classification', string_to_array(t1.classification, ', ')) from tbl t1
-- second part --
select jsonb_build_object('msg', t1.m, 'sentiment', t1.s)||('{'||t1.g||'}')::jsonb
from (select c.m, c.s, array_to_string(array_agg('"classification '||c.r||'":'||c.k), ', ') g
from cte c group by c.m, c.s) t1

TextJoin like function based on a condition in on SQL

Trying to figure out if it is possible to do a textjoin like function in SQL based on a condition. Right now the only way I can think of doing it is by running a pivot to make the rows of the column and aggregating them that way. I think this is the only way to transpose the data in SQL?
Input This would be a aql table (tbl_fruit) that exists as the image depicts
SELECT *
FROM tbl_fruit
Output
Below is for BigQuery Standard SQL (without specifically listing each column, thus in a way that it scales ...)
#standardSQL
select `Group`, string_agg(split(kv, ':')[offset(0)], ', ') output
from `project.dataset.table` t,
unnest(split(translate(to_json_string((select as struct t.* except(`Group`))), '{}"', ''))) kv
where split(kv, ':')[offset(1)] != '0'
group by `Group`
If to apply to sample data from your question - output is
In Big Query, you could do this with arrays:
select grp,
array_to_string(
[
case when apples = 1 then 'apples' end,
case when oranges = 1 then 'oranges' end,
case when bananas = 1 then 'bananas' end,
case when grapes = 1 then 'grapes' end
],
','
) as output
from mytable
This puts all the columns in an array, transcoding each 1 to the corresponding literal string and 0s to null values. Then array_to_string() builds the output CSV string - this functions ignores null values by default.

Count unique within combination of json keys in BigQuery

In BigQuery I do have a json stored in 1 column like this:
{"key1": "value1", "key3":"value3"}
{"key2": "value2"}
{"key3": "value3"}
What I'd want to know is how to calculate number of unique combinations, paying attention that there can be up to 100+ different keys so avoiding listing them would be beneficial.
In example above end result of unique number will be 2, because first and third matched by "key3", while second didn't matched with anything.
I understand how to build this with writing an app that will calculate it, but would like to see if there is any solution possible with 1 query
If your JSON values are formatted with no spaces after the :, then you can treat this as string manipulations:
with t as (
select '{"key1":"value1", "key3":"value3"}' as kv union all
select '{"key2":"value2"}' union all
select '{"key3":"value3"}'
)
select x, count(*)
from t cross join
unnest(regexp_extract_all(t.kv, '"[^,]+"')) x
group by x
having count(*) = 1;
With the spaces, you can use replace() to get rid of them:
with t as (
select '{"key1": "value1", "key3":"value3"}' as kv union all
select '{"key2": "value2"}' union all
select '{"key3": "value3"}'
)
select replace(x, '": "', '":"'), count(*)
from t cross join
unnest(regexp_extract_all(t.kv, '"[^,]+"')) x
group by 1
having count(*) = 1;

T-SQL function to split string with two delimiters as column separators into table

I'm looking for a t-sql function to get a string like:
a:b,c:d,e:f
and convert it to a table like
ID Value
a b
c d
e f
Anything I found in Internet incorporated single column parsing (e.g. XMLSplit function variations) but none of them letting me describe my string with two delimiters, one for column separation & the other for row separation.
Can you please guiding me regarding the issue? I have a very limited t-sql knowledge and cannot fork those read-made functions to get two column solution?
You can find a split() function on the web. Then, you can do string logic:
select left(val, charindex(':', val)) as col1,
substring(val, charindex(':', val) + 1, len(val)) as col2
from dbo.split(#str, ';') s(val);
You can use a custom SQL Split function in order to separate data-value columns
Here is a sql split function that you can use on a development system
It returns an ID value that can be helpful to keep id and value together
You need to split twice, first using "," then a second split using ";" character
declare #str nvarchar(100) = 'a:b,c:d,e:f'
select
id = max(id),
value = max(value)
from (
select
rowid,
id = case when id = 1 then val else null end,
value = case when id = 2 then val else null end
from (
select
s.id rowid, t.id, t.val
from (
select * from dbo.Split(#str, ',')
) s
cross apply dbo.Split(s.val, ':') t
) k
) m group by rowid