How to union an array when grouping? - sql

I am trying to combine multiple array columns into one with distinct elements and then get a count of distinct elements. How can I do something like that in postgres?
create temp table t as ( select 'james' as fn, array ['bond', 'milner'] as ln );
create temp table tt as ( select 'james' as fn, array ['mcface', 'milner'] as ln );
-- expected value: james, 3
select x.name,
array_length()-- what to do here?
from (
select fn, ln
from t
union
select fn, ln
from tt
) as x
group by x.name

You should unnest the arrays in the inner queries:
select x.fn,
count(elem)
from (
select fn, unnest(ln) as elem
from t
union
select fn, unnest(ln) as elem
from tt
) as x
group by x.fn
Db<>fiddle.

Why do you (want to) use arrays? That's not needed. Simply have a derived table using UNION, which eliminates duplicates, GROUP BY the name and use count().
SELECT name,
count(*)
FROM (SELECT name,
ln
FROM t
UNION
SELECT name,
ln
FROM tt) AS x
GROUP BY name;
db<>fiddle
Side note: 9.3 is out of support for a while. Consider upgrading.

Related

bigquery transpose and concatenate for each record

I want to achieve the following transformation.
I have last_name stored in a repeated record as follows.
data before transformation
I want to achieve the following.
data after transformation
Example with sample data created.
create or replace table t1.cte1 as
WITH t1 AS (
SELECT 1 as id,'eren' AS last_name UNION ALL
SELECT 1 as id,'yilmaz' AS last_name UNION ALL
SELECT 1 as id,'kaya' AS last_name
)
SELECT id,ARRAY_AGG(STRUCT(last_name)) AS last_name_rec
FROM t1
GROUP BY id;
with test as (
select x.id, x.lname_agg,y.last_name from
(
select id, STRING_AGG(h.last_name,' ') lname_agg FROM
t1.cte1
LEFT JOIN
UNNEST(last_name_rec) AS h
group by id
) x,
(select id,h.last_name last_name FROM
t1.cte1
LEFT JOIN
UNNEST(last_name_rec) AS h
group by last_name,id) y
) select id ,sp.string_flatten_dedup( lname_agg,' ') concat_last_name, last_name from test;
I'm not sure either if I should store it as an array instead of a concatenated field but it would be good to know how to achieve both.
storing the concat_last_name as an array
I have achieved the first transformation as follows but I had to dedup the concatenated field with a function I wrote.
I'm sure there is a much better way of achieving this.
with test as (
select x.id id, x.lname_agg,y.last_name from
(
select id, STRING_AGG(h.last_name,' ') lname_agg FROM
small_test
LEFT JOIN
UNNEST(last_name_rec) AS h
group by id
) x,
(select id,h.last_name last_name FROM
small_test
LEFT JOIN
UNNEST(last_name_rec) AS h group by last_name,id) y
) select id ,sp.string_flatten_dedup( lname_agg,' ') concat_last_name, last_name from test;
The function.
string_flatten_dedup
CREATE OR REPLACE FUNCTION
sp.string_flatten_dedup(string_value string,
delim string) AS
(
ARRAY_TO_STRING
(ARRAY(SELECT distinct string_value
FROM UNNEST(SPLIT(string_value, delim)) AS string_value
order by string_value desc, string_value),
delim)
);
before using function.
intermediate results.
Final result after applying dedup function.
final output
Updated table structure.
t1.ccte1
Yours works but I got the table structure incorrect when I first posted.
create or replace table t1.cte2 as
with your_table as (
select 1 id, ['brown', 'smith', 'jones'] last_name union all
select 2, ['ryan', 'murphy']
) select id, ln as last_name,
array_to_string(last_name, ',') as concat_last_name,
from your_table, unnest(last_name) ln;
select id, ln as last_name,
array_to_string(last_name, ',') as concat_last_name,
from t1.cte2, unnest(last_name) ln;
--fails as its not the structure I thought it was cte1 is different then cte2
select id, ln.last_name
--array_to_string(last_name, ',') as concat_last_name,
from t1.cte1, unnest(last_name_rec) ln;
Consider below approach
select id, ln as last_name,
array_to_string(last_name, ',') as concat_last_name,
from your_table, unnest(last_name) ln
if applied to sample data in your question data before transformation
with your_table as (
select 1 id, ['brown', 'smith', 'jones'] last_name union all
select 2, ['ryan', 'murphy']
)
output is
In case if you want last names as an array - you already have this array - see below for how to use it
select id, ln as last_name,
last_name as concat_last_name,
from your_table, unnest(last_name) ln
with output

Any equivalent in bigquery to postgres any() array?

Coming to BigQuery somewhat recently from Postgres - I'm used to utilizing the following pattern fairly regularly in specific cases, but not where the CTE set would be too big of course.
with cte1 as (
select id from my_table where feature1 = x, feature2 = y
)
select id, detail1, detail2, detail3 from my_table
where id = ANY( select id from cte1 )
Something to the effect of using my CTE as a place to gather records of a particular interest (something in common, something uncommon, outliers, dupes etc.) and then passing that to my main select using the id = ANY( select id from CTE ) pattern.
What would be the equivalent in BigQuery?
Equivalent is
where id in ( select id from CTE )
So, your query can look like below
with cte1 as (
select id from my_table where feature1 = x and feature2 = y
)
select id, detail1, detail2, detail3 from my_table
where id in ( select id from cte1 )

Defining a subtable and then query from that table using SQL

I have a table with many columns, and I want to count the unique values of each column. I know that I can do
SELECT sho_01, COUNT(*) from sho GROUP BY sho_01
UNION ALL
SELECT sho_02, COUNT(*) from sho GROUP BY sho_02
UNION ALL
....
Here sho is the table and sho_01,.... are the individual columns. This is BigQuery by the way, so they use UNION ALL.
Next, I want to do the same thing, but for a subset of sho, say SELECT * FROM sho WHERE id in (1,2,3). Is there a way where I can create a subtable first, and then query the subtable? Something like this
SELECT * FROM (SELECT * FROM sho WHERE id IN (1,2,3)) AS t1;
SELECT sho_01, COUNT(*) from t1 GROUP BY sho_01
UNION ALL
SELECT sho_02, COUNT(*) from t1 GROUP BY sho_02
UNION ALL
....
Thanks
Presumably, the columns are all of the same type. If so, you can simplify this using arrays:
select el.which, el.val, count(*)
from (select t1.*,
array[struct('sho_01' as which, sho_01 as val),
struct('sho_2', show_02),
. . .
] as ar
from t
) t cross join
unnest(ar) el
group by el.which, el.val;
You can then easily filter however you want by adding a where clause before the group by.
Below is for BigQuery Standard SQL and allows you to avoid manual typing of column names or even knowing them in advance
#standardSQL
SELECT
TRIM(SPLIT(kv, ':')[OFFSET(0)], '"') column,
SPLIT(kv, ':')[OFFSET(1)] value,
COUNT(1) cnt
FROM `project.dataset.table` t,
UNNEST(SPLIT(TRIM(TO_JSON_STRING(t), '{}'))) kv
GROUP BY column, value
-- ORDER BY column, value

Array_agg containing distinct structs

I'm attempting to create an array with distinct struct as values for a column, something like so
select array_agg(distinct struct(field_a, field_b)) as c FROM tables ...
is that possible?
#standardSQL
SELECT ARRAY_AGG(STRUCT(field_a, field_b)) c
FROM (
SELECT DISTINCT field_a, field_b
FROM `project.dataset.table`
)

Select values that exist in all arrays in Postgres

I've got some table ignore with col ignored_entry_ids contains array of integer. For example:
id ignored_entry_ids
1 {1,4,6}
2 {6,8,11}
3 {5,6,7}
How can I select numbers that exists in every row with array? (6 in examle)
If your numbers are unique inside array, you can do something like this, don't think it could be made without unnest
with cte as (
select id, unnest(ignored_entry_ids) as arr
from ign
)
select arr
from cte
group by arr
having count(*) = (select count(*) from ign)
sql fiddle demo
if numbers are not unique, add distinct:
with cte as (
select distinct id, unnest(ignored_entry_ids) as arr
from ign
)
select arr
from cte
group by arr
having count(*) = (select count(*) from ign)