how to convert a bigQuery array row into a column and then count [duplicate] - google-bigquery

This question already has answers here:
Bigquery SQL: convert array to columns
(2 answers)
Closed 9 days ago.
I have a table like as shown below
[sample table AS-IS]
As shown, I would like to count each value in the "Value" column based on the ID.
as shown in the example below
[sample table TO-BE]
The data in the "Value" column is subject to change.
Can you help?

You might consider below PIVOT and a dynamic SQL.
CREATE TEMP TABLE sample_table AS
SELECT '111' ID, ['A', 'B', 'C', 'D'] Value UNION ALL
SELECT '222', ['E', 'F', 'G'] UNION ALL
SELECT '222', ['A', 'H', 'D'];
EXECUTE IMMEDIATE FORMAT("""
SELECT * FROM (
SELECT ID, v FROM sample_table, UNNEST(Value) v
) PIVOT (COUNT(v) FOR Value IN ('%s'))
""", (SELECT STRING_AGG(DISTINCT v, "','" FROM (SELECT ID, v FROM sample_table, UNNEST(Value) v)));

Related

Create a artificial primary key / artificial column inside an SELECT-Statement (Oracle database)

I have a SQL statement that looks like this:
SELECT
x, AVG(y) AS z
FROM
table
UNPIVOT
(y FOR x
IN ("COLUMN1" AS 'A',
"COLUMN1" AS 'B',
"COLUMN2" AS 'C',
"COLUMN3" AS 'D',
"COLUMN4" AS 'E',
"COLUMN5" AS 'F',
"COLUMN6" AS 'G'))
GROUP BY
x;
Is there a possibility to create an artificial key for each line inside of the SELECT statement? I can't add the ID of the table, because then I would have to add it to the group by clause as well and the output would differ from the previous output...
Basically I need to add a unique numeric column to the output of my select.
I don't have an Oracle server to test it on, but something like this should work:
SELECT
ROW_NUMBER() OVER(ORDER BY NULL) AS id_num,
x, AVG(y) AS z
FROM
table
UNPIVOT
(y FOR x
IN ("COLUMN1" AS 'A',
"COLUMN1" AS 'B',
"COLUMN2" AS 'C',
"COLUMN3" AS 'D',
"COLUMN4" AS 'E',
"COLUMN5" AS 'F',
"COLUMN6" AS 'G'))
GROUP BY
x;

Snowflake SQL - OBJECT_CONSTRUCT from COUNT and GROUP BY

I'm trying to summarize data in a table:
counting total rows
counting values on specific fields
getting the distinct values on specific fields
and, more importantly, I'm struggling with:
getting the count for each field nested in an object
given this data
COL1
COL2
A
0
null
1
B
null
B
null
the expected result from this query would be:
with dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
)
select
count(1) as total
,count(col1) as col1
,array_agg(distinct col1) as dist_col1
--,object_construct(???) as col1_object_count
,count(col2) as col2
,array_agg(distinct col2) as dist_col2
--,object_construct(???) as col2_object_count
from
dummy
TOTAL
COL1
DIST_COL1
COL1_OBJECT_COUNT
COL2
DIST_COL2
COL2_OBJECT_COUNT
4
3
["A", "B"]
{"A": 1, "B", 2, null: 1}
2
[0, 1]
{0: 1, 1: 1, null: 2}
I've tried several functions inside OBJECT_CONSTRUCT mixed with ARRAY_AGG, but all failed
OBJECT_CONSTRUCT can work with several columns but only given all (*), if you try a select statement inside, it will fail
another issue is that analytical functions are not easily taken by the object or array functions in Snowflake.
You could use Snowflake Scripting or Snowpark for this but here's a solution that is somewhat flexible so you can apply it to different tables and column sets.
Create test table/view:
Create or Replace View dummy as (
select 'A' as col1, 0 as col2
union all
select null, 1
union all
select 'B', null
union all
select 'B', null
);
Set session variables for table and colnames.
set tbname = 'DUMMY';
set colnames = '["COL1", "COL2"]';
Create view that generates the required table_column_summary data:
Create or replace View table_column_summary as
with
-- Create table of required column names
cn as (
select VALUE::VARCHAR CNAME
from table(flatten(input => parse_json($colnames)))
)
-- Convert rows into objects
,ro as (
select
object_construct_keep_null(*) row_object
-- using identifier on session variable to dynamically supply table/view name
from identifier($tbname) )
-- Flatten row objects into key/values
,rof as (
select
key col_name,
ifnull(value,'null')::VARCHAR col_value
from ro, lateral flatten(input => row_object), cn
-- You will only need this filter if you need a subset
-- of columns from the source table/query summarised
where col_name = cn.cname)
-- Get the column value distinct value counts
,cdv as (
select col_name,
col_value,
sum(1) col_value_count
from rof
group by 1,2
)
-- and derive required column level stats and combine with cdv
,cv as (
select
(select count(1) from dummy) total,
col_name,
object_construct('COL_COUNT', count(col_value) ,
'COL_DIST', array_agg(distinct col_value),
'COL_OBJECT_COUNT', object_agg(col_value,col_value_count)) col_values
from cdv
group by 1,2)
-- Return result
Select * from cv;
Use this final query if you want a solution that works flexibility with any table/columns provided as input...
Select total, object_agg(col_name, col_values) col_values_obj
From table_column_summary
Group by 1;
Or use this final query if you want the fixed columns output as described in your question...
Select total,
COL1[0]:COL_COUNT COL1,
COL1[0]:COL_DIST DIST_COL1,
COL1[0]:COL_OBJECT_COUNT COL1_OBJECT_COUNT,
COL2[0]:COL_COUNT COL2,
COL2[0]:COL_DIST DIST_COL2,
COL2[0]:COL_OBJECT_COUNT COL2_OBJECT_COUNT
from table_column_summary
PIVOT ( ARRAY_AGG ( col_values )
FOR col_name IN ( 'COL1', 'COL2' ) ) as pt (total, col1, col2);

PostgreSQL: Select unique rows where distinct values are in list

Say that I have the following table:
with data as (
select 'John' "name", 'A' "tag", 10 "count"
union all select 'John', 'B', 20
union all select 'Jane', 'A', 30
union all select 'Judith', 'A', 40
union all select 'Judith', 'B', 50
union all select 'Judith', 'C', 60
union all select 'Jason', 'D', 70
)
I know there are a number of distinct tag values, namely (A, B, C, D).
I would like to select the unique names that only have the tag A
I can get close by doing
-- wrong!
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1
however, this will include unique names that only have 1 distinct tag, regardless of what tag is it.
I am using PostgreSQL, although having more generic solutions would be great.
You're almost there - you already have groups with one tag, now just test if it is the tag you want:
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1 and max(tag)='A'
(Note max could be min as well - SQL just doesn't have single() aggregate function but that's different story.)
You can use not exists here:
select distinct "name"
from data d
where "tag" = 'A'
and not exists (
select * from data d2
where d2."name" = d."name" and d2."tag" != d."tag"
);
This is one possible way of solving it:
select
distinct("name")
from data
where "name" not in (
-- create list of names we want to exclude
select distinct name from data where "tag" != 'A'
)
But I don't know if it's the best or most efficient one.

Get first N elements from an array in BigQuery table

I have an array column and I would like to get first N elements of it (keeping an array data type). Is there a some nice way how to do it? Ideally without unnesting, ranking and array_agg back to array.
I could also do this (for getting first 2 elements):
WITH data AS
(
SELECT 1001 as id, ['a', 'b', 'c'] as array_1
UNION ALL
SELECT 1002 as id, ['d', 'e', 'f', 'g'] as array_1
UNION ALL
SELECT 1003 as id, ['h', 'i'] as array_1
)
select *,
[array_1[SAFE_OFFSET(0)], array_1[SAFE_OFFSET(1)]] as my_result
from data
But obviously this is not a nice solution as it would fail in case when some array would have only 1 element.
Here's a general solution with a UDF that you can call for any array type:
CREATE TEMP FUNCTION TopN(arr ANY TYPE, n INT64) AS (
ARRAY(SELECT x FROM UNNEST(arr) AS x WITH OFFSET off WHERE off < n ORDER BY off)
);
WITH data AS
(
SELECT 1001 as id, ['a', 'b', 'c'] as array_1
UNION ALL
SELECT 1002 as id, ['d', 'e', 'f', 'g'] as array_1
UNION ALL
SELECT 1003 as id, ['h', 'i'] as array_1
)
select *, TopN(array_1, 2) AS my_result
from data
It uses unnest and the array function, which it sounds like you didn't want to use, but it has the advantage of being general enough that you can pass any array to it.
Another option for BigQuery Standard SQL (with JS UDF)
#standardSQL
CREATE TEMP FUNCTION FirstN(arr ARRAY<STRING>, N FLOAT64)
RETURNS ARRAY<STRING> LANGUAGE js AS """
return arr.slice(0, N);
""";
SELECT *,
FirstN(array_1, 3) AS my_result
FROM data

Find missing value in table from given set

Assume there is a table called "allvalues" with a column named "column".
This column contains the values "A" to "J" while missing the "H".
I am given a set of values from "G" to "J".
How can I query the table to see which value of my set is missing in the column?
The following does not work:
select * from allvalues where column not in ('G', 'H', 'I', 'J')
This query would result in A, B, C, D, E, F, H which also contains values not included in the given set.
Obviously in such a small data pool the missing value is noticeable by eye, but imagine more entries in the table and a bigger set.
You need to start with a (derived) table with the values you are checking. One explicit method is:
with testvalues as (
select 'G' as val from dual union all
select 'H' as val from dual union all
select 'I' as val from dual union all
select 'J' as val from dual
)
select tv.val
from testvalues tv
where not exists (select 1 from allvalues av where av.column = tv.val);
Often, the values originate through a query or a table. So explicitly declaring them is unnecessary -- you can replace that part with a subquery.
Depends on which SQL syntax you can use, but basically you want to check your table allvalues + the extra values.
eg:
SELECT *
FROM ALLVALUES
WHERE COLUMN NOT IN (
( select s.column from allvalues s )
and column not in ('G', 'H', 'I', 'J')
this will work:
select * from table1;
G
H
I
J
select * from table1
minus
(select * from table1
intersect
select column from allvalues
)
sample input:
select * from ns_table10;
G
H
I
J
SELECT * FROM ns_table11;
A
B
C
D
E
F
G
J
I
select * from ns_table10
minus
(select * from ns_table10
intersect
select * from ns_table11
);
output:
H