Build query for nested structure in bigquery - google-bigquery

I had the following code snippet
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS b, 4 AS a)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS b, 3 AS a), STRUCT (7 AS b, 4 AS a)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
And I got the following output
Row matching_rows
1 2
2 3
As per the where condition the result must be 3rd row only. Why it shows 2nd row also..?

Struct fields are unioned by position, not name. So this:
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS b, 4 AS a)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS b, 3 AS a), STRUCT (7 AS b, 4 AS a)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
is equivalent to this:
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS a, 4 AS b)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS a, 3 AS b), STRUCT (7 AS a, 4 AS b)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
Aside from the first query in the union, too, you can remove the AS <name> aliases since they don't affect the result.

Related

Count distinct by link on sql?

I'm trying to count distinct by the link between two columns.
Here is the example.
rownum
type
id
1
A
a
2
A
b
3
B
b
4
B
c
5
C
c
6
C
d
If I count distinct by type column, it returns 3. However, what I'd like to do is to consider rownum 2 and 3, 4 and 5 are not distinctive because they got the same value on id column.
To rephrase,
type
array of id
A
a, b
B
b, c
C
c, d
Since A and B got same b, and B and C got same c on their arrays, it would return 1 as a result.
I have no idea where to start. Would appreciate if I can get any hint or something.
Consider below:
you might use STRING_AGG
WITH TMP_TBL AS
(
SELECT 1 AS ROWNUM, 'A' AS TYPE, 'a' AS ID UNION ALL
SELECT 2,'A','b' UNION ALL
SELECT 3,'B','b' UNION ALL
SELECT 4,'B','b' UNION ALL
SELECT 5,'C','c' UNION ALL
SELECT 6,'C','d'
);
SELECT DISTINCT TYPE,N_ID
FROM
(
SELECT TYPE,STRING_AGG(ID)OVER(PARTITION BY TYPE) AS N_ID FROM TMP_TBL
)

Select a single maximum or minimum value in entire table of oracle

select maximum value from different columns of the table
For example, Table
A B C
-------
1 2 3
4 5 6
7 8 9
Result would be like
Max
9
Assuming the values are never NULL, I would simply do:
select max(greatest(a, b, c))
from t;
You could also phrase this as:
select greatest(max(a), max(b), max(c))
from t;
This version is more resilient to NULL values. It will work with NULLs unless all values for a column are NULL.
Here's one option, which uses GREATEST and LEAST functions, enclosed into MAX and MIN aggregates:
SQL> with test (a, b, c) as
2 (select 1, 2, 3 from dual union all
3 select 4, 5, 6 from dual union all
4 select 7, 8, 9 from dual
5 )
6 select max(greatest(a, b, c)) max_result,
7 min(least(a, b, c)) min_result
8 from test;
MAX_RESULT MIN_RESULT
---------- ----------
9 1
SQL>
What about:
select greatest(max(a), max(b), max(c))
from your_table;
Or:
select max(x)
from (select max(a) as x from your_table union all
select max(b) from your_table union all
select max(c) from your_table union all
)
You can try this
select max(value) as Max from (
select max(A) as value from example
union
select max(B) as value from example
union
select max(C) as value from example ) as tab;
It will also handle the NULL values present in the column.
WITH tempData (a, b, c) AS (SELECT NULL, 2, 3 FROM DUAL UNION ALL SELECT 4, 5, 6 FROM DUAL
UNION ALL SELECT 7, 8, NULL FROM DUAL)
SELECT GREATEST(MAX(a), MAX(b), MAX(c)) AS maxval, LEAST(MIN(a), MIN(b), MIN(c)) AS minval FROM tempData;
How about?
select greatest(NVL(c1, 0),NVL(c2, 0),NVL(c3, 0), NVL(c4, 0)) from T

BigQuery function to collapse row data into JSON or structure

I'm looking for a way to group by a number of columns in BigQuery but keep more detail than otherwise possible of the rows being aggregated.
Data:
ID A B C D
2 1 2 3 4
2 2 3 4 5
1 1 2 1 3
How my query will look something like this:
SELECT id, TAKE_ANY(a), sum(b), count(d), max(d), MAGIC(a,b,c,d) FROM table GROUP BY 1
And the output I would like is something like:
1, 1, 2, 1, 3, [ (1,2,1,3)]
2, 2, 5, 2, 5, [ (1,2,3,4), (2,3,4,5) ]
Anything exist like the MAGIC function that will package the data into a structure of some sort?
Below option (for BigQuery Standard SQL) is for the case when by [ (1,2,3,4), (2,3,4,5) ] you actually mean STRING vs. ARRAY of STRUCTs (which is not very clear from question but i see possible)
#standardSQL
SELECT
id,
ANY_VALUE(a) any_a,
SUM(b) sum_b,
COUNT(d) count_d,
MAX(d) max_d,
FORMAT('[%s]', STRING_AGG(FORMAT('(%i,%i,%i,%i)', a, b, c, d), ',')) a_b_c_d
FROM `project.dataset.table`
GROUP BY id
If to apply to dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 2 id, 1 a, 2 b, 3 c, 4 d UNION ALL
SELECT 2, 2, 3, 4, 5 UNION ALL
SELECT 1, 1, 2, 1, 3
)
SELECT
id,
ANY_VALUE(a) any_a,
SUM(b) sum_b,
COUNT(d) count_d,
MAX(d) max_d,
FORMAT('[%s]', STRING_AGG(FORMAT('(%i,%i,%i,%i)', a, b, c, d), ',')) a_b_c_d
FROM `project.dataset.table`
GROUP BY id
ORDER BY id
result will be
Row id any_a sum_b count_d max_d a_b_c_d
1 1 1 2 1 3 [(1,2,1,3)]
2 2 1 5 2 5 [(1,2,3,4),(2,3,4,5)]
Inside of your select list, use ARRAY_AGG with the STRUCT function. For example,
SELECT id, ARRAY_AGG(STRUCT(a, b, c, d))
FROM table
GROUP BY id
This will return an array containing all the values of those columns for each group.

Remove multiple entries for a column

I need two columns A and B but of them A has repeated values and B has single unique values. I have to fetch only those values of A which has max(C) value. C is another column.
You can use ROW_NUMBER.
ROW_NUMBER
Returns the sequential number of a row within a partition of a result
set, starting at 1 for the first row in each partition.
PARTITION BY value_expression
Divides the result set produced by the
FROM clause into partitions to which the ROW_NUMBER function is
applied. value_expression specifies the column by which the result set
is partitioned. If PARTITION BY is not specified, the function treats
all rows of the query result set as a single group.
ORDER BY
The ORDER BY clause determines the sequence in which the rows are assigned their unique ROW_NUMBER within a specified
partition. It is required.
Sample of ROW_NUMBER in your case:
SELECT A, B
FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY A ORDER BY C DESC) AS RowNum, A, B, C
FROM TableName
)
WHERE RowNum = 1
Use Row_Number Analytic function to do this
select A,B
from
(
select row_number() over(partition by A order by C desc)rn,A,B,C
from yourtable
)
where RN=1
An alternative to #NoDisplayName's solution is to use keep dense_rank first/last:
with your_table as (select 1 a, 3 b, 10 c from dual union all
select 1 a, 2 b, 20 c from dual union all
select 1 a, 1 b, 30 c from dual union all
select 2 a, 4 b, 40 c from dual union all
select 2 a, 5 b, 60 c from dual union all
select 2 a, 3 b, 60 c from dual union all
select 3 a, 6 b, 70 c from dual union all
select 4 a, 2 b, 80 c from dual)
select a,
max(b) keep (dense_rank first order by c desc) b,
max(c) max_c
from your_table
group by a;
A B MAX_C
---------- ---------- ----------
1 1 30
2 5 60
3 6 70
4 2 80
Using the INTERSECT keyword get those rows which have maximum value of ColC for the ColA.
select ColA, ColB from
(
select ColA, ColB, max(colC) from Tabl
group by ColA, ColB
intersect
select ColA, ColB, ColC from Tabl
) as A

Use a calculated field in the where clause

Is there a way to use a calculated field in the where clause?
I want to do something like
SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL
)
WHERE TOTAL <> 0
;
but I get ORA-00904: "TOTAL": invalid identifier.
So I have to use
SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL
)
WHERE a+b <> 0
;
Logically, the select clause is one of the last parts of a query evaluated, so the aliases and derived columns are not available. (Except to order by, which logically happens last.)
Using a derived table is away around this:
select *
from (SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL)
)
WHERE TOTAL <> 0
;
This will work...
select *
from (SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL)
) as Temp
WHERE TOTAL <> 0;