I had the following code snippet
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS b, 4 AS a)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS b, 3 AS a), STRUCT (7 AS b, 4 AS a)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
And I got the following output
Row matching_rows
1 2
2 3
As per the where condition the result must be 3rd row only. Why it shows 2nd row also..?
Struct fields are unioned by position, not name. So this:
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS b, 4 AS a)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS b, 3 AS a), STRUCT (7 AS b, 4 AS a)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
is equivalent to this:
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS a, 4 AS b)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS a, 3 AS b), STRUCT (7 AS a, 4 AS b)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
Aside from the first query in the union, too, you can remove the AS <name> aliases since they don't affect the result.
Related
I'm trying to count distinct by the link between two columns.
Here is the example.
rownum
type
id
1
A
a
2
A
b
3
B
b
4
B
c
5
C
c
6
C
d
If I count distinct by type column, it returns 3. However, what I'd like to do is to consider rownum 2 and 3, 4 and 5 are not distinctive because they got the same value on id column.
To rephrase,
type
array of id
A
a, b
B
b, c
C
c, d
Since A and B got same b, and B and C got same c on their arrays, it would return 1 as a result.
I have no idea where to start. Would appreciate if I can get any hint or something.
Consider below:
you might use STRING_AGG
WITH TMP_TBL AS
(
SELECT 1 AS ROWNUM, 'A' AS TYPE, 'a' AS ID UNION ALL
SELECT 2,'A','b' UNION ALL
SELECT 3,'B','b' UNION ALL
SELECT 4,'B','b' UNION ALL
SELECT 5,'C','c' UNION ALL
SELECT 6,'C','d'
);
SELECT DISTINCT TYPE,N_ID
FROM
(
SELECT TYPE,STRING_AGG(ID)OVER(PARTITION BY TYPE) AS N_ID FROM TMP_TBL
)
select maximum value from different columns of the table
For example, Table
A B C
-------
1 2 3
4 5 6
7 8 9
Result would be like
Max
9
Assuming the values are never NULL, I would simply do:
select max(greatest(a, b, c))
from t;
You could also phrase this as:
select greatest(max(a), max(b), max(c))
from t;
This version is more resilient to NULL values. It will work with NULLs unless all values for a column are NULL.
Here's one option, which uses GREATEST and LEAST functions, enclosed into MAX and MIN aggregates:
SQL> with test (a, b, c) as
2 (select 1, 2, 3 from dual union all
3 select 4, 5, 6 from dual union all
4 select 7, 8, 9 from dual
5 )
6 select max(greatest(a, b, c)) max_result,
7 min(least(a, b, c)) min_result
8 from test;
MAX_RESULT MIN_RESULT
---------- ----------
9 1
SQL>
What about:
select greatest(max(a), max(b), max(c))
from your_table;
Or:
select max(x)
from (select max(a) as x from your_table union all
select max(b) from your_table union all
select max(c) from your_table union all
)
You can try this
select max(value) as Max from (
select max(A) as value from example
union
select max(B) as value from example
union
select max(C) as value from example ) as tab;
It will also handle the NULL values present in the column.
WITH tempData (a, b, c) AS (SELECT NULL, 2, 3 FROM DUAL UNION ALL SELECT 4, 5, 6 FROM DUAL
UNION ALL SELECT 7, 8, NULL FROM DUAL)
SELECT GREATEST(MAX(a), MAX(b), MAX(c)) AS maxval, LEAST(MIN(a), MIN(b), MIN(c)) AS minval FROM tempData;
How about?
select greatest(NVL(c1, 0),NVL(c2, 0),NVL(c3, 0), NVL(c4, 0)) from T
I'm looking for a way to group by a number of columns in BigQuery but keep more detail than otherwise possible of the rows being aggregated.
Data:
ID A B C D
2 1 2 3 4
2 2 3 4 5
1 1 2 1 3
How my query will look something like this:
SELECT id, TAKE_ANY(a), sum(b), count(d), max(d), MAGIC(a,b,c,d) FROM table GROUP BY 1
And the output I would like is something like:
1, 1, 2, 1, 3, [ (1,2,1,3)]
2, 2, 5, 2, 5, [ (1,2,3,4), (2,3,4,5) ]
Anything exist like the MAGIC function that will package the data into a structure of some sort?
Below option (for BigQuery Standard SQL) is for the case when by [ (1,2,3,4), (2,3,4,5) ] you actually mean STRING vs. ARRAY of STRUCTs (which is not very clear from question but i see possible)
#standardSQL
SELECT
id,
ANY_VALUE(a) any_a,
SUM(b) sum_b,
COUNT(d) count_d,
MAX(d) max_d,
FORMAT('[%s]', STRING_AGG(FORMAT('(%i,%i,%i,%i)', a, b, c, d), ',')) a_b_c_d
FROM `project.dataset.table`
GROUP BY id
If to apply to dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 2 id, 1 a, 2 b, 3 c, 4 d UNION ALL
SELECT 2, 2, 3, 4, 5 UNION ALL
SELECT 1, 1, 2, 1, 3
)
SELECT
id,
ANY_VALUE(a) any_a,
SUM(b) sum_b,
COUNT(d) count_d,
MAX(d) max_d,
FORMAT('[%s]', STRING_AGG(FORMAT('(%i,%i,%i,%i)', a, b, c, d), ',')) a_b_c_d
FROM `project.dataset.table`
GROUP BY id
ORDER BY id
result will be
Row id any_a sum_b count_d max_d a_b_c_d
1 1 1 2 1 3 [(1,2,1,3)]
2 2 1 5 2 5 [(1,2,3,4),(2,3,4,5)]
Inside of your select list, use ARRAY_AGG with the STRUCT function. For example,
SELECT id, ARRAY_AGG(STRUCT(a, b, c, d))
FROM table
GROUP BY id
This will return an array containing all the values of those columns for each group.
I need two columns A and B but of them A has repeated values and B has single unique values. I have to fetch only those values of A which has max(C) value. C is another column.
You can use ROW_NUMBER.
ROW_NUMBER
Returns the sequential number of a row within a partition of a result
set, starting at 1 for the first row in each partition.
PARTITION BY value_expression
Divides the result set produced by the
FROM clause into partitions to which the ROW_NUMBER function is
applied. value_expression specifies the column by which the result set
is partitioned. If PARTITION BY is not specified, the function treats
all rows of the query result set as a single group.
ORDER BY
The ORDER BY clause determines the sequence in which the rows are assigned their unique ROW_NUMBER within a specified
partition. It is required.
Sample of ROW_NUMBER in your case:
SELECT A, B
FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY A ORDER BY C DESC) AS RowNum, A, B, C
FROM TableName
)
WHERE RowNum = 1
Use Row_Number Analytic function to do this
select A,B
from
(
select row_number() over(partition by A order by C desc)rn,A,B,C
from yourtable
)
where RN=1
An alternative to #NoDisplayName's solution is to use keep dense_rank first/last:
with your_table as (select 1 a, 3 b, 10 c from dual union all
select 1 a, 2 b, 20 c from dual union all
select 1 a, 1 b, 30 c from dual union all
select 2 a, 4 b, 40 c from dual union all
select 2 a, 5 b, 60 c from dual union all
select 2 a, 3 b, 60 c from dual union all
select 3 a, 6 b, 70 c from dual union all
select 4 a, 2 b, 80 c from dual)
select a,
max(b) keep (dense_rank first order by c desc) b,
max(c) max_c
from your_table
group by a;
A B MAX_C
---------- ---------- ----------
1 1 30
2 5 60
3 6 70
4 2 80
Using the INTERSECT keyword get those rows which have maximum value of ColC for the ColA.
select ColA, ColB from
(
select ColA, ColB, max(colC) from Tabl
group by ColA, ColB
intersect
select ColA, ColB, ColC from Tabl
) as A
Is there a way to use a calculated field in the where clause?
I want to do something like
SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL
)
WHERE TOTAL <> 0
;
but I get ORA-00904: "TOTAL": invalid identifier.
So I have to use
SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL
)
WHERE a+b <> 0
;
Logically, the select clause is one of the last parts of a query evaluated, so the aliases and derived columns are not available. (Except to order by, which logically happens last.)
Using a derived table is away around this:
select *
from (SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL)
)
WHERE TOTAL <> 0
;
This will work...
select *
from (SELECT a, b, a+b as TOTAL FROM (
select 7 as a, 8 as b FROM DUAL
UNION ALL
select 8 as a, 8 as b FROM DUAL
UNION ALL
select 0 as a, 0 as b FROM DUAL)
) as Temp
WHERE TOTAL <> 0;