I'm looking for a way to group by a number of columns in BigQuery but keep more detail than otherwise possible of the rows being aggregated.
Data:
ID A B C D
2 1 2 3 4
2 2 3 4 5
1 1 2 1 3
How my query will look something like this:
SELECT id, TAKE_ANY(a), sum(b), count(d), max(d), MAGIC(a,b,c,d) FROM table GROUP BY 1
And the output I would like is something like:
1, 1, 2, 1, 3, [ (1,2,1,3)]
2, 2, 5, 2, 5, [ (1,2,3,4), (2,3,4,5) ]
Anything exist like the MAGIC function that will package the data into a structure of some sort?
Below option (for BigQuery Standard SQL) is for the case when by [ (1,2,3,4), (2,3,4,5) ] you actually mean STRING vs. ARRAY of STRUCTs (which is not very clear from question but i see possible)
#standardSQL
SELECT
id,
ANY_VALUE(a) any_a,
SUM(b) sum_b,
COUNT(d) count_d,
MAX(d) max_d,
FORMAT('[%s]', STRING_AGG(FORMAT('(%i,%i,%i,%i)', a, b, c, d), ',')) a_b_c_d
FROM `project.dataset.table`
GROUP BY id
If to apply to dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 2 id, 1 a, 2 b, 3 c, 4 d UNION ALL
SELECT 2, 2, 3, 4, 5 UNION ALL
SELECT 1, 1, 2, 1, 3
)
SELECT
id,
ANY_VALUE(a) any_a,
SUM(b) sum_b,
COUNT(d) count_d,
MAX(d) max_d,
FORMAT('[%s]', STRING_AGG(FORMAT('(%i,%i,%i,%i)', a, b, c, d), ',')) a_b_c_d
FROM `project.dataset.table`
GROUP BY id
ORDER BY id
result will be
Row id any_a sum_b count_d max_d a_b_c_d
1 1 1 2 1 3 [(1,2,1,3)]
2 2 1 5 2 5 [(1,2,3,4),(2,3,4,5)]
Inside of your select list, use ARRAY_AGG with the STRUCT function. For example,
SELECT id, ARRAY_AGG(STRUCT(a, b, c, d))
FROM table
GROUP BY id
This will return an array containing all the values of those columns for each group.
Related
I'm trying to count distinct by the link between two columns.
Here is the example.
rownum
type
id
1
A
a
2
A
b
3
B
b
4
B
c
5
C
c
6
C
d
If I count distinct by type column, it returns 3. However, what I'd like to do is to consider rownum 2 and 3, 4 and 5 are not distinctive because they got the same value on id column.
To rephrase,
type
array of id
A
a, b
B
b, c
C
c, d
Since A and B got same b, and B and C got same c on their arrays, it would return 1 as a result.
I have no idea where to start. Would appreciate if I can get any hint or something.
Consider below:
you might use STRING_AGG
WITH TMP_TBL AS
(
SELECT 1 AS ROWNUM, 'A' AS TYPE, 'a' AS ID UNION ALL
SELECT 2,'A','b' UNION ALL
SELECT 3,'B','b' UNION ALL
SELECT 4,'B','b' UNION ALL
SELECT 5,'C','c' UNION ALL
SELECT 6,'C','d'
);
SELECT DISTINCT TYPE,N_ID
FROM
(
SELECT TYPE,STRING_AGG(ID)OVER(PARTITION BY TYPE) AS N_ID FROM TMP_TBL
)
I have a table data with an array type columns. For each row of the table I want to calculate average and median of those respective column values_*.
Example Table data:
id values_1 values_2
a 2
4
b 10
4
16
c NULL
6
d NULL
NULL
Sample expected output:
id avg_values_1 median_values_1 avg_values_2 median_values_2
a 3 3
b 7.5 10
c 6 6
d NULL NULL
Below is for BigQuery Standard SQL
#standardSQL
WITH temp AS (
SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
FROM `project.dataset.table`
)
SELECT id,
(SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
(SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, [2, 4] values_1 UNION ALL
SELECT 'b', [10, 4, 16] UNION ALL
SELECT 'c', [6, NULL] UNION ALL
SELECT 'd', [NULL, NULL]
), temp AS (
SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
FROM `project.dataset.table`
)
SELECT id,
(SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
(SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp
with output
Row id avg_values_1 median_values_1
1 a 3.0 3.0
2 b 10.0 10.0
3 c 6.0 6.0
4 d null null
Note that I had to first introduce temp CTE to eliminate NULL elements from arrays
You can repeat this construct for as many columns as you need/have
Or, if you have more columns than just few - you can use approach shown in https://stackoverflow.com/a/63105643/5221944 to dynamically build and execute the query for all columns at once!
I had the following code snippet
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS b, 4 AS a)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS b, 3 AS a), STRUCT (7 AS b, 4 AS a)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
And I got the following output
Row matching_rows
1 2
2 3
As per the where condition the result must be 3rd row only. Why it shows 2nd row also..?
Struct fields are unioned by position, not name. So this:
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS b, 4 AS a)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS b, 3 AS a), STRUCT (7 AS b, 4 AS a)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
is equivalent to this:
WITH sequences AS
(SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers
UNION ALL SELECT 2 AS id, [STRUCT(2 AS a, 4 AS b)] AS some_numbers
UNION ALL SELECT 3 AS id, [STRUCT(5 AS a, 3 AS b), STRUCT (7 AS a, 4 AS b)]
AS some_numbers)
SELECT id AS matching_rows
FROM sequences
WHERE EXISTS (SELECT 1
FROM UNNEST(some_numbers)
WHERE b > 3);
Aside from the first query in the union, too, you can remove the AS <name> aliases since they don't affect the result.
select maximum value from different columns of the table
For example, Table
A B C
-------
1 2 3
4 5 6
7 8 9
Result would be like
Max
9
Assuming the values are never NULL, I would simply do:
select max(greatest(a, b, c))
from t;
You could also phrase this as:
select greatest(max(a), max(b), max(c))
from t;
This version is more resilient to NULL values. It will work with NULLs unless all values for a column are NULL.
Here's one option, which uses GREATEST and LEAST functions, enclosed into MAX and MIN aggregates:
SQL> with test (a, b, c) as
2 (select 1, 2, 3 from dual union all
3 select 4, 5, 6 from dual union all
4 select 7, 8, 9 from dual
5 )
6 select max(greatest(a, b, c)) max_result,
7 min(least(a, b, c)) min_result
8 from test;
MAX_RESULT MIN_RESULT
---------- ----------
9 1
SQL>
What about:
select greatest(max(a), max(b), max(c))
from your_table;
Or:
select max(x)
from (select max(a) as x from your_table union all
select max(b) from your_table union all
select max(c) from your_table union all
)
You can try this
select max(value) as Max from (
select max(A) as value from example
union
select max(B) as value from example
union
select max(C) as value from example ) as tab;
It will also handle the NULL values present in the column.
WITH tempData (a, b, c) AS (SELECT NULL, 2, 3 FROM DUAL UNION ALL SELECT 4, 5, 6 FROM DUAL
UNION ALL SELECT 7, 8, NULL FROM DUAL)
SELECT GREATEST(MAX(a), MAX(b), MAX(c)) AS maxval, LEAST(MIN(a), MIN(b), MIN(c)) AS minval FROM tempData;
How about?
select greatest(NVL(c1, 0),NVL(c2, 0),NVL(c3, 0), NVL(c4, 0)) from T
ID A B c D E(Time)
---------------------------
1 J 1 A B 1
2 J 1 A S 2
3 M 1 A B 1
4 M 1 A B 2
5 M 2 A S 3
6 M 2 A S 4
7 T 1 A B 1
8 T 2 A S 2
9 T 1 A B 3
10 k 1 A B 1
11 k 1 A B 2
I need to find unique values with multiple column with some added condition. The unique value are combination of Col A,B and D.
If Col A has only two rows (like record 1 and 2) and the Column B is same on both data and there is a different value as in Column D , BUT the S are only coming after B we dont want to see those records
If Col A has only multiple rows (like record 3 to 6 ) with different Col B and D,
whereas in COulmn D S are coming after B we dont want to see those records.
If Col A has only multiple rows (like record 7 to 9 ) with different Col B and D,
whereas in COulmn D there is a S before B we want to see those records.
If Col A has only multiple rows (like record 10 to 11 ) with different Col B and same column D we dont want to see those records.
any input , able to get to see the first and last of it using partition by and using unbounded in query...
Seems like the basic logic to look for is on to See if S preceds any B on Column D then show all those records using the partition...
Desired output is row 7-9: THis is Based on logic for same column A , we had a Sell before Buy from customer on Column D when order by column E time.
ID A B C D E(Time)
---------------------------------------------------
7 T 1 A B 1
8 T 2 A S 2
9 T 1 A B 3
I started to write a query to do this, but ran out of "Spare time", your criteria is very hard to follow, If you comment out the "Where" at the bottom of the query it functions but doesn't yet produce your desired effect.
Possibly this can lead you in a direction to achive your goal ...
WITH Src AS (
SELECT 1 AS ID, 'J' AS A, 1 AS B, 'A' AS C, 'B' AS D, 1 AS E
UNION ALL SELECT 2, 'J', 1, 'A', 'S', 2
UNION ALL SELECT 3, 'M', 1, 'A', 'B', 1
UNION ALL SELECT 4, 'M', 1, 'A', 'B', 2
UNION ALL SELECT 5, 'M', 2, 'A', 'S', 3
UNION ALL SELECT 6, 'M', 2, 'A', 'S', 4
UNION ALL SELECT 7, 'T', 1, 'A', 'B', 1
UNION ALL SELECT 8, 'T', 2, 'A', 'S', 2
UNION ALL SELECT 9, 'T', 1, 'A', 'B', 3
UNION ALL SELECT 10, 'k', 1, 'A', 'B', 1
UNION ALL SELECT 11, 'k', 1, 'A', 'B', 2
), ACnt AS (
SELECT A, Count(*) AS Cnt
FROM Src
GROUP BY A
), FirstD AS (
SELECT A, D
FROM Src
WHERE E=1
), FirstSRow AS (
SELECT A, Min(E) AS E
FROM Src
WHERE D='S'
GROUP BY A
), LastBRow AS (
SELECT A, Max(E) AS E
FROM Src
WHERE D='B'
GROUP BY A
), Mins AS (
SELECT A, Min(D) AS D, Min(B) AS B
FROM Src
GROUP BY A
), Maxs AS (
SELECT A, Max(D) AS D, Max(B) AS B
FROM Src
GROUP BY A
)
SELECT Src.*
FROM Src
JOIN ACnt ON ACnt.A=Src.A
JOIN FirstD ON FirstD.A=Src.A
JOIN Mins ON Mins.A=Src.A
JOIN Maxs ON Maxs.A=Src.A
LEFT JOIN FirstSRow ON FirstSRow.A=Src.A
LEFT JOIN LastBRow ON LastBRow.A=Src.A
WHERE
NOT (ACnt.Cnt=2 AND Mins.B=Maxs.B AND Mins.D<>Maxs.D AND FirstSRow.E < LastBRow.E)
AND NOT (ACnt.Cnt>=3 AND Mins.B<>Maxs.B AND Mins.D<>Maxs.D AND FirstD.D='B')
AND (ACnt.Cnt>=3 AND Mins.B<>Maxs.B AND Mins.D<>Maxs.D AND FirstD.D='B')