BQ - getting a field from an array of structs without join - sql

I have a table with the following columns:
items ARRAY<STRUCT<label STRING, counter INTEGER>>
explore BOOLEAN
For each record I would like to choose the label with the highest counter, and then count explore on each unique label.
Ideally I would like to run something like:
SELECT FIRST_VALUE(items.label) OVER (ORDER BY items.counter DESC) as label,
COUNT(explore) as explore
FROM my_table
GROUP BY 1
If this is the data in my table:
explore items
1 [['A',1],['B',3]]
1 [['B',1]]
0. [['C',2],['D',1]]
Then I would like to get:
label explore
'B' 2
'C' 1

Consider below approach
select ( select label from t.items
order by counter desc limit 1
) label,
count(*) explore
from your_table t
group by label
if applied to sample data in your question
with your_table as (
select 1 explore, [struct('A' as label, 1 as counter), struct('B' as label, 3 as counter) ] items union all
select 1, [struct('B', 1)] union all
select 0, [struct('C', 2), struct('D', 1) ]
)
output is

Using your sample data, consider below approach.
with data as (
select 1 as explore, [STRUCT( 'A' as label, 1 as counter), STRUCT( 'B' as label, 3 as counter) ] as items,
union all select 1 as explore, [STRUCT( 'B' as label, 1 as counter)] as items,
union all select 0 as explore, [STRUCT( 'C' as label, 2 as counter), STRUCT( 'D' as label, 1 as counter) ] as items
),
add_row_num as (
SELECT
explore,
items,
row_number() over (order by explore desc) as row_number
FROM data
),
get_highest_label as (
select
explore,
row_number,
label,
counter,
first_value(label) over (partition by row_number order by counter desc) as highest_label_per_row
from add_row_num, unnest(items)
),
-- https://stackoverflow.com/questions/36675521/delete-duplicate-rows-from-a-bigquery-table (REMOVE DUPLICATE)
remove_dups as (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY row_number) as new_row_number
FROM get_highest_label
)
select
highest_label_per_row,
count(highest_label_per_row) as explore,
from remove_dups
where new_row_number = 1
group by highest_label_per_row
Output:

Related

How to summarize all values in column over excluded window in BigQuery and then set results into this excluded window?

I need to aggregate values over excluded window the get results in this excluded window
I tried this query, but returns sum over whole field
WITH t AS (
SELECT 'a1' id, 10 value
UNION ALL
SELECT 'a1' id, 20 value
UNION ALL
SELECT 'a2' id, 20 value
UNION ALL
SELECT 'a2' id, 40 value
UNION ALL
SELECT 'a2' id, 12 value
UNION ALL
SELECT 'a3' id, 44 value
UNION ALL
SELECT 'a3' id, 34 value
)
SELECT id,
SUM(value) OVER (PARTITION BY id<>id) sum_all_except_current_id
FROM t
Also tried
SUM(value) OVER (PARTITION BY id, id<>id),
SUM(value) OVER (ORDER BY id<>id),
SUM(value) OVER (PARTITION BY id ORDER BY id<>id),
SUM(value) OVER (PARTITION BY id, id<>id),
SUM(value) OVER (PARTITION BY id<>id) these does't work for me.
What I do wrong?
If you want the sum of all other ids, then you can use -:
select t.*,
sum(value) over () - sum(value) over (partition by id) as excluded_sum
from t;

Bigquery uncommon elements in array

Let us say you have a column of arrays like this , I am trying to group the rows based on the count of uncommon elements. Once the number of distinct uncommon elements reaches 5, it will be in the next group.
In the below example, first three rows will be group 1 because the uncommon elements are ['3','4','6','7'] which is 4 in length, but if you add the next row to the group , the array of distinct uncommon element would be ['1','3','4','5','6','7'] it will exceed the limit of 5 distinct uncommon elements.
with arr as (
select 1 ord, ['1','2','3','4'] as ar
union all
select 2, ['1','2','3']
union all
select 3,['1','2','6','7']
union all
select 4,['2','4','5','7']
union all
select 5, ['string1','5','6','7','8']
)
select * from arr
I am looking for an output like below
Code I have written so far but definitely missing a big piece. Adding it just in case if it is helpful
with arr as (
select 1 ord, ['1','2','3','4'] as ar,1 subclass
union all
select 2, ['1','2','3'],1
union all
select 3,['1','2','6','7'],1
union all
select 4,['2','4','5','7'],1
union all
select 5, ['string1','5','6','7','8'],1
)
, history_t as (
select a.* ,
ARRAY_AGG(struct(ar)) OVER (PARTITION BY SUBCLASS ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as history
from arr a )
, tem2 as (
select a.* except(history,ar),
(SELECT COUNT(1) FROM UNNEST(history) AS col ) AS array_cnt
,b.ar unnest1
from history_t a
,unnest(history) b
)
, tem3 as (
select a.* except(unnest1),sku_lst
from tem2 a , unnest(unnest1) sku_lst
)
, all_sku_freq as (
select
ord, array_cnt , sku_lst , subclass,count(*) sku_freq
from tem3
group by 1,2,3,4 )
, uncommon_sku_cnt as (
select ord, subclass, count( sku_lst) uncommon_sku_count from all_sku_freq where sku_freq <> array_cnt group by 1,2 )
,rolling_uncomm_sku_cnt as (
select a.*, sum(uncommon_sku_count) over(partition by subclass order by ord asc range between unbounded preceding and current row ) roll_uncomm_sku_cnt
from uncommon_sku_cnt a
)
select a.* from rolling_uncomm_sku_cnt a

sql - single line per distinct values in a given column

is there a way using sql, in bigquery more specifically, to get one line per unique value in a given column
I know that this is possible using a sequence of union queries where you have as much union as distinct values as there is in the column of interest. but i'm wondering if there is a better way to do it.
You can use row_number():
select t.* except (seqnum)
from (select t.*, row_number() over (partition by col order by col) as seqnum
from t
) t
where seqnum = 1;
This returns an arbitrary row. You can control which row by adjusting the order by.
Another fun solution in BigQuery uses structs:
select array_agg(t limit 1)[ordinal(1)].*
from t
group by col;
You can add an order by (order by X limit 1) if you want a particular row.
here is just a more formated format :
select tab.* except(seqnum)
from (
select *, row_number() over (partition by column_x order by column_x) as seqnum
from `project.dataset.table`
) as tab
where seqnum = 1
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
You can test, play with above using dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 col UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
SELECT 4, 2 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 6, 3
)
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
with result
Row id col
1 1 1
2 4 2
3 6 3

Find Top Most AND Lowest In a Table's Group Column

I have a table and there are 4 fields in it, ID, Price, QTY, Ratting and Optional [Position].
I have all the records Grouped By Columns [Qty,Ratting]
I have to define the position of groupwise and store that Position into Optional column.
For better understanding I have added an image with data in table:
On the basis of QTY in Each Rating I have to Mark Top3, Bottom3 and Rest of them as remaining.
I am not getting how to do it.
Can anybody suggest me how to do it?
So far what I've tried is:
Declare #RankTable TABLE
(
ID INT,
Price Decimal (10,2),
Qty INT,
Ratting INT
)
INSERT INTO #RankTable
SELECT 1,10,15,1
UNION ALL
SELECT 2,11,11,1
UNION ALL
SELECT 3,96,10,1
UNION ALL
SELECT 4,96,8,1
UNION ALL
SELECT 5,56,7,1
UNION ALL
SELECT 6,74,5,1
UNION ALL
SELECT 7,93,4,1
UNION ALL
SELECT 8,98,2,1
UNION ALL
SELECT 9,12,1,1
UNION ALL
SELECT 10,32,80,2
UNION ALL
SELECT 11,74,68,2
UNION ALL
SELECT 12,58,57,2
UNION ALL
SELECT 13,37,43,2
UNION ALL
SELECT 14,79,32,2
UNION ALL
SELECT 15,29,28,2
UNION ALL
SELECT 16,46,17,2
UNION ALL
SELECT 17,86,13,2
UNION ALL
SELECT 19,75,110,3
UNION ALL
SELECT 20,27,108,3
UNION ALL
SELECT 21,38,104,3
UNION ALL
SELECT 22,87,100,3
UNION ALL
SELECT 23,47,89,3
DECLARE #PositionGroup VARCHAR(1)
SELECT *,ISNULL(#PositionGroup,'') AS Position FROM #RankTable
You can try this:
SELECT ID
,Price
,Qty
,Ratting
,CASE WHEN RowID >= 1 AND RowID <= 3
THEN 0
ELSE CASE WHEN RowID > Total - 3 THEN 1 ELSE 2 END END AS Position
FROM (SELECT ID
,Price
,Qty
,Ratting
,COUNT(*) OVER(PARTITION BY Ratting) AS Total
,ROW_NUMBER() OVER(PARTITION BY Ratting ORDER BY Qty DESC) AS RowID
,ISNULL(#PositionGroup,'') AS Position
FROM #RankTable) AS T
Use Window Function. Try this.
;WITH cte
AS (SELECT *,
Row_number()OVER(partition BY rating ORDER BY id) rn,
count(id)OVER(partition BY rating) mx
FROM #RankTable)
SELECT ID,
Price,
Qty,
Rating,
mx - rn,
CASE WHEN rn IN ( 1, 2, 3 ) THEN 0
WHEN mx - rn IN( 0, 1, 2 ) THEN 1
ELSE 2
END position
FROM cte
try this as well.
;WITH cte AS
(
SELECT MAX(Row) [Max],
MIN(Row) [Min],
LU.Ratting
FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Ratting ORDER BY Qty DESC) Row
FROM #RankTable)LU
GROUP BY LU.Ratting
)
SELECT ID,
R.Price,
R.Qty,
cte.Ratting,
CASE WHEN (Row - Min) <= 2 THEN 0 WHEN (Max - Row) <= 2 THEN 1 ELSE 2 END Position
FROM cte
JOIN (
SELECT Ratting,
ID,
Price,
Qty,
ROW_NUMBER() OVER(PARTITION BY Ratting ORDER BY Qty DESC) [Row]
FROM #RankTable
) R ON R.Ratting = cte.Ratting
Result:

Oracle SQL -- Analytic functions OVER a group?

My table:
ID NUM VAL
1 1 Hello
1 2 Goodbye
2 2 Hey
2 4 What's up?
3 5 See you
If I want to return the max number for each ID, it's really nice and clean:
SELECT MAX(NUM) FROM table GROUP BY (ID)
But what if I want to grab the value associated with the max of each number for each ID?
Why can't I do:
SELECT MAX(NUM) OVER (ORDER BY NUM) FROM table GROUP BY (ID)
Why is that an error? I'd like to have this select grouped by ID, rather than partitioning separately for each window...
EDIT: The error is "not a GROUP BY expression".
You could probably use the MAX() KEEP(DENSE_RANK LAST...) function:
with sample_data as (
select 1 id, 1 num, 'Hello' val from dual union all
select 1 id, 2 num, 'Goodbye' val from dual union all
select 2 id, 2 num, 'Hey' val from dual union all
select 2 id, 4 num, 'What''s up?' val from dual union all
select 3 id, 5 num, 'See you' val from dual)
select id, max(num), max(val) keep (dense_rank last order by num)
from sample_data
group by id;
When you use windowing function, you don't need to use GROUP BY anymore, this would suffice:
select id,
max(num) over(partition by id)
from x
Actually you can get the result without using windowing function:
select *
from x
where (id,num) in
(
select id, max(num)
from x
group by id
)
Output:
ID NUM VAL
1 2 Goodbye
2 4 What's up
3 5 SEE YOU
http://www.sqlfiddle.com/#!4/a9a07/7
If you want to use windowing function, you might do this:
select id, val,
case when num = max(num) over(partition by id) then
1
else
0
end as to_select
from x
where to_select = 1
Or this:
select id, val
from x
where num = max(num) over(partition by id)
But since it's not allowed to do those, you have to do this:
with list as
(
select id, val,
case when num = max(num) over(partition by id) then
1
else
0
end as to_select
from x
)
select *
from list
where to_select = 1
http://www.sqlfiddle.com/#!4/a9a07/19
If you're looking to get the rows which contain the values from MAX(num) GROUP BY id, this tends to be a common pattern...
WITH
sequenced_data
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY id ORDER BY num DESC) AS sequence_id,
*
FROM
yourTable
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
EDIT
I don't know if TeraData will allow this, but the logic seems to make sense...
SELECT
*
FROM
yourTable
WHERE
num = MAX(num) OVER (PARTITION BY id)
Or maybe...
SELECT
*
FROM
(
SELECT
*,
MAX(num) OVER (PARTITION BY id) AS max_num_by_id
FROM
yourTable
)
AS sub_query
WHERE
num = max_num_by_id
This is slightly different from my previous answer; if multiple records are tied with the same MAX(num), this will return all of them, the other answer will only ever return one.
EDIT
In your proposed SQL the error relates to the fact that the OVER() clause contains a field not in your GROUP BY. It's like trying to do this...
SELECT id, num FROM yourTable GROUP BY id
num is invalid, because there can be multiple values in that field for each row returned (with the rows returned being defined by GROUP BY id).
In the same way, you can't put num inside the OVER() clause.
SELECT
id,
MAX(num), <-- Valid as it is an aggregate
MAX(num) <-- still valid
OVER(PARTITION BY id), <-- Also valid, as id is in the GROUP BY
MAX(num) <-- still valid
OVER(PARTITION BY num) <-- Not valid, as num is not in the GROUP BY
FROM
yourTable
GROUP BY
id
See this question for when you can't specify something in the OVER() clause, and an answer showing when (I think) you can: over-partition-by-question