BigQuery: Concatenating arrays with window clause - google-bigquery

I am working with data from tables for which dummy data could be:
WITH
Sequences AS (
SELECT [0, 1, 3] AS some_numbers UNION ALL
SELECT [2, 4, 8] UNION ALL
SELECT [0, 5] UNION ALL
SELECT [2, 16] UNION ALL
SELECT [0, 7]
),
SELECT
some_numbers[ORDINAL(1)] AS grp,
some_numbers[ORDINAL(2)] AS sub_grp,
some_numbers
FROM Sequences
I want to combine the arrays within groups of grp but only till last 1 sub_grp.
I tried small things like:
SELECT
grp,
sub_grp,
ARRAY_CONCAT_AGG(some_numbers) OVER (PARTITION BY grp ORDER BY sub_grp ROWS 1 PRECEDING)
FROM numbers
However this results in error:
Analytic function ARRAY_CONCAT_AGG is not supported.
Any pointer how can I fix this problem?
EDIT Adding the expected output. I am expecting the output to be:

But regardless as I am hit by a hard wall of error; any hint which let me give working example, will be of immense help. I can adapt it to my use case :)
Hope below will unblock your efforts
select * except(arrs, pos),
format('%t', (select array_concat_agg(arr) from t.arrs)) grouped_numbers
from (
select grp, sub_grp,
array_agg(struct(some_numbers as arr)) over win arrs,
row_number() over(partition by grp order by sub_grp) pos
from sequences, unnest([struct(some_numbers[ordinal(1)] as grp, some_numbers[ordinal(2)] as sub_grp)])
window win as (partition by grp order by sub_grp rows between current row and 1 following)
) t
where mod(pos, 2) = 1
if applied to sample data in your question - output is

Related

Given an id, can you get the "index" of that items id in a sorted query

EDIT:
I am using android versions that don't have a sqlite version > 2.35. I cannot use ROW_NUMBER.
Given the following table:
id, date(long)
1, 100
2, 25
3, 5
4, 50
If I query for items sorted:
select * from items order by date:
id, date
3, 5
2, 25
4, 50
1, 100
If I have id 4, can I query to get the index in the sorted list, in this case index "3"
With ROW_NUMBER() window function:
SELECT rn
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY date DESC) rn
FROM items
)
WHERE id = 4;
An alternative, for versions of SQLite prior to 3.25.0+ which don't support window functions:
SELECT COUNT(*) + 1 rn
FROM items
WHERE date > (SELECT date FROM items WHERE id = 4);
See the demo.
You can use ROW_NUMBER() to get the location of the row according to a custom ordering. The query can look like:
select rn
from (
select t.*, row_number() over(order by date) as rn from t
) x
where id = 4

Difference in the output of query, using rank() and CTE

My first query looks like:
select trans.* from
( select
acc_num,
acc_type,
trans_amount,
load_date,
rank() over(partition by acc_num order by load_date) as rk
from monetary
where rat_code = 123
) trans
where trans.rk =1;
second query looks like
with a as (
select *,
row_number() over(partition by acc_num order by load_date) as rn
from monetary
where rat_code = 123 )
select
acc_num,
acc_type,
trans_amount,
load_date
from a
where rn =1;
Can any one please help me I am getting different number of records for both the cases.
though the query is same.
Its because there is difference between rank and row_number.
Below example will show
Accno, dt, rank_col, rownum_col
100, 2-jun-2022, 1, 1
100, 3-jun-2022, 1, 2
100, 1-jul-2022, 1, 3
54, 2-jun-2022, 4, 1
54, 1-jul-2022, 4, 2
In above example, you can see row number will calculate unique row id. Whereas rank gives unique id but in a continuous manner. You can see from above example, rank=1 gives you 3 rows but rownum=1 gives only two.

Number rows 1, 2, 3, 1, 2, 3...ect

SELECT ROW_NUMBER() OVER() AS levels,
name
FROM players_table
The above gives me the following
However, I'm looking to do this (start counting 1-3 every 3 rows) while still preserving the order.
You can take the modulus
SELECT (ROW_NUMBER() OVER() % 3) + 1 AS levels,
name
FROM players_table
select * from (select (row_number() over() %4) levels,
name
from
players_table)t
where levels > 0

Nth result in BigQuery Group By

I have a derived table like:
id, desc, total, account
1, one, 10, a
1, one, 9, b
1, one, 3, c
2, two, 27, c
I can do a simple
select id, desc, sum(total) as total from mytable group by id
but I want to add the equivalent first(account), first(total), second(account), second(total) to the output so it'd be:
id, desc, total, first_account, first_account_total, second_account, second_account_total
1, one, 21, a, 10, b, 9
2, two, 27, c, 27, null, 0
Any pointers?
Thanks in advance!
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, `desc`, total,
arr[OFFSET(0)].account AS first_account,
arr[OFFSET(0)].total AS first_account_total,
arr[SAFE_OFFSET(1)].account AS second_account,
arr[SAFE_OFFSET(1)].total AS second_account_total
FROM (
SELECT id, `desc`, SUM(total) total,
ARRAY_AGG(STRUCT(account, total) ORDER BY total DESC LIMIT 2) arr
FROM `project.dataset.table`
GROUP BY id, `desc`
)
In cases when more than 2 first bins are required I would use below pattern that eliminates repeating of heavy repeated lines like arr[SAFE_OFFSET(1)].total AS second_account_total
#standardSQL
SELECT * FROM (SELECT NULL id, '' `desc`, NULL total, '' first_account, NULL first_account_total, '' second_account, NULL second_account_total) WHERE FALSE
UNION ALL
SELECT id, `desc`, total, arr[OFFSET(0)].*, arr[SAFE_OFFSET(1)].*
FROM (
SELECT id, `desc`, SUM(total) total,
ARRAY_AGG(STRUCT(account, total) ORDER BY total DESC LIMIT 2) arr
FROM `project.dataset.table`
GROUP BY id, `desc`
)
In above, first line sets layout of output while returning no rows at all because of WHERE FALSE, so then I don't need to explicitly parse struct's elements and provide aliases

Differences between row in google big query

I'm currently attempting to calculate differences between rows in google big query. I actually have a working query.
SELECT
id, record_time, level, lag,
(level - lag) as diff
FROM (
SELECT
id, record_time, level,
LAG(level) OVER (ORDER BY id, record_time) as lag
FROM (
SELECT
*
FROM
TABLE_QUERY(MY_TABLES))
ORDER BY
1, 2 ASC
)
GROUP BY 1, 2, 3, 4
ORDER BY 1, 2 ASC
But I'm working with big data and sometimes I have memory limit warning that does not let me execute the query. So, I would like to understand why I cant do an optimized query like bellow. I think it will allow work with more records without memory limit warning.
SELECT
id, record_time, level,
level - LAG(level, 1) OVER (ORDER BY id, record_time) as diff
FROM (
SELECT
*
FROM
TABLE_QUERY(MY_TABLES))
ORDER BY
1, 2 ASC
This kind of function level - LAG(level, 1) OVER (ORDER BY id, record_time) as diff, when the query is executed returns the error
Missing function in Analytic Expression
on Big Query.
I also tried to put ( ) into this function but it does not work as well.
Thanks for helping me!
It works fine for me. Maybe you forgot to enable standard SQL? Here is an example:
WITH Input AS (
SELECT 1 AS id, TIMESTAMP '2017-10-17 00:00:00' AS record_time, 2 AS level UNION ALL
SELECT 2, TIMESTAMP '2017-10-16 00:00:00', 3 UNION ALL
SELECT 1, TIMESTAMP '2017-10-16 00:00:00', 4
)
SELECT
id, record_time, level, lag,
(level - lag) as diff
FROM (
SELECT
id, record_time, level,
LAG(level) OVER (ORDER BY id, record_time) as lag
FROM Input
)
GROUP BY 1, 2, 3, 4
ORDER BY 1, 2 ASC;