Nth result in BigQuery Group By - google-bigquery

I have a derived table like:
id, desc, total, account
1, one, 10, a
1, one, 9, b
1, one, 3, c
2, two, 27, c
I can do a simple
select id, desc, sum(total) as total from mytable group by id
but I want to add the equivalent first(account), first(total), second(account), second(total) to the output so it'd be:
id, desc, total, first_account, first_account_total, second_account, second_account_total
1, one, 21, a, 10, b, 9
2, two, 27, c, 27, null, 0
Any pointers?
Thanks in advance!

Below is for BigQuery Standard SQL
#standardSQL
SELECT id, `desc`, total,
arr[OFFSET(0)].account AS first_account,
arr[OFFSET(0)].total AS first_account_total,
arr[SAFE_OFFSET(1)].account AS second_account,
arr[SAFE_OFFSET(1)].total AS second_account_total
FROM (
SELECT id, `desc`, SUM(total) total,
ARRAY_AGG(STRUCT(account, total) ORDER BY total DESC LIMIT 2) arr
FROM `project.dataset.table`
GROUP BY id, `desc`
)
In cases when more than 2 first bins are required I would use below pattern that eliminates repeating of heavy repeated lines like arr[SAFE_OFFSET(1)].total AS second_account_total
#standardSQL
SELECT * FROM (SELECT NULL id, '' `desc`, NULL total, '' first_account, NULL first_account_total, '' second_account, NULL second_account_total) WHERE FALSE
UNION ALL
SELECT id, `desc`, total, arr[OFFSET(0)].*, arr[SAFE_OFFSET(1)].*
FROM (
SELECT id, `desc`, SUM(total) total,
ARRAY_AGG(STRUCT(account, total) ORDER BY total DESC LIMIT 2) arr
FROM `project.dataset.table`
GROUP BY id, `desc`
)
In above, first line sets layout of output while returning no rows at all because of WHERE FALSE, so then I don't need to explicitly parse struct's elements and provide aliases

Related

Difference in the output of query, using rank() and CTE

My first query looks like:
select trans.* from
( select
acc_num,
acc_type,
trans_amount,
load_date,
rank() over(partition by acc_num order by load_date) as rk
from monetary
where rat_code = 123
) trans
where trans.rk =1;
second query looks like
with a as (
select *,
row_number() over(partition by acc_num order by load_date) as rn
from monetary
where rat_code = 123 )
select
acc_num,
acc_type,
trans_amount,
load_date
from a
where rn =1;
Can any one please help me I am getting different number of records for both the cases.
though the query is same.
Its because there is difference between rank and row_number.
Below example will show
Accno, dt, rank_col, rownum_col
100, 2-jun-2022, 1, 1
100, 3-jun-2022, 1, 2
100, 1-jul-2022, 1, 3
54, 2-jun-2022, 4, 1
54, 1-jul-2022, 4, 2
In above example, you can see row number will calculate unique row id. Whereas rank gives unique id but in a continuous manner. You can see from above example, rank=1 gives you 3 rows but rownum=1 gives only two.

BigQuery SQL: Sum of first N related items

I would like to know the sum of a value in the first n items in a related table. For example, I want to get the sum of a companies first 6 invoices (the invoices can be sorted by ID asc)
Current SQL:
SELECT invoices.company_id, SUM(invoices.amount)
FROM invoices
JOIN companies on invoices.company_id = companies.id
GROUP BY invoices.company_id
This seems simple but I can't wrap my head around it.
Consider also below approach
select company_id, (
select sum(amount)
from t.amounts amount
) as top_six_invoices_amount
from (
select invoices.company_id,
array_agg(invoices.amount order by invoices.invoice_id limit 6) amounts
from your_table invoices
group by invoices.company_id
) t
You can create order row numbers to the lines in a partition based on invoice id and filter to it, something like this:
with array_table as (
select 'a' field, * from unnest([3, 2, 1 ,4, 6, 3]) id
union all
select 'b' field, * from unnest([1, 2, 1, 7]) id
)
select field, sum(id) from (
select field, id, row_number() over (partition by a.field order by id desc) rownum
from array_table a
)
where rownum < 3
group by field
More examples for analytical examples here:
https://medium.com/#aliz_ai/analytic-functions-in-google-bigquery-part-1-basics-745d97958fe2
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts

SQL change the specific row

In my table below, i want to change the placement of specific row.
For example,
ID Name Count
1 X 50
2 Y 30
3 other 25
4 Z 20
It is DESC ordered and i would like to see X,Y,Z orderly. Also, in total, 'other' should be counted. In other words, count should be 125.
You can use union all and add last row to the end.
Something like this:
select id, name,count from table where name<>other
union all
select 4 as id, "other"as name, 135 as count from table
order by 1
or if you want to sum it
select id, name,count from table where name<>other
union all
select 4 as id, 'other' as name, sum(count) as count from table
order by 1
You can put some logic in the order by clause:
select id, name, count
from table
order by case when id <> 3 then 1 else 2 end, id
This way, the first ordering criteria is "rows X, Y, Z first, then the other ones", then you order the groups the way you want, in your case either by id or by name will work.
You can find a working example here
TRY THIS :
SELECT ID,
Name,
CASE
WHEN Name = 'OTHER' THEN (SELECT SUM (COUNT) FROM YOUR_TABLE)
ELSE SUM (COUNT)
END
FROM YOUR_TABLE
GROUP BY Name
ORDER BY Name DESC
I think union all may be the simplest approach, but like this:
select id, name, count
from ((select id, name, count, 1 as ord
from t
where name in ('X', 'Y', 'Z')
) union all
(select 4, 'other', sum(count), 2 as ord
from t
)
) t
order by ord, name;

Lead & Analytical Functions in BigQuery

Assume my table is this
I am trying to modify my table with this information
I have added two columns where column WhenWasLastBasicSubjectDone will let you know when in which semester the student completed his latest Basic Course (sorted by Semester). The other column TotalBasicSubjectsDoneTillNow explains how many times had the student completed Basic Course(Subject) till now (sorted by Semester) ?
I think this is easy to solve with Joins as well as with UDFs but I want to use the power of existing analytical functions in BigQuery and solve it without joins.
You can use window functions for this -- assuming you have a column that specifies ordering. Let me assume that column is semester:
select t.*,
max( case when subject = 'Basic' then semester end ) over (partition by student order by semester end) as lastbasic,
sum( case when subject = 'Basic' then 1 else 0 end ) over (partition by student order by semester end) as numbasictillnow
from t
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
LAST_VALUE(IF(subject='Basic',semester,NULL) IGNORE NULLS) OVER(win) AS WhenWasLastBasicSubjectDone ,
COUNTIF(subject='Basic') OVER(win) AS TotalBasicSubjectsDoneTillNow
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY student ORDER BY semester)
You can test, play with above using dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 Student, 'Sub1' Subject, 'Sem1' Semester UNION ALL
SELECT 1, 'Sub2', 'Sem2' UNION ALL
SELECT 1, 'Basic', 'Sem3' UNION ALL
SELECT 1, 'Basic', 'Sem4' UNION ALL
SELECT 1, 'Sub3', 'Sem5' UNION ALL
SELECT 1, 'Sub2', 'Sem6' UNION ALL
SELECT 1, 'Sub3', 'Sem7' UNION ALL
SELECT 1, 'Sub4', 'Sem8'
)
SELECT *,
LAST_VALUE(IF(subject='Basic',semester,NULL) IGNORE NULLS) OVER(win) AS WhenWasLastBasicSubjectDone ,
COUNTIF(subject='Basic') OVER(win) AS TotalBasicSubjectsDoneTillNow
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY student ORDER BY semester)
-- ORDER BY Semester

Sort by id desc on multiple columns distinct postrges

SELECT impressions.*
FROM impressions
WHERE impressions.user_id = 2
AND impressions.action_name = 'show'
AND (impressions.message IS NOT NULL)
GROUP BY impressionable_id, impressionable_type
I'd like to select from the table all last impressions that are unique on impressionable_id and impresssionable_type ordering by descending id and get the last 10
To explain this further
id, impressionabale_type, impressionable_id, action_name
50012, assignment, 2, show
50011, assignment, 1, show
50010, person, 1, show
50009, assignment, 1, show
50008, person, 5, show
50007, person, 4, show
50006, person, 3, show
50005, person, 1, show
50004, person, 1, show
50003, person, 2, show
50002, person, 2, show
50001, person, 1, show
50000, person, 1, show
Ideally I want this
50012, assignment, 2, show
50011, assignment, 1, show
50010, person, 1, show
50008, person, 5, show
50007, person, 4, show
50006, person, 3, show
50003, person, 2, show
I have tried distinct and group by but my sql knowledge is fair at best.
I get
PG::GroupingError: ERROR: column "impressions.id" must appear in the GROUP BY clause or be used in an aggregate function
Can someone shed some light please
Maybe thi will suit your needs:
SELECT t2.*
FROM (
SELECT DISTINCT impressionable_id, impressionabale_type
FROM impressions
WHERE impressions.action_name = 'show'
) t1, LATERAL (
SELECT *
FROM impressions
WHERE (t1.impressionable_id,t1.impressionabale_type) = (impressionable_id,impressionabale_type)
ORDER BY id DESC
LIMIT 1
) t2
ORDER BY id DESC
LIMIT 10
This will find all unique combinations of impressionable_id and impressionable_type and for each of them will find the row with the largest id in a LATERAL subquery.
select *
from (
select *,
row_number() over (
partition by impressionable_id, impressionable_type
order by id desc
) as rn
from impressions
where
user_id = 2
and action_name = 'show'
and message is not null
) s
where rn = 1