BigQuery: Get top 3 records for each group - sql

I'm new to big query. I need top 3 scores for each group
| Name | Group | Score |
| A | 1 | 100 |
| B | 2 | 80 |
| C | 3 | 101 |
| D | 1 | 53 |
| X | 2 | 8 |
| Y | 3 | 61 |
| Z | 1 | 97 |
| W | 2 | 20 |

Consider below
select * except(pos)
from (
select *, row_number() over(partition by `group` order by score desc) pos
from `project.dataset.table`
)
where pos <= 3
Another option (more BigQuery'ish)
select arr.*
from (
select array_agg(t order by score desc limit 3) arr
from `project.dataset.table` t
group by `group`
) a, a.arr

Related

SQL Select random rows partitioned by a column

I have a dataset looks like this
| Country | id |
-------------------
| a | 5 |
| a | 1 |
| a | 2 |
| b | 1 |
| b | 5 |
| b | 4 |
| b | 7 |
| c | 5 |
| c | 1 |
| c | 2 |
and i need a query which returns 2 random values from where country in ('a', 'c'):
| Country | id |
------------------
| a | 2 | -- Two random rows from Country = 'a'
| a | 1 |
| c | 1 |
| c | 5 | --Two random rows from Country = 'c'
This should work:
select Country, id from
(select Country,
id,
row_number() over(partition by Country order by rand()) as rn
from table_name
) t
where Country in ('a', 'c') and rn <= 2
Replace rand() with random() if you're using Postgres or newid() in SQL Server.

IF clause in ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)

Suppose I have a table like the following:
+----+------+---------+
| id | time | message |
+----+------+---------+
| 1 | 10 | x |
| 2 | 12 | y |
| 1 | 13 | z |
| 2 | 14 | x |
| 1 | 15 | y |
+----+------+---------+
I want to write a query that returns the most updated message per id. Here is my query:
WITH tmp AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY time DESC) as rn
FROM ##TempTable
)
SELECT *
FROM tmp
WHERE rn = 1
which returns:
+----+------+---------+----+
| id | time | message | rn |
+----+------+---------+----+
| 1 | 15 | y | 1 |
| 2 | 14 | x | 1 |
+----+------+---------+----+
I want to add a condition that, for a specific id, if I see message "z" then just keep that row no matter what the time is, but if "z" is not among the messages keep the most updated row for that id. So, the desired output is like:
+----+------+---------+----+
| id | time | message | rn |
+----+------+---------+----+
| 1 | 13 | z | ? |
| 2 | 14 | x | 1 |
+----+------+---------+----+
Any idea how I can modify the query?
here is one way:
select * from (
select *, row_number() over (partition by id order by case when message = 'z' then 1 else 0 end desc, time desc ) rn
from data ) t
where rn = 1
db<>fiddle here

How do I select each data set from a Row_Number Over Partition by table based on the Row_Number Over Partition by column?

How do I select each data set from a Row_Number Over Partition by table based on the Row_Number Over Partition by column?
please diagram below:
+-----------+-------------+-------------------+------------+----------+
| packageid | packagename | package max units | references | row_Numb |
+-----------+-------------+-------------------+------------+----------+
| 44 | Basic | 10 | 103 | 1 |
| 45 | Basic | 10 | 103 | 2 |
| 42 | Cola | 10 | 102 | 1 |
| 43 | Cola | 10 | 102 | 2 |
| 46 | Cola | 10 | 102 | 3 |
| 2 | Home | 11 | 101 | 1 |
| 11 | Home | 11 | 101 | 2 |
| 21 | Home | 11 | 101 | 3 |
| 1 | Spicy | 11 | 104 | 1 |
| 3 | Spicy | 11 | 104 | 2 |
| 41 | Spicy | 11 | 104 | 3 |
+-----------+-------------+-------------------+------------+----------+
I want select each data set in each group based on the row_num column.
Every attempt is welcomed.
Although it sounds like you already have the ROW_NUMBER() column, I believe it is what you are asking for . For the first record for each PACKAGENAME use:
SELECT s.* FROM (
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.packagename ORDER BY t.packageid) as rnk
FROM YourTable t) s
WHERE s.rnk = 1
For all of them use only the inner query.
Here is the cte version, if you want to fetch single record from each group.
;with cte_1
as(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY [packagename],[package max units], [references] ORDER BY [packageid]) as row_Numb
FROM YourTable )
SELECT [packageid],[packagename],[package max units],[reference]
FROM cte_1
WHERE row_Numb = 1
You can use TOP 1 WITH TIES with ordering by ROW_NUMBER():
SELECT TOP 1 WITH TIES *
FROM YourTable
ORDER BY ROW_NUMBER() OVER (PARTITION BY packagename ORDER BY packageid)
Output:
packageid packagename package max units references
44 Basic 10 103
42 Cola 10 102
2 Home 11 101
1 Spicy 11 104

Sum all sub group last value by group

Consider the following table:
ID | ITEM | GROUP_ID | VAL | COST
---+------+----------+-----------+-------
1 | A | 1 | 1 | 12
2 | B | 1 | 2 | 12
3 | C | 1 | 3 | 12
4 | D | 1 | 4 | 13
5 | D | 1 | 5 | 12
6 | E | 2 | 1 | 17
7 | E | 2 | 2 | 10
8 | E | 2 | 3 | 11
9 | E | 2 | 4 | 12
10 | F | 2 | 5 | 15
11 | F | 2 | 6 | 13
12 | F | 2 | 7 | 11
13 | F | 2 | 8 | 12
how to get the result as follow:
GROUP_ID | VAL | COST
----------+-----------+-------
1 | 15 | 48
2 | 36 | 24
The val is the sum by group id.
The cost is the sum of last value by item.
Use analytic function ROW_NUMBER() on postgres, oracle or sql server
SqlFiddleDemo
WITH last_item as (
SELECT group_id, sum(cost) as sum_cost
FROM (
SELECT t.*,
ROW_NUMBER() over (partition by item order by id desc) as rn
FROM Table1 t
) as t
WHERE rn = 1
GROUP BY t.group_id
),
val_sum as (
SELECT t.group_id, SUM(val) as sum_val
FROM Table1 t
GROUP BY t.group_id
)
SELECT v.group_id, v.sum_val, l.sum_cost
FROM val_sum v
INNER JOIN last_item l
ON v.group_id = l.group_id
OUTPUT
| group_id | sum_val | sum_cost |
|----------|---------|----------|
| 1 | 15 | 48 |
| 2 | 36 | 24 |
Try this
WITH LastRow (id)
AS (
SELECT MAX(id)
FROM TheTable
GROUP BY item, group_id
)
SELECT group_Id, SUM(val), SUM(CASE WHEN B.id IS NULL THEN 0 ELSE cost END)
FROM TheTable A
LEFT OUTER JOIN LastRow B ON A.id = B.id
GROUP BY group_id
EDIT:
SQL Fiddle Demo
Thanks #Juan Carlos Oropeza for creating the SQL Fiddle test data

SQL Select top frequent records

I have the following table:
Table
+----+------+-------+
| ID | Name | Group |
+----+------+-------+
| 0 | a | 1 |
| 1 | a | 1 |
| 2 | a | 2 |
| 3 | a | 1 |
| 4 | b | 1 |
| 5 | b | 2 |
| 6 | b | 1 |
| 7 | c | 2 |
| 8 | c | 2 |
| 9 | c | 1 |
+----+------+-------+
I would like to select top 20 distinct names from a specific group ordered by most frequent name in that group. The result for this example for group 1 would return a b c (
a - 3 occurrences, b - 2 occurrences and c - 1 occurrence).
Thank you.
SELECT TOP(20) [Name], Count(*) FROM Table
WHERE [Group] = 1
GROUP BY [Name]
ORDER BY Count(*) DESC
SELECT Top(20)
name, group, count(*) as occurences
FROM yourtable
GROUP BY name, group
ORDER BY count(*) desc
SELECT
TOP 20
Name,
Group,
COUNT(1) Count,
FROM
MyTable
GROUP BY
Name,
Group
ORDER BY
Count DESC