Count distinct over partition by - sql

I am trying to do a distinct count of names partitioned over their roles. So, in the example below: I have a table with the names and the person's role.
I would like a role count column that gives the total number of distinct people in that role. For example, the role manager comes up four times but there are only 3 distinct people for that role - Sam comes up again on a different date.
If I remove the date column, it works fine using:
select
a.date,
a.Name,
a.Role,
count(a.Role) over (partition by a.Role) as Role_Count
from table a
group by a.date, a.name, a.role
Including the date column then makes it count the total roles rather than by distinct name (which I know I haven't identified in the partition). Giving 4 managers and 3 analysts.
How do I fix this?
Desired output:
Date
Name
Role
Role_Count
01/01
Sam
Manager
3
02/01
Sam
Manager
3
01/01
John
Manager
3
01/01
Dan
Manager
3
01/01
Bob
Analyst
2
02/01
Bob
Analyst
2
01/01
Mike
Analyst
2
Current output:
Date
Name
Role
Role_Count
01/01
Sam
Manager
4
02/01
Sam
Manager
4
01/01
John
Manager
4
01/01
Dan
Manager
4
01/01
Bob
Analyst
3
02/01
Bob
Analyst
3
01/01
Mike
Analyst
3

Unfortunately, SQL Server (and other databases as well) don't support COUNT(DISTINCT) as a window function. Fortunately, there is a simple trick to work around this -- the sum of DENSE_RANK()s minus one:
select a.Name, a.Role,
(dense_rank() over (partition by a.Role order by a.Name asc) +
dense_rank() over (partition by a.Role order by a.Name desc) -
1
) as distinct_names_in_role
from table a
group by a.name, a.role

Unfortunately, COUNT(DISTINCT is not available as a window aggregate. But we can use a combination of DENSE_RANK and MAX to simulate it:
select
a.Name,
a.Role,
MAX(rnk) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role ORDER BY Name) AS rnk
FROM table
) a
If Name may have nulls then we need to take that into account:
select
a.Name,
a.Role,
MAX(CASE WHEN Name IS NOT NULL THEN rnk END) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role, CASE WHEN Name IS NULL THEN 0 ELSE 1 END ORDER BY Name) AS rnk
FROM table
) a

Related

BigQuery row_number to remove duplicates

I want to keep only the ID with the latest timestamp from the table, is there a more optimal and efficient way to solve the problem
a query that I tried
SELECT * except(row_number)
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY ID)
row_number
FROM employees
)
WHERE row_number = 1
employees table:
ID NAME DEPARTMENT UPDATED_AT
1 James IT 2019-05-21 12:13:14
1 James IT 2019-05-21 12:14:14
1 James IT 2019-05-21 12:18:14
2 Pam HR 2019-05-26 13:18:14
2 Pam HR 2019-05-26 14:18:14
3 David IT 2019-06-22 14:18:14
3 David IT 2019-06-23 12:18:14
result:
ID NAME DEPARTMENT UPDATED_AT
1 James IT 2019-05-21 12:18:14
2 Pam HR 2019-05-26 14:18:14
3 David IT 2019-06-23 12:18:14
You are just missing the ORDER BY clause in your subquery statement.
WITH
DATA AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UPDATED_AT DESC) AS _row,
*
FROM
employees )
SELECT
* EXCEPT(_row)
FROM
DATA
WHERE
_row = 1
SELECT *
FROM employees
WHERE TRUE
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UPDATED_AT DESC) = 1

How to select only the most recent

Table A has ID and date and name. Each time the record is changed the first 11 digits of the Id remain the same but the final digit would increase by 1. For example
123456789110 01-01-2020 John smith
119876543210 01-01-2020 Peter Griffin
119876543211 05-01-2020 Peter Griffin
How could I write a statement that shows The iD associated with John smith as well as the most recent Id of Peter Griffin? Thanks
Yet another option is using WITH TIES
Select top 1 with ties *
From YourTable
Order by row_number() over (partition by left(id,11) order by date desc)
Why not just use max()?
select name, max(id)
from t
group by name;

SQL find and group consecutive number in rows without duplicate

So I have a table like this:
Taxi Client Time
Tom A 1
Tom A 2
Tom B 3
Tom A 4
Tom A 5
Tom A 6
Tom B 7
Tom B 8
Bob A 1
Bob A 2
Bob A 3
and the expected result will be like this:
Tom 3
Bob 1
I have used the partition function to count the consecutive value but the result become this:
Tom A 2
Tom A 3
Tom B 2
Bob A 2
Please help, I am not good in English, thanks!
This is a variation of a gaps-and-islands problem. You can solve it using window functions:
select taxi, count(*)
from (select t.taxi, t.client, count(*) as num_times
from (select t.*,
row_number() over (partition by taxi order by time) as seqnum,
row_number() over (partition by taxi, client order by time) as seqnum_c
from t
) t
group by t.taxi, t.client, (seqnum - seqnum_c)
having count(*) >= 2
)
group by taxi;
use distinct count
select taxi ,count( distinct cient)
from table_name
group by taxi
It seems your expected output is wrong
I don't see where you get the number 3 from. If you're trying to do what your question says and group by client in consecutive order only and then get the number of different groups, I can help you out with the following query. Bob has 1 group and Tom has 4.
Partition by taxi, ORDER BY taxi, time and check if this client matches the previous client for this taxi. If yes, do not count this row. If no, count this row, this is a new group.
SELECT FEE.taxi,
SUM(FEE.clientNotSameAsPreviousInSequence)
FROM
(
SELECT taxi,
CASE
WHEN PreviousClient IS NULL THEN
1
WHEN PreviousClient <> client THEN
1
ELSE
0
END AS clientNotSameAsPreviousInSequence
FROM
(
SELECT *,
LAG(client) OVER (PARTITION BY taxi ORDER BY taxi, time) AS PreviousClient
FROM table
) taxisWithPreviousClient
) FEE
GROUP BY FEE.taxi;

Issue with returning distinct records based on single column (Oracle)

If I have the table "members" (shown below), how would I go about getting the record of the first occurrence of a membership_id (Oracle).
Expected results
123 John Doe A P
313 Michael Casey A A
113 Luke Skywalker A P
Table - members
membership_id first_name last_name status type
123 John Doe A P
313 Michael Casey A A
113 Luke Skywalker A P
123 Bob Dole A A
313 Lucas Smith A A
SELECT membership_id,
first_name,
last_name,
status,
type
FROM( SELECT membership_id,
first_name,
last_name,
status,
type,
rank() over (partition by membership_id
order by type desc) rnk
FROM members )
WHERE rnk = 1
will work for your sample data set. If you can have ties-- that is, multiple rows with the same membership_id and the same maximum type-- this query will return all those rows. If you only want to return one of the rows where there is a tie, you would either need to add additional criteria to the order by to ensure that all ties are broken or you would need to use the row_number function rather than rank which will arbitrarily break ties.
Select A.*
FROM Members AS A inner join
(Select membership_id, first(first_name) AS FN, first(last_name) AS LN
From Members
Group by membership_id) AS B
ON A.membership_id=B.membership_id and A.first_name=B.FN and A.last_name=B.LN
Hope that helps!
select *
from members
where rowid in (
select min(rowid)
from members
group by membership_id
)

Retrieve highest value from sql table

How can retrieve that data:
Name Title Profit
Peter CEO 2
Robert A.D 3
Michael Vice 5
Peter CEO 4
Robert Admin 5
Robert CEO 13
Adrin Promotion 8
Michael Vice 21
Peter CEO 3
Robert Admin 15
to get this:
Peter........4
Robert.......15
Michael......21
Adrin........8
I want to get the highest profit value from each name.
If there are multiple equal names always take the highest value.
select name,max(profit) from table group by name
Since this type of request almost always follows with "now can I include the title?" - here is a query that gets the highest profit for each name but can include all the other columns without grouping or applying arbitrary aggregates to those other columns:
;WITH x AS
(
SELECT Name, Title, Profit, rn = ROW_NUMBER()
OVER (PARTITION BY Name ORDER BY Profit DESC)
FROM dbo.table
)
SELECT Name, Title, Profit
FROM x
WHERE rn = 1;