SQL Server get distinct counts of name by each ID

SQL Server get distinct counts of name by each ID - sql

I have a dataset like :
ID NAME
1 Aaron
2 Theon
3 Jon Snow
4 Jon Snow
4 Dany
5 Arya
5 Robert
5 Tyrion
I need to add a new column to this that shows the output based on the number of distinct names per ID. So expected output would be:
ID NAME Mapping
1 Aaron 1
2 Theon 1
3 Jon Snow 1
4 Jon Snow 2
4 Dany 2
5 Arya 3
5 Robert 3
5 Tyrion 3
I am confused about how to achieve this since I have tried a case statement where count(distinct(name)) does not return the right values.

You may try using COUNT as an analytic function:
SELECT
ID,
Name,
COUNT(*) OVER (PARTITION BY ID) Mapping
FROM yourTable
ORDER BY
ID;

Another approach to get COUNT of DISTINCT Name for each ID
SELECT *,
(SELECT Count(DISTINCT NAME)
FROM #table T
WHERE T1.id = T.id) Mapping
FROM #table T1
Online Demo

You can simply use below query
SELECT COUNT(DISTINCT NAME)
FROM YOUR_TABLE
GROUP BY ID
Thanks

Other method (specif SQL Server, otherwise use INNER JOIN LATERAL):
SELECT *
FROM #table f1
CROSS APPLY
(
select Count(*) Nb from #table f2
where f2.ID=f1.ID
) f3

Related

How to delete records with lower version in big query?

Lets say my table contains the following data
id
name
version
1
Rahul
1
1
Rahul
2
2
John
1
3
Mike
1
2
John
2
4
Rubel
1
5
David
1
1
Rahul
3
I need to filter the duplicate records with lower version. How can this be done?
The output essentially should be
id
name
version
1
Rahul
3
2
John
2
3
Mike
1
4
Rubel
1
5
David
1

For this dataset, aggregation seems sufficient:
select id, name, max(version) as max_version
from mytable
group by id, name

You can use not exists as follows:
select id, name, version
from your_table t
Where not exists
(Select 1 from your_table tt
Where tt.id = t.id and tt.version > t.version)
Or you can use analytical function row_number as follows:
Select id, name, version from
(select t.*,
Row_number() over (partition by id order by version desc) as rn
from your_table t) t
Where rn = 1

How to get MAX Hike in Min month?

below is table:
Name | Hike% | Month
------------------------
A 7 1
A 6 2
A 8 3
b 4 1
b 7 2
b 7 3
Result should be:
Name | Hike% | Month
------------------------
A 8 3
b 7 2

Here is one way of doing this:
SELECT Name, [Hike%], Month
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY [Hike%] DESC, Month) rn
FROM yourTable
) t
WHERE rn = 1
ORDER BY Name;
If you instead want to return multiple records per name, in the case where two or more records might be tied for having the greatest hike%, then replace ROW_NUMBER with RANK.

use correlated subquery
select Name,min(Hike) as Hike,min(Month) as Month
from
(
select * from tablename a
where Hike in (select max(Hike) from tablename b where a.name=b.name)
)A group by Name

You can use something similar to the below:
SELECT Name, MAX(Hike), Month
FROM table
GROUP BY Name, Month
Hope this helps :)

SQL COUNT(DISTINCT(field1)) GROUP BY MAX(filed2)

I have a table like
name num_try
John 2
John 1
Mike 3
Mike 2
Linda 2
And I want to know count distinct names group by MAX(num_try).
Desired result should look like
MAX(num_try) COUNT(DISTINCT(names))
2 2
3 1
Can you help me with this query?

select max_num_try, count(*) from
(
select name, max(num_try) as max_num_try
from table1
group by name
) a
group by max_num_try
order by max_num_try desc

How to write optimized query for multiple prioritize conditional joins in SQL server?

The scenario I'm after for is :
Result = Nothing
CollectionOfTables = Tbl1, Tbl2, Tbl3
While(True){
CurrentTable = GetHighestPriorityTable(CollectionOfTables)
If(CurrentTable) = Nothing Then Break Loop;
RemoveCurrentTableFrom(CollectionOfTables)
ForEach ID in CurrentTable as TempRow {
If(Result.DoesntContainsId(ID)) Then Result.AddRow(TempRow)
}
}
Assume I have following three tables.
IdNameTable1, Priority 1
1 John
2 Mary
3 Elsa
IdNameTable2, Priority 2
2 Steve
3 Max
4 Peter
IdNameTable3, Priority 3
4 Frank
5 Harry
6 Mona
Here is the final result I need.
IdNameResult
1 John
2 Mary
3 Elsa
4 Peter
5 Harry
6 Mona
A few tips to keep in mind.
Number of actual tables is 10.
Number of rows per table exceeds 1 Million.
It's not necessary to use join in query, but because of amount of data I'm working with the query must be optimized and used set-operations in SQL not a Cursor script.

Here's a way to do it using UNION and ROW_NUMBER():
;With Cte As
(
Select Id, Name, 1 As Prio
From Table1
Union All
Select Id, Name, 2 As Prio
From Table2
Union All
Select Id, Name, 3 As Prio
From Table3
), Ranked As
(
Select Id, Name, Row_Number() Over (Partition By Id Order By Prio) As RN
From Cte
)
Select Id, Name
From Ranked
Where RN = 1
Order By Id Asc;

postgresql - filter out double rows (but not the first and last one)

i got an "postgres" SQL problem.
I got a table which looks like this
id name level timestamp
1 pete 1 100
2 pete 1 200
3 pete 1 500
4 pete 5 900
7 pete 5 1000
9 pete 5 1200
15 pete 2 700
Now I want to delete the lines i dont need. i only want to now the first line where he get a new level and the last line he has this level.
id name level timestamp
1 pete 1 100
3 pete 1 500
15 pete 2 700
4 pete 5 900
9 pete 5 1200
(there much more columns like realmpoints and so on)
I have a solution if the the timestamp is only increasing.
SELECT id, name, level, timestamp
FROM player_testing
WHERE id IN ( SELECT MAX(dup.id)
FROM player_testing As dup
GROUP BY dup.name, dup.level)
UNION
SELECT MIN(dup.id)
FROM player_testing As dup
GROUP BY dup.name, dup.level)
)
ORDER BY ts
But I find no way to makes it work for my problem.

select id, name, level, timestamp
from (
select id,name,level,timestamp,
row_number() over (partition by name, level order by timestamp) as rn,
count(*) over (partition by name, level) as max_rn
from player_testing
) t
where rn = 1 or rn = max_rn;
Btw: timestamp is a horrible name for a column. For one reason because it's a reserved word, but more importantly because it doesn't document what the column contains. Is that a start_timestamp and end_timestamp a valid_until_timestamp, ...?

Here is an alternate solution to #a_horse_with_no_name's without over partition, and thus more generic SQL:
select *
from player_testing as A
where id = (
select min(id)
from player_testing as B
where A.name = B.name
and A.level = B.level
)
or id = (
select max(id)
from player_testing as B
where A.name = B.name
and A.level = B.level
)
Here is the fiddle to show it working: http://sqlfiddle.com/#!2/47bd44/1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server get distinct counts of name by each ID - sql

You may try using COUNT as an analytic function: SELECT ID, Name, COUNT(*) OVER (PARTITION BY ID) Mapping FROM yourTable ORDER BY ID;

Another approach to get COUNT of DISTINCT Name for each ID SELECT *, (SELECT Count(DISTINCT NAME) FROM #table T WHERE T1.id = T.id) Mapping FROM #table T1 Online Demo

You can simply use below query SELECT COUNT(DISTINCT NAME) FROM YOUR_TABLE GROUP BY ID Thanks

Other method (specif SQL Server, otherwise use INNER JOIN LATERAL): SELECT * FROM #table f1 CROSS APPLY ( select Count(*) Nb from #table f2 where f2.ID=f1.ID ) f3

Related

How to delete records with lower version in big query?

How to get MAX Hike in Min month?

SQL COUNT(DISTINCT(field1)) GROUP BY MAX(filed2)

How to write optimized query for multiple prioritize conditional joins in SQL server?

postgresql - filter out double rows (but not the first and last one)

Categories

Resources