Getting Person having interests in more number of subjects - Sql Server - sql

Sorry for the subject as it is not very definientia. I have 2 tables, one stores Person Data and one stores Subject data along with the person interested in. Two tables looks like below
Person
Id Name
1 Imad
2 Sumeet
3 Suresh
4 Navin
Subjects
Id PId Subject
1 1 DC
2 1 DS
3 3 DS
4 4 CA
PId is a Persons' Id
I need to get all students who are interested in max number of subjects, e.g Imad here.
Here is my query
With c as
(
select Pid, count(Id) as 'Total' from subjects group by Pid
)
select Pid into #Temp from c where Total = (Select Max(Total) from c)
select * from Person where Id in (Select Pid from #Temp)
It gives me desired output but whenever this type question is asked in interview, I never get good response from interviewer as they always expect better solution. I am not confident on my SQL skills that's why I think there must be more efficient solution hence I posted it here.
Thanks

Simply order the data and get top most one record with ties(this means if some students have equal counts they both will come in result):
select top 1 with ties p.Id, p.Name
from Subjects s
join Person p on s.PId = p.Id
group by p.Id, p.Name,
order by count(*) desc

You can try this :
;With c as
(
select Pid, count(Id) as 'Total' from subject group by Pid
)
select * from Person join c on c.Pid=Person.Id where c.total>1

Try to this
select * from person
where id in(
Select b.pid
from subject b
group by b.pid
having count(b.pid)>1
)

declare #t table (ID int,name varchar(10))
insert into #t (ID,name)values (1,'imad'),(2,'sumeet'),(3,'suresh'),(4,'navin')
declare #tt table (Id int,Pid int,Subject varchar(10))
insert into #tt (Id,Pid,subject)values (1,1,'DC'),(2,1,'DS'),(3,3,'DS'),(4,4,'CA')
select p.ID,P.name,ttt.Subject from (Select P.ID,P.name,P.Cnt from (
select t.ID,t.name,COUNT((t.ID))Cnt from #t t
INNER JOIN
#tt tt ON t.ID = tt.Pid
GROUP BY t.ID,t.name)P
GROUP BY P.Cnt,P.ID,p.name
HAVING(cnt) > 1)P
INNER JOIN #tt ttt ON Ttt.Pid = P.ID

The current solutions with TOP are MS SQL Server specific. Following solution is based on Standard SQL's Windowed Aggregate Functions, which most DBMSes support:
select Pid, Total
from
(
select Pid, count(Id) as Total
,rank() over (order by count(Id) desc) as rn
from subjects group by Pid
) as dt
where rn = 1

Related

Selecting only one row if the same ID - SQL Server

I'm trying to learn SQL commands and working currently with an query which will list all customers which has status active (ID = 1) and active-busy (ID = 2).
The problem is that some customers have the same ID but the different type. So I have an customer which has ID 1 and Type 3 but the same customer has also ID 1 but Type 1 so what I'm trying to do is select only this which has Type 1 but have also the same ID. So IF ID is the same and Type is 1 and 3, select only Type 3.
SELECT
CASE
WHEN corel.opts LIKE 3
THEN (SELECT corel.opts
WHERE corel.objid = rel.id
AND corel.type IN (1, 2)
AND corel.opts = 3
ELSE corel.opts 1
END)
It's not complete query because it has many other this which I can't post but if you guys would show me way how could I accomplish that, I would appreciate it. I just don't know how to tell IF the same ID in the table but different Type - select only Type 3. Each customer have different ID but it can have the same type.
USE Row_number() like this
DECLARE #T TABLE
(
Id INT,
TypeNo INT
)
INSERT INTO #T
VALUES(1,1),(1,3),(2,1),(2,3),(3,1),(4,3)
;WITH CTE
AS
(
SELECT
RN = ROW_NUMBER() OVER(PARTITION BY Id ORDER BY TypeNo DESC),
Id,
TypeNo
FROM #T
)
SELECT
Id,
TypeNo
FROM CTE
WHERE RN = 1
My Input
Output
Test scenario is borrowed form Jayasurya Satheesh, thx, voted your's up!
DECLARE #T TABLE
(
Id INT,
TypeNo INT
)
INSERT INTO #T
VALUES(1,1),(1,3),(2,1),(2,3),(3,1),(4,3)
--The query will use ROW_NUMBER with PARTITION BY to start a row count for each T.Id separately.
--SELECT TOP 1 WITH TIES will take all first place rows and not just the very first:
SELECT TOP 1 WITH TIES
Id,
TypeNo
FROM #T AS T
ORDER BY ROW_NUMBER() OVER(PARTITION BY T.Id ORDER BY T.TypeNo DESC)
If your Type=3 is not the highest type code the simple ORDER BY T.TypeNo DESC won't be enough, but you can easily use a CASE to solve this.
As far as I understand, you need something like:
SELECT c1.*
FROM corel c1
LEFT OUTER JOIN corel c2 ON c1.objid=c2.objid AND c1.type <> c2.type
WHERE (c1.type=1 AND c2.type IS NULL) OR (c1.type=3 AND c2.type=1)

Sum of Highest 5 numbers in SQL Server 2000

I am having a problem in query some data from database. My table is given below:
What i need is that sum of 5 highest total_marks from the table for each student.
Although i tried the code given below, but it is not returning what i expected.
SELECT s.studentid, SUM(s.total_marks)
FROM students s
WHERE s.sub_code IN (SELECT TOP 5 sub_code
FROM students a
WHERE a.studentid = s.studentid
ORDER BY total_marks DESC)
GROUP BY studentid
Please help me guys. Thanking you advance.
You query could work if there's unique/primary key on (studentId, subcode). At the moment, the query returns 6 records instead of 5 for studentId = 1, for example, beause of duplicate subcode 303.
Usually table should have a unique key, may be you can add incremental id to rewrite your query like:
select s.*
from students as s
where
s.id in (
select top 5 a.id
from students as a
where a.studentId = s.studentId
order by a.total_marks desc
);
Or, if you have unique combinations of (studentId, subcode, total_marks), you can use query like this:
select s.*
from students as s
where
exists (
select *
from (
select top 5 a.subcode, a.total_marks
from students as a
where a.studentId = s.studentId
order by a.total_marks desc
) as b
where b.subcode = s.subcode and b.total_marks = s.total_marks
);
sql fiddle demo
First you should select top 5 grades for each student -
select row_number() over (partition by studentid order by total_marks desc) as rank,
studentid,
total_marks
from students
where rank <= 5
from there you'll be able to use this as a subquery, and use group_by and sum:
select studentid, sum(total_marks)
from
(
select row_number() over (partition by studentid order by total_marks desc) as rank,
studentid,
total_marks
from students
where rank <= 5
) t
group by studentid
This isn't ideal, but the method you started to use requires a primary key column. You can simulate one with a temp table since SQL 2000.
CREATE TABLE #temp (
StudentID INT,
total_marks INT,
ID INT Identity(1,1)
)
INSERT INTO #temp (
StudentID,
total_Marks
)
Select
StudentID,
total_marks
FROM Students
SELECT s.studentid, SUM(s.total_marks)
FROM #temp s
WHERE s.ID IN (SELECT TOP 2
a.ID
FROM #temp a
WHERE a.studentid = s.studentid
ORDER BY total_marks DESC)
GROUP BY studentid
I think SQL 2000 may have a slightly more compact syntax for this, but SQL Fiddle won't let me test versions that old.
Please test this carefully. You will be dumping this entire table to a temp table and that's almost always a bad idea.
Also, ensure that there is some combination of fields not including the total that uniquely identifies a row, or consider adding a surrogate key column to the table.
SQL Fiddle Demo

Get Latest ID from a Duplicate Records in a table

so i have two tables, one is RAWtable and the other is MAINtable, I have to get the latest groupID if there
are more than one records exist (comparing same name, code). For example, I have this on RAWtable:
id groupid name code
1 G09161405 Name1 Code1
2 G09161406 Name1 Code1
the two records should be treated as one and should return this value only:
id groupid name code
2 G09161406 Name1 Code1
This row is the only row that shiuld be inserted in the main table. Provided returning the latest GroupID (the groupid is the combination of date and time)
I've tried this but its not working:
SELECT MAST.ID, MAST.code, MAST.name FROM RAWtable AS MAST INNER JOIN
(SELECT code, name, grouid,id FROM RAWtable AS DUPT GROUP BY code, name, groupid,id HAVING COUNT(*) >= 2) DUPT
ON DUPT.code =MAST.code and DUPT.name =MAST.name where dupt.groupid >mast.groupid
how can i do this? thanks a lot.
select R.id,
R.groupid,
R.name,
R.code
from (select id,
groupid,
name,
code,
row_number() over(partition by name, code order by groupid desc) as rn
from RawTable
) as R
where R.rn = 1
Or if you don't have row_number()
select R1.id,
R1.groupid,
R1.name,
R1.code
from RawTable as R1
inner join (
select name, code, max(groupid) as groupid
from RawTable
group by name, code
) as R2
on R1.name = R2.name and
R1.code = R2.code and
R1.groupid = R2.groupid
Try this way, it will give you max group id which will be latest :
SELECT MAX(GroupId), Name, Code
FROM RAWtable
GROUP BY Name, Code
select max(id),name, code from RaTable
group by name,code having count(*)>1
Will return:
id name code
2 Name1 Code1
Will return the max gorupid for all the records that have more than one record in the table
Try this:
select max(t.groupid), t.name, t.code
from RAWtable t
group by t.name, t.code
This will basically select the max value of groupid for each name and code combination.

Finding duplicate rows in SQL Server

I have a SQL Server database of organizations, and there are many duplicate rows. I want to run a select statement to grab all of these and the amount of dupes, but also return the ids that are associated with each organization.
A statement like:
SELECT orgName, COUNT(*) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
Will return something like
orgName | dupes
ABC Corp | 7
Foo Federation | 5
Widget Company | 2
But I'd also like to grab the IDs of them. Is there any way to do this? Maybe like a
orgName | dupeCount | id
ABC Corp | 1 | 34
ABC Corp | 2 | 5
...
Widget Company | 1 | 10
Widget Company | 2 | 2
The reason being that there is also a separate table of users that link to these organizations, and I would like to unify them (therefore remove dupes so the users link to the same organization instead of dupe orgs). But I would like part manually so I don't screw anything up, but I would still need a statement returning the IDs of all the dupe orgs so I can go through the list of users.
select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
SELECT orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
You can run the following query and find the duplicates with max(id) and delete those rows.
SELECT orgName, COUNT(*), Max(ID) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
But you'll have to run this query a few times.
You can do it like this:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
If you want to return just the records that can be deleted (leaving one of each), you can use:
SELECT
id, orgName
FROM (
SELECT
orgName, id,
ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
FROM organizations
) AS d
WHERE intRow != 1
Edit: SQL Server 2000 doesn't have the ROW_NUMBER() function. Instead, you can use:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id
You can try this , it is best for you
WITH CTE AS
(
SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations
)
select * from CTE where RN>1
go
The solution marked as correct didn't work for me, but I found this answer that worked just great: Get list of duplicate rows in MySql
SELECT n1.*
FROM myTable n1
INNER JOIN myTable n2
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id
If you want to delete duplicates:
WITH CTE AS(
SELECT orgName,id,
RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id)
FROM organizations
)
DELETE FROM CTE WHERE RN > 1
select * from [Employees]
For finding duplicate Record
1)Using CTE
with mycte
as
(
select Name,EmailId,ROW_NUMBER() over(partition by Name,EmailId order by id) as Duplicate from [Employees]
)
select * from mycte
2)By Using GroupBy
select Name,EmailId,COUNT(name) as Duplicate from [Employees] group by Name,EmailId
Select * from (Select orgName,id,
ROW_NUMBER() OVER(Partition By OrgName ORDER by id DESC) Rownum
From organizations )tbl Where Rownum>1
So the records with rowum> 1 will be the duplicate records in your table. ‘Partition by’ first group by the records and then serialize them by giving them serial nos.
So rownum> 1 will be the duplicate records which could be deleted as such.
select column_name, count(column_name)
from table_name
group by column_name
having count (column_name) > 1;
Src : https://stackoverflow.com/a/59242/1465252
select a.orgName,b.duplicate, a.id
from organizations a
inner join (
SELECT orgName, COUNT(*) AS duplicate
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) b on o.orgName = oc.orgName
group by a.orgName,a.id
select orgname, count(*) as dupes, id
from organizations
where orgname in (
select orgname
from organizations
group by orgname
having (count(*) > 1)
)
group by orgname, id
You have several way for Select duplicate rows.
for my solutions , first consider this table for example
CREATE TABLE #Employee
(
ID INT,
FIRST_NAME NVARCHAR(100),
LAST_NAME NVARCHAR(300)
)
INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );
First solution :
SELECT DISTINCT *
FROM #Employee;
WITH #DeleteEmployee AS (
SELECT ROW_NUMBER()
OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
RNUM
FROM #Employee
)
SELECT *
FROM #DeleteEmployee
WHERE RNUM > 1
SELECT DISTINCT *
FROM #Employee
Secound solution : Use identity field
SELECT DISTINCT *
FROM #Employee;
ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)
SELECT *
FROM #Employee
WHERE UNIQ_ID < (
SELECT MAX(UNIQ_ID)
FROM #Employee a2
WHERE #Employee.ID = a2.ID
AND #Employee.FIRST_NAME = a2.FIRST_NAME
AND #Employee.LAST_NAME = a2.LAST_NAME
)
ALTER TABLE #Employee DROP COLUMN UNIQ_ID
SELECT DISTINCT *
FROM #Employee
and end of all solution use this command
DROP TABLE #Employee
i think i know what you need
i needed to mix between the answers and i think i got the solution he wanted:
select o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName
from organizations o
inner join (
SELECT MAX(id) as id, orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
having the max id will give you the id of the dublicate and the one of the original which is what he asked for:
id org name , dublicate count (missing out in this case)
id doublicate org name , doub count (missing out again because does not help in this case)
only sad thing you get it put out in this form
id , name , dubid , name
hope it still helps
Suppose we have table the table 'Student' with 2 columns:
student_id int
student_name varchar
Records:
+------------+---------------------+
| student_id | student_name |
+------------+---------------------+
| 101 | usman |
| 101 | usman |
| 101 | usman |
| 102 | usmanyaqoob |
| 103 | muhammadusmanyaqoob |
| 103 | muhammadusmanyaqoob |
+------------+---------------------+
Now we want to see duplicate records
Use this query:
select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;
+---------------------+------------+---+
| student_name | student_id | c |
+---------------------+------------+---+
| usman | 101 | 3 |
| muhammadusmanyaqoob | 103 | 2 |
+---------------------+------------+---+
I got a better option to get the duplicate records in a table
SELECT x.studid, y.stdname, y.dupecount
FROM student AS x INNER JOIN
(SELECT a.stdname, COUNT(*) AS dupecount
FROM student AS a INNER JOIN
studmisc AS b ON a.studid = b.studid
WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4)
GROUP BY a.stdname
HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN
studmisc AS z ON x.studid = z.studid
WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4)
ORDER BY x.stdname
Result of the above query shows all the duplicate names with unique student ids and number of duplicate occurances
Click here to see the result of the sql
/*To get duplicate data in table */
SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1
GROUP BY EmpCode HAVING COUNT(EmpCode) > 1
I use two methods to find duplicate rows.
1st method is the most famous one using group by and having.
2nd method is using CTE - Common Table Expression.
As mentioned by #RedFilter this way is also right. Many times I find CTE method is also useful for me.
WITH TempOrg (orgName,RepeatCount)
AS
(
SELECT orgName,ROW_NUMBER() OVER(PARTITION by orgName ORDER BY orgName)
AS RepeatCount
FROM dbo.organizations
)
select t.*,e.id from organizations e
inner join TempOrg t on t.orgName= e.orgName
where t.RepeatCount>1
In the example above we collected the result by finding repeat occurrence using ROW_NUMBER and PARTITION BY. Then we applied where clause to select only rows which are on repeat count more than 1. All the result is collected CTE table and joined with Organizations table.
Source : CodoBee
Try
SELECT orgName, id, count(*) as dupes
FROM organizations
GROUP BY orgName, id
HAVING count(*) > 1;

SQL query to return only 1 record per group ID

I'm looking for a way to handle the following scenario. I have a database table that I need to return only one record for each "group id" that is contained within the table, furthermore the record that is selected within each group should be the oldest person in the household.
ID Group ID Name Age
1 134 John Bowers 37
2 134 Kerri Bowers 33
3 135 John Bowers 44
4 135 Shannon Bowers 42
So in the sample data provided above I would need ID 1 and 3 returned, as they are the oldest people within each group id.
This is being queried against a SQL Server 2005 database.
SELECT t.*
FROM (
SELECT DISTINCT groupid
FROM mytable
) mo
CROSS APPLY
(
SELECT TOP 1 *
FROM mytable mi
WHERE mi.groupid = mo.groupid
ORDER BY
age DESC
) t
or this:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY groupid ORDER BY age DESC) rn
FROM mytable
) x
WHERE x.rn = 1
This will return at most one record per group even in case of ties.
See this article in my blog for performance comparisons of both methods:
SQL Server: Selecting records holding group-wise maximum
Use:
SELECT DISTINCT
t.groupid,
t.name
FROM TABLE t
JOIN (SELECT t.groupid,
MAX(t.age) 'max_age'
FROM TABLE t
GROUP BY t.groupid) x ON x.groupid = t.groupid
AND x.max_age = t.age
So what if there's 2+ people with the same age for a group? It'd be better to store the birthdate rather than age - you can always calculate the age for presentation.
Try this (assuming Group is synonym for Household)
Select * From Table t
Where Age = (Select Max(Age)
From Table
Where GroupId = t.GroupId)
If there are two or more "oldest" people in some household (They all are the same age and there is noone else older), then this will return all of them, not just one at random.
If this is an issue, then you need to add another subquery to return an arbitrary key value for one person in that set.
Select * From Table t
Where Id =
(Select Max(Id) Fom Table
Where GroupId = t.GroupId
And Age =
(Select(Max(Age) From Table
Where GroupId = t.GroupId))
SELECT GroupID, Name, Age
FROM table
INNER JOIN
(
SELECT GroupID, MAX(Age) AS OLDEST
FROM table
) AS OLDESTPEOPLE
ON
table.GroupID = OLDESTPEOPLE.GroupID
AND
table.Age = OLDESTPEOPLE.OLDEST
SELECT GroupID, Name, Age
FROM table
INNER JOIN
(
SELECT GroupID, MAX(Age) AS OLDEST
FROM table
**GROUP BY GroupID**
) AS OLDESTPEOPLE
ON
table.GroupID = OLDESTPEOPLE.GroupID
AND
table.Age = OLDESTPEOPLE.OLDEST