Select entry of each group having exactly 1 entry - sql

I am looking for an optimized query
let me show you a small example.
Lets suppose I have a table having three field studentId, teacherId and subject as
Now I want those data in which a physics teacher is teaching to only one student, i.e
teacher 300 is only teaching student 3 and so on.
What I have tried till now
select sid,tid from tabletesting with(nolock)
where tid in (select tid from tabletesting with(nolock)
where subject='physics' group by tid having count(tid) = 1)
and subject='physics'
The above query is working fine. But I want different solution in which I don't have to scan the same table twice.
I also tried using Rank() and Row_Number() but no result.
FYI :
I have showed you an example, this is not the actual table i am playing with, my table contain huge number of rows and columns and where clause is also very complex(i.e date comparison etc.), so I don't want to give the same where clause in subquery and outquery.

You can do this with window functions. Assuming that there are no duplicate students for a given teacher (as in your sample data):
select tt.sid, tt.tid
from (select tt.*, count(*) over (partition by teacher) as scnt
from TableTesting tt
) tt
where scnt = 1;
Another way to approach this, which might be more efficient, is to use an exists clause:
select tt.sid, tt.tid
from TableTesting tt
where not exists (select 1 from TableTesting tt1 where tt1.tid = tt.tid and tt1.sid <> tt.sid)

Another option is to use an analytic function:
select sid, tid, subject from
(
select sid, tid, subject, count(sid) over (partition by subject, tid) cnt
from tabletesting
) X
where cnt = 1

Related

SQL: find most common values for specific members in column 1

I have the following SQL related question:
Let us assume I have the following simple data table:
I would like to identify the most common street address and place it in column 3:
I think this should be fairly straight-forward using COUNT? Not quite sure how to go about it though. Any help is greatly appreciated
Regards
This is a very long method that I just wrote. It only lists the most frequent address. You have to get these values and insert them into the table. See if it works for you:
select * from
(select d.company, count(d.address) as final, c.maxcount,d.address
from dbo.test d inner join
(select a.company,max(a.add_count) as maxcount from
(select company,address,count(address) as add_count from dbo.test group by company,address)a
group by a.company) c
on (d.company = c.company)
group by d.company,c.maxcount,d.address)e
where e.maxcount=e.final
Here is a query in standard SQL. It first counts records per company and address, then ranks them per company giving the most often occurring address rank #1. Then it only keeps those best ranked address records, joins with the table again and shows the results.
select
mytable.company,
mytable.address,
ranked.address as most_common_address
from mytable
join
(
select
company,
address,
row_number() over (partition by company oder by cnt desc) as rn
from
(
select
company,
address,
count(*) over (partition by company, address) as cnt
from mytable
) counted
) ranked on ranked.rn = 1
and ranked.company = mytable.company
and ranked.address = mytable.address;
This select statement will give you the most frequent occurrence. Let us call this A.
SELECT `value`,
COUNT(`value`) AS `value_occurrence`
FROM `my_table`
GROUP BY `value`
ORDER BY `value_occurrence` DESC
LIMIT 1;
To INSERT this into your table,
INSERT INTO db (col1, col2, col3) VALUES (val1, val2, A)
Note that you want that whole select statment for A!
You don't mention your DBMS. Here is a solution for Oracle.
select
company,
address,
(
select stats_mode(address)
from mytable this_company_only
where this_company_only.company = mytable.company
) as most_common_address
from mytable;
This looks a bit clumsy, because STATS_MODE is only available as an aggregate function, not as an analytic window function.

Pulling max values grouped by a variable with other columns in SQL

Say I have three columns in a very large table: a timestamp variable (last_time_started), a player name (Michael Jordan), and the team he was on the last time he started (Washington Wizards, Chicago Bulls), how do I pull the last time a player started, grouped by player, showing the team? For example:
if I did
select max(last_time_started), player, team
from table
group by 2
I would not know which team the player was on when he played his last game, which is important to me.
In Postgres the most efficient way is to use distinct on():
SELECT DISTINCT ON (player)
last_time_started,
player,
team,
FROM the_table
ORDER BY player, last_time_started DESC;
Using a window function is usually the second fastest solution, using a join with a derived table is usually the slowest alternative.
Here's a couple of ways to do this in Postgres:
With windowing functions:
SELECT last_time_started, player, team
FROM
(
SELECT
last_time_started,
player,
team,
CASE WHEN max(last_time_started) OVER (PARTITION BY PLAYER) = last_time_started then 'X' END as max_last_time_started
FROM table
)
WHERE max_last_time_started = 'x';
Or with a correlated subquery:
SELECT last_time_started, player, team
FROM table t1
WHERE last_time_started = (SELECT max(last_time_started) FROM table WHERE table.player = t1.player);
Try this solution
select s.*
from table s
inner join (
select max(t.last_time_started) as last_time_started, t.player
from table t
group by t.player) v on s.player = t.player and s.last_time_started = t.last_time_started
Also this approach should be faster, because it does not contain join
select v.last_time_started,
v.player,
v.team
from (
select t.last_time_started,
t.player,
t.team,
row_number() over (partition by t.player order by last_time_started desc) as n
from table t
) v
where v.n = 1

SQL random sample with groups

I have a university graduate database and would like to extract a random sample of data of around 1000 records.
I want to ensure the sample is representative of the population so would like to include the same proportions of courses eg
I could do this using the following:
select top 500 id from degree where coursecode = 1 order by newid()
union
select top 300 id from degree where coursecode = 2 order by newid()
union
select top 200 id from degree where coursecode = 3 order by newid()
but we have hundreds of courses codes so this would be time consuming and I would like to be able to reuse this code for different sample sizes and don't particularly want to go through the query and hard code the sample sizes.
Any help would be greatly appreciated
You want a stratified sample. I would recommend doing this by sorting the data by course code and doing an nth sample. Here is one method that works best if you have a large population size:
select d.*
from (select d.*,
row_number() over (order by coursecode, newid) as seqnum,
count(*) over () as cnt
from degree d
) d
where seqnum % (cnt / 500) = 1;
EDIT:
You can also calculate the population size for each group "on the fly":
select d.*
from (select d.*,
row_number() over (partition by coursecode order by newid) as seqnum,
count(*) over () as cnt,
count(*) over (partition by coursecode) as cc_cnt
from degree d
) d
where seqnum < 500 * (cc_cnt * 1.0 / cnt)
Add a table for storing population.
I think it should be like this:
SELECT *
FROM (
SELECT id, coursecode, ROW_NUMBER() OVER (PARTITION BY coursecode ORDER BY NEWID()) AS rn
FROM degree) t
LEFT OUTER JOIN
population p ON t.coursecode = p.coursecode
WHERE
rn <= p.SampleSize
It is not necessary to partition the population at all.
If you are taking a sample of 1000 from a population among hundreds of course codes, it stands to reason that many of those course codes will not be selected at all in any one sampling.
If the population is uniform (say, a continuous sequence of student IDs), a uniformly-distributed sample will automatically be representative of population weighting by course code. Since newid() is a uniform random sampler, you're good to go out of the box.
The only wrinkle that you might encounter is if a student ID is a associated with multiple course codes. In this case make a unique list (temporary table or subquery) containing a sequential id, student id and course code, sample the sequential id from it, grouping by student id to remove duplicates.
I've done similar queries (but not on MS SQL) using a ROW_NUMBER approach:
select ...
from
( select ...
,row_number() over (partition by coursecode order by newid()) as rn
from degree
) as d
join sample size as s
on d.coursecode = s.coursecode
and d.rn <= s.samplesize

SQL query to select distinct row with minimum value

I want an SQL statement to get the row with a minimum value.
Consider this table:
id game point
1 x 5
1 z 4
2 y 6
3 x 2
3 y 5
3 z 8
How do I select the ids that have the minimum value in the point column, grouped by game? Like the following:
id game point
1 z 4
2 y 5
3 x 2
Use:
SELECT tbl.*
FROM TableName tbl
INNER JOIN
(
SELECT Id, MIN(Point) MinPoint
FROM TableName
GROUP BY Id
) tbl1
ON tbl1.id = tbl.id
WHERE tbl1.MinPoint = tbl.Point
This is another way of doing the same thing, which would allow you to do interesting things like select the top 5 winning games, etc.
SELECT *
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Point) as RowNum, *
FROM Table
) X
WHERE RowNum = 1
You can now correctly get the actual row that was identified as the one with the lowest score and you can modify the ordering function to use multiple criteria, such as "Show me the earliest game which had the smallest score", etc.
This will work
select * from table
where (id,point) IN (select id,min(point) from table group by id);
As this is tagged with sql only, the following is using ANSI SQL and a window function:
select id, game, point
from (
select id, game, point,
row_number() over (partition by game order by point) as rn
from games
) t
where rn = 1;
Ken Clark's answer didn't work in my case. It might not work in yours either. If not, try this:
SELECT *
from table T
INNER JOIN
(
select id, MIN(point) MinPoint
from table T
group by AccountId
) NewT on T.id = NewT.id and T.point = NewT.MinPoint
ORDER BY game desc
SELECT DISTINCT
FIRST_VALUE(ID) OVER (Partition by Game ORDER BY Point) AS ID,
Game,
FIRST_VALUE(Point) OVER (Partition by Game ORDER BY Point) AS Point
FROM #T
SELECT * from room
INNER JOIN
(
select DISTINCT hotelNo, MIN(price) MinPrice
from room
Group by hotelNo
) NewT
on room.hotelNo = NewT.hotelNo and room.price = NewT.MinPrice;
This alternative approach uses SQL Server's OUTER APPLY clause. This way, it
creates the distinct list of games, and
fetches and outputs the record with the lowest point number for that game.
The OUTER APPLY clause can be imagined as a LEFT JOIN, but with the advantage that you can use values of the main query as parameters in the subquery (here: game).
SELECT colMinPointID
FROM (
SELECT game
FROM table
GROUP BY game
) As rstOuter
OUTER APPLY (
SELECT TOP 1 id As colMinPointID
FROM table As rstInner
WHERE rstInner.game = rstOuter.game
ORDER BY points
) AS rstMinPoints
This is portable - at least between ORACLE and PostgreSQL:
select t.* from table t
where not exists(select 1 from table ti where ti.attr > t.attr);
Most of the answers use an inner query. I am wondering why the following isn't suggested.
select
*
from
table
order by
point
fetch next 1 row only // ... or the appropriate syntax for the particular DB
This query is very simple to write with JPAQueryFactory (a Java Query DSL class).
return new JPAQueryFactory(manager).
selectFrom(QTable.table).
setLockMode(LockModeType.OPTIMISTIC).
orderBy(QTable.table.point.asc()).
fetchFirst();
Try:
select id, game, min(point) from t
group by id

SQL - Displaying entries that are the max of a count?

CREATE TABLE doctor( patient CHAR(13), docname CHAR(30) );
Say I had a table like this, then how would I display the names of the doctors that have the most patients? Like if the most was three and two doctors had three patients then I would display both of their names.
This would get the max patients:
SELECT MAX(count)
FROM (SELECT COUNT(docname) FROM doctor GROUP BY docname) a;
This is all the doctors and how many patients they have:
SELECT docname, COUNT(docname) FROM doctor GROUP BY name;
Now I can't figure out how to combine them to list only the names of doctors who have the max patients.
Thanks.
This should do it.
SELECT docname, COUNT(*) FROM doctor GROUP BY name HAVING COUNT(*) =
(SELECT MAX(c) FROM
(SELECT COUNT(patient) AS c
FROM doctor
GROUP BY docname))
On the other hand if you require only the first entry, then
SELECT docname, COUNT(docname) FROM doctor
GROUP BY name
ORDER BY COUNT(docname) DESC LIMIT 1;
This should do it for you:
SELECT docname
FROM doctor
GROUP BY docname
HAVING COUNT(patient)=
(SELECT MAX(patientcount) FROM
(SELECT docname,COUNT(patient) AS patientcount
FROM doctor
GROUP BY docname) t1)
Here's another alternative that only has one subquery instead of two:
SELECT docname
FROM author
GROUP BY name
HAVING COUNT(*) = (
SELECT COUNT(*) AS c
FROM author
GROUP BY name
ORDER BY c DESC
LIMIT 1
)
Allowing for any feature in any ISO SQL specification since you did not specify a database product or version, and assuming that the table of patients is called "patients" and has a column called "docname", the following might give you what you wanted:
With PatientCounts As
(
Select docname
, Count(*) As PatientCount
From patient
Group By docname
)
, RankedCounts As
(
Select docname, PatientCount
, Rank() Over( Order By PatientCount ) As PatientCountRank
From PatientCounts
)
Select docname, PatientCount, PatientCountRank
From RankedCounts
Where PatientCountRank = 1
While using ... HAVING COUNT(*) = ( ...MAX().. ) works:
Within the query, it needs almost the same sub-query twice.
For most databases, it needs a 2nd level sub-query as MAX( COUNT(*) )
is not supported.
While using TOP / LIMIT / RANK etc works:
It uses SQL extensions for a specific database.
Also, using TOP / LIMIT of 1 will only give one row - what if there are two or more doctors with the same maximum number of patients?
I would break the problem into steps:
Get target field(s) and associated count
SELECT docName, COUNT( patient ) AS countX
FROM doctor
GROUP BY docName
Using the above as a 'statement scoped view', use this to get the max count row(s)
WITH x AS
(
SELECT docName, COUNT( patient ) AS countX
FROM doctor
GROUP BY docName
)
SELECT x.docName, x.countX
FROM x
WHERE x.countX = ( SELECT MAX( countX ) FROM x )
The WITH clause, which defines a 'statement scoped view', effectively gives named sub-queries that can be re-used within the same query.
While this solution, using statement scoped views, is longer, it is:
Easier to test
Self documenting
Extendable
It is easier to test as parts of the query can be run standalone.
It is self documenting as the query directly reflects the requirement
ie the statement scoped view lists the target field(s) and associated count.
It is extendable as if other conditions or fields are required, this can be easily added to the statement scoped view.
eg in this case, the table stucture should be changed to include a doctor-id as a primary key field and this should be part of the results.
Take both queries and join them together to get the max:
SELECT
docName,
m.MaxCount
FROM
author
INNER JOIN
(
SELECT
MAX(count) as MaxCount,
docName
FROM
(SELECT
COUNT(docname)
FROM
doctor
GROUP BY
docname
)
) m ON m.DocName = author.DocName
Another alternative using CTE:
with cte_DocPatients
as
(
select docname, count(*) as patientCount
from doctor
group by docname
)
select docname, patientCount from
cte_DocPatients where
patientCount = (select max(patientCount) from cte_DocPatients)
if you do not need to care about performance I think just sorting and pick first element. Something like this:
SELECT docname, COUNT(docname) as CNT
FROM doctor
WHERE docname = docname
GROUP BY docname
ORDER BY CNT DESC
LIMIT 1
This will give you each doctor name and respective count of treating patients
SELECT docname, COUNT(docname) as TreatingPatients FROM doctor
WHERE docname = docname
GROUP BY docname