how can i rewrite a select query in this situation - sql

Here are two table in parent/child relationship.
What i need to do is to select students with there average mark:
CREATE TABLE dbo.Students(
Id int NOT NULL,
Name varchar(15) NOT NULL,
CONSTRAINT PK_Students PRIMARY KEY CLUSTERED
(
CREATE TABLE [dbo].[Results](
Id int NOT NULL,
Subject varchar(15) NOT NULL,
Mark int NOT NULL
)
ALTER TABLE [dbo].[Results] WITH CHECK ADD CONSTRAINT [FK_Results_Students] FOREIGN KEY([Id])
REFERENCES [dbo].[Students] ([Id])
I wrote a query like this :
SELECT name , coalesce(avg(r.[mark]),0) as Avmark
FROM students s
LEFT JOIN results r ON s.[id]=r.[id]
GROUP BY s.[name]
ORDER BY ISNULL(AVG(r.[mark]),0) DESC;
But the result is that all of students with there avg mark in desc order.What i need is to restrict result set with students that have the highest average mark agaist other,i.e.if the are two students with avg mark 50 and 1 with 25 i need to display only those students with 50.If there are only one student with highest avg mark- only he must appear in result set.How can i do this in best way?

SQL Server 2005+, using CTEs:
WITH grade_average AS (
SELECT r.id,
AVG(r.mark) 'avg_mark'
FROM RESULTS r
GROUP BY r.id),
highest_average AS (
SELECT MAX(ga.avg_mark) 'highest_avg_mark'
FROM grade_average ga)
SELECT DISTINCT
s.name,
ga.avg_mark
FROM STUDENTS s
JOIN grade_average ga ON ha.id = s.id
JOIN highest_average ha ON ha.highest_avg_mark = ga.avg_mark
Non-CTE equivalent:
SELECT DISTINCT
s.name,
ga.avg_mark
FROM STUDENTS s
JOIN (SELECT r.id,
AVG(r.mark) 'avg_mark'
FROM RESULTS r
GROUP BY r.id) ga ON ha.id = s.id
JOIN SELECT MAX(ga.avg_mark) 'highest_avg_mark'
FROM (SELECT r.id,
AVG(r.mark) 'avg_mark'
FROM RESULTS r
GROUP BY r.id) ga) ha ON ha.highest_avg_mark = ga.avg_mark

If you're using a relatively new version of MS SQL server, you can use WITH to make this simple to write:
WITH T AS (
SELECT
name,
coalesce(avg(r.[mark]),0) as mark
FROM students s
LEFT JOIN results r ON s.[id]=r.[id]
GROUP BY s.[name])
SELECT name as 'ФИО', mark as 'Средний бал'
FROM T
WHERE T.mark = (SELECT MAX(mark) from T)

Is it as simple as this? For all versions of SQL Server 2000+
SELECT TOP 1 WITH TIES
name, ISNULL(avg(r.[mark]),0) as AvMark
FROM
students s
LEFT JOIN
results r ON s.[id]=r.[id]
GROUP BY
s.[name]
ORDER BY
ISNULL(avg(r.[mark]),0) DESC;

SELECT name as 'ФИО',
coalesce(avg(r.[mark]),0) as 'Средний бал'
FROM students s
LEFT JOIN results r
ON s.[id]=r.[id]
GROUP BY s.[name]
HAVING AVG(r.[mark]) >= 50
ORDER BY ISNULL(AVG(r.[mark]),0) DESC
about HAVING clause

Related

Make string_agg() return unique values only [duplicate]

This question already has answers here:
Get unique values using STRING_AGG in SQL Server
(8 answers)
Closed 1 year ago.
I am working in SQL Server 2017 and I have the following two tables:
create table Computer (
Id int Identity(1, 1) not null,
Name varchar(100) not null,
constraint pk_computer primary key (Id)
);
create table HardDisk (
Id int Identity(1, 1) not null,
Interface varchar(100) not null,
ComputerId int not null,
constraint pk_harddisk primary key (Id),
constraint fk_computer_harddisk foreign key (ComputerId) references Computer(Id)
);
I have data such as:
Query
My current query is the following:
-- select query
select c.Id as computer_id,
string_agg(cast(hd.Interface as nvarchar(max)), ' ') as hard_disk_interfaces
from Computer c
left join HardDisk hd on c.Id = hd.ComputerId
group by c.Id;
This gets me the following:
computer_id | hard_disk_interfaces
-------------+----------------------
1 | SATA SAS
2 | SATA SAS SAS SAS SATA
However, I only want the distinct values, I'd like to end up with:
computer_id | hard_disk_interfaces
-------------+----------------------
1 | SATA SAS
2 | SATA SAS
I tried to put distinct in front of the string_agg, but that didn't work.
Here's a db-fiddle.
Incorrect syntax near the keyword 'distinct'.
string_agg is missing that feature , so you have to prepare the distinct list you want then aggregate them :
select id , string_agg(interface,' ') hard_disk_interfaces
from (
select distinct c.id, interface
from Computer c
left join HardDisk hd on c.Id = hd.ComputerId
) t group by id
for your original query :
select *
from ....
join (
<query above> ) as temp
...
group by ... , hard_disk_interfaces
A couple of other ways:
;WITH cte AS
(
SELECT c.Id, Interface = CONVERT(varchar(max), hd.Interface)
FROM dbo.Computer AS c
LEFT OUTER JOIN dbo.HardDisk AS hd ON c.Id = hd.ComputerId
GROUP BY c.Id, hd.Interface
)
SELECT Id, STRING_AGG(Interface, ' ')
FROM cte
GROUP BY Id;
or
SELECT c.Id, STRING_AGG(x.Interface, ' ')
FROM dbo.Computer AS c
OUTER APPLY
(
SELECT Interface = CONVERT(varchar(max), Interface)
FROM dbo.HardDisk WHERE ComputerID = c.Id
GROUP BY Interface
) AS x
GROUP BY c.Id;
Example db<>fiddle
If you are getting duplicates in a larger query with more joins, I would argue those duplicates are not duplicates coming out of STRING_AGG(), but rather duplicate rows coming from one or more of your 47 joins, not from this portion of the query. And I would guess that you still get those duplicates even if you leave out this join altogether.

SQL - Selecting highest scores for different categories

Lets say i've got a db with 3 tables:
Players (PK id_player, name...),
Tournaments (PK id_tournament, name...),
Game (PK id_turn, FK id_tournament, FK id_player and score)
Players participate in tournaments. Table called Game keeps track of each player's score for different tournaments)
I want to create a view that looks like this:
torunament_name Winner highest_score
Tournament_1 Jones 300
Tournament_2 White 250
I tried different aproaches but I'm fairly new to sql (and alsoto this forum)
I tried using union all clause like:
select * from (select "Id_player", avg("score") as "Score" from
"Game" where "Id_tournament" = '1' group by "Id_player" order by
"Score" desc) where rownum <= 1
union all
select * from (select "Id_player", avg("score") as "Score" from
"Game" where "Id_tournament" = '2' group by "Id_player" order by
"Score" desc) where rownum <= 1;
and ofc it works but whenever a tournament happens, i would have to manually add a select statement to this with Id_torunament = nextvalue
EDIT:
So lets say that player with id 1 scored 50 points in tournament a, player 2 scored 40 points, player 1 wins, so the table should show only player 1 as the winner (or if its possible 2or more players if its a tie) of this tournament. Next row shows the winner of second tournament. I dont think Im going to put multiple games for one player in the same tournament, but if i would, it would probably count avg from all his scores.
EDIT2:
Create table scripts:
create table players
(id_player numeric(5) constraint pk_id_player primary key, name
varchar2(50));
create table tournaments
(id_tournament numeric(5) constraint pk_id_tournament primary key,
name varchar2(50));
create table game
(id_game numeric(5) constraint pk_game primary key, id_player
numeric(5) constraint fk_id_player references players(id_player),
id_tournament numeric(5) constraint fk_id_tournament references
tournaments(id_tournament), score numeric(3));
RDBM screenshot
FINAL EDIT:
Ok, in case anyone is wondering I used Jorge Campos script, changed it a bit and it works. Thank you all for helping. Unfortunately I cannot upvote comments yet, so I can only thank by posting. Heres the final script:
select
t.name,
p.name as winner,
g.score
from
game g inner join tournaments t
on g.id_tournament = t.id_tournament
inner join players p
on g.id_player = p.id_player
inner join
(select g.id_tournament, g.id_player,
row_number() over (partition by t.name order by
score desc) as rd from game g join tournaments t on
g.id_tournament = t.id_tournament
) a
on g.id_player = a.id_player
and g.id_tournament = a.id_tournament
and a.rd=1
order by t.name, g.score desc;
This query could be simplified depending on the RDBMs you are using.
select
t.name,
p.name as winner,
g.score
from
game g inner join tournaments t
on g.id_tournament = t.id_tournament
inner join players p
on g.id_player = p.id_player
inner join
(select id_tournament,
id_player,
row_number() over (partition by t.name order by score desc) as rd
from game
) a
on g.id_player = a.id_player
and g.id_tournament = a.id_tournament
and a.rd=1
order by t.name, g.score desc
Assuming what you want as "Display high score of each player in each tournament"
your query would be like below in MS Sql server
select
t.name as tournament_name,
p.name as Winner,
Max(g.score) as [Highest_Score]
from Tournmanents t
Inner join Game g on t.id_tournament=g.id_tournament
inner join Players p on p.id_player=g.id_player
group by
g.id_tournament,
g.id_player,
t.name,
p.name
Please check this if this works for you
SELECT tournemntData.id_tournament ,
tournemntData.name ,
dbo.Players.name ,
tournemntData.Score
FROM dbo.Game
INNER JOIN ( SELECT dbo.Tournaments.id_tournament ,
dbo.Tournaments.name ,
MAX(dbo.Game.score) AS Score
FROM dbo.Game
INNER JOIN dbo.Tournaments ONTournaments.id_tournament = Game.id_tournament
INNER JOIN dbo.Players ON Players.id_player = Game.id_player
GROUP BY dbo.Tournaments.id_tournament ,
dbo.Tournaments.name
) tournemntData ON tournemntData.id_tournament =Game.id_tournament
INNER JOIN dbo.Players ON Players.id_player = Game.id_player
WHERE tournemntData.Score = dbo.Game.score

Trying to use a count column in the where part of a query

There is 2 tables called
Students
stuID
camID FK
Campus
camID PK
camName
I am trying to find the campuses with more than 4 students that include the camName, camID, (number of students)
This is what I got so far
SELECT
students.camID, campus.camName, SUM(students.stuID) as [count]
FROM
students
JOIN
campus ON campus.camID = students.camID
WHERE
[count] > 3
GROUP BY
students.camID, campus.camName
ORDER BY
[count]
All this gets me though is a error that 'Invalid comlumn name 'count'.
You can't use a column alias in a WHERE clause, because the WHERE clause is evaluated before the alias is even created. You also can't use an alias in the HAVING clause.
SELECT students.camID, campus.camName, COUNT(students.stuID) as studentCount
FROM students
JOIN campus
ON campus.camID = students.camID
GROUP BY students.camID, campus.camName
HAVING COUNT(students.stuID) > 3
ORDER BY studentCount
SELECT [t0].* FROM campus AS [t0]
INNER JOIN (SELECT COUNT(*) AS [value], [t1].camID
FROM students AS [t1]
GROUP BY [t1].camID )
AS [t2] ON [t0].camID = [t2].camID
WHERE [t2].[value] > 3
The first SQL products didn't support derived tables, so HAVING was invented. But now we do have derived tables, so we no longer need HAVING and indeed it can cause confusion (note legacy functionality is never removed from the SQL Standard):
SELECT *
FROM (
SELECT students.camID, campus.camName,
SUM(students.stuID) as [count]
FROM students
JOIN campus ON campus.camID = students.camID
GROUP
BY students.camID, campus.camName
) AS DT1
WHERE [count] > 3
ORDER
BY [count]

Oracle sql - referencing tables

My school task was to get names from my movie database actors which play in movies with highest ratings
I made it this way and it works :
select name,surname
from actor
where ACTORID in(
select actorid
from actor_movie
where MOVIEID in (
select movieid
from movie
where RATINGID in (
select ratingid
from rating
where PERCENT_CSFD = (
select max(percent_csfd)
from rating
)
)
)
);
the output is :
Gary Oldman
Sigourney Weaver
...but I'd like to also add to this select mentioned movie and its rating. It accessible in inner selects but I don't know how to join it with outer select in which i can work just with rows found in Actor Table.
Thank you for your answers.
You just need to join the tables properly. Afterwards you can simply add the columns you´d like to select. The final select could be looking like this.
select ac.name, ac.surname, -- go on selecting from the different tables
from actor ac
inner join actor_movie amo
on amo.actorid = ac.actorid
inner join movie mo
on amo.movieid = mo.movieid
inner join rating ra
on ra.ratingid = mo.ratingid
where ra.PERCENT_CSFD =
(select max(percent_csfd)
from rating)
A way to get your result with a slightly different method could be something like:
select *
from
(
select name, surname, percent_csfd, row_number() over ( order by percent_csfd desc) as rank
from actor
inner join actor_movie
using (actorId)
inner join movie
using (movieId)
inner join rating
using(ratingId)
(
where rank = 1
This uses row_number to evaluate the "rank" of the movie(s) and then filter for the movie(s) with the highest rating.

Flattening nested query in WHERE clause with NOT IN

Suppose I have these two tables, simplified for the purpose of the question:
CREATE TABLE merchandises
(
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price INT NOT NULL
)
CREATE TABLE gifts
(
id BIGSERIAL NOT NULL PRIMARY KEY,
from_user VARCHAR(255) REFERENCES users(id),
to_user VARCHAR(255) REFERENCES users(id),
with_merchandise BIGINT REFERENCES merchandises(id)
)
The merchandises table lists available merchandises. The gifts table show records that a user has sent a merchandise to another user as gift (proper index is in place to avoid duplication).
What I would like to query is a list of merchandises that a user can send to another user, provided that the merchandises should not have been gifted before.
This is a query that works, but I hope that I can find one that does not have a nested query, thinking that it might give better performance thanks to the optimizer of POSTGRESQL.
SELECT DISTINCT ON (m.id) m.id, m.name, m.description
FROM merchandises m
WHERE m.id NOT IN (
SELECT g.with_merchandise
FROM gifts g
WHERE g.from_user = 'some_user_id' AND g.to_user = 'some_other_user_id'
)
ORDER BY m.id ASC
LIMIT 20 OFFSET 0
In the previous attempt, I had this query, but I found out that it does not work:
SELECT DISTINCT ON (m.id) m.id, m.name, m.description
FROM merchandises m
LEFT JOIN gifts g
ON m.id = g.with_merchandise
WHERE g.id IS NULL
OR g.from_user <> 'some_user_id' AND g.to_user <> 'some_other_user_id'
ORDER BY m.id ASC
LIMIT 20 OFFSET 0
This query does not work because even though the WHERE clause filters out gift entries from two specific users, two other users might have given gifts with the same merchandise (same merchandise_id).
Even though you asked to remove the subquery, using a not exists subquery might run faster than not in especially if the not in query returns a lot of values:
SELECT m.id, m.name, m.description
FROM merchandises m
WHERE NOT EXISTS (
SELECT 1
FROM gifts g
WHERE g.with_merchandise = m.id
AND g.from_user = 'some_user_id'
AND g.to_user = 'some_other_user_id'
)
This query can take advantage of a composite key on gifts(with_merchandise,from_user,to_user)
If you still rather use left join, then move your conditions for from_user and to_user from the where to the on clause
SELECT m.id, m.name, m.description
FROM merchandises m
LEFT JOIN gifts g ON m.id = g.with_merchandise
AND g.from_user = 'some_user_id' AND g.to_user = 'some_other_user_id'
WHERE g.id IS NULL
ORDER BY m.id ASC
LIMIT 20 OFFSET 0
This uses a left outer join and should perform well.
SELECT m.*
FROM merchandises m
LEFT OUTER JOIN (SELECT with_merchandise FROM gifts WHERE from_user = 'some_user_id' AND to_user = 'some_other_user_id' GROUP BY with_merchandise) g ON m.id = g.with_merchandise
WHERE g.with_merchandise IS NULL
ORDER BY m.id ASC
LIMIT 20 OFFSET 0