SQL - Comparing two rows and two columns - sql

I'm studying SQL and can't seem to find an answer to this exercise.
Exercise: For all cases where the same reviewer rated the same movie twice and gave it a higher rating the second time, return the reviewer's name and the title of the movie.
I don't know how to compare 2 rows and then get the higher rating.
The tables' schemas are:
Movie ( mID, title, year, director )
English: There is a movie with ID number mID, a title, a release
year, and a director.
Reviewer ( rID, name )
English: The reviewer with ID number rID has a certain name.
Rating ( rID, mID, stars, ratingDate )
English: The reviewer rID gave the movie mIDa number of stars rating (1-5) on a certain ratingDate.*
Researching here in the forum I've got as far as to this point:
select *
from rating a
join Reviewer rv on rv.rid = a.rid
where 1 < (select COUNT(*) from rating b
where b.rid = a.rid and b.mid = a.mid)
I'd be glad to be given also an explanation of the code. Since even the code above is making me really confused.
/* Create the schema for our tables */
create table Movie(mID int, title text, year int, director text);
create table Reviewer(rID int, name text);
create table Rating(rID int, mID int, stars int, ratingDate date);
/* Populate the tables with our data */
insert into Movie values(101, 'Gone with the Wind', 1939, 'Victor Fleming');
insert into Movie values(102, 'Star Wars', 1977, 'George Lucas');
insert into Movie values(103, 'The Sound of Music', 1965, 'Robert Wise');
insert into Movie values(104, 'E.T.', 1982, 'Steven Spielberg');
insert into Movie values(105, 'Titanic', 1997, 'James Cameron');
insert into Movie values(106, 'Snow White', 1937, null);
insert into Movie values(107, 'Avatar', 2009, 'James Cameron');
insert into Movie values(108, 'Raiders of the Lost Ark', 1981, 'Steven Spielberg');
insert into Reviewer values(201, 'Sarah Martinez');
insert into Reviewer values(202, 'Daniel Lewis');
insert into Reviewer values(203, 'Brittany Harris');
insert into Reviewer values(204, 'Mike Anderson');
insert into Reviewer values(205, 'Chris Jackson');
insert into Reviewer values(206, 'Elizabeth Thomas');
insert into Reviewer values(207, 'James Cameron');
insert into Reviewer values(208, 'Ashley White');
insert into Rating values(201, 101, 2, '2011-01-22');
insert into Rating values(201, 101, 4, '2011-01-27');
insert into Rating values(202, 106, 4, null);
insert into Rating values(203, 103, 2, '2011-01-20');
insert into Rating values(203, 108, 4, '2011-01-12');
insert into Rating values(203, 108, 2, '2011-01-30');
insert into Rating values(204, 101, 3, '2011-01-09');
insert into Rating values(205, 103, 3, '2011-01-27');
insert into Rating values(205, 104, 2, '2011-01-22');
insert into Rating values(205, 108, 4, null);
insert into Rating values(206, 107, 3, '2011-01-15');
insert into Rating values(206, 106, 5, '2011-01-19');
insert into Rating values(207, 107, 5, '2011-01-20');
insert into Rating values(208, 104, 3, '2011-01-02');

something like that should work (they are other ways, too)
SELECT rev.name, m.title
FROM Reviewer rev
INNER JOIN Rating r1 on r1.rID = rev.rID
INNER JOIN Rating r2 on r2.rID = rev.rID and r2.mID = r1.mID
INNER JOIN Movie m on m.mID = r1.mID
WHERE r2.ratingDate > r1.ratingDate and r2.stars > r1.stars
or you can do all in join (instead of WHERE clause) in this case
SELECT rev.name, m.title
FROM Reviewer rev
INNER JOIN Rating r1 on r1.rID = rev.rID
INNER JOIN Rating r2
on r2.rID = rev.rID
and r2.mID = r1.mID
and r2.ratingDate > r1.ratingDate
and r2.stars > r1.stars
INNER JOIN Movie m on m.mID = r1.mID
SqlFiddle (with your sample datas)
Explanation : I suppose you know the JOIN syntax, so
The trick is to join Rating two times.
Then the WHERE part checks if there's exist a line where one of the rating (from same reviewer on same movie) has a bigger ratingDate and more stars. Which checks : "gave it a higher rating the second time".
Then we just group by reviewerName and movie title (this part is to avoid duplicates if we have 3 reviews, the second having more stars than the first, and the third more than the second) : with your sample datas, the GROUP BY is not needed, but...

Start with getting all the reviewers who reviewed exactly twice:
select rid
from rating r
group by rid
having count(*) = 2
Now the question is: are they the same or is the second larger? To do this, join back in the ratings, but also include the two dates:
from (select rid, min(ratingdate) as minratingdate, max(ratingdate) as maxratingdate
from rating r
group by rid
having count(*) = 2
) twotimes join
rating r1
on r1.rid = twotimes.rid and r1.ratingdate = twotimes.minratingdate join
rating r2
on r2.rid = twotimes.rid and r2.ratingdate = twotimes.maxratingdate
This brings in the information about the two reviews. You can finish the query from here.

You can use GROUP BY and HAVING:
SELECT m.mId, m.Title, r.Name, ra.stars
FROM Movie m
JOIN (SELECT mId, rId, MAX(stars) stars
FROM Rating
GROUP BY mId, rId
HAVING COUNT(*) > 1) ra ON m.mId = ra.mId
JOIN Reviewer r ON ra.rId = r.rId
GROUP BY m.mId, m.Title, r.Name, ra.stars
This will return you any movie that has multiple reviews from the same reviewer with the highest number of stars.
Here is the SQL Fiddle for testing.
Good luck.

Related

SQL Query: Matching highest review with the review author - Movie database

I'm working on a SQL homework problem:
"For each movie that has at least one rating, find the movie title and total number of stars, the highest star and the person who gave highest star."
Database:
create table Movies(mID integer, title varchar(100));
create table Reviewers(rID integer, name varchar(100));
create table Ratings(rID integer, mID integer, stars integer);
insert into Movies values(101, 'Gone with the Wind');
insert into Movies values(102, 'Star Wars');
insert into Movies values(103, 'The Sound of Music');
insert into Reviewers values(201, 'Sarah Martinez');
insert into Reviewers values(202, 'Daniel Lewis');
insert into Reviewers values(203, 'Brittany Harris');
insert into Ratings values(201, 101, 2);
insert into Ratings values(203, 101, 4);
insert into Ratings values(203, 102, 4);
insert into Ratings values(203, 103, 4);
insert into Ratings values(202, 103, 2);
Best query I can come up with is:
SELECT title,
SUM(stars) AS total_stars,
MAX(stars) AS highest_stars,
name AS highest_stars_reviewer
FROM Movies
INNER JOIN Ratings USING(mID)
INNER JOIN Reviewers USING(rID)
GROUP BY mID;
The problem is that instead of returning the name of the reviewer who gave the highest stars review, the query returns the reviewer with the lower rID who reviewed the movie.
I would appreciate any help with this query to get the desired result.
The proper way to do this in SQL uses window functions:
SELECT m.title,
SUM(r.stars) AS total_stars,
MAX(r.stars) AS highest_stars,
MAX(CASE WHEN r.seqnum = 1 THEN rv.name END) AS highest_stars_reviewer
FROM Movies m INNER JOIN
(SELECT r.*,
ROW_NUMBER() OVER (PARTITION BY m.id ORDER BY r.stars DESC) as seqnum
FROM Ratings r
) r
USING (mID) INNER JOIN
Reviewers rv
USING (rID)
GROUP BY m.title, m.mID;
Notes:
If you are using multiple tables in a query, you should be qualifying all column names.
The GROUP BY columns should match the SELECT columns -- although your version is okay because SQL allows you to do this when the aggregation key is a unique key.
This returns an arbitrary reviewer with the highest stars, in the event that there is more than one review with the maximum.

Inner join an inner join with another inner join

I'm wondering if it is possible to inner join an inner join with another inner join.
I have a database of 3 tables:
people
athletes
coaches
Every athlete or coach must exist in the people table, but there are some people who are neither coaches nor athletes.
What I am trying to do is find a list of people who are active (meaning play or coach) in at least 3 different sports. The definition of active is they are either coaches, athletes or both a coach and an athlete for that sport.
The person table would consist of (id, name, height)
the athlete table would be (id, sport)
the coaching table would be (id, sport)
I have created 3 inner joins which tell me who is both a coach and and an athlete, who is just a coach and who is just an athlete.
This is done via inner joins.
For example,
1) who is both a coach and an athlete
select
person.id,
person.name,
coach.sport as 'Coaches and plays this sport'
from coach
inner join athlete
on coach.id = athlete.id
and coach.sport = athlete.sport
inner join person
on athlete.id = person.id
That brings up a list of everyone who both coaches and plays the same sport.
2) To find out who only coaches sports, I have used inner joins as below:
select
person.id,
person.name,
coach.sport as 'Coaches this sport'
from coach
inner join person
on coach.id = person.id
3) Then to find out who only plays sports, I've got the same as 2) but just tweaked the words
select
person.id,
person.name,
athlete.sport as 'Plays this sport'
from athlete
inner join person
on athlete.id = person.id
The end result is now I've got:
1) persons who both play and coach the same sport
2) persons who coach a sport
3) persons who play a sport
What I would like to know is how to find a list of people who play or coach at least 3 different sports? I can't figure it out because if someone plays and coaches a sport like hockey in table 1, then I don't want to count them in table 2 and 3.
I tried using these 3 inner joins to make a massive join table so that I could pick the distinct values but it is not working.
Is there an easier way to go about this without making sub-sub-queries?
What I would like to know is how to find a list of people who play /
coach at least 3 different sports? I can't figure it out because if
someone plays and coaches a sport like hockey in table 1, then I don't
want to count them in table 2 and 3.
you can do something like this
select p.id,min(p.name) name
from
person p inner join
(
select id,sport from athlete
union
select id,sport from coach
)
ca
on ca.id=p.id
group by p.id
having count(ca.sport)>2
CREATE TABLE #person (Id INT, Name VARCHAR(50));
CREATE TABLE #athlete (Id INT, Sport VARCHAR(50));
CREATE TABLE #coach (Id INT, Sport VARCHAR(50));
INSERT INTO #person (Id, Name) VALUES(1, 'Bob');
INSERT INTO #person (Id, Name) VALUES(2, 'Carol');
INSERT INTO #person (Id, Name) VALUES(2, 'Sam');
INSERT INTO #athlete (Id, Sport) VALUES(1, 'Golf');
INSERT INTO #athlete (Id, Sport) VALUES(1, 'Football');
INSERT INTO #coach (Id, Sport) VALUES(1, 'Tennis');
INSERT INTO #athlete (Id, Sport) VALUES(2, 'Tennis');
INSERT INTO #coach (Id, Sport) VALUES(2, 'Tennis');
INSERT INTO #athlete (Id, Sport) VALUES(2, 'Swimming');
-- so Bob has 3 sports, Carol has only 2 (she both coaches and plays Tennis)
SELECT p.Id, p.Name
FROM
(
SELECT Id, Sport
FROM #athlete
UNION -- this has an implicit "distinct"
SELECT Id, Sport
FROM #coach
) a
INNER JOIN #person p ON a.Id = p.Id
GROUP BY p.Id, p.Name
HAVING COUNT(*) >= 3
-- returns 1, Bob
I have created a SQL with some test data - should work in your case:
Connecting the two results in the subselect with UNION:
UNION will return just non-duplicate values. So every sport will be just counted once.
Finally just grouping the resultset by person.Person_id and person.name.
Due to the HAVING clause, just persons with 3 or more sports will be returned-
CREATE TABLE person
(
Person_id int
,name varchar(50)
,height int
)
CREATE TABLE coach
(
id int
,sport varchar(50)
)
CREATE TABLE athlete
(
id int
,sport varchar(50)
)
INSERT INTO person VALUES
(1,'John', 130),
(2,'Jack', 150),
(3,'William', 170),
(4,'Averel', 190),
(5,'Lucky Luke', 180),
(6,'Jolly Jumper', 250),
(7,'Rantanplan ', 90)
INSERT INTO coach VALUES
(1,'Football'),
(1,'Hockey'),
(1,'Skiing'),
(2,'Tennis'),
(2,'Curling'),
(4,'Tennis'),
(5,'Volleyball')
INSERT INTO athlete VALUES
(1,'Football'),
(1,'Hockey'),
(2,'Tennis'),
(2,'Volleyball'),
(2,'Hockey'),
(4,'Tennis'),
(5,'Volleyball'),
(3,'Tennis'),
(6,'Volleyball'),
(6,'Tennis'),
(6,'Hockey'),
(6,'Football'),
(6,'Cricket')
SELECT person.Person_id
,person.name
FROM person
INNER JOIN (
SELECT id
,sport
FROM athlete
UNION
SELECT id
,sport
FROM coach
) sports
ON sports.id = person.Person_id
GROUP BY person.Person_id
,person.name
HAVING COUNT(*) >= 3
ORDER BY Person_id
The coaches & athletes, ie people who are coaches or athletes, are relevant to your answer. That is union (rows in one or another), not (inner) join rows in one and another). (Although outer join involves a union, so there is a complicated way to use it here.) But there's no point in getting that by unioning only-coaches, only-athletes & coach-athletes.
Idiomatic is to group & count the union of Athletes & Coaches.
select id
from (select * from Athletes union select * from Coaches) as u
group by id
having COUNT(*) >= 3
Alternatively, you want ids of people who coach or play a 1st sport and coach or play a 2nd sport and coach or play a 3rd sport where the sports are all different.
with u as (select * from Athletes union select * from Coaches)
select u1.id
from u u1
join u u2 on u1.id = u2.id
join u u3 on u2.id = u3.id
where u1.sport <> u2.sport and u2.sport <> u3.sport and u1.sport <> u3.sport
If you wanted names you would join that with People.
Is there any rule of thumb to construct SQL query from a human-readable description?](https://stackoverflow.com/a/33952141/3404097)

PostgreSQL: Get an entity with all his relationships

I have a table "Cars" and a table "Person". A Person drives many Cars and a Car can be driven by many People so I have another table "Person_Car" which has both id's per row.
Car(id, name)
Person(id, name)
Person_Car(car_id, person_id)
How can I get a list of all people with the cars it drives (car names concatenated), something like this:
("John", "Car 1, Car 2, Car 3")
("Kate", "Car 2, Car 4, Car 5")
Example is here: http://sqlfiddle.com/#!15/ba949/1
Test data:
Create table Car(id int, name text);
Create table Person(id int, name text);
Create table Person_Car(car_id int, person_id int);
INSERT INTO Car VALUES (1, 'Car 1'),
(2, 'Car 2'),
(3, 'Car 3'),
(4, 'Car 4'),
(5, 'Car 5');
INSERT INTO Person VALUES(1, 'John'), (2, 'Kate');
INSERT INTO Person_Car VALUES (1,1), (2,1), (3,1), (2,2), (4,2), (5,2);
Your desired code:
SELECT p.name, array_to_string(array_agg(c.name), ',') FROM Person p
INNER JOIN Person_Car pc ON p.id=pc.person_id
INNER JOIN Car c ON c.id=pc.car_id
GROUP by p.name
Output:
John Car 1,Car 2,Car 3
Kate Car 2,Car 4,Car 5
Just in case you want to avoid the GROUP BY
Option 1
WITH forienKeyTable AS
(
SELECT pc.person_id, c.name
FROM Car c
JOIN Person_Car pc ON pc.car_id = c.id
)
SELECT p.name
, array_to_string
(ARRAY(
SELECT fkt.name
FROM forienKeyTable fkt
WHERE fkt.person_id = p.id
)::text[], ','::text, 'empty'::text)
FROM Person p;
Option 2
SELECT p.name
, array_to_string
(ARRAY(
SELECT c.name
FROM Car c
JOIN Person_Car pc ON pc.car_id = c.id
WHERE pc.person_id = p.id
)::text[], ','::text, 'empty'::text)
FROM Person p;

SQL query involving NULL values

The following are the tables I am working with:
Movie (mID, title, year, director)
Reviewer (rID, name)
Rating (rID, mID, stars, ratingDate)
Which statement would I use to display all reviewers that have a NULL value for the date (ratingDate) meaning I need to extract information from both the Reviewer and Rating table.
I have tried different things with the IS NULL command but to no avail. Any help would be appreciated.
SELECT
rev.rID
, rev.name
, rat.mID
, rat.stars
FROM
Reviewer AS rev
JOIN
Rating AS rat
ON rat.rID = rev.rID
WHERE
rat.ratingDate IS NULL
SELECT Reviewer.name
FROM Reviewer
INNER JOIN Rating
ON Reviewer.rID = Rating.rID
WHERE Rating.ratingDate IS NULL
Try this, I think it will do.
it should be trivial
select *
from rating
join reviewer on rating.rid = reviewer.rid
join movie on rating.mid = movie.mid
where ratingdate is null
so there might be other problems - e.g. the ratingdate is not a date and not null but an empty string (ratingdate = '')?
You need to use a LEFT JOIN. That is, a JOIN that includes null values on one side.

How to get results of this particular "query"

NOTE: I accidentally put another question's sentence in here (massive apologies on my part), I have updated this post as of Wednesday 14th March at 23:21pm with the correct question.
I have spent a few hours trying to figure out this question without anyone's help but have realised I have wasted too much productive time and should've asked someone sooner. I had a decent crack at this and have come so close but cannot get the final solution I need. What I am supposed to get is:
For all cases where the same reviewer rated the same movie twice and
gave it a higher rating the second time, return the reviewer's name
and the title of the movie.
This is the query I managed to get here:
SELECT reviewer.name, movie.title, rating.stars
FROM (reviewer JOIN rating ON reviewer.rid = rating.rid)
JOIN movie ON movie.mid = rating.mid
GROUP BY reviewer.name
HAVING COUNT(*) >= 2
ORDER BY reviewer.name DESC
(I have a feeling there is a missing WHERE clause from the above query, but am not sure where to place it)
(From what I have learned, RIGHT and FULL OUTER JOINs are not currently supported in SQLite)
And here are the tables and data (in pictures)...
... And the DB code...
/* Delete the tables if they already exist */
drop table if exists Movie;
drop table if exists Reviewer;
drop table if exists Rating;
/* Create the schema for our tables */
create table Movie(mID int, title text, year int, director text);
create table Reviewer(rID int, name text);
create table Rating(rID int, mID int, stars int, ratingDate date);
/* Populate the tables with our data */
insert into Movie values(101, 'Gone with the Wind', 1939, 'Victor Fleming');
insert into Movie values(102, 'Star Wars', 1977, 'George Lucas');
insert into Movie values(103, 'The Sound of Music', 1965, 'Robert Wise');
insert into Movie values(104, 'E.T.', 1982, 'Steven Spielberg');
insert into Movie values(105, 'Titanic', 1997, 'James Cameron');
insert into Movie values(106, 'Snow White', 1937, null);
insert into Movie values(107, 'Avatar', 2009, 'James Cameron');
insert into Movie values(108, 'Raiders of the Lost Ark', 1981, 'Steven Spielberg');
insert into Reviewer values(201, 'Sarah Martinez');
insert into Reviewer values(202, 'Daniel Lewis');
insert into Reviewer values(203, 'Brittany Harris');
insert into Reviewer values(204, 'Mike Anderson');
insert into Reviewer values(205, 'Chris Jackson');
insert into Reviewer values(206, 'Elizabeth Thomas');
insert into Reviewer values(207, 'James Cameron');
insert into Reviewer values(208, 'Ashley White');
insert into Rating values(201, 101, 2, '2011-01-22');
insert into Rating values(201, 101, 4, '2011-01-27');
insert into Rating values(202, 106, 4, null);
insert into Rating values(203, 103, 2, '2011-01-20');
insert into Rating values(203, 108, 4, '2011-01-12');
insert into Rating values(203, 108, 2, '2011-01-30');
insert into Rating values(204, 101, 3, '2011-01-09');
insert into Rating values(205, 103, 3, '2011-01-27');
insert into Rating values(205, 104, 2, '2011-01-22');
insert into Rating values(205, 108, 4, null);
insert into Rating values(206, 107, 3, '2011-01-15');
insert into Rating values(206, 106, 5, '2011-01-19');
insert into Rating values(207, 107, 5, '2011-01-20');
insert into Rating values(208, 104, 3, '2011-01-02');
I have another relatively similar question like this, but if I get some help on this one I should be able to apply the patterns and techniques from this one to the next one.
Thanks in advance! :)
I have added an inner join with derived table that returns maximum stars per movie. Because of inner join between movies and ratings only movies with ratings will be retrieved. Join it back to main query to get maximum stars per movie.
Note: you stated that you wish to order by movie title but your query orders by reviewer.
SELECT reviewer.name, movie.title, rating.stars, maxStarsPerMovie.MaxStars
FROM (reviewer JOIN rating ON reviewer.rid = rating.rid)
JOIN movie ON movie.mid = rating.mid
join
(
select movie.mid, max(rating.stars) MaxStars
from movie
inner join rating
on movie.mid = rating.mid
group by movie.mid
) maxStarsPerMovie
on movie.mid = maxStarsPerMovie.mid
ORDER BY reviewer.name DESC
EDIT: requiremets changed. This query will return list of reviewers who changed their opinion at later date in favor of the movie. It does so by joining ratings for the second time adding two filters on stars and date to join.
SELECT reviewer.name, movie.title, rating.ratingDate, rating.stars,
newRating.ratingDate newRatingDate, newRating.Stars newRatingStars
FROM (reviewer JOIN rating ON reviewer.rid = rating.rid)
JOIN movie ON movie.mid = rating.mid
inner join rating newRating
on newRating.mid = movie.mid
and newRating.rid = reviewer.rid
and newRating.ratingdate > rating.ratingdate
and newRating.stars > rating.stars
ORDER BY reviewer.name, movie.title
From the description of the requirement:
Return the movie title and number of stars (sorted by movie title) For each movie that has at least one rating, and find the highest number of stars that movie received.
The reviewer details do not appear to be required - only the Movie and the maximum stars.
Therefore, I suggest:
SELECT movie.mid, MAX(movie.title) as title, MAX(rating.stars) as max_stars
FROM rating
JOIN movie ON movie.mid = rating.mid
GROUP BY movie.mid
ORDER BY 2, 1
You were probably doing this for Stanford's SQL mini-course, as I just did. Here's what I got for my answer (I had no experience with SQL prior to watching the lectures, so hopefully this isn't too terrible):
Start with a query that finds each rID for a reviewer who rated a movie twice and gave it a higher score the second time:
select R1.rID from Rating R1, Rating R2 where R1.mID = R2.mID and R1.rID = R2.rID
and R1.ratingDate < R2.ratingDate and R2.stars > R1.stars;
think of R1 as the first rating of a particular movie by a particular reviewer, and R2 as the second.
We need to be talking about 2 reviews of the same movie by the same person, hence R1.mID = R2.mID and R1.rID = R2.rID. Next, to make sure that R1 was indeed first, chronologically, we set R1.ratingDate < R2.ratingDate, and to make sure that R2 was indeed given a greater score, we set R2.stars > R1.stars.
You can check to see that this gives us rID 201, which is the correct answer (can verify by checking the data). Now we need to display the movie reviewer name and title instead of the rID.
I did this by doing the cross product of all 3 relations (I suppose using joins would be cleaner?), removing duplicates and using the query listed above as a where clause subquery:
select distinct name, title
from Movie, Reviewer, Rating
where Movie.mID = Rating.mID and Reviewer.rID = Rating.rID and Rating.rID in (select R1.rID from
Rating R1, Rating R2 where R1.mID =R2.mID and R1.rID = R2.rID and R1.ratingDate < R2.ratingDate
and R2.stars > R1.stars);
In the where clause I just made the cross products into natural joins by setting the respective mIDs and rIDs equal, and made sure that the rIDs (called Rating.rID for disambiguation) were determined by the initial query that I wrote.
I would suggest using a Self Join in subquery to identify the reviewer and then use the result to get name of reviewer and movie title.
SELECT name, title FROM Reviewer R join (SELECT Ra.mID as mID, Ra.rID as rID FROM Rating Ra JOIN Rating Rb ON Ra.mID=Rb.mID WHERE Ra.rID=Rb.rID AND Ra.ratingDate<Rb.ratingDate AND Ra.stars<Rb.stars) AS Tmp ON R.rID=Tmp.rID JOIN Movie M ON M.mID=Tmp.mID