Remove duplicate rows from answer of below query - sql

**List all directors who directed 5000 movies or more, in descending order of the number of movies they directed
The use of Distinct before d.name doesnot help.
result = pd.read_sql_query("SELECT d.name,count(*) as num
FROM PERSON d, M_DIRECTOR md
WHERE d.Pid = md.Pid
GROUP BY d.Pid,d.name
HAVING COUNT(*) >= 10
order by count(*) desc
",conn)

You must use proper explicit joins between the tables and count on distinct movies:
select
p.name,
count(distinct d.mid) num
from person p
inner join m_director d on d.pid = p.pid
inner join movie m on m.mid = d.mid
group by p.pid, p.name
having num >= 10
order by num desc

Probably you have duplicate records in Person table - people with the same name but different ids. Try to group just by name and not by id
result = pd.read_sql_query("SELECT d.name,count(*) as num
FROM PERSON d, M_DIRECTOR md
WHERE d.Pid = md.Pid
GROUP BY d.name
HAVING COUNT(*) >= 10
order by count(*) desc
",conn)

Related

sql: get a single-column table, order by column from another table

I have interconnected tables.
movies (main, parent) : id | title | year
people (child) : people_id | name | birthyear
ratings (child) : movie_id | rating | votes
stars (child) : movie_id | person_id
I need to make a query ang get a sinle column output from tables "movies-people-stars" and order that by column from the table "rating" without joining column "rating" to my output.
My code:
SELECT title from movies
where id in (select movie_id from stars
where person_id in(select id from people where name = "Chadwick Boseman"))LIMIT 5;
It returns all titles of movies where Chadwick Boseman plays. I need to order them by rating. How to do it?
Although this would never be done without a join, since it is homework, you can use a correlated subquery for the table ratings in the ORDER BY clause:
select m.title
from movies m
inner join stars s on s.movie_id = m.id
inner join people p on p.people_id = s.person_id
where p.name = 'Chadwick Boseman'
order by (select r.rating from ratings r where r.movie_id = m.id) desc
limit 5
You could also use your query and add the ORDER BY clause:
select m.title
from movies m
where m.id in (
select movie_id
from stars
where person_id in(
select id
from people
where name = 'Chadwick Boseman'
)
)
order by (select r.rating from ratings r where r.movie_id = m.id) desc
limit 5;
You need to include the column in the select list to order by that column. Order by sorts your output in the order of the column you specify. Also, why can't you use JOINs for your query like below.
SELECT m.title,d.rating
FROM movies m
JOIN stars s ON s.movie_id = m.id
JOIN people p ON p.id = s.person_id
JOIN tbl d ON d.xx = z.yy ----- JOIN the table d here and use it in select . replace z,xx and yy with actual table name and columns.
WHERE p.name = "Chadwick Boseman"
ORDER BY d.rating
LIMIT 5
updated* - It might work but not able to test as I don't have access to actual data and tables.
SELECT m.title
FROM movies m
JOIN stars s ON s.movie_id = m.id
JOIN people p ON p.id = s.person_id
WHERE p.name = 'Chadwick Boseman'
AND m.id in (SELECT top 5 movie_id
FROM ratings r
WHERE r.movie_id = m.id
ORDER BY ratings desc)

Shorten a query

I have to write a query that would calculate number of tickets purchased consisting only of movie genre of that type. At the end, I have to return movie genre and number of tickets bought for that genre. I have written a query but I was wondering if it can be made shorter and more compact?
Following is the database scheme:
movies(movieId, movieGenre, moviePrice)
tickets(ticketId, ticketDate, customerId)
details(ticketId, movieId, numOfTickets)
Here is my query:
select m.genre, count(*)
from(select t.ticketId, m.genre
from(select d.ticketId
from(select m.genre, t.ticketId
from tickets t join details d on t.ticketId =
d.ticketId join movies m on d.movieId = m.movieId
group by m.genre, t.ticketId) d
group by d.ticketId
having count(*) = 1) as t join details d on t.ticketId =
d.ticketId join movies m on d.movieId = m.movieId
group by t.ticketId, m.genre) m
group by m.genre;
This runs on a database so I am only able to post sample output:
comedy 29821
action 27857
rom-com 19663
I see no reason to use the table tickets, because the results do not filter or aggregate by ticketDate or customerID. Thus, a shorter sql is
SELECT m.moviegenre,
Sum(d.numoftickets) as SumNum
FROM details d
LEFT JOIN movies m
ON d.movieid = m.movieid
GROUP BY m.moviegenre
HAVING SumNum > 0
ORDER BY m.moviegenre
added 3/28 am
I am not sure what is meant by Duplicates?? In table = details(ticketId, movieId, numOfTickets) ??
I would expect that ticketId is unique, so what would explain duplicates?
Is the same ticketId being printed twice, repeatedly??
Determine what number of ticketId are duplicates--
SELECT ticketId, count(*) as cnt
FROM details d
GROUP By ticketId
HAVING count(*) > 1
Determine what number of "details" rows are duplicates--
SELECT ticketId, movieId, numOfTickets, count(*) as cnt
FROM details d
GROUP By ticketId, movieId, numOfTickets
HAVING count(*) > 1
Then again, it may be that table = movies(movieId, movieGenre, moviePrice) is the one with duplicates??
Determine what number of movieId are duplicates--
SELECT movieId, count(*) as cnt
FROM movies m
GROUP BY movieId
HAVING count(*) > 1
Remove duplicates from details--
SELECT m.moviegenre,
Sum(d.numoftickets) as SumNum
FROM
(Select Distinct * From details) d
LEFT JOIN movies m
ON d.movieid = m.movieid
GROUP BY m.moviegenre
ORDER BY m.moviegenre

How to make LEFT JOIN with row having max date?

I have two tables in Oracle DB
Person (
id
)
Bill (
id,
date,
amount,
person_id
)
I need to get person and amount from last bill if exist.
I trying to do it this way
SELECT
p.id,
b.amount
FROM Person p
LEFT JOIN Bill b
ON b.person_id = p.id AND b.date = (SELECT MAX(date) FROM Bill WHERE person_id = 1)
WHERE p.id = 1;
But this query works only with INNER JOIN. In case of LEFT JOIN it throws ORA-01799 a column may not be outer-joined to a subquery
How can I get amoun from the last bill using left join?
Please try the below avoiding sub query to be outer joined
SELECT
p.id,
b.amount
FROM Person p
LEFT JOIN(select * from Bill where date =
(SELECT MAX(date) FROM Bill b1 WHERE person_id = 1)) b ON b.person_id = p.id
WHERE p.id = 1;
What you are looking for is a way to tell in bills, for each person, what is the latest record, and that one is the one to join with. One way is to use row_number:
select * from person p
left join (select b.*,
row_number() over (partition by person_id order by date desc) as seq_num
from bills b) b
on p.id = b.person_id
and seq_num = 1
You cannot have a subquery inside an ON statement.
Instead you need to convert your LEFT JOIN statement into a whole subquery.
Not tested but this should work.
SELECT
p.id,
b.amount
FROM Person p
LEFT JOIN (
SELECT id FROM Bill
WHERE person_id = p.id
AND date = (SELECT date FROM Bill WHERE person_id = 1)) b
WHERE p.id = 1;
I'm not quite sure why you would want to filter for the date though.
Simply filtering for the person_id should do the trick
you should join Person and Bill to the result for max date in bill related to person_id
select Person.id, bill.amount
from Person
left join bill on bill.person_id = person.id
left join (
select person_id, max(date) as max_date
from bill
group by person_id ) t on t.person_id = Person.id and b.date = t.max_date
Hey you can do like this
SELECT
p.id,
b.amount
FROM Person p
LEFT JOIN Bill b
ON b.person_id = p.id AND b.date = (SELECT max(date) FROM Bill WHERE person_id = 1)
WHERE p.id = 1
SELECT
p.id,
b.amount
FROM Person p
LEFT JOIN Bill b
ON b.person_id = p.id
WHERE (SELECT max(date) FROM bill AS sb WHERE sb.person_id=p.id LIMIT 1)=b.date;
SELECT
p.id,
c.amount
FROM Person p
LEFT JOIN (select b.person_id as personid,b.amount as amount from Bill b where b.date1= (select max(date1) from Bill where person_id=1)) c
ON c.personid = p.id
WHERE p.id = 1;
try this
select * from person p
left join (select MAX(id) KEEP (DENSE_RANK FIRST ORDER BY date DESC)
from bills b) b
on p.id = b.person_id
I use GREATEST() function in join condition:
SELECT
p.id,
b.amount
FROM Person p
LEFT JOIN Bill b
ON b.person_id = p.id
AND b.date = GREATEST(b.date)
WHERE p.id = 1
This allows you to grab the whole row if necessary and grab the top x rows
SELECT p.id
,b.amount
FROM person p
LEFT JOIN
(
SELECT * FROM
(
SELECT date
,ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY date DESC) AS row_num
FROM bill
)
WHERE row_num = 1
) b ON p.id = b.person_id
WHERE p.id = 1
;

How to find the second largest value within certain row?

I have two tables, movies(id, name, year, rating) and movies_genres(movie_id, genre), I want to find the id of all second top rated movies within each genre(not global), but when I wrote this
select MG.genre, M.id
from movies_genres MG inner join movies M on MG.movie_id = M.id
where M.rating =
(select max(rating) from
(select rating from movies M2 inner join movies_genres MG2 on M2.id = MG2.movie_id where MG2.genre = MG.genre)
where rating <
(select max(rating) from
(select rating from movies M3 inner join movies_genres MG3 on M3.id = MG3.movie_id where MG3.genre = MG.genre)))
order by MG.genre;
I got an error, it said that the MG.genre in Line 5 is a invalid identifier.
If you are using a database other than mysql you can use row_number window function to get the second highest rated movie for each genre.
select MG.genre, M.id
from movies_genres MG
inner join movies M on MG.movie_id = M.id
inner join (select m.*,mg.*,
row_number() over(partition by mg.genre order by m.rating desc) as rn
from movies m inner join movies_genres mg on mg.movie_id = M.id) x
on x.id = m.id and mg.genre = x.genre
where x.rn = 2;
You don't have to make a reference to the outer genre, only to the next select level like this:
Select MG.genre, M.id
From movies_genres MG, movies M
Where M.id = MG.movie_id
And M.rating = (Select max(rating)
From movies M2, movies_genres MG2
Where M2.id = MG2.movie_id
And MG2.genre = MG.genre
And rating < (Select max(rating)
From movies M3, movies_genres MG3
Where M3.id = MG3.movie_id
And MG3.genre = MG2.genre)
)
order by MG.genre;
Edited: oopsie, I've missed one reference to genre: And MG2.genre = MG.genre. Just added.
How does it work? The most inner select returns the max rating of the ask genre MG2. The middle select returns the max rating of asked genre MG, which is lower than the max of most inner select. The outer select list all movie ids (and genre) with this exact rating.

SQL: How to save order in sql query?

I have PostgreSQL database and I try to print all my users (Person).
When I execute this query
-- show owners
-- sorted by maximum cars amount
SELECT p.id
FROM car c JOIN person p ON c.person_id = p.id
GROUP BY p.id
ORDER BY COUNT(p.name) ASC;
I get all owners sorted by cars amount
Output: 3 2 4 1
And all order goes wrong when I try to link owner id.
SELECT *
FROM person p
WHERE p.id IN (
SELECT p.id
FROM car c JOIN person p ON c.person_id = p.id
GROUP BY p.id
ORDER BY COUNT(p.name) ASC);
Output: 1 2 3 4 and other data
You see than order is wrong. So here is my question how can I save that order?
Instead Of subquery use join. Try this.
SELECT p.*
FROM person p
JOIN (SELECT p.id,
Count(p.NAME)cnt
FROM car c
JOIN person p
ON c.person_id = p.id
GROUP BY p.id) b
ON p.id = b.id
ORDER BY cnt ASC
Untangle the mess. Aggregate first, join later:
SELECT p.*
FROM person p
JOIN (
SELECT person_id, count(*) AS ct
FROM car
GROUP BY person_id
) c ON c.person_id = p.id
ORDER BY c.cnt;
No need to join to person twice. This should be fastest if you count most or all rows.
For a small selection, correlated subqueries are faster:
SELECT p.*
FROM person p
ORDER BY (SELECT count(*) FROM car c WHERE c.person_id = p.id)
WHERE p.id BETWEEN 10 AND 20; -- some very selective predicate
As for your original: IN takes a set on the right hand, order of elements is ignored, so ORDER BY is pointless in the subuery.