Using where clause with not-equal condition after join - sql

I am trying to use WHERE with not-equal condition after joining two tables but it does not work.
Example: I have a table with data on famous people and a separate table with their works. Some works can have several authors. So I want a table listing authors with their co-authors:
CREATE TABLE famous_people (id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
profession TEXT,
birth_year INTEGER);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("Landau", "physicist", 1908);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("Lifshitz", "physicist", 1908);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("Fisher", "statistician", 1908);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("Ginzburg", "physicist", 1916);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("A. Strugatsky", "writer", 1925);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("B. Strugatsky", "writer", 1933);
CREATE TABLE works (id INTEGER PRIMARY KEY AUTOINCREMENT,
person_id INTEGER,
work TEXT);
INSERT INTO works (person_id, work)
VALUES (1, "Theoretical Physics");
INSERT INTO works (person_id, work)
VALUES (2, "Theoretical Physics");
INSERT INTO works (person_id, work)
VALUES (1, "Theory of Superconductivity");
INSERT INTO works (person_id, work)
VALUES (4, "Theory of Superconductivity");
INSERT INTO works (person_id, work)
VALUES (3, "Fisher test");
INSERT INTO works (person_id, work)
VALUES (5, "Roadside Picnic");
INSERT INTO works (person_id, work)
VALUES (6, "Roadside Picnic");
INSERT INTO works (person_id, work)
VALUES (5, "Hard to Be a God");
INSERT INTO works (person_id, work)
VALUES (6, "Hard to Be a God");
/* Co-authors */
SELECT a.name AS author, b.name AS coauthor FROM works
JOIN famous_people a
ON works.person_id = a.id
JOIN famous_people b
ON works.person_id = b.id;
It is Ok, except each author also has themselves as their own co-author, so I am trying to filter it out by adding WHERE author <> coauthor as the last line. But what I get is a table with two columns: work and name. Same weird result with WHERE a.name <> b.name
Funny enough, WHERE author = coauthor works fine but this is not what I want.
Expected result: a table with 2 columns:
author co-author
Landau Lipshitz
A. Strugatsky B. Strugatsky
Fisher NULL

Find all works that have two authors (using inner join on same work but different authors) and find all works that have one author (using not exists). Then combine the results:
SELECT w1.work, p1.name AS author, p2.name AS coauthor
FROM works AS w1
JOIN works AS w2 ON w1.work = w2.work AND w1.person_id < w2.person_id
JOIN famous_people AS p1 ON w1.person_id = p1.id
JOIN famous_people AS p2 ON w2.person_id = p2.id
UNION ALL
SELECT w1.work, p1.name, null
FROM works AS w1
JOIN famous_people AS p1 ON w1.person_id = p1.id
WHERE NOT EXISTS (
SELECT 1
FROM works AS w2
WHERE w2.work = w1.work AND w2.person_id <> w1.person_id
)
Demo on DB<>Fiddle

Your query cannot work. Keep in mind that a join works on rows. So there is one works row with one person ID that you look at at a time in your where clause. Then you join the person to the works row and then you join the person to the works row. That is the same person twice of course, because one works row only refers to one person.
This shows another, minor, problem. You call this table works. I would consider "Theoretical Physics" a work. You do so too; you named the column work. But then, why is the same work twice in the works table? This must not be. A works table shall store works, i.e. one work per row. What you have is a work_author table actually, and a work is uniquely identified by its title. This kind of makes sense; a title may uniquely identify a work - as long as no other author happens to name their work "Theoretical Physics", too :-( And as long as there are no typos in the table either.
This would be a better model:
person (person_id, name, birth_year, ...)
work (work_id, title, year, ...)
work_author (work_id, person_id)
If you have a typo in a title in this model, there is one row where you correct it and the data stays intact.
Now you want to get the authors of a work. This is easily done with aggregation:
select w.*, group_concat(p.name) as authors
from work_author wa
join person p on p.person_id = wa.person_id
join work w on w.work_id = wa.work_id
group by w.work_id
order by w.work_id;
You forgot to tell us your DBMS. As you are using double quotes where it must be single quotes according to the SQL standard, and your DBMS doesn't complain, this may be MySQL. (You should still always use single quotes for string literals.) For MySQL the string aggregation function is GROUP_CONCAT, so guessing MySQL, I used that in my query. Other DBMS use STRING_AGG, LISTAGG or something else.
If you just want to show up to two authors per work, you can take the minimum and maximum name (and compare the two in order not to show the same author twice):
select
w.*,
min(p.name) as author1,
case when min(p.name) <> max(p.name) then max(p.name) end as author2
from ...
UPDATE
In the comments you say that for every author you want to know all authors who worked with them. For this you need to join authors to authors based on their works. Still assuming MySQL:
select p1.name, group_concat(distinct p2.name) as others
from work_author wa1
join work_author wa2 on wa2.work_id = wa1.work_id
and wa2.person_id <> wa1.person_id
join person p1 on p1.person_id = wa1.person_id
join person p2 on p2.person_id = wa2.person_id
group by p1.name
order by p1.name;
Or not aggregated:
select distinct p1.name as person1, p2.name as person2
from work_author wa1
join work_author wa2 on wa2.work_id = wa1.work_id
and wa2.person_id <> wa1.person_id
join person p1 on p1.person_id = wa1.person_id
join person p2 on p2.person_id = wa2.person_id
order by p1.name, p2.name;

I changed the model as proposed by Thorsten Kettner and solved the task of matching authors with their co-authors as follows:
CREATE TABLE famous_people (id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
profession TEXT,
birth_year INTEGER);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("Landau", "physicist", 1908);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("Lifshitz", "physicist", 1908);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("Fisher", "statistician", 1908);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("Ginzburg", "physicist", 1916);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("A. Strugatsky", "writer", 1925);
INSERT INTO famous_people (name, profession, birth_year)
VALUES ("B. Strugatsky", "writer", 1933);
CREATE TABLE works (id INTEGER PRIMARY KEY AUTOINCREMENT,
work TEXT,
subject TEXT);
INSERT INTO works (work, subject)
VALUES ("Theoretical Physics", "physics");
INSERT INTO works (work, subject)
VALUES ("Theory of Superconductivity", "physics");
INSERT INTO works (work, subject)
VALUES ("Fisher test", "statistics");
INSERT INTO works (work, subject)
VALUES ("Roadside Picnic", "scifi");
INSERT INTO works (work, subject)
VALUES ("Hard to Be a God", "scifi");
CREATE TABLE author_works (id INTEGER PRIMARY KEY AUTOINCREMENT,
work_id INTEGER,
author_id INTEGER);
INSERT INTO author_works (work_id, author_id) VALUES (1, 1);
INSERT INTO author_works (work_id, author_id) VALUES (1, 2);
INSERT INTO author_works (work_id, author_id) VALUES (2, 1);
INSERT INTO author_works (work_id, author_id) VALUES (2, 4);
INSERT INTO author_works (work_id, author_id) VALUES (3, 3);
INSERT INTO author_works (work_id, author_id) VALUES (4, 5);
INSERT INTO author_works (work_id, author_id) VALUES (4, 6);
INSERT INTO author_works (work_id, author_id) VALUES (5, 5);
INSERT INTO author_works (work_id, author_id) VALUES (5, 6);
/* List of authors and their works */
SELECT famous_people.name, works.work FROM author_works
JOIN famous_people
ON author_works.author_id = famous_people.id
JOIN works
ON works.id = author_works.work_id;
/* Authors and co-authors ids*/
SELECT DISTINCT a.name, b.name
FROM author_works aw1
JOIN author_works aw2
ON aw1.work_id = aw2.work_id
JOIN famous_people a
ON aw1.author_id = a.id
JOIN famous_people b
ON aw2.author_id = b.id
WHERE aw1.author_id <> aw2.author_id;

Related

How to join tables together via Ids using SQLite?

I am having trouble joining parts of tables. I want first and last names of the people and whatever their interest is to be joined together. I get this error message: "[1] [SQLITE_ERROR] SQL error or missing database (ambiguous column name: pi.PersonID)"
CREATE TABLE people (
PersonID INTEGER PRIMARY KEY AUTOINCREMENT,
FirstName VARCHAR(100),
LastName VARCHAR(100)
);
INSERT INTO people (FirstName, LastName)
VALUES ('Walter', 'White'),
('Jesse', 'Pinkman'),
('Saul', 'Goodman');
SELECT * FROM people;
CREATE TABLE interests (
InterestID INTEGER PRIMARY KEY AUTOINCREMENT,
Interest VARCHAR(100)
);
INSERT INTO interests (Interest)
values ('Swimming'),
('Basketball'),
('Running');
SELECT * FROM interests;
CREATE TABLE persons_interests (
PersonID INTEGER,
InterestID INTEGER,
PRIMARY KEY (PersonID, InterestID),
FOREIGN KEY (PersonID) REFERENCES people,
FOREIGN KEY (InterestID) REFERENCES interests
);
DROP TABLE persons_interests;
INSERT INTO persons_interests (PersonID, InterestID)
VALUES (1, 3),
(2, 2),
(3, 3);
SELECT * FROM persons_interests;
SELECT FirstName, LastName, Interest FROM people p, interests i
JOIN persons_interests pi on p.PersonID = pi.PersonID
JOIN persons_interests pi on i.Interest = pi.InterestID;
Don't mix implicit an explicit joins! You seem to want:
select p.firstname, p.lastname, i.interest
from people p
inner join persons_interests pi on pi.personid = p.personid
inner join interests i on i.interestid = pi.interestid;
Here, each table appears just once in the from clause, with the relevant join conditions.

SQL Query: Matching highest review with the review author - Movie database

I'm working on a SQL homework problem:
"For each movie that has at least one rating, find the movie title and total number of stars, the highest star and the person who gave highest star."
Database:
create table Movies(mID integer, title varchar(100));
create table Reviewers(rID integer, name varchar(100));
create table Ratings(rID integer, mID integer, stars integer);
insert into Movies values(101, 'Gone with the Wind');
insert into Movies values(102, 'Star Wars');
insert into Movies values(103, 'The Sound of Music');
insert into Reviewers values(201, 'Sarah Martinez');
insert into Reviewers values(202, 'Daniel Lewis');
insert into Reviewers values(203, 'Brittany Harris');
insert into Ratings values(201, 101, 2);
insert into Ratings values(203, 101, 4);
insert into Ratings values(203, 102, 4);
insert into Ratings values(203, 103, 4);
insert into Ratings values(202, 103, 2);
Best query I can come up with is:
SELECT title,
SUM(stars) AS total_stars,
MAX(stars) AS highest_stars,
name AS highest_stars_reviewer
FROM Movies
INNER JOIN Ratings USING(mID)
INNER JOIN Reviewers USING(rID)
GROUP BY mID;
The problem is that instead of returning the name of the reviewer who gave the highest stars review, the query returns the reviewer with the lower rID who reviewed the movie.
I would appreciate any help with this query to get the desired result.
The proper way to do this in SQL uses window functions:
SELECT m.title,
SUM(r.stars) AS total_stars,
MAX(r.stars) AS highest_stars,
MAX(CASE WHEN r.seqnum = 1 THEN rv.name END) AS highest_stars_reviewer
FROM Movies m INNER JOIN
(SELECT r.*,
ROW_NUMBER() OVER (PARTITION BY m.id ORDER BY r.stars DESC) as seqnum
FROM Ratings r
) r
USING (mID) INNER JOIN
Reviewers rv
USING (rID)
GROUP BY m.title, m.mID;
Notes:
If you are using multiple tables in a query, you should be qualifying all column names.
The GROUP BY columns should match the SELECT columns -- although your version is okay because SQL allows you to do this when the aggregation key is a unique key.
This returns an arbitrary reviewer with the highest stars, in the event that there is more than one review with the maximum.

how to do get multiple columns + count in a single query?

I usually don't ask for "scripts" but for mechanisms but I think that in this case if i'll see an example I would understand the principal.
I have three tables as shown below:
and I want to get the columns from all three, plus a count of the number of episodes in each series and to get a result like this:
Currently, I am opening multiple DB threads and I am afraid that as I get more visitors on my site it will eventually respond really slowly.
Any ideas?
Thanks a lot!
First join all the tables together to get the columns. Then, to get a count, use a window function:
SELECT count(*) over (partition by seriesID) as NumEpisodesInSeries,
st.SeriesId, st.SeriesName, et.episodeID, et.episodeName,
ct.createdID, ct.CreatorName
FROM series_table st join
episode_table et
ON et.ofSeries = st.seriesID join
creator_table ct
ON ct.creatorID = st.byCreator;
Do your appropriate joins between the tables and their IDs as you would expect, and also join onto the result of a subquery that determines the total episode count using the Episodes table.
SELECT SeriesCount.NumEpisodes AS #OfEpisodesInSeries,
S.id AS SeriesId,
S.name AS SeriesName,
E.id AS EpisodeId,
E.name AS EpisodeName,
C.id AS CreatorId,
C.name AS CreatorName
FROM
Series S
INNER JOIN
Episodes E
ON E.seriesId = S.id
INNER JOIN
Creators C
ON S.creatorId = C.id
INNER JOIN
(
SELECT seriesId, COUNT(id) AS NumEpisodes
FROM Episodes
GROUP BY seriesId
) SeriesCount
ON SeriesCount.seriesId = S.id
SQL Fiddle Schema:
CREATE TABLE Series (id int, name varchar(20), creatorId int)
INSERT INTO Series VALUES(1, 'Friends', 1)
INSERT INTO Series VALUES(2, 'Family Guy', 2)
INSERT INTO Series VALUES(3, 'The Tonight Show', 1)
CREATE TABLE Episodes (id int, name varchar(20), seriesId int)
INSERT INTO Episodes VALUES(1, 'Joey', 1)
INSERT INTO Episodes VALUES(2, 'Ross', 1)
INSERT INTO Episodes VALUES(3, 'Phoebe', 1)
INSERT INTO Episodes VALUES(4, 'Stewie', 2)
INSERT INTO Episodes VALUES(5, 'Kevin Kostner', 3)
INSERT INTO Episodes VALUES(6, 'Brad Pitt', 3)
INSERT INTO Episodes VALUES(7, 'Tom Hanks', 3)
INSERT INTO Episodes VALUES(8, 'Morgan Freeman', 3)
CREATE TABLE Creators (id int, name varchar(20))
INSERT INTO Creators VALUES(1, 'Some Guy')
INSERT INTO Creators VALUES(2, 'Seth McFarlane')
Try this:
http://www.sqlfiddle.com/#!3/5f938/17
select min(ec.num) as NumEpisodes,s.Id,S.Name,
Ep.ID as EpisodeID,Ep.name as EpisodeName,
C.ID as CreatorID,C.Name as CreatorName
from Episodes ep
join Series s on s.Id=ep.SeriesID
join Creators c on c.Id=s.CreatorID
join (select seriesId,count(*) as Num from Episodes
group by seriesId) ec on s.id=ec.seriesID
group by s.Id,S.Name,Ep.ID,Ep.name,C.ID,C.Name
Thanks Gordon
I would do the following:
SELECT (SELECT Count(*)
FROM episodetbl e1
WHERE e1.ofseries = s.seriesid) AS "#ofEpisodesInSeries",
s.seriesid,
s.seriesname,
e.episodeid,
e.episodename,
c.creatorid,
c.creatorname
FROM seriestbl s
INNER JOIN creatortbl c
ON s.bycreator = c.creatorid
INNER JOIN episodetbl e
ON e.ofseries = s.seriesid

How to solve this unique SQL query?

I have the following tables:
Student(Sid,Sname) Primary key: {sid}
Course(cid,cname,duration,fee) Primary key:{cid}
Enrolled(sid,cid) Foreighn key: {sid,cid}
Query: Find the maximum fees paid by each student where a student can
enroll in different courses.
My attempt:
SELECT ssid, max(fee) as MAX_FEES from (Select sid as ssid, C.cid asccid,
fee from Course C,Enrolled E where C.cid = E.cid) group by
rollup(ssid,ccid,fee)
However, this doesn't gives the desired output appropriately. How to output only the Highest fees paid by each student?
try
SELECT max(c.fee) from course c, student s, enrolled e where s.sid=e.sid and e.cid=c.cid group by e.sid;
You didn't say if you also needed to list the students who are not enrolled in any course, so I'll provide one more solution:
CREATE TABLE student (
sid NUMBER PRIMARY KEY,
sname VARCHAR2(40)
);
CREATE TABLE course (
cid NUMBER PRIMARY KEY,
cname VARCHAR2(40),
duration NUMBER,
fee NUMBER
);
CREATE TABLE enrolled (
sid NUMBER,
cid NUMBER,
PRIMARY KEY (sid, cid),
FOREIGN KEY (sid) REFERENCES student (sid),
FOREIGN KEY (cid) REFERENCES course (cid)
);
INSERT INTO student (sid, sname) VALUES (1, 'John');
INSERT INTO student (sid, sname) VALUES (2, 'Peter');
INSERT INTO student (sid, sname) VALUES (3, 'Jake');
INSERT INTO course (cid, cname, duration, fee) VALUES (1, 'Math', 1, 1000);
INSERT INTO course (cid, cname, duration, fee) VALUES (2, 'Physics', 1, 1500);
INSERT INTO enrolled (sid, cid) VALUES (1, 1); -- John taking Math
INSERT INTO enrolled (sid, cid) VALUES (1, 2); -- John taking Physics
-- Peter being lazy
INSERT INTO enrolled (sid, cid) VALUES (3, 1); -- Jake taking Math
COMMIT;
-- not taking lazy (not taking any courses) students under account
SELECT s.sid, MAX(c.fee)
FROM student s
JOIN enrolled e ON (e.sid = s.sid)
JOIN course c ON (e.cid = c.cid)
GROUP BY s.sid
;
-- all students
SELECT s.sid, NVL(MAX(c.fee), 0)
FROM student s
LEFT JOIN enrolled e ON (e.sid = s.sid)
LEFT JOIN course c ON (e.cid = c.cid)
GROUP BY s.sid
;

Cannot use Max with Count in SQL*Plus

this is my sql statement i get this error. but when i use only Max to single and without displaying other results it works. can someone help me
SELECT cat.CategoryName,sb.SubCategoryName,MAX((COUNT(bs.BookID)))
FROM
Category cat,SubCategory sb, Book_Subcategory bs
WHERE cat.CategoryID = sb.CategoryID AND sb.SubCategoryID = bs.SubCategoryID
GROUP BY cat.CategoryName, sb.SubCategoryName, bs.BookID;
ERROR at line 1:
ORA-00937: not a single-group group function
Can someone help me?
SQL does not allow aggregates of aggregates directly.
However, if you write the inner aggregate in a sub-query in the FROM clause (or use a WITH clause and a Common Table Expression, CTE), you can achieve the result:
SELECT gc1.CategoryName, gc1.SubCategoryName, gc1.BookCount
FROM (SELECT cat.CategoryName, sb.SubCategoryName,
COUNT(bs.BookID) AS BookCount
FROM Category AS cat
JOIN SubCategory AS sb ON cat.CategoryID = sb.CategoryID
JOIN Book_Subcategory AS bs ON sb.SubCategoryID = bs.SubCategoryID
GROUP BY cat.CategoryName, sb.SubCategoryName
) AS gc1
WHERE gc1.BookCount = (SELECT MAX(gc2.BookCount)
FROM (SELECT cat.CategoryName, sb.SubCategoryName,
COUNT(bs.BookID) AS BookCount
FROM Category AS cat
JOIN SubCategory AS sb
ON cat.CategoryID = sb.CategoryID
JOIN Book_Subcategory AS bs
ON sb.SubCategoryID = bs.SubCategoryID
GROUP BY cat.CategoryName, sb.SubCategoryName
) AS gc2
)
This is complex because it doesn't use a CTE, and there is a common table expression that must be written out twice.
Using the CTE form (possibly with syntax errors):
WITH gc1 AS (SELECT cat.CategoryName, sb.SubCategoryName,
COUNT(bs.BookID) AS BookCount
FROM Category AS cat
JOIN SubCategory AS sb
ON cat.CategoryID = sb.CategoryID
JOIN Book_Subcategory AS bs
ON sb.SubCategoryID = bs.SubCategoryID
GROUP BY cat.CategoryName, sb.SubCategoryName
)
SELECT gc1.CategoryName, gc1.SubCategoryName, gc1.BookCount
FROM gc1
WHERE gc1.BookCount = SELECT MAX(gc1.BookCount) FROM gc1);
Much tidier!
You can simulate a CTE with a temporary table if your DBMS makes it easy to create them. For example, IBM Informix Dynamic Server could use:
SELECT cat.CategoryName, sb.SubCategoryName,
COUNT(bs.BookID) AS BookCount
FROM Category AS cat
JOIN SubCategory AS sb ON cat.CategoryID = sb.CategoryID
JOIN Book_Subcategory AS bs ON sb.SubCategoryID = bs.SubCategoryID
GROUP BY cat.CategoryName, sb.SubCategoryName
INTO TEMP gc1;
SELECT gc1.CategoryName, gc1.SubCategoryName, gc1.BookCount
FROM gc1
WHERE gc1.BookCount = (SELECT MAX(gc1.BookCount) FROM gc1);
DROP TABLE gc1; -- Optional: table will be deleted at end of session anyway
Given the following tables and data, the main query (copied and pasted from this answer) gave the result I expected when run against IBM Informix Dynamic Server 11.50.FC6 on MacOS X 10.6.4, namely:
Non-Fiction SQL Theory 4
Fiction War 4
That doesn't prove that it 'must work' when run against Oracle - I don't have Oracle and can't demonstrate either way. It does show that there is at least one SQL DBMS that handles the query without problems. (Since IDS does not support the WITH clause and CTEs, I can't show whether that formulation works.)
Schema
CREATE TABLE Category
(
CategoryID INTEGER NOT NULL PRIMARY KEY,
CategoryName VARCHAR(20) NOT NULL
);
CREATE TABLE SubCategory
(
CategoryID INTEGER NOT NULL REFERENCES Category,
SubCategoryID INTEGER NOT NULL PRIMARY KEY,
SubCategoryName VARCHAR(20) NOT NULL
);
CREATE TABLE Book_SubCategory
(
SubCategoryID INTEGER NOT NULL REFERENCES SubCategory,
BookID INTEGER NOT NULL PRIMARY KEY
);
Data
INSERT INTO Category VALUES(1, 'Fiction');
INSERT INTO Category VALUES(2, 'Non-Fiction');
INSERT INTO SubCategory VALUES(2, 1, 'SQL Theory');
INSERT INTO SubCategory VALUES(2, 2, 'Mathematics');
INSERT INTO SubCategory VALUES(1, 3, 'Romance');
INSERT INTO SubCategory VALUES(1, 4, 'War');
INSERT INTO Book_SubCategory VALUES(1, 10);
INSERT INTO Book_SubCategory VALUES(2, 11);
INSERT INTO Book_SubCategory VALUES(3, 12);
INSERT INTO Book_SubCategory VALUES(3, 13);
INSERT INTO Book_SubCategory VALUES(4, 14);
INSERT INTO Book_SubCategory VALUES(1, 15);
INSERT INTO Book_SubCategory VALUES(1, 16);
INSERT INTO Book_SubCategory VALUES(2, 17);
INSERT INTO Book_SubCategory VALUES(1, 18);
INSERT INTO Book_SubCategory VALUES(3, 19);
INSERT INTO Book_SubCategory VALUES(4, 20);
INSERT INTO Book_SubCategory VALUES(4, 21);
INSERT INTO Book_SubCategory VALUES(4, 22);
I think the error is in the GROUP BY clause (bs.BookID does not belong there):
SELECT cat.CategoryName,sb.SubCategoryName,MAX(COUNT(bs.BookID))
FROM Category cat,SubCategory sb, Book_Subcategory bs
WHERE cat.CategoryID =sb.CategoryID AND sb.SubCategoryID=bs.SubCategoryID
GROUP BY cat.CategoryName,sb.SubCategoryName;
BTW, spaces (and punctuation) are your friends. Don't be lazy about it.