I want to create a view, displaying book titles and number of reviews made to the specific book.
What is the options when the values are not compatible?
Relevant columns in the Books table:
ISBN13 PK bigint
Title nvarchar(50)
Language nvarchar(30)
Author Id FK int
Category ID FK int
Sample data Books:
INSERT INTO Books VALUES (9783852913735, 'Ulysses', 'English', 100, 'January 06, 2002', 1, null);
INSERT INTO Books VALUES (9780195038637, 'Battle Cry of Freedom', 'English', 490, 'February 25, 1988', 99, null);
INSERT INTO Books VALUES (9789178615155, 'Surhörningen', 'Swedish', 195, '2019', 4, null);
INSERT INTO Books VALUES (9789178614577, 'Jag älskar regnbågsenhörningar', 'Swedish', 190, '2021', 2, null);
Relevant columns in the Reviews table:
ReviewId PK int
BookId FK bigint -- FK to ISBN13
CategoryID FK
WriterId FK
Date
Sample data Reviews:
insert into Reviews values(0020, '9783852913735', '120', 11, '2001-02-21');
insert into Reviews values(0021, '9789177836599', '140', 4, '2001-10-19');
insert into Reviews values(0022, '9789178130979', '110', 1, '2002-02-22');
insert into Reviews values(0023, '9789178130979', '90', 8, '2003-09-06');
insert into Reviews values(0024, '9789178614677', '50', 2, '2005-08-29');
insert into Reviews values(0025, '9789178615155', '10', 5, '2004-08-25');
insert into Reviews values(0026, '971019503872', '10', 9, '2009-06-11');
insert into Reviews values(0027, '9780195038637', '20', 2, '2010-11-10');
Sample data Categories:
insert into Categories (CategoryId, Name) values(10, 'Architecture');
insert into Categories values(20, 'Art');
insert into Categories values(30, 'Astrology');
insert into Categories values(40, 'Baking');
insert into Categories values(50, 'Business Management');
insert into Categories values(60, 'Biology');
insert into Categories values(70, 'Comics');
insert into Categories values(80, 'Computational Science');
SELECT Books.Title, Books.[Author Id]
FROM Books
INNER JOIN Reviews ON Reviews.BookId=Books.ISBN13;
Below is my code for the reviews part, as I want to show the number of reviews per book:
SELECT
BookId,
COUNT
(BookId) [Reviews]
FROM
Reviews
GROUP BY BookId
HAVING COUNT
(BookId)> 1
So expected results would be:
Title | Author | BookId | Category | Number of Reviews
Have a look in to this query. I created the view and since the category has no values compatible with the books table I used a Left join to retrieve the records which has values in both books and reviews. Feel free to comment on the answer and let me know any other additions or alterations if required. I am happy to assist with. Thanks for posting Insert scripts and table definitions which gave me fast implementation and testing capability.
CREATE view My_View AS
(
SELECT
[B].[ISBN13] AS [BookId]
,[B].[Title]
,[B].[AuthorId] AS [Author]
,[C].[Name] As [Category]
, COUNT([R].[ReviewId]) OVER ( PARTITION BY [B].[Title]) AS [Number of reviews]
FROM Reviews [R]
INNER JOIN Books [B]
ON [R].[BookId] = [B].[ISBN13]
LEFT JOIN Categories [C]
ON [B].[CategoryId] = [C].[CategoryId]
)
SELECT * FROM My_View
Assuming from your sample query you are after just a count of reviews, you would have something like this (guessing obviously for the other tables you need to join with). Several ways to correlate but a simple count only requires an inline correlated subquery:
create view MyView as
select
b.Title,
a.Name Author,
b.ISBN13 BookId,
c.Name Category,
(select Count(*) from Reviews r where r.BookId=b.ISBN13) Reviews
from Books b
join Categories c on c.Id=b.CategoryId
join Authors a on a.Id=b.AuthorId
Using a subset of the data you added, this query works fine
Title BookId Reviews
------------------------------ --------------- -----------
Ulysses 9783852913735 1
Battle Cry of Freedom 9780195038637 1
Surhörningen 9789178615155 1
Jag älskar regnbågsenhörningar 9789178614577 0
Related
I am trying to extract distinct items from a Postgres database pairing a column from a table with a column from another table based on a condition. Simplified version looks like this:
CREATE TABLE users
(
id SERIAL PRIMARY KEY,
name VARCHAR(255)
);
CREATE TABLE photos
(
id INT PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
flag VARCHAR(255)
);
INSERT INTO users VALUES (1, 'Bob');
INSERT INTO users VALUES (2, 'Alice');
INSERT INTO users VALUES (3, 'John');
INSERT INTO photos VALUES (1001, 1, 'a');
INSERT INTO photos VALUES (1002, 1, 'b');
INSERT INTO photos VALUES (1003, 1, 'c');
INSERT INTO photos VALUES (1004, 2, 'a');
INSERT INTO photos VALUES (1004, 2, 'x');
What I need is to extract each user name, only once, and a flag value for each of them. The flag value should prioritize a specific one, let's say b. So, the result should look like:
Bob b
Alice a
Where Bob owns a photo having the b flag, while Alice does not and John has no photos. For Alice the output for the flag value is not important (a or x would be just as good) as long as she owns no photo flagged b.
The closest thing I found were some self-join queries where the flag value would have been aggregated using min() or max(), but I am looking for a particular value, which is not first, nor last. Moreover, I found out that you can define your own aggregate functions, but I wonder if there is an easier way of conditioning the query in order to obtain the required data.
Thank you!
Here is a method with aggregation:
select u.name,
coalesce(max(flag) filter (where flag = 'b'),
min(flag)
) as flag
from users u left join
photos p
on u.id = p.user_id
group by u.id, u.name;
That said, a more typical method would be a prioritization query. Perhaps:
select distinct on (u.id) u.name, p.flag
from users u left join
photos p
on u.id = p.user_id
order by u.id, (p.flag = 'b') desc;
The prompt is to form a SQL query.
That finds the students name and ID who attend all lectures having ects more than 4.
The tables are
CREATE TABLE CLASS (
STUDENT_ID INT NOT NULL,
LECTURE_ID INT NOT NULL
);
CREATE TABLE STUDENT (
STUDENT_ID INT NOT NULL,
STUDENT_NAME VARCHAR(255),
PRIMARY KEY (STUDENT_ID)
)
CREATE TABLE LECTURE (
LECTURE_ID INT NOT NULL,
LECTURE_NAME VARCHAR(255),
ECTS INT,
PRIMARY KEY (LECTURE_ID)
)
I came up with this query but this didn't seem to work on SQLFIDDLE. I'm new to SQL and this query has been a little troublesome for me. How would you query this?
SELECT STUD.STUDENT_NAME FROM STUDENT STUD
INNER JOIN CLASS CLS AND LECTURE LEC ON
CLS.STUDENT_ID = STUD.STUDENT_ID
WHERE LEC.CTS > 4
How do I fix this query?
UPDATE
insert into STUDENT values(1, 'wick', 20);
insert into STUDENT values(2, 'Drake', 25);
insert into STUDENT values(3, 'Bake', 42);
insert into STUDENT values(4, 'Man', 5);
insert into LECTURE values(1, 'Math', 6);
insert into LECTURE values(2, 'Prog', 6);
insert into LECTURE values(3, 'Physics', 1);
insert into LECTURE values(4, '4ects', 4);
insert into LECTURE values(5, 'subj', 4);
insert into SCLASS values(1, 3);
insert into SCLASS values(1, 2);
insert into SCLASS values(2, 3);
insert into SCLASS values(3, 1);
insert into SCLASS values(3, 2);
insert into SCLASS values(3, 3);
insert into SCLASS values(4, 4);
insert into SCLASS values(4, 5);
The following approach might get the job done.
It works by generating two subqueries :
one that counts how many lectures whose ects is greater than 4 were taken by each user
another that just counts the total number of lectures whose ects is greater than 4
Then, the outer query filters in users whose count reaches the total :
SELECT x.student_id, x.student_name
FROM
(
SELECT s.student_id, s.student_name, COUNT(DISTINCT l.lecture_id) cnt
FROM
student s
INNER JOIN class c ON c.student_id = s.student_id
INNER JOIN lecture l ON l.lecture_id = c.lecture_id
WHERE l.ects > 4
GROUP BY s.student_id, s.student_name
) x
CROSS JOIN (SELECT COUNT(*) cnt FROM lecture WHERE ects > 4 ) y
WHERE x.cnt = y.cnt ;
As GMB already said in their answer: count required lections and compare with those taken per student. Here is another way to write such query. We outer join classes to all lectures with ECTS > 4. Analytic window functions allow us to aggregate by two different groups at the same time (here: all rows and student's rows).
select *
from student
where (student_id, 0) in -- 0 means no gap between required and taken lectures
(
select
student_id,
count(distinct lecture_id) over () -
count(distinct lecture_id) over (partition by c.student_id) as gap
from lecture l
left join class c using (lecture_id)
where l.ects > 4
);
Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=74371314913565243863c225847eb044
You can try the following query.
SELECT distinct
STUD.STUDENT_NAME,
STUD.STUDENT_ID
FROM STUDENT STUD
INNER JOIN CLASS CLS ON CLS.STUDENT_ID = STUD.STUDENT_ID
INNER JOIN LECTURE LEC ON LEC.LECTURE_ID=CLS.LECTURE_ID
where LEC.ECTS > 4 group by STUD.STUDENT_ID,STUD.STUDENT_NAME
having COUNT(STUD.STUDENT_ID) =(SELECT COUNT(*) FROM LECTURE WHERE ECTS > 4)
As per the question title I would like to understand which SQL would be efficient and why based on the below solutions assuming the data volumes are high in both the tables PRODUCTS( millions of rows ) AND SALES( billions of rows ). Here are the schema details as below. I am not interested in getting the solution for the question but I would like to get better learning on the optimal solution.
CREATE TABLE PRODUCTS
(
PRODUCT_ID INTEGER,
PRODUCT_NAME VARCHAR2(30)
);
CREATE TABLE SALES
(
SALE_ID INTEGER,
PRODUCT_ID INTEGER,
YEAR INTEGER,
Quantity INTEGER,
PRICE INTEGER
);
INSERT INTO PRODUCTS VALUES ( 100, 'Nokia');
INSERT INTO PRODUCTS VALUES ( 200, 'IPhone');
INSERT INTO PRODUCTS VALUES ( 300, 'Samsung');
INSERT INTO PRODUCTS VALUES ( 400, 'LG');
INSERT INTO SALES VALUES ( 1, 100, 2010, 25, 5000);
INSERT INTO SALES VALUES ( 2, 100, 2011, 16, 5000);
INSERT INTO SALES VALUES ( 3, 100, 2012, 8, 5000);
INSERT INTO SALES VALUES ( 4, 200, 2010, 10, 9000);
INSERT INTO SALES VALUES ( 5, 200, 2011, 15, 9000);
INSERT INTO SALES VALUES ( 6, 200, 2012, 20, 9000);
INSERT INTO SALES VALUES ( 7, 300, 2010, 20, 7000);
INSERT INTO SALES VALUES ( 8, 300, 2011, 18, 7000);
INSERT INTO SALES VALUES ( 9, 300, 2012, 20, 7000);
COMMIT;
--Solution 1
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
LEFT OUTER JOIN
SALES S
ON (P.PRODUCT_ID = S.PRODUCT_ID);
WHERE S.QUANTITY IS NULL
--Solution 2
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
WHERE P.PRODUCT_ID NOT IN
(SELECT DISTINCT PRODUCT_ID FROM SALES);
--Solution 3
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
WHERE NOT EXISTS
(SELECT 1 FROM SALES S WHERE S.PRODUCT_ID = P.PRODUCT_ID);
This is too long for a comment.
Your three queries are different semantically, as written with the provided table definitions.
The first query is different from the other two, logically. Unless quantity is declared as NOT NULL, then it can return matching records where the quantity is NULL.
The second query uses a redundant select distinct. I am guessing that all decent databases would optimize it away. The normal syntax is:
SELECT P.PRODUCT_NAME
FROM PRODUCTS P
WHERE P.PRODUCT_ID NOT IN (SELECT PRODUCT_ID FROM SALES);
And, this is not equivalent to the third. If SALES.PRODUCT_ID is ever NULL, then this query will return no rows at all.
Semantically, I prefer the third version with NOT EXISTS. However, depending on the database one of the first two (fixed?) versions might be an iota faster. As with any performance question, the answer depends on your data, data structure, database engine, and even potentially your hardware.
I've provided a summary of my tables and I've gotten a good start on the SQL, but I'm stuck on figuring out how to limit how many items are returned. I should be able to choose one or more terms, and get back the balance due from just those terms.
A student should have 1 record, they can have several reservations over several terms, but payment is not specific to the reservation but rather to the student. That is the part that is throwing me off.
Table structure, date and the start of my SQL follow. Can someone help me with it please? This result should not be showing the $500 payment from term 3 for Sue Smith.
I'm using PostgreSQL, but I think this is a pretty basic question that doesn't require anything specific to Postgres.
Current result set:
Student ID Last First Total Fees Reservation Count Amount Paid Amount Due
123456 Jones Amy 50 1 50 0
412365 Smith Sue 100 3 545 -445
741258 Anderson Jon 50 1 0.00 50.00
963258 Holmes Fred 100 2 30 70
The schema:
SET search_path TO temp, public;
CREATE TABLE term
(term_id SERIAL PRIMARY KEY,
term_title VARCHAR(100));
CREATE TABLE student
(student_id SERIAL PRIMARY KEY,
student_sis_id VARCHAR(15),
student_first_name VARCHAR(30),
student_last_name VARCHAR(100));
CREATE TABLE reservation
(reservation_id SERIAL PRIMARY KEY,
student_id INTEGER REFERENCES student ON UPDATE CASCADE,
term_id INTEGER REFERENCES term ON UPDATE CASCADE,
reservation_fee_amount NUMERIC DEFAULT 0.00);
CREATE TABLE payment
(payment_id SERIAL PRIMARY KEY,
student_id INTEGER REFERENCES student ON UPDATE CASCADE,
term_id INTEGER REFERENCES term ON UPDATE CASCADE,
payment_cash_amount NUMERIC,
payment_credit_card_amount NUMERIC,
payment_check_amount NUMERIC);
INSERT INTO term VALUES (DEFAULT, 'SESSION 1');
INSERT INTO term VALUES (DEFAULT, 'SESSION 2');
INSERT INTO term VALUES (DEFAULT, 'SESSION 3');
INSERT INTO student VALUES (DEFAULT, 412365, 'Sue', 'Smith');
INSERT INTO student VALUES (DEFAULT, 123456, 'Amy', 'Jones');
INSERT INTO student VALUES (DEFAULT, 741258, 'Jon', 'Anderson');
INSERT INTO student VALUES (DEFAULT, 963258, 'Fred', 'Holmes');
INSERT INTO reservation VALUES (DEFAULT, 1, 1, 50);
INSERT INTO reservation VALUES (DEFAULT, 1, 2, 50);
INSERT INTO reservation VALUES (DEFAULT, 2, 1, 50);
INSERT INTO reservation VALUES (DEFAULT, 3, 2, 50);
INSERT INTO reservation VALUES (DEFAULT, 4, 1, 50);
INSERT INTO reservation VALUES (DEFAULT, 4, 2, 50);
INSERT INTO reservation VALUES (DEFAULT, 1, 3, 50);
INSERT INTO payment VALUES (DEFAULT, 1, 1, 25, 0, 0);
INSERT INTO payment VALUES (DEFAULT, 1, 1, 0, 20, 0);
INSERT INTO payment VALUES (DEFAULT, 2, 1, 25, 25, 0);
INSERT INTO payment VALUES (DEFAULT, 4, 1, 10, 10, 10);
INSERT INTO payment VALUES (DEFAULT, 1, 3, 500, 0, 0);
The query:
SELECT
student.student_sis_id AS "Student ID",
student.student_last_name AS Last,
student.student_first_name AS First,
SUM(reservation.reservation_fee_amount) AS "Total Fees",
(
SELECT COUNT(reservation.reservation_id)
FROM reservation
WHERE student.student_id = reservation.student_id
) AS "Reservation Count",
(
SELECT
COALESCE(SUM(
payment.payment_check_amount
+ payment.payment_cash_amount
+ payment.payment_credit_card_amount
), 0.00)
FROM payment
WHERE payment.student_id = student.student_id
) AS "Amount Paid",
SUM(reservation.reservation_fee_amount) - (
SELECT
COALESCE(SUM(
payment.payment_check_amount
+ payment.payment_cash_amount
+ payment.payment_credit_card_amount
), 0.00)
FROM payment WHERE payment.student_id = student.student_id
) AS "Amount Due"
FROM
student
INNER JOIN reservation ON student.student_id = reservation.student_id
WHERE reservation.term_id IN (1,2)
GROUP BY
student.student_id,
student.student_sis_id,
student.student_last_name,
student.student_first_name
ORDER BY
student.student_sis_id
;
Here is my updated version of the query:
SELECT
s.student_sis_id AS "Student ID",
s.student_last_name AS Last,
s.student_first_name AS First,
SUM(r.reservation_fee_amount) AS "Total Fees",
COUNT(r.reservation_id) AS "Reservation Count",
COALESCE(
SUM(
p.payment_check_amount
+ p.payment_cash_amount
+ p.payment_credit_card_amount
), 0.00
) AS "Amount Paid",
SUM(r.reservation_fee_amount) - (
COALESCE(
SUM(
p.payment_check_amount
+ p.payment_cash_amount
+ p.payment_credit_card_amount
), 0.00
)
) AS "Amount Due"
FROM
student s
INNER JOIN reservation r ON s.student_id = r.student_id
LEFT JOIN payment p ON p.student_id = r.student_id AND p.term_id = r.term_id
WHERE r.term_id IN (1,2)
GROUP BY
s.student_id,
s.student_sis_id,
s.student_last_name,
s.student_first_name
ORDER BY
s.student_sis_id
;
Things to watch:
I included payments in the main (outer) query in order to avoid subqueries
the join type is LEFT [OUTER] JOIN, so the lack of any payment rows will not prevent other data from appearing in the result set
the join condition includes term_id (basically this was the point where you were lost, I think)
and finally I used short table aliases for improving readability.
I hope this is what you are after.
Found the solution to the 2 payment entry problem (that I didn't recognize in my original question). Here is the answer:
set search_path to temp, public;
SELECT
s.student_sis_id AS "Student ID",
s.student_last_name AS "Last Name",
s.student_first_name AS "First Name",
SUM(r.reservation_fee_amount) AS "Total Fees",
COALESCE(p.paid, 0.00) AS "Amount Paid",
COALESCE(SUM(r.reservation_fee_amount) - p.paid, 0.00) AS "Amount Due"
FROM
student s
INNER JOIN reservation r ON s.student_id = r.student_id
left outer join
(
select student_id, term_id,
SUM(
p.payment_check_amount
+ p.payment_cash_amount
+ p.payment_credit_card_amount
) AS "paid"
from payment p
group by student_id, term_id
) as p
ON p.student_id = r.student_id AND p.term_id = r.term_id
WHERE r.reservation_completed AND r.term_id IN (1,2)
GROUP BY
s.student_sis_id,
s.student_last_name,
s.student_first_name,
p.paid
ORDER BY
s.student_sis_id
Thank you dezso and davek
I've the(simplified) following model:
Book
id
name
BookCategory
book_id
category_id
rank
Category
id
name
With a given category id, I'd like to get the books having that category as the highest ranked one.
I'll give an example to be more clear about it:
Book
id name
--- -------
1 On Writing
2 Zen teachings
3 Siddharta
BookCategory
book_id category_id rank
--- ------- -----
1 2 34.32
1 5 24.23
1 9 54.65
2 5 27.33
2 9 28.32
3 2 30.43
3 5 27.87
Category
id name
--- -------
2 Writing
5 Spiritual
9 Buddism
The result for category_id = 2 would be the book with id = 3.
This is the query I'm running:
SELECT book."name" AS bookname
FROM bookcategory AS bookcat
LEFT JOIN book ON bookcat."book_id" = book."id"
LEFT JOIN category cat ON bookcat."category_id" = cat."id"
WHERE cat."id" = 2
ORDER BY bookcat."rank"
This is not the right way to do it because it doesn't select the max rank of each book. I've yet to find a proper solution.
Note: I'm using the postgresql 9.1 version.
Edit:
DB Schema (taken from martin's SQL Fiddle answer):
create table Book (
id int,
name varchar(16)
);
insert into Book values(1, 'On Writing');
insert into Book values(2, 'Zen teachings');
insert into Book values(3, 'Siddharta');
create table BookCategory (
book_id int,
category_id int,
rank real
);
insert into BookCategory values(1,2,34.32);
insert into BookCategory values(1,5,24.23);
insert into BookCategory values(1,9,54.65);
insert into BookCategory values(2,5,27.33);
insert into BookCategory values(2,9,28.32);
insert into BookCategory values(3,2,30.43);
insert into BookCategory values(3,5,27.87);
create table Category (
id int,
name varchar(16)
);
insert into Category values(2, 'Writing');
insert into Category values(5,'Spiritual');
insert into Category values(9, 'Buddism');
add another column to calculate rank:
dense_rank() OVER (PARTITION BY book."name" ORDER BY bookcat."rank"
s ASC) AS rank
To set up:
CREATE TABLE Book
(
id int PRIMARY KEY,
name text not null
);
CREATE TABLE Category
(
id int PRIMARY KEY,
name text not null
);
CREATE TABLE BookCategory
(
book_id int,
category_id int,
rank numeric not null,
primary key (book_id, category_id)
);
INSERT INTO Book VALUES
(1, 'On Writing'),
(2, 'Zen teachings'),
(3, 'Siddharta');
INSERT INTO Category VALUES
(2, 'Writing'),
(5, 'Spiritual'),
(9, 'Buddism');
INSERT INTO BookCategory VALUES
(1, 2, 34.32),
(1, 5, 24.23),
(1, 9, 54.65),
(2, 5, 27.33),
(2, 9, 28.32),
(3, 2, 30.43),
(3, 5, 27.87);
The solution:
SELECT Book.name
FROM (
SELECT DISTINCT ON (book_id)
*
FROM BookCategory
ORDER BY book_id, rank DESC
) t
JOIN Book ON Book.id = t.book_id
WHERE t.category_id = 2
ORDER BY t.rank;
Logically, the subquery in the FROM clause generates a relation with the highest ranking category for each book, from which you then select the books in that category and order them by the ranking in that category.
Results:
name
-----------
Siddharta
(1 row)
Is this what you want?
SELECT
book.name, mx.max_rank
FROM
(SELECT
max(rank) AS max_rank , book_id
FROM BookCategory WHERE category_id = 2
GROUP BY
book_id
) mx
JOIN Book ON
mx.book_id = Book.id
If I understand your question correctly, you need to get the maximum for a given category for every book in BookCategory (that is what the inner select does) and then simply join it to the Book table on book_id.
The whole example is on SQL Fiddle
EDIT:
I see that there is already an accepted answer, but for the sake of completeness, here is my answer following the clarification of the question:
SELECT
Book.name
FROM
(SELECT max(rank) AS max_rank, book_id AS bid
FROM BookCategory GROUP BY book_id
) mx
JOIN BookCategory ON
rank = max_rank
AND book_id = bid
JOIN Book
ON book_id = Book.id
WHERE category_id = 2
On SQL Fiddle.