Query to get the highest rank of each item - sql

I've the(simplified) following model:
Book
id
name
BookCategory
book_id
category_id
rank
Category
id
name
With a given category id, I'd like to get the books having that category as the highest ranked one.
I'll give an example to be more clear about it:
Book
id name
--- -------
1 On Writing
2 Zen teachings
3 Siddharta
BookCategory
book_id category_id rank
--- ------- -----
1 2 34.32
1 5 24.23
1 9 54.65
2 5 27.33
2 9 28.32
3 2 30.43
3 5 27.87
Category
id name
--- -------
2 Writing
5 Spiritual
9 Buddism
The result for category_id = 2 would be the book with id = 3.
This is the query I'm running:
SELECT book."name" AS bookname
FROM bookcategory AS bookcat
LEFT JOIN book ON bookcat."book_id" = book."id"
LEFT JOIN category cat ON bookcat."category_id" = cat."id"
WHERE cat."id" = 2
ORDER BY bookcat."rank"
This is not the right way to do it because it doesn't select the max rank of each book. I've yet to find a proper solution.
Note: I'm using the postgresql 9.1 version.
Edit:
DB Schema (taken from martin's SQL Fiddle answer):
create table Book (
id int,
name varchar(16)
);
insert into Book values(1, 'On Writing');
insert into Book values(2, 'Zen teachings');
insert into Book values(3, 'Siddharta');
create table BookCategory (
book_id int,
category_id int,
rank real
);
insert into BookCategory values(1,2,34.32);
insert into BookCategory values(1,5,24.23);
insert into BookCategory values(1,9,54.65);
insert into BookCategory values(2,5,27.33);
insert into BookCategory values(2,9,28.32);
insert into BookCategory values(3,2,30.43);
insert into BookCategory values(3,5,27.87);
create table Category (
id int,
name varchar(16)
);
insert into Category values(2, 'Writing');
insert into Category values(5,'Spiritual');
insert into Category values(9, 'Buddism');

add another column to calculate rank:
dense_rank() OVER (PARTITION BY book."name" ORDER BY bookcat."rank"
s ASC) AS rank

To set up:
CREATE TABLE Book
(
id int PRIMARY KEY,
name text not null
);
CREATE TABLE Category
(
id int PRIMARY KEY,
name text not null
);
CREATE TABLE BookCategory
(
book_id int,
category_id int,
rank numeric not null,
primary key (book_id, category_id)
);
INSERT INTO Book VALUES
(1, 'On Writing'),
(2, 'Zen teachings'),
(3, 'Siddharta');
INSERT INTO Category VALUES
(2, 'Writing'),
(5, 'Spiritual'),
(9, 'Buddism');
INSERT INTO BookCategory VALUES
(1, 2, 34.32),
(1, 5, 24.23),
(1, 9, 54.65),
(2, 5, 27.33),
(2, 9, 28.32),
(3, 2, 30.43),
(3, 5, 27.87);
The solution:
SELECT Book.name
FROM (
SELECT DISTINCT ON (book_id)
*
FROM BookCategory
ORDER BY book_id, rank DESC
) t
JOIN Book ON Book.id = t.book_id
WHERE t.category_id = 2
ORDER BY t.rank;
Logically, the subquery in the FROM clause generates a relation with the highest ranking category for each book, from which you then select the books in that category and order them by the ranking in that category.
Results:
name
-----------
Siddharta
(1 row)

Is this what you want?
SELECT
book.name, mx.max_rank
FROM
(SELECT
max(rank) AS max_rank , book_id
FROM BookCategory WHERE category_id = 2
GROUP BY
book_id
) mx
JOIN Book ON
mx.book_id = Book.id
If I understand your question correctly, you need to get the maximum for a given category for every book in BookCategory (that is what the inner select does) and then simply join it to the Book table on book_id.
The whole example is on SQL Fiddle
EDIT:
I see that there is already an accepted answer, but for the sake of completeness, here is my answer following the clarification of the question:
SELECT
Book.name
FROM
(SELECT max(rank) AS max_rank, book_id AS bid
FROM BookCategory GROUP BY book_id
) mx
JOIN BookCategory ON
rank = max_rank
AND book_id = bid
JOIN Book
ON book_id = Book.id
WHERE category_id = 2
On SQL Fiddle.

Related

View table SQL Server

I want to create a view, displaying book titles and number of reviews made to the specific book.
What is the options when the values are not compatible?
Relevant columns in the Books table:
ISBN13 PK bigint
Title nvarchar(50)
Language nvarchar(30)
Author Id FK int
Category ID FK int
Sample data Books:
INSERT INTO Books VALUES (9783852913735, 'Ulysses', 'English', 100, 'January 06, 2002', 1, null);
INSERT INTO Books VALUES (9780195038637, 'Battle Cry of Freedom', 'English', 490, 'February 25, 1988', 99, null);
INSERT INTO Books VALUES (9789178615155, 'Surhörningen', 'Swedish', 195, '2019', 4, null);
INSERT INTO Books VALUES (9789178614577, 'Jag älskar regnbågsenhörningar', 'Swedish', 190, '2021', 2, null);
Relevant columns in the Reviews table:
ReviewId PK int
BookId FK bigint -- FK to ISBN13
CategoryID FK
WriterId FK
Date
Sample data Reviews:
insert into Reviews values(0020, '9783852913735', '120', 11, '2001-02-21');
insert into Reviews values(0021, '9789177836599', '140', 4, '2001-10-19');
insert into Reviews values(0022, '9789178130979', '110', 1, '2002-02-22');
insert into Reviews values(0023, '9789178130979', '90', 8, '2003-09-06');
insert into Reviews values(0024, '9789178614677', '50', 2, '2005-08-29');
insert into Reviews values(0025, '9789178615155', '10', 5, '2004-08-25');
insert into Reviews values(0026, '971019503872', '10', 9, '2009-06-11');
insert into Reviews values(0027, '9780195038637', '20', 2, '2010-11-10');
Sample data Categories:
insert into Categories (CategoryId, Name) values(10, 'Architecture');
insert into Categories values(20, 'Art');
insert into Categories values(30, 'Astrology');
insert into Categories values(40, 'Baking');
insert into Categories values(50, 'Business Management');
insert into Categories values(60, 'Biology');
insert into Categories values(70, 'Comics');
insert into Categories values(80, 'Computational Science');
SELECT Books.Title, Books.[Author Id]
FROM Books
INNER JOIN Reviews ON Reviews.BookId=Books.ISBN13;
Below is my code for the reviews part, as I want to show the number of reviews per book:
SELECT
BookId,
COUNT
(BookId) [Reviews]
FROM
Reviews
GROUP BY BookId
HAVING COUNT
(BookId)> 1
So expected results would be:
Title | Author | BookId | Category | Number of Reviews
Have a look in to this query. I created the view and since the category has no values compatible with the books table I used a Left join to retrieve the records which has values in both books and reviews. Feel free to comment on the answer and let me know any other additions or alterations if required. I am happy to assist with. Thanks for posting Insert scripts and table definitions which gave me fast implementation and testing capability.
CREATE view My_View AS
(
SELECT
[B].[ISBN13] AS [BookId]
,[B].[Title]
,[B].[AuthorId] AS [Author]
,[C].[Name] As [Category]
, COUNT([R].[ReviewId]) OVER ( PARTITION BY [B].[Title]) AS [Number of reviews]
FROM Reviews [R]
INNER JOIN Books [B]
ON [R].[BookId] = [B].[ISBN13]
LEFT JOIN Categories [C]
ON [B].[CategoryId] = [C].[CategoryId]
)
SELECT * FROM My_View
Assuming from your sample query you are after just a count of reviews, you would have something like this (guessing obviously for the other tables you need to join with). Several ways to correlate but a simple count only requires an inline correlated subquery:
create view MyView as
select
b.Title,
a.Name Author,
b.ISBN13 BookId,
c.Name Category,
(select Count(*) from Reviews r where r.BookId=b.ISBN13) Reviews
from Books b
join Categories c on c.Id=b.CategoryId
join Authors a on a.Id=b.AuthorId
Using a subset of the data you added, this query works fine
Title BookId Reviews
------------------------------ --------------- -----------
Ulysses 9783852913735 1
Battle Cry of Freedom 9780195038637 1
Surhörningen 9789178615155 1
Jag älskar regnbågsenhörningar 9789178614577 0

Insert multiple rows using the same foreign key that needs to be selected

Assume that there are two tables:
CREATE TABLE products (id SERIAL, name TEXT);
CREATE TABLE comments (id SERIAL, product_id INT, txt TEXT);
I would like to insert multiple comments for the same product. But I don't know the product_id yet, only the product name.
So I could do:
INSERT INTO comments (txt, product_id) VALUES
( 'cool', (SELECT id from products WHERE name='My product name') ),
( 'great', (SELECT id from products WHERE name='My product name') ),
...
( 'many comments later', (SELECT id from products WHERE name='My product name') );
I'd like to reduce the repetition. How to do this?
I tried this but it inserts no rows:
INSERT INTO
comments (txt, product_id)
SELECT
x.txt,
p.id
FROM
(
VALUES
('Great product'),
('I love it'),
...
('another comment')
) x (txt)
JOIN products p ON p.name = 'My product name';
Your query works just fine. The only way it inserts zero rows is if there is no product in the table products for a given string - in your query named My product name. However, #a_horse_with_no_name's suggestion to use a CROSS JOIN might simplify your query a bit. You can combine it with a CTE to collect all comments and then CROSS JOIN it with the record you filtered in from table products.
CREATE TABLE products (id SERIAL, name TEXT);
CREATE TABLE comments (id SERIAL, product_id INT, txt TEXT);
INSERT INTO products VALUES (1, 'My product name'),(2,'Another product name');
WITH j (txt) AS (
VALUES ('Great product'),('I love it'),('another comment')
)
INSERT INTO comments (product_id,txt)
SELECT id,j.txt FROM products
CROSS JOIN j WHERE name = 'My product name';
SELECT * FROM comments;
id | product_id | txt
----+------------+-----------------
1 | 1 | Great product
2 | 1 | I love it
3 | 1 | another comment
Check this db<>fiddle

SQL - EXIST OR ALL?

I have two different table student and grades;
grade table has an attribute student_id which references student_id from student table.
How do I find which student has every grade that exists?
If this is not clear,
Student ID Name
1 1 John
2 2 Paul
3 3 George
4 4 Mike
5 5 Lisa
Grade Student_Id Course Grade
1 1 Math A
2 1 English B
3 1 Physics C
4 2 Math A
5 2 English A
6 2 Physics B
7 3 Economics A
8 4 Art C
9 5 Biology A
Assume there is only grade a,b,c (no d, e or fail)
I want to find only John because He has grade a,b,c while
other student like Paul(2) should not be selected because he does not have grade c. It does not matter which course he took, I just need to find if he has all the grades out there available.
Feel like I should something like exist or all function in sql but not sure.
Please help. Thank you in advance.
I would use GROUP BY and HAVING, but like this:
SELECT s.Name
FROM Student s JOIN
Grade g
ON s.ID = g.Student_Id
GROUP BY s.id, s.Name
HAVING COUNT(DISTINCT g.Grade) = (SELECT COUNT(DISTINCT g2.grade) FROM grade g2);
You say "all the grades out there", so the query should not use a constant for that.
You can use HAVING COUNT(DISTINCT Grade) = 3 to check that the student has all 3 grades:
SELECT Name
FROM Student S
JOIN Grade G ON S.ID = G.Student_Id
GROUP BY Name
HAVING COUNT(DISTINCT Grade) = 3
Guessing at S.ID vs S.Student on the join. Not sure what the difference is there.
By using exists
select * from student s
where exists ( select 1
from grades g where g.Student_Id=s.ID
group by g.Student_Id
having count(distinct Grade)=3
)
Example
with Student as
(
select 1 as id,'John' as person
union all
select 2 as id,'Paul' as person
union all
select 3 as id,'jorge'
),
Grades as
(
select 1 as Graden, 1 as Student_Id, 'Math' as Course, 'A' as Grade
union all
select 2 as Graden, 1 as Student_Id, 'English' as Course, 'B' as Grade
union all
select 3 as Graden, 1 as Student_Id, 'Physics' as Course, 'C' as Grade
union all
select 4 as Graden, 2 as Student_Id, 'Math' as Course, 'A' as Grade
union all
select 5 as Graden, 2 as Student_Id, 'English' as Course, 'A' as Grade
union all
select 6 as Graden, 2 as Student_Id, 'Physics' as Course, 'B' as Grade
)
select * from Student s
where exists ( select 1
from Grades g where g.Student_Id=s.ID
group by g.Student_Id
having count(distinct Grade)=3
)
Note having count(distinct Grade)=3 i used this as in your sample data grade type is 3
Before delving into the answer, here's a working SQL Fiddle Example so you can see this in action.
As Gordon Linoff points out in his excellent answer, you should use GroupBy and Having Count(Distinct ... ) ... as an easy way to check.
However, I'd recommend changing your design to ensure that you have tables for each concern.
Currently your Grade table holds each student's grade per course. So it's more of a StudentCourse table (i.e. it's the combination of student and course that's unique / gives you that table's natural key). You should have an actual Grade table to give you the list of available grades; e.g.
create table Grade
(
Code char(1) not null constraint PK_Grade primary key clustered
)
insert Grade (Code) values ('A'),('B'),('C')
This then allows you to ensure that your query would still work if you decided to include grades D and E, without having to amend any code. It also ensures that you only have to query a small table to get the complete list of grades, rather than a potentially huge table; so will give better performance. Finally, it will also help you maintain good data; i.e. so you don't accidentally end up with students with grade X due to a typo; i.e. since the validation/constraints exist in the database.
select Name from Student s
where s.Id in
(
select sc.StudentId
from StudentCourse sc
group by sc.StudentId
having count(distinct sc.Grade) = (select count(Code) from Grade)
)
order by s.Name
Likewise, it's sensible to create a Course table. In this case holding Ids for each course; since holding the full course name in your StudentCourse table (as we're now calling it) uses up a lot more space and again lacks validation / constraints. As such, I'd propose amending your database schema to look like this:
create table Grade
(
Code char(1) not null constraint PK_Grade primary key clustered
)
insert Grade (Code) values ('A'),('B'),('C')
create table Course
(
Id bigint not null identity(1,1) constraint PK_Course primary key clustered
, Name nvarchar(128) not null constraint UK_Course_Name unique
)
insert Course (Name) values ('Math'),('English'),('Physics'),('Economics'),('Art'),('Biology')
create table Student
(
Id bigint not null identity(1,1) constraint PK_Student primary key clustered
,Name nvarchar(128) not null constraint UK_Student_Name unique
)
set identity_insert Student on --inserting with IDs to ensure the ids of these students match data from your question
insert Student (Id, Name)
values (1, 'John')
, (2, 'Paul')
, (3, 'George')
, (4, 'Mike')
, (5, 'Lisa')
set identity_insert Student off
create table StudentCourse
(
Id bigint not null identity(1,1) constraint PK_StudentCourse primary key
, StudentId bigint not null constraint FK_StudentCourse_StudentId foreign key references Student(Id)
, CourseId bigint not null constraint FK_StudentCourse_CourseId foreign key references Course(Id)
, Grade char /* allow null in case we use this table for pre-results; otherwise make non-null */ constraint FK_StudentCourse_Grade foreign key references Grade(Code)
, Constraint UK_StudentCourse_StudentAndCourse unique clustered (StudentId, CourseId)
)
insert StudentCourse (StudentId, CourseId, Grade)
select s.Id, c.Id, x.Grade
from (values
('John', 'Math', 'A')
,('John', 'English', 'B')
,('John', 'Physics', 'C')
,('Paul', 'Math', 'A')
,('Paul', 'English', 'A')
,('Paul', 'Physics', 'B')
,('George', 'Economics','A')
,('Mike', 'Art', 'C')
,('Lisa', 'Biology', 'A')
) x(Student, Course, Grade)
inner join Student s on s.Name = x.Student
inner join Course c on c.Name = x.Course

Convert a one-to-many relationship to many-to-many and update existing references

I have a one-to-many relationship which I've converted to a many-to-many relationship.
Example:
Main Table (
Id int,
Code varchar(2)
)
Secondary Table (
Id int,
Name varchar(250),
MainId int
)
I have the following entries in the Main table:
Id Code
1 A
2 B
3 C
Secondary table:
Id Name MainId
1 Foo 1
2 Bar 1
3 Foo 2
4 Bar 2
5 Bar 3
Since the values in the column 'Name' in the 'Secondary' table are repeated quite often, the db size has grown considerably, I've decided to convert into a many-to-many relationship and only reference unique 'Name' entries.
As a first step I've created the following join table:
MainSecondary Table (
MainId int,
SecondaryId int,
)
For the final step I need to update the existing references and delete duplicate records based on the 'Name' column, which is where I'm stuck (over a million records).
The intended outcome should be:
Main table:
Id Code
1 A
2 B
3 C
Secondary table:
Id Name
1 Foo
2 Bar
MainSecondary table:
MainId SecondaryId
1 (A) 1 (Foo)
1 (A) 2 (Bar)
2 (B) 1 (Foo)
2 (B) 2 (Bar)
3 (C) 1 (Foo)
Set-up
create table main
(
id int,
code varchar(2)
);
create table secondary
(
id int,
name varchar(250),
main_id int
);
insert into main (id, code) values (1, 'A');
insert into main (id, code) values (2, 'B');
insert into main (id, code) values (3, 'C');
insert into secondary (id, name, main_id) values (1, 'Foo', 1);
insert into secondary (id, name, main_id) values (2, 'Bar', 1);
insert into secondary (id, name, main_id) values (3, 'Foo', 2);
insert into secondary (id, name, main_id) values (4, 'Bar', 2);
insert into secondary (id, name, main_id) values (5, 'Bar', 3);
Create new_secondary table
create table new_secondary
(
id int,
name varchar(250)
);
Create new relationship table: main_secondary
create table main_secondary
(
main_id int,
secondary_id int
);
Populate new_secondary table, removing duplicates
insert into new_secondary
(
id,
name
)
select
min(id),
name
from
secondary
group by
name;
Populate main_secondary relationship table
insert into main_secondary
(
main_id,
secondary_id
)
select distinct
a.main_id,
b.id as secondary_id
from
secondary a
join
new_secondary b
on a.name = b.name;;
Check the results
select
a.id as main_id,
a.code,
c.id as secondary_id,
c.name
from
main a
join
main_secondary b
on a.id = b.main_id
join
secondary c
on c.id = b.secondary_id;
Results
main_id code secondary_id name
----------- ---- ------------ -------
1 A 1 Foo
2 B 1 Foo
1 A 2 Bar
2 B 2 Bar
3 C 2 Bar
(5 rows affected)
3 (C) 2 (Bar) is different from your example, but I think it's correct.
You would need to drop the old secondary table and rename the new_secondary table (when you are sure everything is OK) to keep things tidy.

Add or delete repeated row

I have an output like this:
id name date school school1
1 john 11/11/2001 nyu ucla
1 john 11/11/2001 ucla nyu
2 paul 11/11/2011 uft mit
2 paul 11/11/2011 mit uft
I would like to achieve this:
id name date school school1
1 john 11/11/2001 nyu ucla
2 paul 11/11/2011 mit uft
I am using direct join as in:
select distinct
a.id, a.name,
b.date,
c.school
a1.id, a1.name,
b1.date,
c1.school
from table a, table b, table c,table a1, table b1, table c1
where
a.id=b.id
and...
Any ideas?
We will need more information such as what your tables contain and what you are after.
One thing I noticed is you have a school and then school1. 3nf states that you should never duplicate fields and append numbers to them to get more information even if you think that the relationship will only be 1 or 2 additional items. You need to create a second table that stores a user associated with 1 to many schools.
I agree with everyone else that both your source table and your desired output are poor design. While you probably can't do anything about your source table, I recommend the following code and output:
Select id, name, date, school from MyTable;
union
Select id, name, date, school1 from MyTable;
(repeat as necessary)
This will give you results in the format:
id name date school
1 john 11/11/2001 nyu
1 john 11/11/2001 ucla
2 paul 11/11/2011 mit
2 paul 11/11/2011 uft
(Note: in my version of SQL, union queries automatically select distinct records so the distinct flag isn't needed)
With this format, you could easily count the number of schools per student, number of students per school, etc.
If processing time and/or storage space is a factor here, you could then split this into 2 tables, 1 with the id,name & date, the other with the id & school (basically what JonH just said). But if you're just working up some simple statistics, this should suffice.
This problem was just too irresistable, so I just took a guess at the data structures that we are dealing with. The technology wasn't specified in the question. This is in Transact-SQL.
create table student
(
id int not null primary key identity,
name nvarchar(100) not null default '',
graduation_date date not null default getdate(),
)
go
create table school
(
id int not null primary key identity,
name nvarchar(100) not null default ''
)
go
create table student_school_asc
(
student_id int not null foreign key references student (id),
school_id int not null foreign key references school (id),
primary key (student_id, school_id)
)
go
insert into student (name, graduation_date) values ('john', '2001-11-11')
insert into student (name, graduation_date) values ('paul', '2011-11-11')
insert into school (name) values ('nyu')
insert into school (name) values ('ucla')
insert into school (name) values ('uft')
insert into school (name) values ('mit')
insert into student_school_asc (student_id, school_id) values (1,1)
insert into student_school_asc (student_id, school_id) values (1,2)
insert into student_school_asc (student_id, school_id) values (2,3)
insert into student_school_asc (student_id, school_id) values (2,4)
select
s.id,
s.name,
s.graduation_date as [date],
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s1 where s1.rank_num = 1) as school,
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s2 where s2.rank_num = 2) as school1
from
student s
Result:
id name date school school1
--- ----- ---------- ------- --------
1 john 2001-11-11 nyu ucla
2 paul 2011-11-11 mit uft