Finding duplicate rows with unique keys in table - sql

I'm grabbing some text data from a table and having trouble figuring out how to do this exactly.
My data is book information. I have author, book title, and a book summary. Some of the summaries were duplicated into other book titles. So, for example:
Author: David Smith
Book Title: SQL For Dummies
Summary: A great book about SQL.
Author: David Smith
Book Title: Dummies Don't Need SQL
Summary: A great book about SQL.
Author: Jim Jones
Book Title: We Don't Use SQL Here
Summary: This book says you should not use SQL.
What I'm looking for is a way to get the author, book title and summary for the duplicates and tie them to each other. So I'd see the author and book titles for both books that have a duplicate summary, grouped together.
I'm using an oracle database server.
I've wracked my brain for hours and am not sure where to look next. Thanks!

select *
from (
select
summary, author, title,
count(*) over (partition by summary) cnt
from your_table
)
where cnt>1
order by summary, author, title
Also you can use dense_rank to mark groups

Related

How to combine two databases?

I am asking myself how to best define a new database given two databases.
I have the following two databases
Author where I save information about authors i.e. Name, Birthyear, Country, Genre etc.
Book where I save information about the book i.e. AuthorName, ReleaseYear, Pages etc
How do I create a new database with the full information about all books and the author? I.e. a Book_Complete Database which includes Author.Name, ReleaseYear,Pages, Genre,ReleaseYear and Author.Birthyear, Author.Country?
Better to go for a single database and having 2 tables in it like
Author Table
AuthorId (PK)
Name
Birthyear
Country
Genre
Book Table
BookId (PK)
AuthorId (FK)
ReleaseYear
Pages
If you have two tables in a database you can combine them using JOIN. Here is SQLite tutorial on how to use JOIN.
https://www.sqlitetutorial.net/sqlite-join/
On the information you provided I assume you can you columns Name in table Author, and AuthorName in table Book. You can try something like this
SELECT
A.Name,
B.ReleaseYear,
B.Pages,
B.Genre,
B.ReleaseYear,
A.Birthyear,
A.Country
FROM
Author A
LEFT JOIN Book B
ON A.Name = B.AuthorName
Basically, I have these create these two tables on my app and want to store the combined version on a server. Or would you suggest another approach?
Yes we would. You should have one table per entity, i.e. one for the authors and one for the books. Thus you don't store data redundantly (e.g. the author's name or the book's title), which could otherwise lead to inconsistencies later. You can read up on database normalization here: https://en.wikipedia.org/wiki/Database_normalization or google it.
Now you must decide how to link the tables. One author can write multiple books. Can one book be written by multiple authors? Give both tables IDs you can work with (or find some natural unique key, like the ISBN for the book and maybe name + birth year for the author - the name only will probably not suffice and maybe it won't even when combined with the year).
If one book can only be written by one author, you have a 1:n relation (one book written by one author, one author writing multiple books).
author (author_id, name, birth_year, country, ...)
book (book_id, title, author_id, release_year, pages, ...)
If one book can have several authors, you have an m:n relation (one book written by multiple authors, one author writing multiple books).
author (author_id, name, birth_year, country, ...)
book (book_id, title, release_year, pages, ...)
book_author (book_id, author_id)
You can always join the tables, so as to get a combined result.

SQL sub-queries with no joins

I am really struggling with one of the questions while revising. I guess you guys can help me out.
Here I have two tables named book and branch
Branch
Book
The question is:
List the title and author of books whose sales are greater than the
average sales. For each such book, also list the difference between
its sales and the average sales. The column of differences in the
table of results should be named "Difference".
Here is what I tried
SELECT title, authorFROM book
WHERE sales > AVG (sales) ( SELECT bookNo AS Difference
FROM book
WHERE Difference= sales-AVG(sales));
You want to create a table of results with three columns: title, author and Difference(the difference between its sales and the average sales). In sql you can do math in your select expressions, so you can just add (sales-AVG(sales)) to the list of columns. To specify the name you can use the keyword AS.
SELECT title, author, (sales-AVG(sales)) AS Difference FROM book WHERE sales>AVG(sales)

Issues with Group functions

I'm working on lab question that I just cannot come up with a solution, that would allow me to figure out the books that have more then one author(See the posted question below for my comments to make sense). My mind is totally blank on it. I'm very bad with word problems. I know that I have to do a JOIN statement which I've completed and I know that I have to use the COUNT function to count the number of authors but I honestly don't know how I would go about only counting the books that have 2 authors.
Any input would be appreciated. I tried to break it down into steps but it's just that one part that I'm not grasping in my mind.
Using the correct tables in your schema, create a query using either join operation you wish that will list the book title and number of authors for all books that have been written by more than one author. Give the title column an alias of "Book Title" and the column showing the number of authors an alias of "Number of Authors".
There is a BOOKS table and a AUTHOR table that are JOINED by a BOOK_AUTHOR table by their BOOKID in BOOKS and AUTHORID in AUTHOR.
I think I'm starting to understand that I have to use a mathematical equation to figure out more then one author. I don't understand the HAVING function all too well so I'm going to do more research on this one.
You were right, you need a JOIN, a COUNT, but also an HAVING to make sure there is more then one authors that written the book :
select title as 'Book Title', count(authors) as 'Number of Authors'
from books
join authors on books.id = authors.book_id
having count(authors) > 1
group by authors;
Make sure to adapt the table names and columns to the right one, as you didn't post them.
Note that if the books's author id column has the same name as the author id, you can use the USING keyword to join. then your query would become
select title as 'Book Title', count(authors) as 'Number of Authors'
from books
join authors on using(book_id)
having count(authors) > 1
group by authors;
Note that if you want to select only books that specifically have 2 authors you can change the having clause to having count(authors) = 2. But even if you ask for that in your question, according to the exercice you pasted, you did not understand properly the question.
I finally figured out the answer with all your help. Please see below:
SELECT DISTINCT Title "Book Title", COUNT(*)"Number of Authors"
FROM BOOKS JOIN
BOOK_AUTHOR ON
BOOKS.BookId = BOOK_AUTHOR.BookId
JOIN AUTHOR ON
BOOK_AUTHOR.AuthorId = Author.AuthorId
GROUP BY Title
HAVING COUNT(*) > 1;

Using SQL to determine the number of different customers who have placed an order for books

I'm working on this assignment in Oracle and the question states "Determine the number of different customers who have ordered a book by an author or co-written by John Doe?" Ultimately it's suppose to look like this when completed.
COUNT DISTINCTCUSTOMER
-----------------------
5
My SQL query looks like this which is getting me nowhere.
SELECT customers
FROM books
where author John Doe
Without knowing your data model - use COUNT DISTINCT, e.g:
SELECT COUNT(DISTINCT customer)
FROM books
WHERE author = 'John Doe'

Create a relation

I have no idea about how to create relation and solve their queries. I want to relation as follows:
BookAuthor(book, author, earnings)
BookReference(book, referenceBook, times)
BookReview(book, reviewer, score)
BookPublih(book, year, publisher, price, numbar)
In these database, each book may have 1 or more authors & each author may make a diff. amt. of money from that book. One book may make reference to other book.1 book may be reviewed by diff. reviewers and get diff. scores. an author could also be a reviewer & a publisher.
I want to solve following queries
Find all books published in 2003 & reviewed by both Sammer Tulpule & Hemant Mehta.
Find all the reviewers who never reviewed their own books.
Find all authors who reviewed more than 2 books written by Sita Mitra.
Find all authors who have written exactly 1 book and reviewed more than 1 book.
Find all the reviewer who reviewed every book from 'Stephen King'.
Find all books published in 1995-2000 in descending order.
I know, these is not good to find an answer, but believe me i really don't under
A solution to your problem could use these tables:
BOOKS (BookID, BookName,PubDate,PublisherPerson_ID,Price,Number) //stores each book and its data
PERSONS(PersonID, Name) //stores any people (authors, reviewers)
BOOKAUTHORS(Book_ID, Person_ID) //many to many relationship between books and authors
BOOKREVENUE(Book_ID, Person_ID, Revenue) //stores revenue for each book per author
BOOKREFERENCES(Book_ID, RefBook_ID) //many to many reference table for books
BOOKREVIEWERS(Book_ID, Person_ID, Score) //many to many relationship for book reviews
Im not going to write the queries here, but this will get you started