How can we build such an extremely complex SQL statement? - sql

[Book] isbn(PK),title,category_id,subcategory_id,price
[Author] isbn(FK),author_id(PK),name
[Category] category_id(PK),name
[SubCategory] sub_category_id(PK),category_id(FK),name
I have a database (not designed by me) that contains the above four tables.
I want to have a book list having the following format:
isbn, title, author name(s), category name, subcategory name(may not have), price
But there's some complexity, as you can see, each book can have more than one author, the author name column should have author names separated by commas.
And for the category which is the more difficult part, there are some categories that have no subcategories and thus, some book records have subcategory_id set to 0 because its category_id refers to an category that has no subcategories, in this case, the subcategory name column in the book list does not need to show anything.
I really have no idea how such a complex a complex SQL statement can be built quickly to get the book list. Would somebody kindly think of a solution?
Many thanks to you all.

When you find yourself building an "extremely complex SQL statement", it's usually best to step back and rethink.
Remember this - the vast majority of operations performed on database table are selects, not inserts or updates (though there are exceptions to every rule, of course).
The right time to be "spending" CPU cycles calculating things like author lists is when the list changes, not when you just want to extract the information.
Add another column to the book table called author_list and then create an insert/update trigger on authors so that this column is rebuilt whenever an author is changed for the specific ISBN.
That puts the cost where it should be and will make your query a lot simpler. The trigger ensures the data stays consistent, and it's okay to break 3NF if you know what you're doing.
As to the subcategory, the case statement can be your friend, but per-row functions on select never scale well.
I would just create a set of rows in subcategories with the id of 0 (one for each category) and make its name blank. Then it can be done with a simple join without having to concern yourself with performance. This could also be don with a trigger on category so every category will always have a subcategory of 0.
With those two changes, the query becomes a lot less complex, something along the lines of:
select b.isbn, b.title, b.author_list, c.name, sc.name, b.price
from Book b, Category c, SubCategory sc
where b.category_id = c.category_id
and b.category_id = sc.category_id
and b.subcategory_id = sc.subcategory_id
order by ...
This query should scream along since it's using just the basic levels of relational algebra (i.e., no per-row functions (including case statements), no subqueries). And that's an "old-school" query, you may get even more performance by using explicit rather than implicit JOINs.
One final point: a properly 3NF schema would not have the ISBN in the authors table - a better option would be to have a separate BookAuthor table holding the ISBN and author_id to properly model the many-to-many relationship. But you may have aleready changed that for performance (I don't know).

That's an odd schema, not how I would have designed it. Being denormalized, it's probably going to have a lot of duplication in the author table.
Anyways, because you may have one or more authors, joins aren't really going to cut it for that information. Some things, to be honest, are better done outside of SQL and this is one of them. You can just build a loop that constructs the information and emits the data when the ISBN changes, assuming you do your ordering well.
As for the categories and subcategories, use a left join and it will return NULL on the subcategory information which you can test for. If there's more than one subcategory possible for the book (or categories for that matter), then you're really DOA with SQL here.

See #Pax's answer for a nicer way to handle the null / zero values for sub_category_id
select isbn, a.name as author_name, c.name as category_name, sc.name as subcategory_name, price
from Book
join Author a on isbn = a.isbn
join Category c on category_id = c.category_id
join SubCategory sc on category_id = sc.category_id and subcategory_id = sc.subcategory_id
where subcategory_id != 0
union
select isbn, a.name as author_name, c.name as category_name, '' as subcategory_name, price
from Book
join Author a on isbn = a.isbn
join Category c on category_id = c.category_id
join SubCategory sc on category_id = sc.category_id and subcategory_id = sc.subcategory_id
where subcategory_id = 0

Well, the subcategory business is poor database design. Even if you assume that a book can only be in one category, it's a poor design because (in that case), a category can always bee derived from the subcategory, so you've introduced redundancy by having book have attributes for both.
As far as the query you want, that's just a matter of doing the joins and projecting teh select statement. In you don't know enough SQL to do that, you probably shouldn't be trying to write queries (or you should be asking about basic joins and projections).
As to how you turn multiple rows into one (which is what you want to do with the authors), that depends on your RDBMS (which you don't specify) and/or your front-end.

Something like this should be close.
select
Book.ISBN,
Book.Title,
Author.Name,
Category.Name as Category_Name,
SubCategory.Name as SubCategory_Name,
Book.Price
from
Book join Author
on Book.ISBN = Author.ISBN
join Category
on Book.Category_ID = Category.Category_ID
join SubCategory
on Book.Category_ID = SubCategory.Category_ID
and Book.SubCategory_ID = SubCategory.Sub_Category_ID

Related

Need assistance with SQL statement having trouble with the JOINS and WHERE clause

The company is performing an analysis of their inventory. They are considering purging books that are not popular with their customers. To do this they need a list of books that have never been purchased. Write a query using a join that provides this information. Your results should include all the book details and the order number column. Sort your results by the book title.
SELECT o.order_nbr, b.*
FROM orders o JOIN books
WHERE
ORDER BY book_title
This is all I could come up with, I'm still learning Joins and struggling to figure out what the correct statement should be. Wasn't sure what to put in the WHERE clause and don't really know how to properly join these tables.
You need an ON clause to specify what you are joining on. Also, your WHERE clause is empty, and you are not specifying the type of JOIN you are using. Looking at the way the tables are set up, the expectation is you are going to join the BOOKS table on ORDER_ITEMS, which also contains ORDER_NBR.
In the question, it's asking to find books with no orders, so correct join would be a LEFT JOIN between BOOKS and ORDER_ITEMS, as that will include every book, even those without orders, which will have an ORDER_NBR of NULL
The SQL would look like
SELECT o.order_nbr, b.*
FROM books b
LEFT JOIN order_items o on b.book_id = o.book_id
WHERE o.order_nbr is null
ORDER BY book_title
This would return only the books with no orders.

SQL: Querying Against Composite Entity

I'm trying to figure out how to query against a composite key.
In my select result, I want the book name, author, category, and selling price. So far I have select title,category,price from books1 where books1.category='MYS';
but I'm not sure how to go about getting the author name.
I'm not sure why you have books1 when the model show the table named books. Books is a horrible name for a table -- typically in relational databases you use the singular -- eg book.
Here is how you do a join:
Select a.First, a.Last
from books b
join books_authors ab on b.b_code = ab.book_code
join authors a on ab.authorId = a.id
where b.category = 'MYS'
also all of your field names have spaces in them -- I don't know what platform you are using so I've no idea how to escape the names. Using spaces in field name is non-standard and not acually SQL. I'd advise against it whenever possible.

SQL Multiple Joins - How do they work exactly?

I'm pretty sure this works universally across various SQL implementations. Suppose I have many-to-many relationship between 2 tables:
Customer: id, name
has many:
Order: id, description, total_price
and this relationship is in a junction table:
Customer_Order: order_date, customer_id, order_id
Now I want to write SQL query to join all of these together, mentioning the customer's name, the order's description and total price and the order date:
SELECT name, description, total_price FROM Customer
JOIN Customer_Order ON Customer_Order.customer_id = Customer.id
JOIN Order = Order.id = Customer_Order.order_id
This is all well and good. This query will also work if we change the order so it's FROM Customer_Order JOIN Customer or put the Order table first. Why is this the case? Somewhere I've read that JOIN works like an arithmetic operator (+, * etc.) taking 2 operands and you can chain operator together so you can have: 2+3+5, for example. Following this logic, first we have to calculate 2+3 and then take that result and add 5 to it. Is it the same with JOINs?
Is it that behind the hood, the first JOIN must first be completed in order for the second JOIN to take place? So basically, the first JOIN will create a table out of the 2 operands left and right of it. Then, the second JOIN will take that resulting table as its left operand and perform the usual joining. Basically, I want to understand how multiple JOINs work behind the hood.
In many ways I think ORMs are the bane of modern programming. Unleashing a barrage of underprepared coders. Oh well diatribe out of the way, You're asking a question about set theory. THere are potentially other options that center on relational algebra but SQL is fundamentally set theory based. here are a couple of links to get you started
Using set theory to understand SQL
A visual explanation of SQL

SQL, using data from a number of separate tables to receive an output

So my tables I'm working with look like the following:
Book(BookID, Title)
Author(BookID, AuthID)
Writer(AuthID, PubID, FirstName, LastName)
Publisher(PubID, PubName, Country)
And I'd love to change them to make more sense, but I'm not allowed to even change their names at this point.
Anyway, I have two separate pieces of code that I want to run together. So it's the result of this:
select Book.Title
from Book
join Author
on Book.BookID=Author.BookID
group by Book.Title, Book.BookID
having count(*) >= 2
with this:
select AuthorID
from Author
join Publisher
on Author.PubID=Publisher.Publisher
where Publisher.Country like 'Australia'
Initially I thought INTERSECT might work but I quickly realised that because they're not matching fields, I need something else. And the fact that Writer and Publisher have to be linked via Author is throwing me off completely.
Is there a way to do this save going back to the table and changing it to something less unnecessarily complex?
I've been going through the list of statements and whatnot on trying to find a solution, but I'm not sure which one I'm supposed to be looking at. Perhaps something with GROUP in it? So anything, just a point in the right direction would be much appreciated.
Are you trying to find all books with more than one author published in Australia?
If so, the query would look something like this:
select b.Title
from Book b join
Author a
on b.BookID = a.BookID join
Writer w
on w.AuthId = a.AuthId join
Publisher p
on w.PubId = p.PubId
where p.Country like 'Australia'
group by b.Title, b.BookID
having count(*) >= 2;
I think there's a small conceptual misunderstanding blocking your way here.
The two queries can be combined into a single query, and return the results you want, via multiple joins involving all these tables.

What is the best way to implement this SQL query?

I have a PRODUCTS table, and each product can have multiple attributes so I have an ATTRIBUTES table, and another table called ATTRIBPRODUCTS which sits in the middle. The attributes are grouped into classes (type, brand, material, colour, etc), so people might want a product of a particular type, from a certain brand.
PRODUCTS
product_id
product_name
ATTRIBUTES
attribute_id
attribute_name
attribute_class
ATTRIBPRODUCTS
attribute_id
product_id
When someone is looking for a product they can select one or many of the attributes. The problem I'm having is returning a single product that has multiple attributes. This should be really simple I know but SQL really isn't my thing and past a certain point I get a bit lost in the logic. The problem is I'm trying to check each attribute class separately so I want to end up with something like:
SELECT DISTINCT products.product_id
FROM attribproducts
INNER JOIN products ON attribproducts.product_id = products.product_id
WHERE (attribproducts.attribute_id IN (9,10,11)
AND attribproducts.attribute_id IN (60,61))
I've used IN to separate the blocks of attributes of different classes, so I end up with the products which are of certain types, but also of certain brands. From the results I've had it seems to be that AND between the IN statements that's causing the problem.
Can anyone help a little? I don't have the luxury of completely refactoring the database unfortunately, there is a lot more to it than this bit, so any suggestions how to work with what I have will be gratefully received.
Take a look at the answers to the question SQL: Many-To-Many table AND query. It's the exact same problem. Cletus gave there 2 possible solutions, none of which very trivial (but then again, there simply is no trivial solution).
SELECT DISTINCT products.product_id
FROM products p
INNER JOIN attribproducts ptype on p.product_id = ptype.product_id
INNER JOIN attribproducts pbrand on p.product_id = pbrand.product_id
WHERE ptype.attribute_id IN (9,10,11)
AND pbrand.attribute_id IN (60,61)
Try this:
select * from products p, attribproducts a1, attribproducts a2
where p.product_id = a1.product_id
and p.product_id = a2.product_id
and a1.attribute_id in (9,10,11)
and a2.attribute_id in (60,61);
This will return no rows because you're only counting rows that have a number that's (either 9, 10, 11) AND (either 60, 61).
Because those sets don't intersect, you'll get no rows.
If you use OR instead, it'll give products with attributes that are in the set 9, 10, 11, 60, 61, which isn't what you want either, although you'll then get multiple rows for each product.
You could use that select as an subquery in a GROUP BY statement, grouping by the quantity of products, and order that grouping by the number of shared attributes. That will give you the highest matches first.
Alternatively (as another answer shows), you could join with a new copy of the table for each attribute set, giving you only those products that match all attribute sets.
It sounds like you have a data schema that is GREAT for storage but terrible for selecting/reporting. When you have a data structure of OBJECT, ATTRIBUTE, OBJECT-ATTRIBUTE and OBJECT-ATTRIBUTE-VALUE you can store many objects with many different attributes per object. This is sometime referred to as "Vertical Storage".
However, when you want to retrieve a list of objects with all of their attributes values, it is an variable number of joins you have to make. It is much easier to retrieve data when it is stored horizonatally (Defined columns of data)
I have run into this scenario several times. Since you cannot change the existing data structure. My suggest would be to write a "layer" of tables on top. Dynamically create a table for each object/product you have. Then dynamically create static columns in those new tables for each attribute. Pretty much you need to "flatten" your vertically stored attribute/values into static columns. Convert from a vertical architecture into a horizontal ones.
Use the "flattened" tables for reporting, and use the vertical tables for storage.
If you need sample code or more details, just ask me.
I hope this is clear. I have not had much coffee yet :)
Thanks,
- Mark
You can use multiple inner joins -- I think this would work:
select distinct product_id
from products p
inner join attribproducts a1 on a1.product_id=p.product_id
inner join attribproducts a2 on a1.product_id=p.product_id
where a1.attribute_id in (9,10,11)
and a2.attribute_id in (60,61)