query: cross product instead of join - sql

I have two tables that I would like to join but I am getting an error from MySQL
Table: books
bookTagNum ShelfTagNum
book1 1
book2 2
book3 2
Table: shelf
shelfNum shelfTagNum
1 shelf1
2 shelf2
I want my results to be:
bookTagNum ShelfTagNum shelfNum
book1 shelf1 1
book2 shelf2 2
book3 shelf2 2
but instead I am also getting an extra result:
book1 shelf2 2
I think my query is doing a cross product instead of a join:
SELECT `books`.`bookTagNum` , `books`.`shelfNum` , `shelf`.`shelfTagNum` , `books`.`title`
FROM books, shelf
where `books`.`shelfNum`=`books`.`shelfNum`
ORDER BY `shelf`.`shelfTagNum` ASC
LIMIT 0 , 30
What am I doing wrong?

I think you want
where `books`.`shelfTagNum`=`shelf`.`shelfNum`
In order to match rows from the books and shelf tables, you need to have terms from each in your where clause - otherwise, you're just performing a no-operation check on the rows of books, since every row's shelfNum will be equal to its shelfNum.
As #fixme.myopenid.com suggests, you could also go the explicit JOIN route, but it's not necessary.

if you want to be sure you're doing a join instead of a cross product, you should state it explicitly in the SQL, thus:
SELECT books.bookTagNum,books.shelfNum, shelf.shelfTagNum, books.title
FROM books INNER JOIN shelf ON books.shelfNum = shelf.shelfTagNum
ORDER BY shelf.shelfTagNum
(which will return only those rows which exist in both tables), or:
SELECT books.bookTagNum,books.shelfNum, shelf.shelfTagNum, books.title
FROM books LEFT OUTER JOIN shelf ON books.shelfNum = shelf.shelfTagNum
ORDER BY shelf.shelfTagNum
(which will return all rows from books), or:
SELECT books.bookTagNum,books.shelfNum, shelf.shelfTagNum, books.title
FROM books RIGHT OUTER JOIN shelf ON books.shelfNum = shelf.shelfTagNum
ORDER BY shelf.shelfTagNum
(which will return all rows from shelf)

FYI: If you rewrite your names to be consistent, things get a lot easier to read.
Table 1: Book
BookID ShelfID BookName
1 1 book1
2 2 book2
3 2 book3
Table 2: Shelf
ShelfID ShelfName
1 shelf1
2 shelf2
now, a query to extract books to shelves is
SELECT
b.BookName,
s.ShelfName
FROM
Book b
JOIN Shelf s ON s.ShelfID = b.ShelfID
To answer the original question:
> where `books`.`shelfNum`=`books`.`shelfNum`
> ^^^^^--------------^^^^^------------- books repeated - this is an error
the WHERE clause, as written, does nothing, and because your where clause isn't limiting any rows, you are indeed getting the cross product.

Check your SQL. Your where clause cannot possibly be books.shelfNum=books.shelfNum
And what are all those single quotes for?

Try this:
SELECT `books`.`bookTagNum` , `books`.`shelfNum` , `shelf`.`shelfTagNum` ,
`books`.`title`
FROM books, shelf
where `books`.`shelftagNum`=`shelf`.`shelfNum`
ORDER BY `shelf`.`shelfTagNum` ASC
LIMIT 0 , 30
Because the implicit JOIN condition was not properly stated the result was a cross product.

As others have mentioned, the problem you faced was with your ON condition. To specifically answer your question:
In MySQL, if you omit a JOIN, an INNER JOIN/CROSS JOIN is used. For other databases, it is different. For example, PostgreSQL uses a CROSS JOIN, not an INNER JOIN.
Re: http://dev.mysql.com/doc/refman/5.7/en/join.html
"In MySQL, JOIN, CROSS JOIN, and INNER JOIN are syntactic equivalents (they can replace each other). In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause, CROSS JOIN is used otherwise."

Related

SQL Multiple SUM analysis

(First, sorry for my English I'm writing this from France)
I've read a lot of solutions here, but I'm quite lost!
Here's the problem: I've three tables
Budgets, Invoice, and Paiements
The relations are 1 to many between Invoice and Budgets, (there's always at least one budget) and, 0 to many between Invoice and Paiements i.e. (the Invoice can be not paid).
I'm trying to find any Invoice which is not paid OR partially paid!
Let's have an example
Then... I've written an SQL statement for it as the following:
Select sum(budget.amount) as m1, sum(Paiements.amount),budget.code
from
budget left outer join paiements
on
budget.code=Paiements.code
group by budget.code
I get this answer:
Now I'm trying to get only where C2 is 0 or C1 not equals C2.
How to modify my SQL statement?
You have to use the Having Clause to specify conditions that filter which group results should appear in the results.
The WHERE clause places conditions on the selected columns, whereas the HAVING clause places conditions on groups created by the GROUP BY clause.
So your query will be as the following:
Select sum(budget.amount) as m1, coalesce(sum(paiements.amount),0), budget.code
from
budget left outer join paiements
on
budget.code = paiements.code
group by budget.code
Having coalesce(sum(paiements.amount),0) = 0
or sum(budget.amount) <> coalesce(sum(paiements.amount),0)
The use of coalesce(sum(paiements.amount),0) function is to replace the null values with zeros.
See a demo from db-fiddle.
Note: it's a good practice to use aliases for the tables and columns in your query, doing so makes the query more readable.
Consider the following query:
select sum(B.amount) c1, coalesce(sum(P.amount),0) c2 ,B.code c3
from
budget B left outer join paiements P
on
B.code=P.code
group by B.code
having coalesce(sum(P.amount),0) =0
Or sum(B.amount) <> coalesce(sum(P.amount),0)

APPLY operator - how does it decide which rows are a match between the two sets?

When using a JOIN, it is clear exactly what is deciding whether or not the rows will match, e.g. ON a.SomeID1=b.SomeID1. So the only rows returned will be ones where there is a matching 'SomeID1' in the tables referenced by aliases A and B.
My initial thought was that, when using APPLY, a WHERE clause is typically placed within the right-hand query, to provide similar functionality to the ON clause of a JOIN.
However, I see many SQL queries that do not include a WHERE in the right-hand query when using APPLY. So won't this mean that the resulting rows will just be the product of the number of rows from both tables?
What logic determines which rows will match between the left and right queries when using APPLY?
I have tried many blog posts, answers on here and even YouTube videos, but none of the explanations have 'clicked' with me.
The apply operator (in databases that support it) implements a type of join called a lateral join.
For me, the best way of understanding it starts with a correlated subquery. For instance:
select a.*,
(select count(*)
from b
where b.a_id = a.a_id
--------------^
) as b_count
from a;
The subquery is counting the number of matching rows in b for each row in a. How does it do this? The correlation clause is the condition that maps the subquery to the outer query.
Apply works the same way:
select a.*, b.b_count
from a outer apply
(select count(*) as b_count
from b
where b.a_id = a.a_id
------------^
) b;
In other words, the correlation clause is the answer to your question.
What is the difference between a lateral join and a correlated subquery? There are three differences:
A lateral join can return more than one row.
A lateral join can return more than one column.
A lateral join is in the FROM clause so the returned columns can be referenced multiple times in the query.
Further to Gordon's excellent answer:
APPLY does not need to be correlated (ie that it uses columns from the outer query), the key is that it is lateral (it returns a new resultset for each row).
So starting with a base query:
select c.*
from customer c;
Example Resut:
Id
Name
1
John
2
Jack
The idea is to apply a new resultset to this. In this case, we only want a single row (a grouped-up count) to apply for each existing row.
Note the where correlation, we use an outer reference
select c.*, o.Orders
from customer c
outer apply
(select count(*) as Orders
from [order] o
where o.c_id = c.id
) o;
Id
Name
Orders
1
John
2
2
Jack
0
We can, however, return multiple results. In fact, we can return anything we like, and place arbirtrary filters on the result:
select c.*, t.*
from customer c
outer apply
(select 'Thing1' thing
union all
select 'Thing2'
where c.Name = 'Jack'
) t;
Id
Name
thing
1
John
Thing1
1
John
Thing2
2
Jack
Thing1
Note how the row containing John got doubled up, based on the filter. Note also that the first half of the union has no outer reference.
See also this answer for further APPLY tricks.

How to put conditions on left joins

I have two tables, CustomerCost and Products that look like the following:
I am joining the two tables using the following SQL query:
SELECT custCost.ProductId,
custCost.CustomerCost
FROM CUSTOMERCOST Cost
LEFT JOIN PRODUCTS prod ON Cost.productId =prod.productId
WHERE prod.productId=4
AND (Cost.Customer_Id =2717
OR Cost.Customer_Id IS NULL)
The result of the join is:
joins result
What i want to do is when I pass customerId 2717 it should return only specific customer cost i.e. 258.93, and when customerId does not match then only it should take cost as 312.50
What am I doing wrong here?
You can get your expected output as follows:
SELECT Cost.ProductId,
Cost.CustomerCost
FROM CUSTOMERCOST Cost
INNER JOIN PRODUCTS prod ON Cost.productId = prod.productId
WHERE prod.productId=4
AND Cost.Customer_Id = 2717
However, if you want to allow customer ID to be passed as NULL, you will have to change the last line to AND Cost.Customer_Id IS NULL. To do so dynamically, you'll need to use variables and generate the query based on the input.
The problem in the original query that you have posted is that you have used an alias called custCost which is not present in the query.
EDIT: Actually, you don't even need a join. The CUSTOMERCOST table seems to have both Customer and Product IDs.
You can simply:
SELECT
Cost.ProductId, Cost.CustomerCost
FROM
CUSTOMERCOST Cost
WHERE
Cost.Customer_Id = 2717
AND Cost.productId = 4
You seem to want:
SELECT c.*
FROM CUSTOMERCOST c
WHERE c.productId = 4 AND c.Customer_Id = 2717
UNION ALL
SELECT c.*
FROM CUSTOMERCOST c
WHERE c.productId = 4 AND c.Customer_Id IS NULL AND
NOT EXISTS (SELECT 1 FROM CUSTOMERCOST c2 WHERE c2.productId = 4 AND c2.Customer_Id = 2717);
That is, take the matching cost, if it exists for the customer. Otherwise, take the default cost.
SELECT custCost.ProductId,
custCost.CustomerCost
FROM CUSTOMERCOST Cost
LEFT JOIN PRODUCTS prod
ON Cost.productId =prod.productId
AND (Cost.Customer_Id =2717 OR Cost.Customer_Id IS NULL)
WHERE prod.productId=4
WHERE applies to the joined row. ON controls the join condition.
Outer joins are why FROM and ON were added to SQL-92. The old SQL-89
syntax had no support for them, and different vendors added different,
incompatible syntax to support them.

SQL - Max Vs Inner Join

I have a question on which is a better method in terms of speed.
I have a database with 2 tables that looks like this:
Table2
UniqueID Price
1 100
2 200
3 300
4 400
5 500
Table1
UniqueID User
1 Tom
2 Tom
3 Jerry
4 Jerry
5 Jerry
I would like to get the max price for each user, and I am now faced with 2 choices:
Use Max or using Inner Join suggested in the following post:Getting max value from rows and joining to another table
Which method is more efficient?
The answer to your question is to try both methods, and see which performs faster on your data in your environment. Unless you have a large amount of data, the difference is probably not important.
In this case, the traditional method of group by is probably better:
select u.user, max(p.price)
from table1 u join
table2 p
on u.uniqueid = p.uniqueid
group by u.user;
For such a query, you want an index on table2(uniqueid, price), and perhaps on table1(uniqueid, user) as well. This depends on the database engine.
Instead of a join, I would suggest not exists:
select u.user, p.price
from table1 u join
table2 p
on u.uniqueid = p.uniqueid
where not exists (select 1
from table1 u2 join
table2 p2
on u2.uniqueid = p2.uniqueid
where p2.price > p.price
);
Do note that these do not do exactly the same things. The first will return one row per user, no matter what. This version can return multiple rows, if there are multiple rows with the same price. On the other hand, it can return other columns from the rows with the maximum price, which is convenient.
Because your data structure requires a join in the subquery, I think you should stick with the group by approach.

SQL query on two tables - return rows in one table that don't have entries in the other

I have two database tables, Categories and SuperCategories, for an inventory control system I'm working on:
Categories: ID_Category, CategoryName
SuperCategories: ID_SuperCategory, CategoryID, SuperCategoryID
I'm putting category-subcategory relationships into the SuperCategories table. I'm putting all categories into the Categories table.
Here is an example:
Categories:
ID_Category CategoryName
1 Box
2 Red Box
3 Blue Box
4 Blue Plastic Box
5 Can
6 Tin Can
SuperCategories:
ID_Super CategoryID SuperCategoryID
1 2 1
2 3 1
3 4 3
4 6 5
CategoryID and SuperCategoryID relate back to the primary key ID_Category in the Categories table.
What I would like is a query that returns all of the category names that are not parents of any other categories:
Red Box
Blue Plastic Box
Tin Can
This amounts to finding all values of ID_Category that do not show up in the SuperCategoryID column (2, 4, and 6), but I'm having trouble writing the SQL.
I'm using VB6 to query an Access 2000 database.
Any help is appreciated. Thanks!
EDIT: I voted up everyone's answer that gave me something that worked. I accepted the answer that I felt was the most instructive. Thanks again for your help!
Mike Pone's answer works, because he joins the "Categories" table with the "SuperCategories" table as a "LEFT OUTER JOIN" - this will take all entries from "Categories" and add columns from "SuperCategories" to those where the link exists - where it does not exist (e.g. where there is no entry in "SuperCategories"), you'll get NULLs for the SuperCategories columns - and that's exactly what Mike's query then checks for.
If you would write the query like so:
SELECT c.CategoryName, s.ID_Super
FROM Categories c
LEFT OUTER JOIN SuperCategories s ON c.ID_Category = s.SuperCategoryID
you would get something like this:
CategoryName ID_Super
Box 1
Box 2
Red Box NULL
Blue Box 3
Blue Plastic Box NULL
Can 4
Tin Can NULL
So this basically gives you your answer - all the rows where the ID_Super on the LEFT OUTER JOIN is NULL are those who don't have any entries in the SuperCategories table. All clear? :-)
Marc
SELECT
CAT.ID_Category,
CAT.CategoryName
FROM
Categories CAT
WHERE
NOT EXISTS
(
SELECT
*
FROM
SuperCategories SC
WHERE
SC.SuperCategoryID = CAT.ID_Category
)
Or
SELECT
CAT.ID_Category,
CAT.CategoryName
FROM
Categories CAT
LEFT OUTER JOIN SuperCategories SC ON
SC.SuperCategoryID = CAT.ID_Category
WHERE
SC.ID_Super IS NULL
I'll also make the suggestion that your naming standards could probably use some work. They seem all over the place and difficult to work with.
include only those categories that don't are not super cateogories. A simple outer join
select CategoryName from Categories LEFT OUTER JOIN
SuperCategories ON Categories.ID_Category =SuperCategories.SuperCategoryID
WHERE SuperCategories.SuperCategoryID is null
Not sure if the syntax will work for Access, but something like this would work:
select CategoryName from Categories
where ID_Category not in (
select SuperCategoryID
from SuperCategories
)
I always take the outer join approach as marc_s suggests. There is a lot of power when using OUTER JOINS. Often times I'll have to do a FULL OUTER JOIN to check data on both sides of the query.
You should also look at the ISNULL function, if you are doing a query where data can be in either table A or table B then I will use the ISNULL function to return a value from either column.
Here's an example
SELECT
isNull(a.[date_time],b.[date_time]) as [Time Stamp]
,isnull(a.[ip],b[ip]) as [Device Address]
,isnull(a.[total_messages],0) as [Local Messages]
,isnull(b.[total_messages],0) as [Remote Messages]
FROM [Local_FW_Logs] a
FULL OUTER JOIN [Remote_FW_Logs] b
on b.ip = a.ip
I have two tables interface_category and interface_subcategory.
Interface_subcategory contains SubcategoryID, CategoryID, Name(SubcategoryName)
Interface_category contains CategoryID, Name(CategoryName)
Now I want output CategoryID and Name(Subcategory name)
Query I written is below and its work for me
select ic.CategoryID, ic.Name CategoryName, ISC.SubCategoryID, ISC.Name SubCategoryName from Interface_Category IC
inner join Interface_SubCategory ISC
on ISC.CategoryID = ic.CategoryID
order by ic.CategoryID, isc.SubCategoryID