Best practice for joining tables with case - sql

I need to join two different tables in two different situations. In this scenario I have to solve my problem with these two solutions;
1) I will create a statement(S.ShipmentType) to use in join like this:
SELECT S.*, P.*
,CASE
WHEN S.ShipmentType = 'import' THEN SP.SupplierName
WHEN S.ShipmentType = 'export' THEN C.CustomerName
END AS ShipmentDesination
FROM tblShippments S
INNER JOIN tblProducts P ON S.productId = P.productID
LEFT OUTER JOIN tblCustomers C ON S.companyId = C.customerId AND S.ShipmentType = 'export'
LEFT OUTER JOIN tblSuppliers SP ON S.companyId = SP.supplierId AND S.ShipmentType = 'import'
2) Or I will create two different shipments tables for imports and exports, then use two different queries to get data.
Which one is the best practice?

I think I would do this with two separate queries, but I don't hate this solution either. I think it's both feasible.
Btw, I think you should be able to solve this without the CASE WHEN, assuming that CompanyId is unique over both tblCustomers and tblSuppliers. In that case you could simply do:
COALESCE(SP.SupplierName, C.CustomerName)

Related

Is it true that all joins following a left join in a SQL query must also be left joins? Why or why not?

I remember this rule of thumb from back in college that if you put a left join in a SQL query, then all subsequent joins in that query must also be left joins instead of inner joins, or else you'll get unexpected results. But I don't remember what those results are, so I'm wondering if maybe I'm misremembering something. Anyone able to back me up on this or refute it? Thanks! :)
For instance:
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id -- this inner join might be bad mojo
Not that they have to be. They should be (or perhaps a full join at the end). It is a safer way to write queries and express logic.
Your query is:
select *
from customer c left join
ledger l
on c.id = l.customerid inner join
order o
on l.orderid = o.id
The left join says "keep all customers, even if there is no matching record in ledger. The second says, "I have to have a matching ledger record". So, the inner join converts the first to an inner join.
Because you presumably want all customers, regardless of whether there is a match in the other two tables, you would use a left join:
select *
from customer c left join
ledger l
on c.id = l.customerid left join
order o
on l.orderid = o.id
You remember correctly some parts of it!
The thing is, when you chain join tables like this
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id
The JOIN is executed sequentialy, so when customer left join ledger happens, you are making sure all joined keys from customer return (because it's a left join! and you placed customers to the left).
Next,
The results of the former JOIN are joined with order (using inner join), forcing the "the first join keys" to match (1 to 1) with the keys from order so you will end up only with records that were matched in order table as well
Bad mojo? it really depends on what you are trying to accomplish.
If you want to guarantee all records from customers return, you should keep "left joining" to it.
You can, however, make this a little more intuitive to understand (not necessarily a better way of writing SQL!) by writing:
SELECT * FROM
(
(SELECT * from customer) c
LEFT JOIN
(SELECT * from ledger) l
ON
c.id= l.customerid
) c_and_l
INNER JOIN (OR PERHAPS LEFT JOIN)
(SELECT * FROM order) as o
ON c_and_l.orderid (better use c_and_l.id as you want to refer to customerid from customers table) = o.id
So now you understand that c_and_l is created first, and then joined to order (you can imagine it as 2 tables are joining again)

To create a view using 5 tables with all columns of all tables

I have to create a view that joins together all of the columns in the CUSTOMERS, ORDERS, ORDERDETAILS, EMPLOYEES, PAYMENTS and PRODUCTS tables.
the schema for the table is below
I tried the following query, though I am at a loss how to solve the above question :
create view orders_view AS
select *
from sys.customers c
left JOIN EMPLOYEES e
on c.SALESREPEMPLOYEENUMBER = e.EMPLOYEENUMBER
left join sys.orders o
on c.CUSTOMER NUMBER = o.CUSTOMERNUMBER
left join sys.orderdetails od
on o.ORDERNUMBER = od.ORDERNUMBER
left join sys.products p
on od.PRODUCTCODE = p.PRODUCTCODE
left join sys.PAYMENTS py
on c.CUSTOMERNUMBER = py.customernumber
I am a newbie with SQL and databases, so any help is appreciated.
Here are some observations on things going wrong:
You have curly braces that are not necessary (perhaps this is a typo in the question).
You are selecting *, but have duplicate column names (such as productcode), which prevents the view from being created. Best practice: list all the columns explicitly in the view.
You have c.CUSTOMER NUMBER = o.CUSTOMERNUMBER. The space is probably a typo. If not, change the name so the space is not part of the name. Best practice: Use identifiers that do not have to be escaped.
I am not aware of a sys.customers table. The sys schema should only be used for internal Oracle objects. (Here is one source.)
thank you for all the inputs. they helped me a lot in figuring out the answer.
Following is the query for the question asked above.it gave me a view with columns from all tables.
create or replace view overall AS
select c.*,
e.LASTNAME,
e.FIRSTNAME,
e.EXTENSION,
e.EMAIL,
e.OFFICECODE,
e.REPORTSTO,
e.JOBTITLE,
o.ORDERNUMBER,
o.ORDERDATE,
o.REQUIREDDATE,
o.SHIPPEDDATE,
o.STATUS,
o.COMMENTS,
od.PRODUCTCODE,
od.QUANTITYORDERED,
od.PRICEEACH,
od.ORDERLINENUMBER,
p.PRODUCTNAME,
p.PRODUCTLINE,
p.PRODUCTSCALE,
p.PRODUCTVENDOR,
p.PRODUCTDESCRIPTION,
p.QUANTITYINSTOCK,
p.BUYPRICE,
p.MSRP,
py.CHECKNUMBER,
py.PAYMENTDATE,
py.AMOUNT
from sys.customers c
left JOIN EMPLOYEES e
on c.SALESREPEMPLOYEENUMBER = e.EMPLOYEENUMBER
left join sys.orders o
on c.CUSTOMERNUMBER = o.CUSTOMERNUMBER
left join sys.orderdetails od
on o.ORDERNUMBER = od.ORDERNUMBER
left join sys.products p
on od.PRODUCTCODE = p.PRODUCTCODE
left join sys.PAYMENTS py
on c.CUSTOMERNUMBER = py.customernumber
;
Your query looks alright, but I don't think it will let you create a view if more than one column has the same name.
Since there are duplicates, e.g. CITY, I think the only way around it is to name all the columns and give unique names for duplicate columns.

SQL Perfomance: Which its better WHERE clause or JOIN condition ON [duplicate]

Is there any difference (performance, best-practice, etc...) between putting a condition in the JOIN clause vs. the WHERE clause?
For example...
-- Condition in JOIN
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND CUS.FirstName = 'John'
-- Condition in WHERE
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE CUS.FirstName = 'John'
Which do you prefer (and perhaps why)?
The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
For inner joins I have not really noticed a difference (but as with all performance tuning, you need to check against your database under your conditions).
However where you put the condition makes a huge difference if you are using left or right joins. For instance consider these two queries:
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
The first will give you only those records that have an order dated later than May 15, 2009 thus converting the left join to an inner join.
The second will give those records plus any customers with no orders. The results set is very different depending on where you put the condition. (Select * is for example purposes only, of course you should not use this in production code.)
The exception to this is when you want to see only the records in one table but not the other. Then you use the where clause for the condition not the join.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderID is null
Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.
I prefer to keep join conditions separate from query restriction conditions.
If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.
WHERE will filter after the JOIN has occurred.
Filter on the JOIN to prevent rows from being added during the JOIN process.
I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.
It feels syntactically cleaner.
I typically see performance increases when filtering on the join. Especially if you can join on indexed columns for both tables. You should be able to cut down on logical reads with most queries doing this too, which is, in a high volume environment, a much better performance indicator than execution time.
I'm always mildly amused when someone shows their SQL benchmarking and they've executed both versions of a sproc 50,000 times at midnight on the dev server and compare the average times.
Agree with 2nd most vote answer that it will make big difference when using LEFT JOIN or RIGHT JOIN. Actually, the two statements below are equivalent. So you can see that AND clause is doing a filter before JOIN while the WHERE clause is doing a filter after JOIN.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN (SELECT * FROM dbo.Orders WHERE OrderDate >'20090515') AS ORD
ON CUS.CustomerID = ORD.CustomerID
Joins are quicker in my opinion when you have a larger table. It really isn't that much of a difference though especially if you are dealing with a rather smaller table. When I first learned about joins, i was told that conditions in joins are just like where clause conditions and that i could use them interchangeably if the where clause was specific about which table to do the condition on.
Putting the condition in the join seems "semantically wrong" to me, as that's not what JOINs are "for". But that's very qualitative.
Additional problem: if you decide to switch from an inner join to, say, a right join, having the condition be inside the JOIN could lead to unexpected results.
It is better to add the condition in the Join. Performance is more important than readability. For large datasets, it matters.

How to check if there are multiple values of the same attribute in the same object, SQL Server

I have three tables: Products, PRO_FAB and Fabrics. Products table has Product_ID and Name, PRO_FAB Has PRO_FAB_ID, Product_ID and Fabric_ID, Fabrics have Fabric_ID and Name. They are related like this:
Products.Product_ID --- PRO_FAB.Product_ID <-> PRO_FAB.Product_ID --- Fabrics.Fabric_ID.
So it's basically a m:n relationship. I'm trying to make a query that will list a Product that is made out of all fabrics in the Fabrics table. I've tried using HAVING, CONTAINS and other methods, but none seems to work, or at least I don't know how to use them. I appreciate your help.
DECLARE #TotalFabrics int
SELECT #TotalFabrics = count(*) FROM dbo.Fabrics
SELECT p.Name
FROM Products p
INNER JOIN PRO_FAB pf ON pf.Product_ID = p.Product_ID
INNER JOIN Fabrics f ON f.Fabric_ID = pf.Fabric_ID
GROUP BY p.Name
HAVING COUNT(f.Fabric_ID) = #TotalFabrics
This should work if you're looking for products that use ALL fabrics. Would be interested to see what other approaches people will come up with.
You can also replace #TotalFabrics with a subquery if needed, but I think this is neater.
As Dmitri correctly pointed out, the following query gets products with ANY fabrics, not ALL fabrics (I read the question too fast and didn't take the time to understand it).
SELECT Fabrics.Name as FabricName, Products.Name as ProductName
FROM Fabrics
INNER JOIN PRO_FAB pf on Fabrics.FabricID = pf.FabricID
INNER JOIN Products on Products.ProductID = pf.ProductID
ORDER BY FabricName, ProductName
Dmitri's solution is elegant and easy to understand (and I'm sure it performs better, so I've given it an up vote), but I thought of another solution that will work (I'm sure there are many more)
;with cte as
(
SELECT Products.Product_ID, Fabrics.Fabric_ID
FROM Fabrics CROSS JOIN Products
)
Select Products.Name
FROM Products
WHERE Product_ID
NOT IN(SELECT cte.Product_ID
FROM cte
LEFT OUTER JOIN PRO_FAB
ON cte.Fabric_ID = PRO_FAB.Fabric_ID
AND PRO_FAB.Product_ID = cte.Product_ID
WHERE PRO_FAB_ID IS NULL)
If your version of SQL doesn't support common table expressions, then you can use an inline subquery.

Condition within JOIN or WHERE

Is there any difference (performance, best-practice, etc...) between putting a condition in the JOIN clause vs. the WHERE clause?
For example...
-- Condition in JOIN
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND CUS.FirstName = 'John'
-- Condition in WHERE
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE CUS.FirstName = 'John'
Which do you prefer (and perhaps why)?
The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.
For inner joins I have not really noticed a difference (but as with all performance tuning, you need to check against your database under your conditions).
However where you put the condition makes a huge difference if you are using left or right joins. For instance consider these two queries:
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
The first will give you only those records that have an order dated later than May 15, 2009 thus converting the left join to an inner join.
The second will give those records plus any customers with no orders. The results set is very different depending on where you put the condition. (Select * is for example purposes only, of course you should not use this in production code.)
The exception to this is when you want to see only the records in one table but not the other. Then you use the where clause for the condition not the join.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderID is null
Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.
I prefer to keep join conditions separate from query restriction conditions.
If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.
WHERE will filter after the JOIN has occurred.
Filter on the JOIN to prevent rows from being added during the JOIN process.
I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.
It feels syntactically cleaner.
I typically see performance increases when filtering on the join. Especially if you can join on indexed columns for both tables. You should be able to cut down on logical reads with most queries doing this too, which is, in a high volume environment, a much better performance indicator than execution time.
I'm always mildly amused when someone shows their SQL benchmarking and they've executed both versions of a sproc 50,000 times at midnight on the dev server and compare the average times.
Agree with 2nd most vote answer that it will make big difference when using LEFT JOIN or RIGHT JOIN. Actually, the two statements below are equivalent. So you can see that AND clause is doing a filter before JOIN while the WHERE clause is doing a filter after JOIN.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN (SELECT * FROM dbo.Orders WHERE OrderDate >'20090515') AS ORD
ON CUS.CustomerID = ORD.CustomerID
Joins are quicker in my opinion when you have a larger table. It really isn't that much of a difference though especially if you are dealing with a rather smaller table. When I first learned about joins, i was told that conditions in joins are just like where clause conditions and that i could use them interchangeably if the where clause was specific about which table to do the condition on.
Putting the condition in the join seems "semantically wrong" to me, as that's not what JOINs are "for". But that's very qualitative.
Additional problem: if you decide to switch from an inner join to, say, a right join, having the condition be inside the JOIN could lead to unexpected results.
It is better to add the condition in the Join. Performance is more important than readability. For large datasets, it matters.