I have three tables: customers, orders and refunds. Some customers (not all) placed orders and for some orders (not all) there were refunds.
When I join the three tables like this (the details are not that important):
SELECT ...
FROM customers LEFT JOIN orders
ON customers.customer_id=orders.customer_id
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
//WHERE order_id IS NOT NULL;// uncomment to filter out customers that have no orders
I get a big table in which all customers are listed (even the ones that have not placed any orders and they have NULL in the 'order_id' column), with all their orders and the orders' refunds (even if not all orders have refunds):
NAME ORDER_ID ORDER AMOUNT REFUND
------------------------------------------------------------
Natalie 2 12.50 NULL
Natalie 3 18.00 18.00
Brenda 4 20.00 NULL
Adam NULL NULL NULL
Since I only want to see only customers that have placed orders, i.e in this case I want to filter Adam from the table, I uncomment the 'WHERE' row from the SQL query above.
This yields the desired result.
My question is:
On which table is the WHERE executed - on the original 'orders' table (which has no order_id that is NULL) or on the table that is result of the JOINs?
Apparently it is the latter, but just want to make sure, since it is not very obvious from the SQL syntax and it is a very important point.
Thank you
In this case, you're making SQL work harder than it has to. It is operating on the results (likely a MERGE event, or something along those lines).
There's a chance SQL is realizing what you're doing and optimizing the plan and changing to an INNER JOIN for you. But I can't be certain (and neither can SQL -- it can change how it optimizes over time).
In the case where you only want where an order is there, use an INNER JOIN instead. SQL will be much more efficient at this.
SELECT ...
FROM customers
INNER JOIN orders
ON customers.customer_id=orders.customer_id
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
You can change the LEFT JOIN as INNER JOIN to eliminate customers which don't have any order
SELECT ...
FROM customers INNER JOIN orders
ON customers.customer_id=orders.customer_id
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
It's because you're using LEFT JOIN, which will return all rows from the left hand table, in your case this is the Customer Table, and return NULL where no corresponding values appear in the right hand tables.
Just rewrite it using inner joins, so only rows where matching data is found will be returned.
SELECT ...
FROM customers
INNER JOIN orders
ON customers.customer_id=orders.customer_id
INNER JOIN refunds
ON orders.order_id=refunds.order_id;
Related
I have 2 tables-one customers, one transactions. One customer does not have any transactions. How do I handle that? As I'm trying to join my tables, the customer with no transaction does not show up as shown in code below.
SELECT Orders.Customer_Id, Customers.AcctOpenDate, Customers.CustomerFirstName, Customers.CustomerLastName, Orders.TxnDate, Orders.Amount
FROM Orders
INNER JOIN Customers ON Orders.Customer_Id=Customers.Customer_Id;
I need to be able to account for the customer with no transaction such as querying for least transaction amount.
Use below updated query - Right Outer join is used instead of Inner join to show all customers regardless of the customer placed an order yet.
SELECT Orders.Customer_Id, Customers.AcctOpenDate,
Customers.CustomerFirstName, Customers.CustomerLastName,
Orders.TxnDate, Orders.Amount
FROM Orders
Right Outer JOIN Customers ON Orders.Customer_Id=Customers.Customer_Id;
INNER Joins show only those records that are present in BOTH tables
OUTER joins gets SQL to list all the records present in the designated table and shows NULLs for the fields in the other table that are not present
LEFT OUTER JOIN (the first table)
RIGHT OUTER JOIN (the second table)
FULL OUTER JOIN (all records for both tables)
Get up to speed on the join types and how to handle NULLS and that is 90% of writing SQL script.
Below is the same query with a left join and using ISNULL to turn the amount column into 0 if it has no records present
SELECT Orders.Customer_Id, Customers.AcctOpenDate, Customers.CustomerFirstName, Customers.CustomerLastName
, Orders.TxnDate, ISNULL(Orders.Amount,0)
FROM Customers
LEFT OUTER JOIN Orders ON Orders.Customer_Id=Customers.Customer_Id;
try this :
SELECT Orders.Customer_Id, Customers.AcctOpenDate, Customers.CustomerFirstName, Customers.CustomerLastName, Orders.TxnDate, Orders.Amount
FROM Orders
Right OUTER JOIN Customers ON Orders.Customer_Id=Customers.Customer_Id;
I strongly recommend LEFT JOIN. This keeps all rows in the first table, along with matching columns in the second. If there are no matching rows, these columns are NULL:
SELECT c.Customer_Id, c.AcctOpenDate, c.CustomerFirstName, c.CustomerLastName,
o.TxnDate, o.Amount
FROM Customers c LEFT JOIN
Orders o
ON o.Customer_Id = c.Customer_Id;
Although you could use RIGHT JOIN, I never use RIGHT JOINs, because I find them much harder to follow. The logic of "keep all rows in the first table I read" is relatively simple. The logic of "I don't know which rows I'm keeping until I read the last table" is harder to follow.
Also note that I included table aliases and change the CustomerId to come from customers -- the table where you are keeping all rows.
Using CASE will replace "null" with 0 then you can sum the values. This will count customers with no transactions.
SELECT c.Name,
SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END) as TransactionsPerCustomer
FROM Customers c
LEFT JOIN Transactions t
ON c.Name = t.customerID
group by c.Name
SELECT c.Name,
SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END) as numberoftransaction
FROM customers c
LEFT JOIN transactions t
ON c.Name = t.customerID
group by c.Name
I have two tables, CustomerCost and Products that look like the following:
I am joining the two tables using the following SQL query:
SELECT custCost.ProductId,
custCost.CustomerCost
FROM CUSTOMERCOST Cost
LEFT JOIN PRODUCTS prod ON Cost.productId =prod.productId
WHERE prod.productId=4
AND (Cost.Customer_Id =2717
OR Cost.Customer_Id IS NULL)
The result of the join is:
joins result
What i want to do is when I pass customerId 2717 it should return only specific customer cost i.e. 258.93, and when customerId does not match then only it should take cost as 312.50
What am I doing wrong here?
You can get your expected output as follows:
SELECT Cost.ProductId,
Cost.CustomerCost
FROM CUSTOMERCOST Cost
INNER JOIN PRODUCTS prod ON Cost.productId = prod.productId
WHERE prod.productId=4
AND Cost.Customer_Id = 2717
However, if you want to allow customer ID to be passed as NULL, you will have to change the last line to AND Cost.Customer_Id IS NULL. To do so dynamically, you'll need to use variables and generate the query based on the input.
The problem in the original query that you have posted is that you have used an alias called custCost which is not present in the query.
EDIT: Actually, you don't even need a join. The CUSTOMERCOST table seems to have both Customer and Product IDs.
You can simply:
SELECT
Cost.ProductId, Cost.CustomerCost
FROM
CUSTOMERCOST Cost
WHERE
Cost.Customer_Id = 2717
AND Cost.productId = 4
You seem to want:
SELECT c.*
FROM CUSTOMERCOST c
WHERE c.productId = 4 AND c.Customer_Id = 2717
UNION ALL
SELECT c.*
FROM CUSTOMERCOST c
WHERE c.productId = 4 AND c.Customer_Id IS NULL AND
NOT EXISTS (SELECT 1 FROM CUSTOMERCOST c2 WHERE c2.productId = 4 AND c2.Customer_Id = 2717);
That is, take the matching cost, if it exists for the customer. Otherwise, take the default cost.
SELECT custCost.ProductId,
custCost.CustomerCost
FROM CUSTOMERCOST Cost
LEFT JOIN PRODUCTS prod
ON Cost.productId =prod.productId
AND (Cost.Customer_Id =2717 OR Cost.Customer_Id IS NULL)
WHERE prod.productId=4
WHERE applies to the joined row. ON controls the join condition.
Outer joins are why FROM and ON were added to SQL-92. The old SQL-89
syntax had no support for them, and different vendors added different,
incompatible syntax to support them.
Suppose I have two tables.. One is customers and the other is orders. Orders has a foreign key that joins to the customers table. How should I go about returning data from both tables:
Filtered on a field in the orders table and,
Filtered on a field in the customers table?
Is using WHERE to filter after the JOIN in my SELECT statement the correct way to go, or putting in an AND within the JOIN statement? And would I have to use one method for one of the above situations and the other for the other one?
For example,
SELECT customers.customer_type, orders.grant_date
FROM orders
JOIN customers ON customers.customer_id = orders.customer_id
WHERE orders.order_id = 3;
or
SELECT customers.customer_type, orders.grant_date
FROM orders
JOIN customers ON customers.customer_id = orders.customer_id
AND orders.order_id = 3;
I guess I can summarize my questions as:
a. Which table should I pair with my FROM statement? Should it be the one which has the foreign key i.e. orders? Or does it depend on the situation?
b. How should I filter the data? With a WHERE or an AND with the JOIN? And how is one different from the other i.e. when should I use one over the other in my two situations?
It doesn't matter whether you do
FROM orders
JOIN customers ON customers.customer_id = orders.customer_id
or
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
Which table you make the first table is up to you. Here I would make orders the first table, because it is orders along with their customer information you are showing in your results.
With an inner join it doesn't make a difference either, whether you put criteria in your WHERE clause or ON clause.
However, it looks strange to join customers on a condition on orders:
JOIN customers ON customers.customer_id = orders.customer_id AND orders.order_id = 3
This is not how an ON clause is supposed to work. So either:
FROM orders
JOIN customers ON customers.customer_id = orders.customer_id
WHERE orders.order_id = 3
or
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
WHERE orders.order_id = 3
or
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id AND orders.order_id = 3
Many people prefer the last query over the second last, because you can easily convert the inner join into an outer join. So the general advice is: put criteria on the first table in WHERE and the criteria on other tables in ON. Make this a rule of thumb.
I my experience with SQL, when using an inner join (aka. join ), order of tables after the FROM clause has never made a difference in results. Also, it has not made any difference if I used a 'where' clause or an 'and' clause after the join. But this behavior is limited ONLY and ONLY to Inner joins. If you are using left or right joins, the order as well as Where/And clauses make a lot of difference in the results returned.
I'd like to add that if you have a lot of nulls in your keys(which should never be the case), that might affect the result set in inner joins when switching between Where and And or changing the sequence of tables after from.
I have a select statement with 5 ID columns. I need to lookup and select the corresponding customer names from a Customer master table that stores Ids/names and come up with a Customer report. The tables columns are as below:
origCustomerID,Tier1PartnerID,Tier2PartnerID,DistributorId,EndCustomerID,productId,OrderTotal,OrderDate
The first 5 columns are ID columns that match CustID column in the Customers table. Note that NOT all of these columns will contain a value for a given record at all times, i.e. they could be null at times. Given the current constraints in hiveQL, I can only think of the following way, but this takes up a lot of time and is not the best possible way. Could you please suggest any improvements to this?
Select origCustomerID,a.name,Tier1PartnerID,b.name,Tier2PartnerID,
c.name,DistributorId,d.name,EndCustomerID,e.name,productId,OrderTotal,OrderDate
From Orders O
LEFT OUTER JOIN customers a on o.origCustomerID = a.custid
LEFT OUTER JOIN customers b on o.Tier1PartnerID = a.custid
LEFT OUTER JOIN customers c on o.Tier2PartnerID = a.custid
LEFT OUTER JOIN customers d on o.DistributorId = a.custid
LEFT OUTER JOIN customers e on o.EndCustomerID = a.custid
If the id values are always either customer ids or NULL (i.e. in the case they are not NULL you are sure they are customer ids and not something else) and each record in the Orders table matches at most one customer (i.e. every record has at most one id in those five columns; or possible the same id several times), you could perhaps use COALESCE in your matching expression.
I can't test this at the moment, but this should join the records using the first non-NULL id from the Orders table.
SELECT [stuff]
FROM Orders O
LEFT OUTER JOIN customers a
ON COALESCE(o.origCustomerID,
o.Tier1PartnerID,
o.Tier2PartnerID,
o.DistributorId,
o.EndCustomerID) = a.custid
Hope that helps.
I'm a bit of a beginner with SQL so apologies if this seems trivial/basic. I'm struggling to get my head around it...
I am trying to generate results that show all customers that are in the customer table that have never placed an order and will therefore have no entry on the invoice table.
In other words, I want to select all customers from the customer table where there is no entry for their customer number in the invoice table.
Many thanks,
Mike
SELECT *
FROM customer c
WHERE NOT EXISTS (
SELECT 1
FROM invoice i
WHERE i.customerid = c.customerid
)
I would suggest you also read Oracle's documentation on different types of table joins here.
if customer_id is the collumn that identify the customer you should do something like this...
select * from Customer
where customer_id not in (select customer_id from invoice)
If you want to return all customer rows, then you will want to use a LEFT JOIN
select *
from customer c
left join invoices i
on c.customerid = i.customerid
where i.customerid is null
See SQL Fiddle with Demo
If you need help learning JOIN syntax, then here is a great visual explanation of joins.
A LEFT JOIN will return all rows from the customer table even if there is not a matching row in the invoices table. If you wanted to return only the rows that matched in both tables, then you would use an INNER JOIN. By adding the where i.customerid is null to the query it will return only those rows with no match in invoices.