SQL JOIN get all records that do not match certain criteria - sql

I have two tables
Order and Invoice.
Order can have multiple invoices. Each invoice record has a state - paid or unpaid.
Order Invoice
O-123. i1 (paid)
O-123. i2 (unpaid)
O-123. i3(unpaid)
O-456 i4(paid)
O-456 i4(paid)
O-678. i5 (paid)
O-678 i6 (paid)
I need to get a list of all order which have no unpaid invoice. In this case it should return o456 and o678.
Sample query
select * from core.order as o
inner join
invoices as inv
on o.id = inv.order_id
where inv.status is paid

You can use not exists for that. (Assumed datatype of status column as varchar)
select * from core.order as o
where not exists
(
select 1 from invoices as inv where status='unpaid' and o.id=inv.order_id
)

One canonical approach uses aggregation:
SELECT o.id
FROM core.order o
LEFT JOIN invoices inv
ON inv.order_id = o.id
GROUP BY o.id
HAVING COUNT(CASE WHEN inv.status = 'unpaid' THEN 1 END) = 0;

One method is using not exists
select *
from core.order o
where not exists (
select 1
from invoices as inv
where o.id = inv.order_id and inv.status is 'unpaid'
)

Related

Return Records Based on Condition Comparing Aggregates of Two Child Tables

I have an invoices table that has many payments and has many invoice line items. The total of an invoice record is derived through invoice line items (quantity * price).
I'm trying to create a query that will return invoices with an outstanding balance.
SELECT inv.id, MAX(inv.invoice_total) as "InvoiceTotal", SUM(pt.amount) AS "TotalPayments" FROM (
SELECT
i.id,
SUM( ili.price * ili.quantity ) as "invoice_total"
FROM
invoices i
JOIN invoice_line_items ili
on i.id = ili.invoice_id
GROUP BY i.id
) inv
LEFT JOIN payment_transactions pt
ON pt.invoice_id = inv.id
GROUP BY inv.id
ORDER BY inv.id DESC
The last piece, I figured, was to add HAVING that returns only records where Total Payments is less than the Invoice Total, but it's not working.
How am I able to achieve this? Is there a simpler approach than mine?
This should work:
SELECT inv.id, MAX(inv.invoice_total) as InvoiceTotal, SUM(pt.amount) AS "TotalPayments"
FROM (SELECT i.id, SUM( ili.price * ili.quantity ) as invoice_total
FROM invoices i JOIN
invoice_line_items ili
ON i.id = ili.invoice_id
GROUP BY i.id
) inv LEFT JOIN
payment_transactions pt
ON pt.invoice_id = inv.id
GROUP BY inv.id
HAVING COALESCE(SUM(pt.amount), 0) < MAX(inv.invoice_total)
ORDER BY inv.id DESC;

Only return rows after the exists condtion

I have an SQL server Query
This returns orders from customers of products that are related / added to their OrderID (Final Invoice)
This uses an exists condition
Select * from Orders o1
where DepartmentSpecialty = 'LivingRoom'
and Exists (SELECT o2.Department FROM Orders o2 WHERE o2.Department = 'Kitchen'
and o1.ID = o2.ID
and o1.OrderID = o2.OrderID
)
I only wish to bring back rows for the order dates AFTER they have ordered from the Kitchen Department in relation to their OrderID. This whom ordered from the Living Room department.
Any ideas team that I can amend the SQL to do this please
You can do this without a join or subquery, by using window functions:
select o.*
from (select o.*,
min(case when o.Department = 'Kitchen'
then date
end) over (partition by id, orderid) as kitchen_date
from orders o
) o
where o.DepartmentSpecialty = 'LivingRoom' and
o.date > o.kitchen_date;
I suspect that your original join conditions (and hence the partition by columns) are too restrictive. I would expect these to be some sort of customer id.
something a bit like this perhaps
Select * from Orders o1
where DepartmentSpecialty = 'LivingRoom'
and o1.orderdate >= (SELECT MIN(o2.orderdate) FROM Orders o2 WHERE o2.Department = 'Kitchen'
and o1.ID = o2.ID
and o1.OrderID = o2.OrderID)
This should do the trick
SELECT o1.*
FROM Orders AS o1
INNER JOIN Orders AS o2
ON o1.OrderId = o2.OrderId
AND o1.[SomeDateField] > o2.[SomeDateField] -- You will need to specify a date field
WHERE o1.DepartmentSpecialty = 'LivingRoom' -- Department Your interested in
AND o2.DepartmentSpeciality = 'Kitchen' -- Inital Department

Join SQL Statements optimization

I have 2 Tables:
Customer
ID
Customer_ID
Name
Sir_Name
Phone
Email
and
Table Invoice
Manager_Name
Manaer_First_Name
Customer_ID1
Customer_ID2
Customer_ID3
There is only one Customer.Customer_ID for each Customer or a Customer has no Customer_ID
In Invoice.Customer_ID1 i have the same Customer_ID.Customer_ID several times.
I Like to get all Records in Customer Table Join Invoice Table - check if the Customer_ID = Customer_ID1 if not check in Customer_ID = Customer_ID2 Or Customer_ID = Customer_ID2
If customer_ID is found in one of rows stop the search.
Probably the best way to write the query is:
select . . .
from customer c join
invoice i
on c.customer_id = coalesce(i.customer_id1, i.customer_id2, i.customer_id3);
This should be able to take advantage of an index on customer(customer_id). If this is not efficient, then another alternative is left join:
select . . ., coalesce(c1.col1, c2.col1, c3.col1) as col1, . . .
from invoice i left join
customer c1
on c1.customer_id = i.customer_id1 left join
customer c2
on c2.customer_id = i.customer_id2 left join
customer c3
on c3.customer_id = i.customer_id3;
The left join can take advantage of an index on customer(customer_id). You need to use coalesce() in the select to choose the field from the right table.
select
*
from [Table Invoice] A
JOIN [Customer] B
ON B.Customer_ID = A.Customer_ID1 OR (B.Customer_ID <> A.Customer_ID1 AND B.Customer_ID = A.Customer_ID2) OR (B.Customer_ID = A.Customer_ID3 AND B.Customer_ID <> A.Customer_ID2 AND B.Customer_ID <> A.Customer_ID1)
this would return you all the Invoices for all of the Customers. In case you need Invoices just for one customer - add
WHERE B.Customer_ID = #YourCustomerID
statement. If you need only one, first invoice, add 'TOP 1' to select statement:
SELECT TOP 1
Could a inner join on or clause
select Customer.*, Invocie.*
from Customer
inner join Invoice on ( Customer.Customer_ID = Invoce.Customer_ID1
OR Customer.Customer_ID = Invoce.Customer_ID2
OR Customer.Customer_ID = Invoce.Customer_ID3)
This is how I understand your request: You want all customers that have at least one entry in the invoice table. But per customer you want the "best" invoice record only; with ID1 match better than ID2 match and ID2 match better than ID3 match.
So join the tables to get all matches and then rank your matches with row_number giving the best matching record #1. Then only keep those rows ranked #1.
select *
from
(
select
c.*,
i.*,
row_number() over
(
partition by c.customer_id order by
case c.customer_id
when i.customer_id1 then 1
when i.customer_id2 then 2
when i.customer_id3 then 3
end
) as rn
from customer c
join invoice i on c.customer_id in (i.customer_id1, i.customer_id2, i.customer_id3)
)
where rn = 1;

Trying to Optimize PostgreSQL Nested WHERE IN

I have a Postgres (9.1) customer database similar to:
customers.id
customers.lastname
customers.firstname
invoices.id
invoices.customerid
invoices.total
invoicelines.id
invoicelines.invoiceid
invoicelines.itemcode
invoicelines.price
I built a search which lists all customers who have purchased a certain item (say 'abc').
Select * from customers WHERE customers.id IN
(Select invoices.customerid FROM invoices WHERE invoices.id IN
(Select invoicelines.invoiceid FROM invoicelines WHERE
invoicelines.itemcode = 'abc')
)
The search works fine and brings up the correct customers but takes about 10 seconds or so on a database of 2 million invoices and 2 million line items.
I was wondering if there was another approach that could trim that down a bit.
An alternative is to use EXISTS:
Select *
from customers
WHERE EXISTS (
Select invoices.customerid
FROM invoices
JOIN invoicelines
ON invoicelines.invoiceid = invoices.id AND
invoicelines.itemcode = 'abc' AND
customers.id = invoices.customerid)
You might switch to using exists instead. I suspect that this might work well:
Select c.*
from customers c
where exists (Select 1
from invoices i join
invoicelines il
on i.id = il.invoiceid and il.itemcode = 'abc'
where c.id = i.customerid
);
For this, you want to be sure you have the right indexes: invoices(customerid, id) and invoicelines(invoiceid, itemcode).
Do you want all of the rows and columns in customer where the itemcode for that customer's item is 'abc'? If you join on the customerid then you can find all of the customer information for those items. If you have duplicates within that list you can use DISTINCT which will only give you one entry per customerID.
SELECT
DISTINCT [List of customer columns]
FROM
customers
INNER JOIN
invoicelines
ON
customers.customerid = invoicelines.customerid
AND
invoicelines.itemcode = 'abc'

Optimize SQL query for canceled orders

Here is a subset of my tables:
orders:
- order_id
- customer_id
order_products:
- order_id
- order_product_id (unique key)
- canceled
I want to select all orders (order_id) for a given customer(customer_id), where ALL of the products in the order are canceled, not just some of the products. Is there a more elegantly or efficient way of doing it than this:
select order_id from orders
where order_id in (
select order_id from orders
inner join order_products on orders.order_id = order_products.order_id
where order_products.customer_id = 1234 and order_products.canceled = 1
)
and order_id not in (
select order_id from orders
inner join order_products on orders.order_id = order_products.order_id
where order_products.customer_id = 1234 and order_products.canceled = 0
)
If all orders have at least one row in order_products, Try this
Select order_id from orders o
Where Not Exists
(Select * From order_products
Where order_id = o.order_id
And cancelled = 1)
If the above assumption is not true, then you also need:
Select order_id from orders o
Where Exists
(Select * From order_products
Where order_id = o.order_id)
And Not Exists
(Select * From order_products
Where order_id = o.order_id
And cancelled = 1)
The fastest way will be this:
SELECT order_id
FROM orders o
WHERE customer_id = 1234
AND
(
SELECT canceled
FROM order_products op
WHERE op.order_id = o.order_id
ORDER BY
canceled DESC
LIMIT 1
) = 0
The subquery will return 0 if and only if there had been some products and they all had been canceled.
If there were no products at all, the subquery will return NULL; if there is at least one uncanceled product, the subquery will return 1.
Make sure you have an index on order_products (order_id, canceled)
Something like this? This assumes that every order has at least one product, otherwise this query will return also orders without any products.
select order_id
from orders o
where not exists (select 1 from order_products op
where canceled = 0
and op.order_id = o.order_id
)
and o.customer_id = 1234
SELECT customer_id, order_id, count(*) AS product_count, sum(canceled) AS canceled_count
FROM orders JOIN order_products
ON orders.order_id = order_products.order_id
WHERE customer_id = <<VALUE>>
GROUP BY customer_id, order_id
HAVING product_count = canceled_count
You can try something like this
select orders.order_id
from #orders orders inner join
#order_products order_products on orders.order_id = order_products.order_id
where order_products.customer_id = 1234
GROUP BY orders.order_id
HAVING SUM(order_products.canceled) = COUNT(order_products.canceled)
Since we don't know the database platform, here's an ANSI standard approach. Note that this assumes nothing about the schema (i.e. data type of the cancelled field, how the cancelled flag is set (i.e. 'YES',1,etc.)) and uses nothing specific to a given database platform (which would likely be a more efficient approach if you could give us the platform and version you are using):
select op1.order_id
from (
select op.order_id, cast( case when op.cancelled is not null then 1 else 0 end as tinyint) as is_cancelled
from #order_products op
) op1
group by op1.order_id
having count(*) = sum(op1.is_cancelled);