Rails/SQL: finding invoices by checking two sums

I have an Invoice model that has_many lines and has_many payments.
total (decimal)
I need to find all paid invoices. So I'm doing the following:
Invoice.joins(:lines, :payments).having(' sum(lines.total) = sum(payments.total').group('invoices.id')
Which queries:
FROM "invoices"
INNER JOIN "lines" ON "lines"."invoice_id" = "invoices"."id"
INNER JOIN "payments" ON "payments"."invoice_id" = "invoices"."id"
GROUP BY invoices.id
HAVING sum(lines.total) = sum(payments.total)
But it always return empty array even if there are invoices fully paid.
Is something wrong with my code?

If you join to more than one table with a 1:n relationship, the joined rows can multiply each other.
This related answer has more detailed explanation for the problem:
Two SQL LEFT JOINS produce incorrect result
To avoid that, sum the totals before you join. This way you join to exactly 1 (or 0) rows, and nothing is multiplied. Not only correct, also considerably faster.
SELECT i.*, l.sum_total
FROM invoices i
SELECT invoice_id, sum(total) AS sum_total
FROM lines
) l ON l.invoice_id = i.id
SELECT invoice_id, sum(total) AS sum_total
FROM payments
) p ON p.invoice_id = i.id
WHERE l.sum_total = p.sum_total;
Using [INNER] JOIN, not LEFT [OUTER] JOIN on purpose. Invoices that do not have any lines or payments are not of interest to begin with. Since we want "paid" invoices. For lack of definition and by the looks of the provided query, I am assuming that means invoices with actual lines and payments, both totaling the same.

If one invoice have a line and two payments fully paid like this:
id total invoice_id
1 30 1
id total invoice_id
1 10 1
2 20 1
Then join lines and payments to invoice with invoce_id will get 2 rows like this:
payment_id payment_total line_id line_total invoice_id
1 10 1 30 1
2 20 1 30 1
So the sum of line_total will not equal to sum of payment_total.
To get all paid invoice could use exists instead of joins:
(select 1 from
(select invoice_id
from (select invoice_id,sum(total) as line_total
from lines
group by invoice_id) as l
inner join (select invoice_id,sum(total) as payment_total
from payments
group by invoice_id) as p
on l.invoice_id = p.invoice_id
where payment_total = line_total) as paid
where invoices.id = paid.id) ")
The sub_query paid will get all paid invoice_ids.


Subquery amount not coming in full

I have this query:
Select I.Invoice_Number, PA.Invoice_Number, I.Line_Amount, PA.Invoiced_Amount
from XXX as PA
Left join (select Invoice_Number, Line_Amount from Invoices) as I
on PA.Invoice_Number = I.Invoice_Number
Group by PA.Invoice_Number;
Both should give me the same amount of cost (I.Line_Amount = PA.Invoice_Amount) per Invoice_Number, yet I.Line_Amount is only bringing the first row on the list, while PA.Invoiced.Number brings the sum of the cost on that Invoice.
I tried using sum(Line_Amount) within the subquery but all records come out as Null.
Is there a way for me to join both tables and make sure that the amounts per invoice match to the total amount of that invoice?
If I understand you correctly (and you want to make sure that sum of Line_Amount in Invoices table is the same as Invoiced_Amount in XXX table) the second table should have invoice number and sum of amounts:
select I.Invoice_Number, PA.Invoice_Number, I.total, PA.Invoiced_Amount
from XXX as PA
left join (
select Invoice_Number, sum(Line_Amount) as total
from Invoices
group by Invoice_Number
) as I
on PA.Invoice_Number = I.Invoice_Number
You can try it here: http://sqlfiddle.com/#!9/d1d010/1/0

Sum from two tables and compare with a value from third table

I'm trying to do something which I believe is very simple, but can't figure it out in SQL Statement.
The tables
Invoices (column - GrossAmount)
Receipts (column - ReceiptValue, there could be a receipt or no receipt at all)
Credit notes (column - GrossCredit, there could be a credit note or none)
I want to show the total outstanding invoices, i.e., show all invoices where Invoices.GrossAmount > (sum(Receipt.ReceiptValue) + sum(CreditNotes.GrossCredit)).
Query needs to show all the invoices which are not fully paid or not paid at all.
InvoiceId is same in all tables as foreign key.
Using MS SQL Server 2014.
You need to sum each table individually (grouped by invoice) and then [left] join the results:
SELECT i.InvoiceId
FROM invoices i
LEFT JOIN (SELECT InvoiceId, SUM(ReceiptValue) AS sum_receipt
FROM receipts
GROUP BY InvoiceId) r ON i.InvoiceId = r.InvoiceId
LEFT JOIN (SELECT InvoiceId, SUM(GrossCredit) AS sum_credit
FROM credit
GROUP BY InvoiceId) g ON i.InvoiceId = g.InvoiceId
WHERE i.GrossAmount > COALESCE(sum_receipt, 0) + COALESCE(sum_credit, 0)
I think you want something like this:
select i.*,
coalesce(r.sumrv, 0) as receiptValue,
coalesce(c.sumgc, 0) as grossCredits
from invoices i left join
(select invoiceId, sum(receiptvalue) as sumrv
from receipts
group by invoiceId
) r
on i.invoiceId = r.invoiceId left join
(select invoiceId, sum(grosscredit) as sumgc
from credits c
group by invoiceId
) c
on i.invoiceId = c.invoiceId
where i.GrossAmount > coalesce(r.sumrv, 0) + coalesce(c.sumgc, 0);
Three important things:
Use left join so you don't drop invoices with no matching records in one or both of the tables.
Use coalesce() so NULL values are treated as 0.
Do the aggregations before joining the tables.

Aggregate after join without duplicates

Consider this query:
(select * from orders where <condition>) as s,
(select * from products where <condition>) as p
s.id = p.order;
There are, for example, 200 records in products and 100 in orders (one order can contain one or more products).
I need to join then and then:
count products (should return 200)
count orders (should return 100)
sum by one of orders field (should return sum by 100 prices)
The problem is after join p and s has same length and for 2) I can write count(distinct s.id), but for 3) I'm getting duplicates (for example, if sale has 2 products it sums price twice) so sum works on entire 200 records set, but should query only 100.
Any thoughts how to sum only distinct records from joined table but also not ruin another selects?
Example, joined table has
id sale price
0 0 4
0 0 4
1 1 3
2 2 4
2 2 4
2 2 4
So the sum(s.price) will return:
but I need:
If the products table is really more of an "order lines" table, then the query would make sense. You can do what you want by in several ways. Here I'm going to suggest conditional aggregation:
select count(distinct p.id), count(distinct s.id),
sum(case when seqnum = 1 then s.price end)
from (select o.* from orders o where <condition>) s join
(select p.*, row_number() over (partition by p.order order by p.order) as seqnum
from products p
where <condition>
) p
on s.id = p.order;
Normally, a table called "products" would have one row per product, with things like a description and name. A table called something like "OrderLines" or "OrderProducts" or "OrderDetails" would have the products within a given order.
You are not interested in single product records, but only in their number. So join the aggregate (one record per order) instead of the single rows:
count(*) as count_orders,
sum(p.cnt) as count_products,
from orders as s
select order, count(*) as cnt
from products
where <condition>
group by order
) as p on p.order = s.id
where <condition>;
Your main problem is with table design. You currently have no way of knowing the price of a product if there were no sales on it. Price should be in the product table. A product cost a certain price. Then you can count all the products of a sale and also get the total price of the sale.
Also why are you using subqueries. When you do this no indexes will be used when joining the two subqueries. If your joins are that complicated use views. In most databases they can indexed

SQL rewrite to optimize

I'm trying to optimize or change the SQL to work with inner joins rather than independent calls
Database: one invoice can have many payment records and order (products) records
(SELECT SUM(Orders.Cost) FROM Orders WHERE Orders.Invoice = InvoiceNum and Orders.Returned <> 1 GROUP BY Orders.Invoice) as vat_only,
(SELECT SUM(Orders.Vat) FROM Orders WHERE Orders.Invoice = InvoiceNum and Orders.Returned <> 1 GROUP BY Orders.Invoice) as sales_prevat,
(SELECT SUM(pay.Amount) FROM Payments as pay WHERE Invoices.InvoiceNum = pay.InvoiceNum ) as income
InvoiceYear = currentyear
I'm sure we can do this another way by grouping and joining tables together. When I tried the SQL statement below, I wasn't getting the same amount (count) of records...I'm thinking in respect to the type of join or where it joins !! but still couldn't get it working after 3 hrs of looking on the screen..
So far I got to...
Sum(Orders.Cost) AS SumOfCost,
Sum(Orders.VAT) AS SumOfVAT,
SUM(distinct Payments.Amount) as money
Orders ON Orders.Invoice = Invoices.InvoiceNum
Payments ON Invoices.InvoiceNum = Payments.InvoiceNum
Invoices.InvoiceYear = 11
AND Orders.Returned <> 1
Sorry for the bad english and I'm not sure what to search for to find if it's already been answered here :D
Thanks in advance for all the help
Your problem is that an order has multiple lines for an invoice and it has multiple payments on an invoice (sometimes). This causes a cross product effect for a given order. You fix this by pre-summarizing the tables.
A related problem is that the join will fail if there are no payments, so you need left outer join.
select i.InvoiceNum, osum.cost, osum.vat, p.income
from Invoice i left outer join
(select o.Invoice, sum(o.Cost) as cost, sum(o.vat) as vat
from orders o
where Returned <> 1
group by o.Invoice
) osum
on osum.Invoice = i.InvoiceNum left outer join
(select p.InvoiceNum, sum(pay.Amount) as income
from Payments p
group by p.InvoiceNum
) psum
on psum.InvoiceNum = i.InvoiceNum
where i.InvoiceYear = year(getdate())
Two comments: Is the key field for orders really Invoice or is it also InvoiceNum? Also, do you have a field Invoice.InvoiceYear? Or do you want year(i.InvoiceDate) in the where clause?
Assuming that both payments and orders can contain more than one record per invoice you will need to do your aggregates in a subquery to avoid cross joining:
SELECT Invoices.InvoiceNum, o.Cost, o.VAT, p.Amount
FROM Invoices
( SELECT Invoice, Cost = SUM(Cost), VAT = SUM(VAT)
FROM Orders
WHERE Orders.Returned <> 1
GROUP BY Invoice
) o
ON o.Invoice = Invoices.InvoiceNum
( SELECT InvoiceNum, Amount = SUM(Amount)
FROM Payments
GROUP BY InvoiceNum
) P
ON P.InvoiceNum = Invoices.InvoiceNum
WHERE Invoices.InvoiceYear = 11;
To expand on the CROSS JOIN comment, imagine this data for an Invoice (1)
Invoice Cost VAT
1 15.00 3.00
1 10.00 2.00
InvoiceNum Amount
1 15.00
1 10.00
When you join these tables as you did:
SELECT Orders.*, Payments.Amount
FROM Invoices
ON Orders.Invoice = Invoices.InvoiceNum
LEFT JOIN Payments
ON Invoices.InvoiceNum = Payments.InvoiceNum;
You end up with:
Orders.Invoice Orders.Cost Orders.Vat Payments.Amount
1 15.00 3.00 15.00
1 10.00 2.00 15.00
1 15.00 3.00 10.00
1 10.00 2.00 10.00
i.e. every combination of payments/orders, so for each invoice you would get many more rows than required, which distorts your totals. So even though the original data had £25 of payments, this doubles to £50 because of the two records in the order table. This is why each table needs to be aggregated individually, using DISTINCT would not work in the case there was more than one payment/order for the same amount on a single invoice.
One final point with regard to optimisation, you should probably index your tables, If you run the query and display the actual execution plan SSMS will suggest indexes for you, but at a guess the following should improve the performance:
CREATE NONCLUSTERED INDEX IX_Orders_InvoiceNum ON Orders (Invoice) INCLUDE(Cost, VAT, Returned);
CREATE NONCLUSTERED INDEX IX_Payments_InvoiceNum ON Payments (InvoiceNum) INCLUDE(Amount);
This should allow both subqueries to only use the index on each table, with no bookmark loopup/clustered index scan required.
Try this, note that I haven't tested it, just wipped it out on notepad. If any of your invoices may not exist in any of the subtables, then use LEFT JOIN
SELECT InvoiceNum, vat_only, sales_prevat, income
FROM Invoices i
INNER JOIN (SELECT Invoice, SUM(Cost) [vat_only], SUM(Vat) [sales_prevat]
FROM Orders
WHERE Returned <> 1
GROUP BY Invoice) o
ON i.InvoiceNum = o.Invoice
INNER JOIN (SELECT SUM(Amount) [income]
FROM Payments) p
ON i.InvoiceNum = p.InvoiceNum
WHERE i.InvoiceYear = currentyear
SUM( Pay.Amount ) as Income
( select
SUM( O.Cost ) as VAT_Only,
SUM( O.Vat ) as sales_prevat
Invoice I
Join Orders O
on I.InvoiceNum = O.Invoice
AND O.Returned <> 1
I.InvoiceYear = currentYear
group by
I.InvoiceNum ) PreQuery
JOIN Payments Pay
on PreQuery.InvoiceNum = Pay.InvoiceNum
group by
Your "currentYear" reference could be parameterized or you can use from getting the current date from sql function such as
Year( GetDate() )

SQL JOIN, GROUP BY on three tables to get totals

I've inherited the following DB design. Tables are:
My query needs to return invoiceid, the invoice amount (in the invoices table), and the amount due (invoice amount minus any payments that have been made towards the invoice) for a given customernumber. A customer may have multiple invoices.
The following query gives me duplicate records when multiple payments are made to an invoice:
SELECT i.invoiceid, i.amount, i.amount - p.amount AS amountdue
FROM invoices i
LEFT JOIN invoicepayments ip ON i.invoiceid = ip.invoiceid
LEFT JOIN payments p ON ip.paymentid = p.paymentid
LEFT JOIN customers c ON p.customerid = c.customerid
WHERE c.customernumber = '100'
How can I solve this?
I am not sure I got you but this might be what you are looking for:
SELECT i.invoiceid, sum(case when i.amount is not null then i.amount else 0 end), sum(case when i.amount is not null then i.amount else 0 end) - sum(case when p.amount is not null then p.amount else 0 end) AS amountdue
FROM invoices i
LEFT JOIN invoicepayments ip ON i.invoiceid = ip.invoiceid
LEFT JOIN payments p ON ip.paymentid = p.paymentid
LEFT JOIN customers c ON p.customerid = c.customerid
WHERE c.customernumber = '100'
GROUP BY i.invoiceid
This would get you the amounts sums in case there are multiple payment rows for each invoice
Thank you very much for the replies!
Saggi Malachi, that query unfortunately sums the invoice amount in cases where there is more than one payment. Say there are two payments to a $39 invoice of $18 and $12. So rather than ending up with a result that looks like:
1 39.00 9.00
You'll end up with:
1 78.00 48.00
Charles Bretana, in the course of trimming my query down to the simplest possible query I (stupidly) omitted an additional table, customerinvoices, which provides a link between customers and invoices. This can be used to see invoices for which payments haven't made.
After much struggling, I think that the following query returns what I need it to:
SELECT DISTINCT i.invoiceid, i.amount, ISNULL(i.amount - p.amount, i.amount) AS amountdue
FROM invoices i
LEFT JOIN invoicepayments ip ON i.invoiceid = ip.invoiceid
LEFT JOIN customerinvoices ci ON i.invoiceid = ci.invoiceid
SELECT invoiceid, SUM(p.amount) amount
FROM invoicepayments ip
LEFT JOIN payments p ON ip.paymentid = p.paymentid
GROUP BY ip.invoiceid
) p
ON p.invoiceid = ip.invoiceid
LEFT JOIN payments p2 ON ip.paymentid = p2.paymentid
LEFT JOIN customers c ON ci.customerid = c.customerid
WHERE c.customernumber='100'
Would you guys concur?
I have a tip for those, who want to get various aggregated values from the same table.
Lets say I have table with users and table with points the users acquire. So the connection between them is 1:N (one user, many points records).
Now in the table 'points' I also store the information about for what did the user get the points (login, clicking a banner etc.). And I want to list all users ordered by SUM(points) AND then by SUM(points WHERE type = x). That is to say ordered by all the points user has and then by points the user got for a specific action (eg. login).
The SQL would be:
SELECT SUM(points.points) AS points_all, SUM(points.points * (points.type = 7)) AS points_login
FROM user
LEFT JOIN points ON user.id = points.user_id
GROUP BY user.id
The beauty of this is in the SUM(points.points * (points.type = 7)) where the inner parenthesis evaluates to either 0 or 1 thus multiplying the given points value by 0 or 1, depending on wheteher it equals to the the type of points we want.
First of all, shouldn't there be a CustomerId in the Invoices table? As it is, You can't perform this query for Invoices that have no payments on them as yet. If there are no payments on an invoice, that invoice will not even show up in the ouput of the query, even though it's an outer join...
Also, When a customer makes a payment, how do you know what Invoice to attach it to ? If the only way is by the InvoiceId on the stub that arrives with the payment, then you are (perhaps inappropriately) associating Invoices with the customer that paid them, rather than with the customer that ordered them... . (Sometimes an invoice can be paid by someone other than the customer who ordered the services)
I know this is late, but it does answer your original question.
/*Read the comments the same way that SQL runs the query
4) My final notes at the bottom
, cust.customernumber
, MAX(list.inv_amount) AS invoice_amount/* we select the max because it will be the same for each payment to that invoice (presumably invoice amounts do not vary based on payment) */
, MAX(list.inv_amount) - SUM(list.pay_amount) AS [amount_due]
Customers AS cust
Payments AS pay
pay.customerid = cust.customerid
INNER JOIN ( /* generate a list of payment_ids, their amounts, and the totals of the invoices they billed to*/
inpay.paymentid AS paymentid
, inv.invoiceid AS invoiceid
, inv.amount AS inv_amount
, pay.amount AS pay_amount
InvoicePayments AS inpay
Invoices AS inv
ON inv.invoiceid = inpay.invoiceid
Payments AS pay
ON pay.paymentid = inpay.paymentid
) AS list
list.paymentid = pay.paymentid
/* so at this point my result set would look like:
-- All my customers (crossed by) every paymentid they are associated to (I'll call this A)
-- Every invoice payment and its association to: its own ammount, the total invoice ammount, its own paymentid (what I call list)
-- Filter out all records in A that do not have a paymentid matching in (list)
-- we filter the result because there may be payments that did not go towards invoices!
/* we want a record line for each customer and invoice ( or basically each invoice but i believe this makes more sense logically */
, list.invoiceid
-- we can improve this query by only hitting the Payments table once by moving it inside of our list subquery,
-- but this is what made sense to me when I was planning.
-- Hopefully it makes it clearer how the thought process works to leave it in there
-- as several people have already pointed out, the data structure of the DB prevents us from looking at customers with invoices that have no payments towards them.