Multi-Table Invoice SUM Comparison - sql

Say I have 3 tables in a rails app:
invoices
id | customer_id | employee_id | notes
---------------------------------------------------------------
1 | 1 | 5 | An order with 2 items.
2 | 12 | 5 | An order with 1 item.
3 | 17 | 12 | An empty order.
4 | 17 | 12 | A brand new order.
invoice_items
id | invoice_id | price | name
---------------------------------------------------------
1 | 1 | 5.35 | widget
2 | 1 | 7.25 | thingy
3 | 2 | 1.25 | smaller thingy
4 | 2 | 1.25 | another smaller thingy
invoice_payments
id | invoice_id | amount | method | notes
---------------------------------------------------------
1 | 1 | 4.85 | credit card | Not enough
2 | 1 | 1.25 | credit card | Still not enough
3 | 2 | 1.25 | check | Paid in full
This represents 4 orders:
The first has 2 items, for a total of 12.60. It has two payments, for a total paid amount of 6.10. This order is partially paid.
The second has only one item, and one payment, both totaling 1.25. This order is paid in full.
The third order has no items or payments. This is important to us, sometimes we use this case. It is considered paid in full as well.
The final order has one item again, for a total of 1.25, but no payments as of yet.
Now I need a query:
Show me all orders that are not paid in full yet; that is, all orders such that the total of the items is greater than the total of the payments.
I can do it in pure sql:
SELECT invoices.*,
invoice_payment_amounts.amount_paid AS amount_paid,
invoice_item_amounts.total_amount AS total_amount
FROM invoices
LEFT JOIN (
SELECT invoices.id AS invoice_id,
COALESCE(SUM(invoice_payments.amount), 0) AS amount_paid
FROM invoices
LEFT JOIN invoice_payments
ON invoices.id = invoice_payments.invoice_id
GROUP BY invoices.id
) AS invoice_payment_amounts
ON invoices.id = invoice_payment_amounts.invoice_id
LEFT JOIN (
SELECT invoices.id AS invoice_id,
COALESCE(SUM(invoice_items.item_price), 0) AS total_amount
FROM invoices
LEFT JOIN invoice_items
ON invoices.id = invoice_items.invoice_id
GROUP BY invoices.id
) AS invoice_item_amounts
ON invoices.id = invoice_item_amounts.invoice_id
WHERE amount_paid < total_amount
But...now I need to get that into rails (probably as a scope). I can use find_by_sql, but that then returns an array, rather than an ActiveRecord::Relation, which is not what I need, since I want to chain it with other scopes (there is, for example, an overdue scope, which uses this), etc.
So raw SQL probably isn't the right way to go here.....but what is? I've not been able to do this in activerecord's query language.
The closest I've gotten so far was this:
Invoice.select('invoices.*, SUM(invoice_items.price) AS total, SUM(invoice_payments.amount) AS amount_paid').
joins(:invoice_payments, :invoice_items).
group('invoices.id').
where('amount_paid < total')
But that fails, since on orders like #1, with multiple payments, it incorrectly doubles the price of the order (due to multiple joins), showing it as still unpaid. I had the same problem in SQL, which is why I structured it in the way I did.
Any thoughts here?

You can get your results using group by and having clause of MySQL as:
Pure MySQL Query:
SELECT `invoices`.* FROM `invoices`
INNER JOIN `invoice_items` ON
`invoice_items`.`invoice_id` = `invoices`.`id`
INNER JOIN `invoice_payments` ON
`invoice_payments`.`invoice_id` = `invoices`.`id`
GROUP BY invoices.id
HAVING sum(invoice_items.price) < sum(invoice_payments.amount)
ActiveRecord Query:
Invoice.joins(:invoice_items, :invoice_payments).group("invoices.id").having("sum(invoice_items.price) < sum(:invoice_payments.amount)")

When building more complex queries in Rails usually Arel Really Exasperates Logicians comes in handy
Arel is a SQL AST manager for Ruby. It
simplifies the generation of complex SQL queries, and
adapts to various RDBMSes.
Here is a sample how the Arel implementation would look like based on the requirements
invoice_table = Invoice.arel_table
# Define invoice_payment_amounts
payment_arel_table = InvoicePayment.arel_table
invoice_payment_amounts = Arel::Table.new(:invoice_payment_amounts)
payment_cte = Arel::Nodes::As.new(
invoice_payment_amounts,
payment_arel_table
.project(payment_arel_table[:invoice_id],
payment_arel_table[:amount].sum.as("amount_paid"))
.group(payment_arel_table[:invoice_id])
)
# Define invoice_item_amounts
item_arel_table = InvoiceItem.arel_table
invoice_item_amounts = Arel::Table.new(:invoice_item_amounts)
item_cte = Arel::Nodes::As.new(
invoice_item_amounts,
item_arel_table
.project(item_arel_table[:invoice_id],
item_arel_table[:price].sum.as("total"))
.group(item_arel_table[:invoice_id])
)
# Define main query
query = invoice_table
.project(
invoice_table[Arel.sql('*')],
invoice_payment_amounts[:amount_paid],
invoice_item_amounts[:total]
)
.join(invoice_payment_amounts).on(
invoice_table[:id].eq(invoice_payment_amounts[:invoice_id])
)
.join(invoice_item_amounts).on(
invoice_table[:id].eq(invoice_item_amounts[:invoice_id])
)
.where(invoice_item_amounts[:total].gt(invoice_payment_amounts[:amount_paid]))
.with(payment_cte, item_cte)
res = Invoice.find_by_sql(query.to_sql)
for r in res do
puts "---- Invoice #{r.id} -----"
p r
puts "total: #{r[:total]}"
puts "amount_paid: #{r[:amount_paid]}"
puts "----"
end
This will return the same output as your SQL query using the sample data you have provided to the question.
Output:
<Invoice id: 2, notes: "An order with 1 items.", created_at: "2017-12-18 21:15:47", updated_at: "2017-12-18 21:15:47">
total: 2.5
amount_paid: 1.25
----
---- Invoice 1 -----
<Invoice id: 1, notes: "An order with 2 items.", created_at: "2017-12-18 21:15:47", updated_at: "2017-12-18 21:15:47">
total: 12.6
amount_paid: 6.1
----
Arel is quite flexible so you can use this as a base and refine the query conditions based on more specific requirements you might have.
I would strongly recommend for you to consider creating a cache columns (total, amount_paid) in the Invoice table and maintain them so you can avoid this complex query. At least the total additional column would be quite simple to create and fill the data.

Related

SQL - Identify if a user is present every month

I am performing some data analysis on users who have made transactions over the course of three months.
What I would like to do is identify customers who made specific transaction types (Credit) in every single month present in the data table over those two years. As you can see in the data table below, User A has performed a Credit transaction in months 1,2,3 and I would like a flag saying "Frequent" applied to the customer.
User B, however, has not performed a credit transaction every month (month 2 was Debit), and so I would like them to have a different flag name (e.g. "Infrequent").
How can I use SQL to identify if a user has made a specific transaction type each month?
| Date | User | Amount | Transaction Type | **Flag ** |
| 2022-01-15 | A | $15.00 | Credit | **Flag ** |
...
| 2022-02-15 | A | $15.00 | Credit | **Flag ** |
...
| 2022-03-15 | A | $15.00 | Credit | **Flag ** |
...
...
| 2022-01-15 | B | $15.00 | Credit | **Flag ** |
...
| 2022-02-15 | B | $15.00 | Debit | **Flag ** |
...
| 2022-03-15 | B | $15.00 | Credit | **Flag ** |
I have tried the following - hoping there is a better or more simple way.
SELECT
Date, User, Amount, Transaction_Type,
CASE WHEN Count(present) = 3 THEN 'Frequent' ELSE 'Infrequent'
FROM Transactions
LEFT JOIN (
SELECT
User,Month(Date),Count(Transaction_Type) as present
FROM
Transactions
WHERE
Transaction_Type = 'Credit'
GROUP BY
User,Month(Date)
Having
Count(Transaction_Type) > 0
) subquery
ON subquery.User = Transaction.User
GROUP BY
Date,User,Amount,Transaction_Type
That is the way I would approach it. Assuming you are using T-SQL I would make the following changes. Instead of having the LEFT JOIN be to a sub-query, I would make the sub-query a CTE and then joint to that. I find it easier to grok when the main query is not full of sub-queries and you can test the CTE on its own more easily, plus if performance becomes an issue is relatively trivial to convert the CTE to a temp table. without affecting the main query too much.
You have a couple of problems I think. the first is that your subquery is going to return you the count of the credits in each month. If I make 3 credits in January this is going to flag me as frequent because the total is more than 3. You probably want to do a
COUNT(DISTINCT Transaction_type) AS hasCredit
to identify if there is AT LEAST ONE credit transaction, then have another aggregation that
SUM(hasCredit)
to get the number of months in which a credit appears.
using nested sub-queries means your LEFT JOIN would now be two sub-queries deep and dissapearing off the right hand side of your screen. Writing them as CTEs keeps the main logic clean and script narrow.
I think this does what you need, but can't test it because I don't have any sample data.
WITH CTE_HasCredit AS
(
SELECT
User
,Month(Date) AS [TransactionMonth]
,Count(DISTINCT Transaction_Type) AS [hasCredit]
FROM
Transactions
WHERE
Transaction_Type = 'Credit'
GROUP BY
User
,Month(Date)
Having
Count(Transaction_Type) > 0
)
,
CTE_isFrequent AS
(
SELECT
User
,SUM(hasCredit) AS [TotalCredits]
FROM
CTE_HasCredit
GROUP BY
User
)
SELECT
TXN.Date
, TXN.User
, TXN.Amount
, TXN.Transaction_Type
,CASE
WHEN FRQ.TotalCredits >= 3 THEN 'Frequent'
ELSE 'Infrequent'
END AS [customerType]
FROM
Transactions AS TXN
LEFT JOIN
CTE_isFrequent AS FRQ ON FRQ.User = TXN.User
GROUP BY
TXN.Date
,TXN.User
,TXN.Amount
,TXN.Transaction_Type
I don't think you need the GROUP BY on the main query either; it would de-dupe transactions for the same day for the same amount.
You might also want to look at the syntax for COUNT() OVER(). These would allow you to do the calculations in the main query and would look something like.
,CASE
WHEN COUNT(DISTINCT TXN.Transaction_Type) OVER(PARTITION BY User, MONTH(TXN.Date),TXN.Transaction_Type) >=3 THEN 'Frequent'
ELSE 'Infrequent'
END AS [customerType2]
This second way would give you customer type for both the Debits and Credits. I am not aware of a way to filter the COUNT() OVER() to just Credits, for that you would need to use the CTE method.

Create multiple filtered result sets of a joined table for use in aggregate functions

I have a (heavily simplified) orders table, total being the dollar amount, containing:
| id | client_id | type | total |
|----|-----------|--------|-------|
| 1 | 1 | sale | 100 |
| 2 | 1 | refund | 100 |
| 3 | 1 | refund | 100 |
And clients table containing:
| id | name |
|----|------|
| 1 | test |
I am attempting to create a breakdown, by client, metrics about the total number of sales, refunds, sum of sales, sum of refunds etc.
To do this, I am querying the clients table and joining the orders table. The orders table contains both sales and refunds, specified by the type column.
My idea was to join the orders twice using subqueries and create aliases for those filtered tables. The aliases would then be used in aggregate functions to find the sum, average etc. I have tried many variations of joining the orders table twice to achieve this but it produces the same incorrect results. This query demonstrates this idea:
SELECT
clients.*,
SUM(sales.total) as total_sales,
SUM(refunds.total) as total_refunds,
AVG(sales.total) as avg_ticket,
COUNT(sales.*) as num_of_sales
FROM clients
LEFT JOIN (SELECT * FROM orders WHERE type = 'sale') as sales
ON sales.client_id = clients.id
LEFT JOIN (SELECT * FROM orders WHERE type = 'refund') as refunds
ON refunds.client_id = clients.id
GROUP BY clients.id
Result:
| id | name | total_sales | total_refunds | avg_ticket | num_of_sales |
|----|------|-------------|---------------|------------|--------------|
| 1 | test | 200 | 200 | 100 | 2 |
Expected result:
| id | name | total_sales | total_refunds | avg_ticket | num_of_sales |
|----|------|-------------|---------------|------------|--------------|
| 1 | test | 100 | 200 | 100 | 1 |
When the second join is included in the query, the rows returned from the first join are returned again with the second join. They are multiplied by the number of rows in the second join. It's clear my understanding of joining and/or subqueries is incomplete.
I understand that I can filter the orders table with each aggregate function. This produces correct results but seems inefficient:
SELECT
clients.*,
SUM(orders.total) FILTER (WHERE type = 'sale') as total_sales,
SUM(orders.total) FILTER (WHERE type = 'refund') as total_refunds,
AVG(orders.total) FILTER (WHERE type = 'sale') as avg_ticket,
COUNT(orders.*) FILTER (WHERE type = 'sale') as num_of_sales
FROM clients
LEFT JOIN orders
on orders.client_id = clients.id
GROUP BY clients.id
What is the appropriate way to created filtered and aliased versions of this joined table?
Also, what exactly is happening with my initial query where the two subqueries are joined. I would expect them to be treated as separate subsets even though they are operating on the same (orders) table.
You should do the (filtered) aggregation once for all aggregates you want, and then join to the result of that. As your aggregation doesn't need any columns from the clients table, this can be done in a derived table. This is also typically faster than grouping the result of the join.
SELECT clients.*,
o.total_sales,
o.total_refunds,
o.avg_ticket,
o.num_of_sales
FROM clients
LEFT JOIN (
select client_id,
SUM(total) FILTER (WHERE type = 'sale') as total_sales,
SUM(total) FILTER (WHERE type = 'refund') as total_refunds,
AVG(total) FILTER (WHERE type = 'sale') as avg_ticket,
COUNT(*) FILTER (WHERE type = 'sale') as num_of_sales
from orders
group by client_id
) o on o.client_id = clients.id

How do you use two SUM() aggregate functions in the same query for PostgreSQL?

I have a PostgreSQL query that yields the following results:
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | null | 125.00 | new
123-2 | B corp. | null | 100.00 | new
I need to replace the o.order_total (it doesn't work properly) and sum up the sum of the order_shipment_total column so that, for the example above, each row winds up saying 225.00. I need the results above to look like this below:
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 225.00 | 125.00 | new
123-2 | B corp. | 225.00 | 100.00 | new
What I've Tried
1.) To replace o.order_total, I've tried SUM(SUM(osh.items)) but get the error message that you cannot nest aggregate functions.
2.) I've tried to put the entire query as a subquery and sum the order_shipment_total column, but when I do, it just repeats the column itself. See below:
SELECT order,
company,
SUM(order_shipment_total) AS order_shipment_total,
order_shipment_total,
order_type
FROM (
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type
) subquery
GROUP BY order,
company,
order_shipment_total,
order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 125.00 | 125.00 | new
123-2 | B corp. | 100.00 | 100.00 | new
3.) I've tried to only include the rows I actually want to group by in my subquery/query example above, because I feel like I was able to do this in Oracle SQL. But when I do that, I get an error saying "column [name] must appear in the GROUP BY clause or be used in an aggregate function."
...
GROUP BY order,
company,
order_type;
ERROR: column "[a column name]" must appear in the GROUP BY clause or be used in an aggregate function.
How do I accomplish this? I was certain that a subquery would be the answer but I'm confused as to why this approach will not work.
The thing you're not quite grasping with your query / approach is that you're actually wanting two different levels of grouping in the same query row results. The subquery approach is half right, but when you do a subquery that groups, inside another query that groups you can only use the data you've already got (from the subquery) and you can only choose to keep it at the level of aggregate detail it already is, or you can choose to lose precision in favor of grouping more. You can't keep the detail AND lose the detail in order to sum up further. A query-of-subquery is hence (in practical terms) relatively senseless because you might as well group to the level you want in one hit:
SELECT groupkey1, sum(sumx) FROM
(SELECT groupkey1, groupkey2, sum(x) as sumx FROM table GROUP BY groupkey1, groupkey2)
GROUP BY groupkey1
Is the same as:
SELECT groupkey1, sum(x) FROM
table
GROUP BY groupkey1
Gordon's answer will probably work out (except for the same bug yours exhibits in that the grouping set is wrong/doesn't cover all the columns) but it probably doesn't help much in terms of your understanding because it's a code-only answer. Here's a breakdown of how you need to approach this problem but with simpler data and foregoing the window functions in favor of what you already know.
Suppose there are apples and melons, of different types, in stock. You want a query that gives a total of each specific kind of fruit, regardless of the date of purchase. You also want a column for the total for each fruit overall type:
Detail:
fruit | type | purchasedate | count
apple | golden delicious | 2017-01-01 | 3
apple | golden delicious | 2017-01-02 | 4
apple | granny smith | 2017-01-04 ! 2
melon | honeydew | 2017-01-01 | 1
melon | cantaloupe | 2017-01-05 | 4
melon | cantaloupe | 2017-01-06 | 2
So that's 7 golden delicious, 2 granny smith, 1 honeydew, 6 cantaloupe, and its also 9 apples and 7 melons
You can't do it as one query*, because you want two different levels of grouping. You have to do it as two queries and then (critical understanding point) you have to join the less-precise (apples/melons) results back to the more precise (granny smiths/golden delicious/honydew/cantaloupe):
SELECT * FROM
(
SELECT fruit, type, sum(count) as fruittypecount
FROM fruit
GROUP BY fruit, type
) fruittypesum
INNER JOIN
(
SELECT fruit, sum(count) as fruitcount
FROM fruit
GROUP BY fruit
) fruitsum
ON
fruittypesum.fruit = fruitsum.fruit
You'll get this:
fruit | type | fruittypecount | fruit | fruitcount
apple | golden delicious | 7 | apple | 9
apple | granny smith | 2 | apple | 9
melon | honeydew | 1 | melon | 7
melon | cantaloupe | 6 | melon | 7
Hence for your query, different groups, detail and summary:
SELECT
detail.order || '-' || detail.ordinal_number as order,
detail.company,
summary.order_total,
detail.order_shipment_total,
detail.order_type
FROM (
SELECT o.order,
osh.ordinal_number,
o.company,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_type
) detail
INNER JOIN
(
SELECT o.order,
SUM(osh.items) AS order_total
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
--don't need the where clause; we'll join on order number
GROUP BY o.order,
o.company,
o.order_type
) summary
ON
summary.order = detail.order
Gordon's query uses a window function achieve the same effect; the window function runs after the grouping is done, and it establishes another level of grouping (PARTITION BY ordernumber) which is the effective equivalent of my GROUP BY ordernumber in the summary. The window function summary data is inherently connected to the detail data via ordernumber; it is implicit that a query saying:
SELECT
ordernumber,
lineitemnumber,
SUM(amount) linetotal
sum(SUM(amount)) over(PARTITION BY ordernumber) ordertotal
GROUP BY
ordernumber,
lineitemnumber
..will have an ordertotal that is the total of all the linetotal in the order: The GROUP BY prepares the data to the line level detail, and the window function prepares data to just the order level, and repeats the total as many times are necessary to fill in for every line item. I wrote the SUM that belongs to the GROUP BY operation in capitals.. the sum in lowercase belongs to the partition operation. it has to sum(SUM()) and cannot simply say sum(amount) because amount as a column is not allowed on its own - it's not in the group by. Because amount is not allowed on its own and has to be SUMmed for the group by to work, we have to sum(SUM()) for the partition to run (it runs after the group by is done)
It behaves exactly the same as grouping to two different levels and joining together, and indeed I chose that way to explain it because it makes it more clear how it's working in relation to what you already know about groups and joins
Remember: JOINS make datasets grow sideways, UNIONS make them grow downwards. When you have some detail data and you want to grow it sideways with some more data(a summary), JOIN it on. (If you'd wanted totals to go at the bottom of each column, it would be unioned on)
*you can do it as one query (without window functions), but it can get awfully confusing because it requires all sorts of trickery that ultimately isn't worth it because it's too hard to maintain
You should be able to use window functions:
SELECT o.order || '-' || osh.ordinal_number AS order, o.company,
SUM(SUM(osh.items)) OVER (PARTITION BY o.order) as order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o JOIN
order_shipments osh
ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order, o.company, o.order_type;

SQL 2 Left outer joins with Sum and Group By

Looking for some guidance on this. I am attempting to run a report in my complaint management system.. Complaints by Year, Location, Subcategory, Showing Totals for TotalCredits (child table) and TotalsCwts (childtable) as well as total ExternalRootCause (on master table).
This is my SQL, but the TotalCwts and TotalCredits are not being calculated correctly. It calculates 1 time for each child record rather than the total for each master record.
SELECT
dbo.Complaints.Location,
YEAR(dbo.Complaints.ComDate) AS Year,
dbo.Complaints.ComplaintSubcategory,
COUNT(Distinct(dbo.Complaints.ComId)) AS CustomerComplaints,
SUM(DISTINCT CASE WHEN (dbo.Complaints.RootCauseSource = 'External' ) THEN 1 ELSE 0 END) as ExternalRootCause,
SUM(dbo.ComplaintProducts.Cwts) AS TotalCwts,
Coalesce(SUM(dbo.CreditDeductions.CreditAmount),0) AS TotalCredits
FROM dbo.Complaints
JOIN dbo.CustomerComplaints
ON dbo.Complaints.ComId = dbo.CustomerComplaints.ComId
LEFT OUTER JOIN dbo.CreditDeductions
ON dbo.Complaints.ComId = dbo.CreditDeductions.ComId
LEFT OUTER JOIN dbo.ComplaintProducts
ON dbo.Complaints.ComId = dbo.ComplaintProducts.ComId
WHERE
dbo.Complaints.Location = Coalesce(#Location,Location)
GROUP BY
YEAR(dbo.Complaints.ComDate),
dbo.Complaints.Location,
dbo.Complaints.ComplaintSubcategory
ORDER BY
[YEAR] desc,
dbo.Complaints.Location,
dbo.Complaints.ComplaintSubcategory
Data Results
Location | Year | Subcategory | Complaints | External RC | Total Cwts | Total Credits
---------------------------------------------------------------------------------------
Boston | 2016 | Documentation | 1 | 0 | 8 | 8.00
Data Should Read
Location | Year | Subcategory | Complaints | External RC | Total Cwts | Total Credits
---------------------------------------------------------------------------------------
Boston | 2016 | Documentation | 1 | 0 | 4 | 2.00
Above data reflects 1 complaint having 4 Product Records with 1cwt each and 2 credit records with 1.00 each.
What do I need to change in my query or should I approach this query a different way?
The problem is that the 1 complaint has 2 Deductions and 4 products. When you join in this manner then it will return every combination of Deduction/Product for the complaint which gives 8 rows as you're seeing.
One solution, which should work here, is to not query the Dedustion and Product tables directly; query a query which returns one row per table per complaint. In other words, replace:
LEFT OUTER JOIN dbo.CreditDeductions ON dbo.Complaints.ComId = dbo.CreditDeductions.ComId
LEFT OUTER JOIN dbo.ComplaintProducts ON dbo.Complaints.ComId = dbo.ComplaintProducts.ComId
...with this - showing the Deductions table only, you can work out the Products:
LEFT OUTER JOIN (
select ComId, count(*) CountDeductions, sum(CreditAmount) CreditAmount
from dbo.CreditDeductions
group by ComId
) d on d.ComId = Complaints.ComId
You'll have to change the references to dbo.CreditDedustions to just d (or whatever you want to call it).
Once you've done them both then you'll one each per complaint, which will result with 1 row per complaint contaoining the counts and totals from the two sub-tables.

SQL select only highest date

For a project I want to generate a price list.
I want to get only the latest prices from each supplier for each article.
There are just those two tables.
Table articles
ARTNR | TXT | ACTIVE | SUPPLIER
------------------------------------------
10 | APPLE | Y | 10
20 | ORANGE | Y | 10
30 | KEYBOARD | N | 20
40 | ORANGE | Y | 20
50 | BANANA | Y | 10
60 | CHERRY | Y | 10
Table prices
ARTNR | PRCGRP | PRCDAT | PRICE
--------------------------------------
10 | 10 | 01-Aug-10 | 2.1
10 | 10 | 05-Aug-11 | 2.2
10 | 10 | 21-Aug-12 | 2.5
20 | 0 | 01-Aug-10 | 2.1
20 | 10 | 09-Aug-12 | 2.3
10 | 10 | 14-Aug-13 | 2.7
This is what I have so far:
SELECT
ARTICLES.[ARTNR], ARTICLES.[TXT], ARTICLES.[ACTIVE], ARTICLES.[SUPPLIER], PRICES.PRCGRP, PRICES.PRCDAT, PRICES.PRICE
FROM
ARTICLES INNER JOIN PRICES ON ARTICLES.ARTNR = PRICES.ARTNR
WHERE
(
(ARTICLES.[ACTIVE]="Y") AND
(ARTICLES.[SUPPLIER]=10) AND
(PRICES.PRCGRP=0) AND
(PRICES.PRCDAT=(SELECT MAX(PRCDAT) FROM PRICES as art WHERE art.ARTNR = PRICES.artnr) )
)
ORDER BY ARTICLES.ARTNR
;
It is okay to choose just one supplier each time, but I want the max price.
The problem is:
Lots of articles do not show up with the query above,
but I cannot figure out what is wrong.
I can see that they should be in the resultset when I leave out the subselect on max prcdat.
What is wrong?
Your subquery to get the latest price does not take the other conditions into account, that is when you're getting the latest price, you may get a price in another price group or that is not active. When you join that against the filtered list that has no inactive prices and only prices in a single price group, you get no hits that exist in both.
Either you need to duplicate or - better - move your conditions inside the subquery to get the best price under the conditions. I can't test against access, but something like this should be possible if the SQL is not too limited;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
JOIN (
SELECT a.artnr, MAX(p.prcdat) prcdat
FROM articles a JOIN prices p ON a.artnr = p.artnr
WHERE a.active='Y' AND a.supplier=10 AND p.prcgrp=10
GROUP BY a.artnr) z
ON a.artnr = z.artnr AND p.prcdat = z.prcdat
ORDER BY a.ARTNR
If the SQL support in access won't allow a join with a subquery, you can just move the conditions inside your existing subquery, something like;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
WHERE p.prcdat = (
SELECT MAX(p2.prcdat)
FROM articles a2 JOIN prices p2 ON a2.artnr = p2.artnr
WHERE a.artnr = a2.artnr AND a2.active='Y' AND a2.supplier=10 AND p2.prcgrp=10
)
ORDER BY a.ARTNR;
Note that due to limitations in identifying a unique price (no primary key in prices), the queries may give duplicates if several prices for the same article have the same prcdat. If that's a problem, you'll probably need to duplicate your conditions outside the subquery too.