Sql query group by question, creating monthly averages - sql

I am trying to compile a table which shows three columns:
product name, average monthly sales volume and average monthly sales price for said product.
I am using adventureworks 2019.
I have written the below query based upon some help I previously received. I have summed the order quantity and unit price for each product and then in the outer query averaged them.
SELECT
Product_Name,
AVG(Sales_Volume) AS Avg_Sales_Volume,
AVG(Price) AS Avg_Price
FROM
(SELECT
PP.[Name] AS Product_Name,
SUM(SSOD.[OrderQty]) AS Sales_Volume,
SUM(SSOD.[UnitPrice]) AS Price,
FORMAT(SSOH.[OrderDate], 'MM-yyyy') AS Month_Year
FROM
[Sales].[SalesOrderHeader] AS SSOH
LEFT JOIN Sales.SalesOrderDetail AS SSOD
ON SSOH.SalesOrderID = SSOD.SalesOrderID
LEFT JOIN production.product AS PP
ON SSOD.ProductID = PP.ProductID
GROUP BY PP.[Name], SSOH.[OrderDate]) AS T
--WHERE Product_Name = 'Road-150 Red, 44' (doing this for reference)
GROUP BY T.Product_Name, Month-Year
If I group by Month-Year I get around 3600 rows, where as if I don't, I get 266. I am confused about this. Basically, I am not sure what it is actually showing.
To some of you this may seem really basic, but it feels like I cannot get my head around it.
Could anyone take a bit of time to explain this to me?
Thanks
Product_Name Avg_Sales_Volume Avg_Price
LL Mountain Frame - Silver, 48 14 844.96
LL Touring Frame - Blue, 50 26 2100.546
Women's Mountain Shorts, L 13 161.494
Road-550-W Yellow, 44 3 1890.7332
HL Road Frame - Red, 48 18 6025.3137
Mountain-500 Silver, 42 8 1395.0116
(266 rows)
Query with group by Month-Year also:
Product_Name Avg_Sales_Volume Avg_Price
Road-150 Red, 44 1 6758.9544
HL Mountain Frame - Silver, 46 15 4465.6362
AWC Logo Cap 14 76.4672
Long-Sleeve Logo Jersey, L 2 102.611
Road-150 Red, 56 2 6817.546
Mountain-500 Silver, 52 13 2118.7125
LL Touring Frame - Yellow, 62 62 5001.30
ML Mountain Frame-W - Silver, 40 115 6546.3382
(3862 rows)

A few notes first, then an explanation of what you're seeing.
First, your outer query has GROUP BY Month-Year with a dash, but I believe this is intended to be the column Month_Year (with an underscore) from the subquery.
Next, I like your use of AS to explicitly define aliases on columns, as well as your use of square brackets [ and ] around object names rather than quoted identifiers.
Finally, start getting in the practice of using schema- and alias-identifiers throughout your code - especially when working with subqueries. They'll make it much more clear which objects should be returned, and will prevent "ambiguous column" errors.
Explanation:
The difference in row-counts returned for each of your two queries is due to how those queries instruct the grouping to occur.
GROUP BY T.Product_Name; returns 266 rows because it is aggregating all of the sales volume and pricing information for the given product across all time. Changing up your subquery a bit to read:
SELECT COUNT(DISTINCT pp.Name)
FROM [Sales].[SalesOrderHeader] AS SSOH
LEFT JOIN Sales.SalesOrderDetail AS SSOD
ON SSOH.SalesOrderID = SSOD.SalesOrderID
LEFT JOIN production.product AS PP
ON SSOD.ProductID = PP.ProductID
Results in 266 rows being returned, indicating that there are 266 distinct product names included in that set.
When you add more grouping conditions (such as T.Month_Year), you are telling the engine to make "subgroups" in the aggregation structure.
GROUP BY T.Product_Name, T.Month_Year runs your aggregates for each of those distinct 266 product names as well as aggregating the data for each distinct T.Month_Year value that appears in each of your 266 product name groups.
Examining those row-counts a bit closer, the two-condition grouping returns 3,862 rows while the single-condition grouping returns 266 rows. Across those 3,862 rows there are 266 distinct product names represented an average of 14.52 times (3862 / 266 = 15.518...). If you assumed that every product had at least one sale per month, then you might conclude that we are looking at slightly more than one year's worth of sales data here. More likely, this is a set of several years of sales data with a lot of variation in sales volume and frequency between products.
ADDENDUM: Adding your GROUP BY columns to the SELECT illustrates the difference in result sets:
SELECT Product_Name,
T.Month_Year,
AVG(Sales_Volume) AS Avg_Sales_Volume,
AVG(Price) AS Avg_Price
FROM (SELECT PP.[Name] AS Product_Name,
SUM(SSOD.[OrderQty]) AS Sales_Volume,
SUM(SSOD.[UnitPrice]) AS Price,
FORMAT(SSOH.[OrderDate], 'MM-yyyy') AS Month_Year
FROM [Sales].[SalesOrderHeader] AS SSOH
LEFT JOIN Sales.SalesOrderDetail AS SSOD
ON SSOH.SalesOrderID = SSOD.SalesOrderID
LEFT JOIN production.product AS PP
ON SSOD.ProductID = PP.ProductID
GROUP BY PP.[Name], SSOH.[OrderDate]) AS T
--WHERE Product_Name = 'Road-150 Red, 44' (doing this for reference)
GROUP BY T.Product_Name,
T.Month_Year
ORDER BY Product_Name;
Examining the results shows that each product name also has records for any month in which that product sold:
+--------------------------+-----------+-------+-----------+
| ProductName | Month_Year|Avg_Vol| Avg_Price |
+--------------------------+-----------+-------+-----------+
| All-Purpose Bike Stand | 12-2013 | 1 | 193.0714 |
| All-Purpose Bike Stand | 06-2014 | 1 | 218.625 |
| All-Purpose Bike Stand | 05-2014 | 1 | 187.909 |
| All-Purpose Bike Stand | 10-2013 | 1 | 212.00 |
| AWC Logo Cap | 02-2014 | 6 | 57.7928 |
| AWC Logo Cap | 02-2012 | 48 | 93.357 |
| AWC Logo Cap | 08-2011 | 68 | 103.73 |
| AWC Logo Cap | 01-2013 | 124 | 129.4896 |
| AWC Logo Cap | 03-2014 | 21 | 71.1747 |
+--------------------------+-----------+-------+-----------+

Related

Finding products that were ordered 20% more times than the average of all other products in postgresql

I have asked a similar question and have received some help from some very nice people.
How to find the average of all other products in postgresql.
This question is not all but I thought I can work out the rest on my own if the hardest part can be resolved but apparently I've overestimated my abilities. So I'm posting another question... :)
The question is as followed.
I have a table Products which looks like the following:
+-----------+-----------+----------+
|ProductCode|ProductType| .... |
+-----------+-----------+----------+
| ref01 | BOOKS | .... |
| ref02 | ALBUMS | .... |
| ref06 | BOOKS | .... |
| ref04 | BOOKS | .... |
| ref07 | ALBUMS | .... |
| ref10 | TOYS | .... |
| ref13 | TOYS | .... |
| ref09 | ALBUMS | .... |
| ref29 | TOYS | .... |
| ref02 | ALBUMS | .... |
| ..... | ..... | .... |
+-----------+-----------+----------+
Another table Sales which looks like the following:
+-----------+-----------+----------+
|ProductCode| qty | .... |
+-----------+-----------+----------+
| ref01 | 15 | .... |
| ref02 | 12 | .... |
| ref06 | 20 | .... |
| ref04 | 14 | .... |
| ref07 | 11 | .... |
| ref10 | 19 | .... |
| ref13 | 3 | .... |
| ref09 | 9 | .... |
| ref29 | 5 | .... |
| ref02 | 4 | .... |
| ..... | ..... | .... |
+-----------+-----------+----------+
I am trying to find the products that were ordered 20% more than the average of all other products of the same type.
A product can be ordered several times and the quantities (qty) of each order might not be the same. Such as ref02 in the sample table. I only included one example (ref02) but it is the case for all products. So to find how many times a specific product was ordered would mean to find the sum of quantities ordered from all orders of the product.
By manually calculating, the result should be something like:
+-----------+-----------+----------+
|ProductCode| qty | .... |
+-----------+-----------+----------+
| ref02 | 16 | .... |
| ref06 | 20 | .... |
| ref07 | 11 | .... |
| ref10 | 19 | .... |
| ..... | ..... | .... |
+-----------+-----------+----------+
So if looking in the type ALBUMS and product ref02, then I need to find the average of Orders of ALL OTHER ALBUMS.
In this case, it is the average of ref06 and ref04, but there are more in the actual table. So what I need to do is the following:
Since product ref02 is 'ALBUMS' and there are two orders of ref02, the total orders will be 12+4=16. And ref07 and ref09 are also 'ALBUMS'.
So their average is (11+9)/2=10 < 12+4=16.
Since product ref06 is 'BOOKS', and **ref01** and ref04 are also 'BOOKS'.
So their average is (15+14)/2=14.5 <20.
Since product ref07 is 'ALBUMS', and **ref02** and ref09 are also 'ALBUMS'.
So their average is (12+9+4)/3=8.3 <11.
Since product ref10 is 'TOYS', and ref13 and ref29 are also 'TOYS'
So their average is (3+5)/2=4<19.
The rest does not satisfy the condition thus will not be in the result.
I know how to and was able to find the average of orders for all products under the same type, but I have no idea how to find the average of orders for all other products under the same type.
I know how to find the desired products with the helps I've received from my previous question How to find the average of all other products in postgresql, but that is when there is only one order for each product. I don't know how to proceed if there are multiple orders for each product. This is the "overestimated" bit I've mentioned at the beginning... :(
The answers I've received in my previous question has this problem:
DEMO (db<>fiddle). The tables in the demo are much more similar to the ones I'm working with, and as you see, there are many rows for one product. (The duplicated rows are by accident. The values just happened to be the same)
I am using PostgreSQL, but the exercise forbids the use of several keywords including: WITH, OVER, LIMIT, PARTITION, or LATERAL. I realize that they are commonly used in most solutions I've found and the ones provided to me, but I cannot use them because no result will be returned otherwise... :(
I know not being allowed to use these keywords can be annoying, but I honestly don't know what to do so please help! :)
I wrote a query for all combinations, Total by Product Code, Total by Product Type and e.t.c. You can calculate the average value if you need using (SUM values / Count Values).
select
main1.product_code,
main1.product_type,
main1.total as "Total by Product Code",
main1.sales_count as "Count by Product Code",
main2.total as "Total by Product Type",
main2.sales_count as "Count by Product Type",
main2.total - main1.total as "Total by Other Products Types (ignore this Product Code)",
main2.sales_count - main1.sales_count as "Count by Other Products Types (ignore this Product Code)"
from
(
select
s.product_code,
p.product_type,
sum(s.qty) as total,
count(*) as sales_count
from
examples.sales s
left join
examples.products p on p.product_code = s.product_code
group by
s.product_code, p.product_type
) main1
left join
(
select t1.product_type, sum(t1.qty) as total, count(*) as sales_count from (
select * from examples.sales s
left join examples.products p on p.product_code = s.product_code
) t1
group by t1.product_type
) main2 on main1.product_type = main2.product_type
Result:
Pr.Code
Pr.Type
Total by Pr.Code
Count by Pr.Code
Total by Pr.Type
Count by Pr.Type (ignore this Product Code)
Total by Other Pr.Types
Count by Other Pr.Types (ignore this Product Code)
ref29
TOYS
5
1
27
3
22
2
ref06
BOOKS
20
1
34
2
14
1
ref13
TOYS
3
1
27
3
24
2
ref02
ALBUMS
16
2
36
4
20
2
ref10
TOYS
19
1
27
3
8
2
ref07
ALBUMS
11
1
36
4
25
3
ref04
BOOKS
14
1
34
2
20
1
ref09
ALBUMS
9
1
36
4
27
3
Fix two errors in the setup
1.
A product can be ordered several times ...
It should still appear once in the Products table. The 2nd entry of ref02 is wrong.
2.
So to find how many times a specific product was ordered would mean to find the sum of quantities ordered from all orders of the product.
So your rationale for ref07 doesn't hold:
Since product ref07 is 'ALBUMS', and **ref02** and ref09 are also 'ALBUMS'.
So their average is (12+9+4)/3=8.3 <11.
Counting the two sales for ref02 separately is wrong in light of your definition. Operate with sums per product:
Since product ref07 is 'ALBUMS', and ref02 and ref09 are also 'ALBUMS'.
So their average is (16+9)/2 = 12.5 > 11. -- doesn't qualify!
Answer
find the products that were ordered 20% more than the average of all other products of the same type.
I am putting a proper solution first: an efficient query for Postgres 11+ using a window function with custom window frame over the aggregate sum()
SELECT product_code, orders
FROM (
SELECT product_code, sum(s.orders) AS orders
, avg(sum(s.orders)) OVER (PARTITION BY p.product_type
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
EXCLUDE CURRENT ROW) AS avg_orders
FROM product p
JOIN sales s USING (product_code)
GROUP BY product_code, p.product_type
) sub
WHERE avg_orders * 1.2 < orders
ORDER BY product_code; -- optional
Result (with the errors mentioned above fixed):
product_code
orders
ref02
16
ref06
20
ref10
19
Much more efficient than the below.
Postgres can apply a window function over an aggregate in the same query level. See:
Postgres window function and group by exception
How to use a SQL window function to calculate a percentage of an aggregate
At your request, an inefficient solution working around modern SQL features:
SELECT product_code, ps.orders
FROM (
SELECT product_code, p.product_type, sum(s.orders) AS orders
FROM product p
JOIN sales s USING (product_code)
GROUP BY product_code, p.product_type
) ps
JOIN LATERAL (
SELECT avg(orders) AS avg_orders
FROM (
SELECT sum(s1.orders) AS orders
FROM product p1
JOIN sales s1 USING (product_code)
WHERE p1.product_type = ps.product_type
AND p1.product_code <> ps.product_code
GROUP BY product_code
) sub
) a ON a.avg_orders * 1.2 < ps.orders
ORDER BY product_code; -- optional
db<>fiddle here
Same result.
We have to repeat the basic aggregation for sums in the subquery, since we cannot use a CTE to materialize it. (Possible remaining workaround: use a temporary table isntead.)
Basics in my answer to your previous question:
How to find the average of all other products in postgresql

SQL MIN() with GROUP BY select additional columns

I am trying to query a sql database table for the minimum price for products. I also want to grab an additional column with the value of the row with the minimum price. My data looks something like this.
ProductId | Price | Location
1 | 50 | florida
1 | 55 | texas
1 | 53 | california
2 | 65 | florida
2 | 64 | texas
2 | 60 | new york
I can query the minimum price for a product with this query
select ProductId, Min(Price)
from Table
group by ProductId
What I want to do is also include the Location where the Min price is being queried from in the above query. Is there a standard way to achieve this?
One method uses a correlated subquery:
select t.*
from t
where t.price = (select min(t2.price) from t t2 where t2.productid = t.productid);
In most databases, this has very good performance with an index on (productid, price).

How do you use two SUM() aggregate functions in the same query for PostgreSQL?

I have a PostgreSQL query that yields the following results:
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | null | 125.00 | new
123-2 | B corp. | null | 100.00 | new
I need to replace the o.order_total (it doesn't work properly) and sum up the sum of the order_shipment_total column so that, for the example above, each row winds up saying 225.00. I need the results above to look like this below:
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 225.00 | 125.00 | new
123-2 | B corp. | 225.00 | 100.00 | new
What I've Tried
1.) To replace o.order_total, I've tried SUM(SUM(osh.items)) but get the error message that you cannot nest aggregate functions.
2.) I've tried to put the entire query as a subquery and sum the order_shipment_total column, but when I do, it just repeats the column itself. See below:
SELECT order,
company,
SUM(order_shipment_total) AS order_shipment_total,
order_shipment_total,
order_type
FROM (
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type
) subquery
GROUP BY order,
company,
order_shipment_total,
order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 125.00 | 125.00 | new
123-2 | B corp. | 100.00 | 100.00 | new
3.) I've tried to only include the rows I actually want to group by in my subquery/query example above, because I feel like I was able to do this in Oracle SQL. But when I do that, I get an error saying "column [name] must appear in the GROUP BY clause or be used in an aggregate function."
...
GROUP BY order,
company,
order_type;
ERROR: column "[a column name]" must appear in the GROUP BY clause or be used in an aggregate function.
How do I accomplish this? I was certain that a subquery would be the answer but I'm confused as to why this approach will not work.
The thing you're not quite grasping with your query / approach is that you're actually wanting two different levels of grouping in the same query row results. The subquery approach is half right, but when you do a subquery that groups, inside another query that groups you can only use the data you've already got (from the subquery) and you can only choose to keep it at the level of aggregate detail it already is, or you can choose to lose precision in favor of grouping more. You can't keep the detail AND lose the detail in order to sum up further. A query-of-subquery is hence (in practical terms) relatively senseless because you might as well group to the level you want in one hit:
SELECT groupkey1, sum(sumx) FROM
(SELECT groupkey1, groupkey2, sum(x) as sumx FROM table GROUP BY groupkey1, groupkey2)
GROUP BY groupkey1
Is the same as:
SELECT groupkey1, sum(x) FROM
table
GROUP BY groupkey1
Gordon's answer will probably work out (except for the same bug yours exhibits in that the grouping set is wrong/doesn't cover all the columns) but it probably doesn't help much in terms of your understanding because it's a code-only answer. Here's a breakdown of how you need to approach this problem but with simpler data and foregoing the window functions in favor of what you already know.
Suppose there are apples and melons, of different types, in stock. You want a query that gives a total of each specific kind of fruit, regardless of the date of purchase. You also want a column for the total for each fruit overall type:
Detail:
fruit | type | purchasedate | count
apple | golden delicious | 2017-01-01 | 3
apple | golden delicious | 2017-01-02 | 4
apple | granny smith | 2017-01-04 ! 2
melon | honeydew | 2017-01-01 | 1
melon | cantaloupe | 2017-01-05 | 4
melon | cantaloupe | 2017-01-06 | 2
So that's 7 golden delicious, 2 granny smith, 1 honeydew, 6 cantaloupe, and its also 9 apples and 7 melons
You can't do it as one query*, because you want two different levels of grouping. You have to do it as two queries and then (critical understanding point) you have to join the less-precise (apples/melons) results back to the more precise (granny smiths/golden delicious/honydew/cantaloupe):
SELECT * FROM
(
SELECT fruit, type, sum(count) as fruittypecount
FROM fruit
GROUP BY fruit, type
) fruittypesum
INNER JOIN
(
SELECT fruit, sum(count) as fruitcount
FROM fruit
GROUP BY fruit
) fruitsum
ON
fruittypesum.fruit = fruitsum.fruit
You'll get this:
fruit | type | fruittypecount | fruit | fruitcount
apple | golden delicious | 7 | apple | 9
apple | granny smith | 2 | apple | 9
melon | honeydew | 1 | melon | 7
melon | cantaloupe | 6 | melon | 7
Hence for your query, different groups, detail and summary:
SELECT
detail.order || '-' || detail.ordinal_number as order,
detail.company,
summary.order_total,
detail.order_shipment_total,
detail.order_type
FROM (
SELECT o.order,
osh.ordinal_number,
o.company,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_type
) detail
INNER JOIN
(
SELECT o.order,
SUM(osh.items) AS order_total
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
--don't need the where clause; we'll join on order number
GROUP BY o.order,
o.company,
o.order_type
) summary
ON
summary.order = detail.order
Gordon's query uses a window function achieve the same effect; the window function runs after the grouping is done, and it establishes another level of grouping (PARTITION BY ordernumber) which is the effective equivalent of my GROUP BY ordernumber in the summary. The window function summary data is inherently connected to the detail data via ordernumber; it is implicit that a query saying:
SELECT
ordernumber,
lineitemnumber,
SUM(amount) linetotal
sum(SUM(amount)) over(PARTITION BY ordernumber) ordertotal
GROUP BY
ordernumber,
lineitemnumber
..will have an ordertotal that is the total of all the linetotal in the order: The GROUP BY prepares the data to the line level detail, and the window function prepares data to just the order level, and repeats the total as many times are necessary to fill in for every line item. I wrote the SUM that belongs to the GROUP BY operation in capitals.. the sum in lowercase belongs to the partition operation. it has to sum(SUM()) and cannot simply say sum(amount) because amount as a column is not allowed on its own - it's not in the group by. Because amount is not allowed on its own and has to be SUMmed for the group by to work, we have to sum(SUM()) for the partition to run (it runs after the group by is done)
It behaves exactly the same as grouping to two different levels and joining together, and indeed I chose that way to explain it because it makes it more clear how it's working in relation to what you already know about groups and joins
Remember: JOINS make datasets grow sideways, UNIONS make them grow downwards. When you have some detail data and you want to grow it sideways with some more data(a summary), JOIN it on. (If you'd wanted totals to go at the bottom of each column, it would be unioned on)
*you can do it as one query (without window functions), but it can get awfully confusing because it requires all sorts of trickery that ultimately isn't worth it because it's too hard to maintain
You should be able to use window functions:
SELECT o.order || '-' || osh.ordinal_number AS order, o.company,
SUM(SUM(osh.items)) OVER (PARTITION BY o.order) as order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o JOIN
order_shipments osh
ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order, o.company, o.order_type;

SQL 2 Left outer joins with Sum and Group By

Looking for some guidance on this. I am attempting to run a report in my complaint management system.. Complaints by Year, Location, Subcategory, Showing Totals for TotalCredits (child table) and TotalsCwts (childtable) as well as total ExternalRootCause (on master table).
This is my SQL, but the TotalCwts and TotalCredits are not being calculated correctly. It calculates 1 time for each child record rather than the total for each master record.
SELECT
dbo.Complaints.Location,
YEAR(dbo.Complaints.ComDate) AS Year,
dbo.Complaints.ComplaintSubcategory,
COUNT(Distinct(dbo.Complaints.ComId)) AS CustomerComplaints,
SUM(DISTINCT CASE WHEN (dbo.Complaints.RootCauseSource = 'External' ) THEN 1 ELSE 0 END) as ExternalRootCause,
SUM(dbo.ComplaintProducts.Cwts) AS TotalCwts,
Coalesce(SUM(dbo.CreditDeductions.CreditAmount),0) AS TotalCredits
FROM dbo.Complaints
JOIN dbo.CustomerComplaints
ON dbo.Complaints.ComId = dbo.CustomerComplaints.ComId
LEFT OUTER JOIN dbo.CreditDeductions
ON dbo.Complaints.ComId = dbo.CreditDeductions.ComId
LEFT OUTER JOIN dbo.ComplaintProducts
ON dbo.Complaints.ComId = dbo.ComplaintProducts.ComId
WHERE
dbo.Complaints.Location = Coalesce(#Location,Location)
GROUP BY
YEAR(dbo.Complaints.ComDate),
dbo.Complaints.Location,
dbo.Complaints.ComplaintSubcategory
ORDER BY
[YEAR] desc,
dbo.Complaints.Location,
dbo.Complaints.ComplaintSubcategory
Data Results
Location | Year | Subcategory | Complaints | External RC | Total Cwts | Total Credits
---------------------------------------------------------------------------------------
Boston | 2016 | Documentation | 1 | 0 | 8 | 8.00
Data Should Read
Location | Year | Subcategory | Complaints | External RC | Total Cwts | Total Credits
---------------------------------------------------------------------------------------
Boston | 2016 | Documentation | 1 | 0 | 4 | 2.00
Above data reflects 1 complaint having 4 Product Records with 1cwt each and 2 credit records with 1.00 each.
What do I need to change in my query or should I approach this query a different way?
The problem is that the 1 complaint has 2 Deductions and 4 products. When you join in this manner then it will return every combination of Deduction/Product for the complaint which gives 8 rows as you're seeing.
One solution, which should work here, is to not query the Dedustion and Product tables directly; query a query which returns one row per table per complaint. In other words, replace:
LEFT OUTER JOIN dbo.CreditDeductions ON dbo.Complaints.ComId = dbo.CreditDeductions.ComId
LEFT OUTER JOIN dbo.ComplaintProducts ON dbo.Complaints.ComId = dbo.ComplaintProducts.ComId
...with this - showing the Deductions table only, you can work out the Products:
LEFT OUTER JOIN (
select ComId, count(*) CountDeductions, sum(CreditAmount) CreditAmount
from dbo.CreditDeductions
group by ComId
) d on d.ComId = Complaints.ComId
You'll have to change the references to dbo.CreditDedustions to just d (or whatever you want to call it).
Once you've done them both then you'll one each per complaint, which will result with 1 row per complaint contaoining the counts and totals from the two sub-tables.

SQL select only highest date

For a project I want to generate a price list.
I want to get only the latest prices from each supplier for each article.
There are just those two tables.
Table articles
ARTNR | TXT | ACTIVE | SUPPLIER
------------------------------------------
10 | APPLE | Y | 10
20 | ORANGE | Y | 10
30 | KEYBOARD | N | 20
40 | ORANGE | Y | 20
50 | BANANA | Y | 10
60 | CHERRY | Y | 10
Table prices
ARTNR | PRCGRP | PRCDAT | PRICE
--------------------------------------
10 | 10 | 01-Aug-10 | 2.1
10 | 10 | 05-Aug-11 | 2.2
10 | 10 | 21-Aug-12 | 2.5
20 | 0 | 01-Aug-10 | 2.1
20 | 10 | 09-Aug-12 | 2.3
10 | 10 | 14-Aug-13 | 2.7
This is what I have so far:
SELECT
ARTICLES.[ARTNR], ARTICLES.[TXT], ARTICLES.[ACTIVE], ARTICLES.[SUPPLIER], PRICES.PRCGRP, PRICES.PRCDAT, PRICES.PRICE
FROM
ARTICLES INNER JOIN PRICES ON ARTICLES.ARTNR = PRICES.ARTNR
WHERE
(
(ARTICLES.[ACTIVE]="Y") AND
(ARTICLES.[SUPPLIER]=10) AND
(PRICES.PRCGRP=0) AND
(PRICES.PRCDAT=(SELECT MAX(PRCDAT) FROM PRICES as art WHERE art.ARTNR = PRICES.artnr) )
)
ORDER BY ARTICLES.ARTNR
;
It is okay to choose just one supplier each time, but I want the max price.
The problem is:
Lots of articles do not show up with the query above,
but I cannot figure out what is wrong.
I can see that they should be in the resultset when I leave out the subselect on max prcdat.
What is wrong?
Your subquery to get the latest price does not take the other conditions into account, that is when you're getting the latest price, you may get a price in another price group or that is not active. When you join that against the filtered list that has no inactive prices and only prices in a single price group, you get no hits that exist in both.
Either you need to duplicate or - better - move your conditions inside the subquery to get the best price under the conditions. I can't test against access, but something like this should be possible if the SQL is not too limited;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
JOIN (
SELECT a.artnr, MAX(p.prcdat) prcdat
FROM articles a JOIN prices p ON a.artnr = p.artnr
WHERE a.active='Y' AND a.supplier=10 AND p.prcgrp=10
GROUP BY a.artnr) z
ON a.artnr = z.artnr AND p.prcdat = z.prcdat
ORDER BY a.ARTNR
If the SQL support in access won't allow a join with a subquery, you can just move the conditions inside your existing subquery, something like;
SELECT a.artnr, a.txt, a.active, a.supplier, p.prcgrp, p.prcdat, p.price
FROM articles a INNER JOIN prices p ON a.ARTNR = p.ARTNR
WHERE p.prcdat = (
SELECT MAX(p2.prcdat)
FROM articles a2 JOIN prices p2 ON a2.artnr = p2.artnr
WHERE a.artnr = a2.artnr AND a2.active='Y' AND a2.supplier=10 AND p2.prcgrp=10
)
ORDER BY a.ARTNR;
Note that due to limitations in identifying a unique price (no primary key in prices), the queries may give duplicates if several prices for the same article have the same prcdat. If that's a problem, you'll probably need to duplicate your conditions outside the subquery too.